Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14319 )
Change subject: IMPALA-8984: Fix race condition in creating Kudu table ...................................................................... Patch Set 14: (4 comments) The fix makes sense to me. But it'd be better if we can pass the test_concurrent_create_kudu_table test I pasted below, since we are fixing race condition in creating the same kudu table in parallel. http://gerrit.cloudera.org:8080/#/c/14319/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14319/14//COMMIT_MSG@9 PS14, Line 9: This patch fixed the race condition when use 'CREATE IF NOT EXISTS' to : create kudu managed table without Kudu-HMS Integration. This bug would : caused table deleted in Kudu storage, but reserved in HMS when multiple : identical queries submitted at the same time. I think we should explain the 'race condition' more clear. Also there's some gramma errors. Maybe native speakers could give some advises. This paragraph may be better in gramma, but still not explanning what's the race condition: This patch fixes the race condition when using 'CREATE IF NOT EXISTS' to create the same managed kudu table in parallel. Note that it won't happend if Kudu-HMS integration is enable. The bug would cause the table being deleted in Kudu but reserving in HMS. http://gerrit.cloudera.org:8080/#/c/14319/14//COMMIT_MSG@14 PS14, Line 14: Fix is to check whether table already exists in HMS after obtain the : 'metastoreDdlLock_' lock. If true, just return 'Table already exists' : tip, otherwise create a new table in HMS and then update catalog. I think this is better: The solution is adding check for HMS table existence before creating it in HMS and after obtaining 'metastoreDdlLock_'. If the HMS table is created by other concurrent threads, just return as 'Table already exists'. So we don't fail in creating the HMS table and won't rollback the creation of kudu table. http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2245 PS14, Line 2245: synchronized (metastoreDdlLock_) { I think you can add some configurable sleep time before this. In test we set the sleep time long encough so that two threads can go here concurrently and reproduce the bug. IMPALA-7534 is a good example for learning how to add your tests. The bug it fixed is also a race condition which is hard to reproduce too. The solution is running the test in many iterations to increase the possibility of hitting the bug: https://github.com/apache/impala/blob/0a08217/tests/custom_cluster/test_local_catalog.py#L383-L385 I've tried adding a test in tests/custom_cluster/test_concurrent_ddls.py to reproduce this bug: @pytest.mark.execute_serially def test_concurrent_create_kudu_table(self, unique_database): test_self = self class ThreadLocalClient(threading.local): def __init__(self): self.client = test_self.create_impala_client() tls = ThreadLocalClient() def run_create_table_if_not_exists(): self.execute_query_expect_success( tls.client, "create table if not exists %s.kudu_tbl " "(id int, primary key(id)) stored as kudu" % unique_database) tls.client.close() NUM_ITERS = 100 for i in xrange(NUM_ITERS): # Run two commands in parallel pool = ThreadPool(processes=2) r1 = pool.apply_async(run_create_table_if_not_exists) r2 = pool.apply_async(run_create_table_if_not_exists) r1.get() r2.get() pool.terminate() self.execute_query("drop table if exists %s.kudu_tbl" % unique_database) This test will run two concurrent CreateTable statements and do it for 100 times. It may hit other bugs but your can recognize them in codes and fix them later, e.g. https://github.com/apache/impala/blob/0a08217/tests/custom_cluster/test_concurrent_ddls.py#L126 http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/test/java/org/apache/impala/service/MultiKuduDDLTest.java File fe/src/test/java/org/apache/impala/service/MultiKuduDDLTest.java: http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/test/java/org/apache/impala/service/MultiKuduDDLTest.java@67 PS14, Line 67: public void testMultiKuduTabeDDL() throws ImpalaException { This test is not quite convincing to me... Mostly because line72 couldn't happen in practice and this is not a concurrent test. -- To view, visit http://gerrit.cloudera.org:8080/14319 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1a4047bcdaa6b346765b96e8c36bb747a2b0091d Gerrit-Change-Number: 14319 Gerrit-PatchSet: 14 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Tue, 24 Dec 2019 08:50:20 +0000 Gerrit-HasComments: Yes
