Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14319 )

Change subject: IMPALA-8984: Fix race condition in creating Kudu table
......................................................................


Patch Set 14:

(4 comments)

The fix makes sense to me. But it'd be better if we can pass the 
test_concurrent_create_kudu_table test I pasted below, since we are fixing race 
condition in creating the same kudu table in parallel.

http://gerrit.cloudera.org:8080/#/c/14319/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14319/14//COMMIT_MSG@9
PS14, Line 9: This patch fixed the race condition when use 'CREATE IF NOT 
EXISTS' to
            : create kudu managed table without Kudu-HMS Integration. This bug 
would
            : caused table deleted in Kudu storage, but reserved in HMS when 
multiple
            : identical queries submitted at the same time.
I think we should explain the 'race condition' more clear. Also there's some 
gramma errors. Maybe native speakers could give some advises.

This paragraph may be better in gramma, but still not explanning what's the 
race condition:
This patch fixes the race condition when using 'CREATE IF NOT EXISTS' to create 
the same managed kudu table in parallel. Note that it won't happend if Kudu-HMS 
integration is enable. The bug would cause the table being deleted in Kudu but 
reserving in HMS.


http://gerrit.cloudera.org:8080/#/c/14319/14//COMMIT_MSG@14
PS14, Line 14: Fix is to check whether table already exists in HMS after obtain 
the
             : 'metastoreDdlLock_' lock. If true, just return 'Table already 
exists'
             : tip, otherwise create a new table in HMS and then update catalog.
I think this is better:

The solution is adding check for HMS table existence before creating it in HMS 
and after obtaining 'metastoreDdlLock_'. If the HMS table is created by other 
concurrent threads, just return as 'Table already exists'. So we don't fail in 
creating the HMS table and won't rollback the creation of kudu table.


http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2245
PS14, Line 2245:       synchronized (metastoreDdlLock_) {
I think you can add some configurable sleep time before this. In test we set 
the sleep time long encough so that two threads can go here concurrently and 
reproduce the bug.

IMPALA-7534 is a good example for learning how to add your tests. The bug it 
fixed is also a race condition which is hard to reproduce too. The solution is 
running the test in many iterations to increase the possibility of hitting the 
bug: 
https://github.com/apache/impala/blob/0a08217/tests/custom_cluster/test_local_catalog.py#L383-L385

I've tried adding a test in tests/custom_cluster/test_concurrent_ddls.py to 
reproduce this bug:

  @pytest.mark.execute_serially
  def test_concurrent_create_kudu_table(self, unique_database):
    test_self = self

    class ThreadLocalClient(threading.local):
      def __init__(self):
        self.client = test_self.create_impala_client()

    tls = ThreadLocalClient()

    def run_create_table_if_not_exists():
      self.execute_query_expect_success(
          tls.client, "create table if not exists %s.kudu_tbl "
                      "(id int, primary key(id)) stored as kudu" % 
unique_database)
      tls.client.close()

    NUM_ITERS = 100
    for i in xrange(NUM_ITERS):
      # Run two commands in parallel
      pool = ThreadPool(processes=2)
      r1 = pool.apply_async(run_create_table_if_not_exists)
      r2 = pool.apply_async(run_create_table_if_not_exists)
      r1.get()
      r2.get()
      pool.terminate()
      self.execute_query("drop table if exists %s.kudu_tbl" % unique_database)

This test will run two concurrent CreateTable statements and do it for 100 
times. It may hit other bugs but your can recognize them in codes and fix them 
later, e.g. 
https://github.com/apache/impala/blob/0a08217/tests/custom_cluster/test_concurrent_ddls.py#L126


http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/test/java/org/apache/impala/service/MultiKuduDDLTest.java
File fe/src/test/java/org/apache/impala/service/MultiKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14319/14/fe/src/test/java/org/apache/impala/service/MultiKuduDDLTest.java@67
PS14, Line 67:     public void testMultiKuduTabeDDL() throws ImpalaException {
This test is not quite convincing to me... Mostly because line72 couldn't 
happen in practice and this is not a concurrent test.



--
To view, visit http://gerrit.cloudera.org:8080/14319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a4047bcdaa6b346765b96e8c36bb747a2b0091d
Gerrit-Change-Number: 14319
Gerrit-PatchSet: 14
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Tue, 24 Dec 2019 08:50:20 +0000
Gerrit-HasComments: Yes

Reply via email to