Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc@2129
PS16, Line 2129:     CatalogUpdateResultIterator 
callback_ctx(catalog_update_result);
               :     TUpdateCatalogCacheRequest update_req;
               :     update_req.__set_is_delta(true);
               :     
update_req.__set_native_iterator_ptr(reinterpret_cast<int64_t>(&callback_ctx));
               :     // The catalog version is updated in WaitForCatalogUpdate 
below. So we need a
               :     // standalone field in the request to update the service 
ID without touching the
               :     // catalog version.
               :     
update_req.__set_catalog_service_id(catalog_update_result.catalog_service_id);
nit: Let's move these into the if-clause at line 2155.


http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@217
PS16, Line 217: 20
nit: Can we lower this down to 5?


http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@239
PS16, Line 239: sleep(self.UPDATE_FREQUENCY_S * 2)
While thinking about comments for this sleep, I realize this actually reveals a 
bug of our approach.

I add a log in CatalogServiceCatalog's constructor to print the service id. 
Let's say the 3 catalogd instances are using service id s0, s1 and s2. The 
WARNING log we want is

 Ignoring catalog update result of catalog service ID: s1. The expected catalog 
service ID: s1. Current catalog service ID: s2. Catalogd may be restarted more 
than once.

But what I saw is

 Ignoring catalog update result of catalog service ID: s2. The expected catalog 
service ID: s2. Current catalog service ID: s1. Catalogd may be restarted more 
than once.

This means the catalog of the first restarted catalogd is still propagated to 
the impalad, and we are ignoring the ddl update from the second restarted 
catalogd. This will lead to stale catalog until the next statestore update 
comes. The following test can reveal the trouble:

  UPDATE_FREQUENCY_S = 20

  @pytest.mark.execute_serially
  @CustomClusterTestSuite.with_args(
    statestored_args="--statestore_update_frequency_ms={frequency_ms}"
    .format(frequency_ms=(UPDATE_FREQUENCY_S * 1000)))
  def test_restart_catalogd_twice(self):
    self.execute_query_expect_success(self.client, "drop table if exists 
join_aa")
    self.execute_query_expect_success(self.client, "create table join_aa(id 
int)")
    # Make the catalog object version grow large enough
    self.execute_query_expect_success(self.client, "invalidate metadata")

    # No need to care whether the dll is executed successfully, it is just to 
make
    # the local catalog catche of impalad out of sync
    for i in range(0, 10):
      try:
        query = "alter table join_aa add columns (age" + str(i) + " int)"
        self.execute_query_async(query)
      except Exception, e:
        LOG.info(str(e))
    self.cluster.catalogd.restart()
    sleep(self.UPDATE_FREQUENCY_S * 2)
    self.cluster.catalogd.restart()

    self.execute_query_expect_success(self.client, "drop table join_aa")
    # Should not see stale metadata on 'join_aa'
    result = self.execute_query_expect_success(self.client, "show tables")
    assert 'join_aa' not in result.data

Do you have any ideas on fixing this?



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: liuyao <[email protected]>
Gerrit-Comment-Date: Mon, 09 Aug 2021 01:59:11 +0000
Gerrit-HasComments: Yes

Reply via email to