Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 )
Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync ...................................................................... Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc@2129 PS16, Line 2129: CatalogUpdateResultIterator callback_ctx(catalog_update_result); : TUpdateCatalogCacheRequest update_req; : update_req.__set_is_delta(true); : update_req.__set_native_iterator_ptr(reinterpret_cast<int64_t>(&callback_ctx)); : // The catalog version is updated in WaitForCatalogUpdate below. So we need a : // standalone field in the request to update the service ID without touching the : // catalog version. : update_req.__set_catalog_service_id(catalog_update_result.catalog_service_id); nit: Let's move these into the if-clause at line 2155. http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@217 PS16, Line 217: 20 nit: Can we lower this down to 5? http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@239 PS16, Line 239: sleep(self.UPDATE_FREQUENCY_S * 2) While thinking about comments for this sleep, I realize this actually reveals a bug of our approach. I add a log in CatalogServiceCatalog's constructor to print the service id. Let's say the 3 catalogd instances are using service id s0, s1 and s2. The WARNING log we want is Ignoring catalog update result of catalog service ID: s1. The expected catalog service ID: s1. Current catalog service ID: s2. Catalogd may be restarted more than once. But what I saw is Ignoring catalog update result of catalog service ID: s2. The expected catalog service ID: s2. Current catalog service ID: s1. Catalogd may be restarted more than once. This means the catalog of the first restarted catalogd is still propagated to the impalad, and we are ignoring the ddl update from the second restarted catalogd. This will lead to stale catalog until the next statestore update comes. The following test can reveal the trouble: UPDATE_FREQUENCY_S = 20 @pytest.mark.execute_serially @CustomClusterTestSuite.with_args( statestored_args="--statestore_update_frequency_ms={frequency_ms}" .format(frequency_ms=(UPDATE_FREQUENCY_S * 1000))) def test_restart_catalogd_twice(self): self.execute_query_expect_success(self.client, "drop table if exists join_aa") self.execute_query_expect_success(self.client, "create table join_aa(id int)") # Make the catalog object version grow large enough self.execute_query_expect_success(self.client, "invalidate metadata") # No need to care whether the dll is executed successfully, it is just to make # the local catalog catche of impalad out of sync for i in range(0, 10): try: query = "alter table join_aa add columns (age" + str(i) + " int)" self.execute_query_async(query) except Exception, e: LOG.info(str(e)) self.cluster.catalogd.restart() sleep(self.UPDATE_FREQUENCY_S * 2) self.cluster.catalogd.restart() self.execute_query_expect_success(self.client, "drop table join_aa") # Should not see stale metadata on 'join_aa' result = self.execute_query_expect_success(self.client, "show tables") assert 'join_aa' not in result.data Do you have any ideas on fixing this? -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: liuyao <[email protected]> Gerrit-Comment-Date: Mon, 09 Aug 2021 01:59:11 +0000 Gerrit-HasComments: Yes
