Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20192 )
Change subject: IMPALA-12267: DMLs/DDLs can hang as a result of catalogd restart ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/20192/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/20192/1/be/src/service/impala-server.cc@382 PS1, Line 382: 5 > When catalog_update_info_ really changes, we exit the loop. hmm, we only exit the loop when the 'catalog_service_id' of it changes. It consists of other fields like catalog_version that could also changes in a statestore update. But we don't check other fields in the while-loop. The suggestion is that we should check all fields of 'catalog_update_info_' to count statestore updates. If we receive e.g. 10 statestore updates and still see the catalog service id unchanged, we can exit the loop. http://gerrit.cloudera.org:8080/#/c/20192/1/be/src/service/impala-server.cc@2269 PS1, Line 2269: we only got the updates about some but not all restarts : // - the update about the catalogd that has 'catalog_service_id' has not : // arrived yet > I thought about a situation like this: Yeah, exactly. In such case (IMPALA-10875), client might see stale metadata. In theory, there are two cases that we exit the while loop with catalog service id changes, i.e. either the id in DDL response is stale or the id in DDL response is newer. Client might see stale metadata in the latter case. And the 3rd case is timeout. http://gerrit.cloudera.org:8080/#/c/20192/1/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/20192/1/tests/custom_cluster/test_restart_services.py@234 PS1, Line 234: assert "Ignoring catalog update result of catalog service ID" in logs > I added "age" in the select statement on L237. I think we should check it before the alter statement on L235. Because the alter statement might bring the metadata back to normal. What we want to check is the client doesn't see stale metadata even if the timeout happens. -- To view, visit http://gerrit.cloudera.org:8080/20192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib71bec8f67f80b0bdfe0a6cc46a16ef624163d8b Gerrit-Change-Number: 20192 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Wed, 19 Jul 2023 14:27:25 +0000 Gerrit-HasComments: Yes
