Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20192 )
Change subject: IMPALA-12267: DMLs/DDLs can hang as a result of catalogd restart ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/20192/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/20192/1/be/src/service/impala-server.cc@382 PS1, Line 382: -1 > Quanlong may have more experience for safe timeout value. I think 5 minutes should be enough but maybe don't need that long. There are two cases we need this timeout: 1) the case of IMPALA-12267 to avoid coordinator wait infinitely. Any timeout value is ok. 2) the case of MPALA-5476: the DDL response comes from the new catalogd and coordinator need to wait for the statestore update from it. Note that when catalogd launches, it sends the inital update to statestore. So any update in statestore should be usable. So I think 2x or 3x of statestore_update_frequency_ms should be ok in normal cases. In case the new catalogd has intermittent connection issue with statestore that causes the catalog update not being sent, coordinator might need to wait more time. I think limiting the iterations might be more robust than limiting the wait time. Each iteration corresponds to an applied statestore update: while (cur_service_id == catalog_update_info_.catalog_service_id) { catalog_version_update_cv_.Wait(ver_lock); } It seems 3 iterations should be enough. The worse case that needs 3 iterations is: iter0: coordinator is applying the statestore update of the old catalogd while the DDL responses from the new catalogd comes and waits iter1: coordinator starts to apply the statestore update of the new catalogd and sees the catalog_service_id changes. It then triggers a full update from statestore. iter2: coordinator applies the full update from statestore then updates the catalog_service_id to the latest, which unblocks the DDL. In case we miss some corner cases, we can choose a larger iteration limit, e.g. 10. http://gerrit.cloudera.org:8080/#/c/20192/1/be/src/service/impala-server.cc@2241 PS1, Line 2241: catalog_version_update_cv_.Wait(ver_lock); Can we add a log after each wake up? It helps to know how many iterations we have waited. -- To view, visit http://gerrit.cloudera.org:8080/20192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib71bec8f67f80b0bdfe0a6cc46a16ef624163d8b Gerrit-Change-Number: 20192 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Mon, 17 Jul 2023 11:15:13 +0000 Gerrit-HasComments: Yes
