Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/23174 )
Change subject: IMPALA-14227: In HA failover, passive catalogd should apply pending HMS events before being active ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/23174/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23174/2//COMMIT_MSG@15 PS2, Line 15: This patch adds a wait during HA failover to ensure HMS events before : the failover happens are all applied on the new active catalogd. This may become problematic in case the event processor is lagging - if the passive coordinator is lagging 1 hour behind HMS, does this mean the failover will need 1 hour so there won't be a catalogd for a prolonged time? http://gerrit.cloudera.org:8080/#/c/23174/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/23174/2/be/src/catalog/catalog-server.cc@873 PS2, Line 873: SleepForMs(FLAGS_hms_event_polling_interval_s * 1000L); Wouldn't it be better to do an HMS RPC here to get the latest id, and wait until last_synced_hms_event_id reaches that? I have the following problems with the sleep: - if there are not many writes to HMS at the moment then catalogd may sleep unnecessarily, making failover slower - in case the polling is delayes (e.g. slow HMS RPC), sleeping this much may not be enough. http://gerrit.cloudera.org:8080/#/c/23174/2/be/src/catalog/catalog-server.cc@879 PS2, Line 879: while (last_synced_hms_event_id < latest_hms_event_id) What will happen if the event processor runs into an error state? Will this loop wait forever? I may be useful to have a timeout to sync events, and if it passes, revert to globally invalidating metedata. -- To view, visit http://gerrit.cloudera.org:8080/23174 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf4fcb0e27c14197f79625749949b47c033a5f31 Gerrit-Change-Number: 23174 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Tue, 15 Jul 2025 12:12:39 +0000 Gerrit-HasComments: Yes
