Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23174 )

Change subject: IMPALA-14227: In HA failover, passive catalogd should apply 
pending HMS events before being active
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/23174/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23174/2//COMMIT_MSG@15
PS2, Line 15: This patch adds a wait during HA failover to ensure HMS events 
before
            : the failover happens are all applied on the new active catalogd.
> This may become problematic in case the event processor is lagging - if the
Yeah, that could be a problem. I'm assuming the passive catalogd won't have a 
long lag since it doesn't run any DDLs that could block event processing. Agree 
that adding a timeout is useful when external systems (e.g. HMS) are slow.

I plan to add an improvement that EventProcessor goes into a "catching up" 
state that it just invalidate tables when processing events. Such a state can 
be used in this failover scenario or when the active catalogd starts to have a 
long lag. It'd be a larger change so will do this in a seperate patch.


http://gerrit.cloudera.org:8080/#/c/23174/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/23174/2/be/src/catalog/catalog-server.cc@873
PS2, Line 873:   SleepForMs(FLAGS_hms_event_polling_interval_s * 1000L);
> Wouldn't it be better to do an HMS RPC here to get the latest id, and wait
Tried to not adding new JNI methods. The sleep is just 1s by default. But yeah, 
fetching from HMS directly is more robust. I'll change this.


http://gerrit.cloudera.org:8080/#/c/23174/2/be/src/catalog/catalog-server.cc@879
PS2, Line 879: while (last_synced_hms_event_id < latest_hms_event_id)
> What will happen if the event processor runs into an error state? Will this
Yeah, I'm assuming EventProcessor won't go into the error state after 
IMPALA-12832. But there could still be some unhandled cases. I'll add a timeout 
for this.



--
To view, visit http://gerrit.cloudera.org:8080/23174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf4fcb0e27c14197f79625749949b47c033a5f31
Gerrit-Change-Number: 23174
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 15 Jul 2025 14:30:10 +0000
Gerrit-HasComments: Yes

Reply via email to