Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23174 )

Change subject: IMPALA-14227: In HA failover, passive catalogd should apply 
pending HMS events before being active
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/23174/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23174/4//COMMIT_MSG@11
PS4, Line 11:  However, it could still use
            : a stale metadata cache when some pending HMS events generated by 
the
            : previous active catalogd are not applied yet.
> . For storages like S3 that don't have block locations, reload might have the 
> same performance as the initial loading since both time is dominated in file 
> listing.

This is true for external tables, but for Hive ACID tables we could skip file 
listing if validWriteIdList didn't change. In Iceberg tables we don't even need 
file listing.

I will think this through and create a Jira. Marking tables stale could be also 
very useful to reduce load during event processing. For example:
1. mark a tables stale if no catalog operations happened to it for N minutes
2. while a table is stale HMS events are ignored (with the exception of drop 
table/rename) but the cached data is kept in memory in catalogd
3. the staleness is propagated via catalogd to coordinators through statestore
4. if a coordinator wants to use a stale table, it has to request data from the 
catalog again
5. the new request to catalog "revives" the table, leading to a REFRESH while 
assuming that existing file descriptors are still valid

This could be really efficient for example for Iceberg tables that are 
frequently written but read rarely by Impala - between the rare reads the table 
could go stale, so the write events wouldn't lead to always re-reading Iceberg 
metadata. When it is read again by Impala, only the freshest Iceberg snapshot 
would need to be read and only new files would need fetching block locations.

The cost is that catalogd <-> coordinator traffic would increase.



--
To view, visit http://gerrit.cloudera.org:8080/23174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf4fcb0e27c14197f79625749949b47c033a5f31
Gerrit-Change-Number: 23174
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Thu, 17 Jul 2025 10:56:58 +0000
Gerrit-HasComments: Yes

Reply via email to