Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1285
PS3, Line 1285:             // update refresh event id at partition level
This and the comment at L1298 confuse me since no codes after them are actually 
updating the refresh event id. Can we remove them?


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1923
PS3, Line 1923:     long latestEventId = getLatestEventId(client);
I think we should get this before loadFileMetadataForPartitions() as well.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1930
PS3, Line 1930:       p.setLastRefreshEventId(latestEventId);
We'd better check and only update it if 'latestEventId' is not -1 or lower than 
the current tracked refresh event id.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1932
PS3, Line 1932:     LOG.info("Setting the latest refresh event id to {} for the 
loaded partitions",
Let's add the table name in the log.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2947
PS3, Line 2947:   public long getLatestEventId(IMetaStoreClient client) {
This seems a static method and can be moved to MetastoreEventsProcessor. 
Actually, there is a similar method: 
MetastoreEventsProcessor.getCurrentEventId(). We can add a new one like 
getCurrentEventIdNoThrow() for this.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1053
PS3, Line 1053:           infoLog("Incremented events skipped counter to {}",
We'd better add more info about why an event is skipped or not skipped. They 
would be useful in debugging issues when we lose track of the event ids, e.g. 
IMPALA-12256

We already have partition level logs at 
https://github.com/apache/impala/blob/fed883379383b368bbce4f3b36b95e01a7378578/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L3903
So we just need to add the table level refresh event id.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1529
PS3, Line 1529:         eventId = getCurrentEventId(msClient);
This happens after we reload the table metadata which is unsafe if there are 
external DDLs finish between we reload the HMS schema and get the latest event 
id here. Those DDLs will be incorrectly skipped. To be safe, I think we should 
get the latest event id before L1522.

BTW, do we need this extra RPC call for non-hdfs tables, e.g. kudu table?


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1537
PS3, Line 1537:       throw new CatalogException("Unable to load table 
metadata.", ex);
It seems too strict to fail the DDL just for failures in getting the latest 
event id. It'd be better to just log a warning about older events might not be 
skipped due to failures in fetching the latest event id.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1615
PS3, Line 1615:         eventId = getCurrentEventId(msClient);
The same as above, I think this should run before the load() and exceptions in 
it should not fail the DDL.


http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20022/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3219
PS3, Line 3219: setEnableSyncToLatestEventOnDdls(true)
Is this required to turn on this optimization?


http://gerrit.cloudera.org:8080/#/c/20022/3/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/20022/3/tests/custom_cluster/test_events_custom_configs.py@316
PS3, Line 316: --enable_reload_events=true
Do we need this flag?



--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 31 Jul 2023 02:40:25 +0000
Gerrit-HasComments: Yes

Reply via email to