Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 )
Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/20022/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20022/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1491 PS6, Line 1491: if (isOlderEvent(null)) { : infoLog("Not processing the alter table event {} as it is an older event", : getEventId()); : return; : } : : if (isRename_) { : processRename(); : return; : } I am trying to understand the RENAME case. I think that it would be clearer to skip isOlderEvent() logic for RENAME, as renameTableFromEvent() has its own handling of edge cases like renaming a table and then creating a table with the same name. http://gerrit.cloudera.org:8080/#/c/20022/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2341 PS6, Line 2341: isOlderEvent I am trying to understand what will happen in the batched case. Wouldn't it make sense to call isOlderEvent for batchedEvents_ individually? If I understand correctly, currently we consider the eventId of the last event in batchedEvents_, so if the refresh happened somewhere in the middle, then we won't skip the early events. For example: from Hive: insert to partition of tbl from Impala: refresh tbl from Hive: insert to partition of tbl The events from the two events could be batched, right? Also, this path looks untested as we always do only a single insert in the tests. This could be covered by an insert with dynamic partitioning in Hive. http://gerrit.cloudera.org:8080/#/c/20022/6/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20022/6/tests/custom_cluster/test_events_custom_configs.py@315 PS6, Line 315: catalogd_args Is there a reason why enable_sync_to_latest_event_on_ddls is true in the Java test but the default false here? Is the long term goal to enable that flag by default? -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 14 Aug 2023 16:01:31 +0000 Gerrit-HasComments: Yes
