Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20022 )
Change subject: IMPALA-11535: Skip older events in the event processor based on the latestRefreshEventID ...................................................................... Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1527 PS4, Line 1527: reloadFileMetadata, reloadTableSchema, false, partitionsToUpdate, : debugAction, partitionToEventId, reason); > Shouldn't we consider reloadFileMetadata/reloadTableSchema/partitionsToUpda We always reloadFileMetadat, reloadTableSchema in catalogOpExecutor whenever there is a schema change or file metadata change. If you see the methods calling loadTableMetadata(), we always set them to true whenever they are changed. Regarding PartitionsToUpdate, we set the lastRefreshEventId on updated partitions only (see HdfsTable.java table L#1930) and never set it at the table level. http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1532 PS4, Line 1532: // Update the lastRefreshEventId at the table level if it is unpartitioned table : // if it is partitioned table, partitions are updated in HdfsTable#load() method > This also applies to other table types, e.g. Iceberg/Kudu/HBase, right? For unpartitioned tables, Yes. For partitioned tables in Iceberg, kudu, there is no way of setting lastRefreshEventID at partition level. http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3222 PS4, Line 3222: createTable(testTblName, true); > Both this and the Python test used a partitioned table. As this feature is Ack http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3231 PS4, Line 3231: alterTableAddParameter(testTblName, "somekey", "someval"); > Why is calling alterTable from Impala relevant here? Ideally, not really. Just want to emphasize that lastrefreshEventId would skip this event. http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3233 PS4, Line 3233: "test", null); : eventsProcessor_.processEvents(); : assertTrue(testTbl.getLastRefreshEventId() > Shouldn't we load a table and call getLastRefreshEventId before eventsProce Ack. we would want to compare currentEventId of event processor to be equal to the lastrefreshEventId after the refresh event. Made the necessary change. -- To view, visit http://gerrit.cloudera.org:8080/20022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1 Gerrit-Change-Number: 20022 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 08 Aug 2023 03:09:10 +0000 Gerrit-HasComments: Yes
