Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20022 )

Change subject: IMPALA-11535: Skip older events in the event processor based on 
the latestRefreshEventID
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1527
PS4, Line 1527:             reloadFileMetadata, reloadTableSchema, false, 
partitionsToUpdate,
              :             debugAction, partitionToEventId, reason);
> Shouldn't we consider reloadFileMetadata/reloadTableSchema/partitionsToUpda
We always reloadFileMetadat, reloadTableSchema in catalogOpExecutor whenever 
there is a schema change or file metadata change. If you see the methods 
calling loadTableMetadata(), we always set them to true whenever they are 
changed.
Regarding PartitionsToUpdate, we set the lastRefreshEventId on updated 
partitions only (see HdfsTable.java table L#1930) and never set it at the table 
level.


http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1532
PS4, Line 1532:       // Update the lastRefreshEventId at the table level if it 
is unpartitioned table
              :       // if it is partitioned table, partitions are updated in 
HdfsTable#load() method
> This also applies to other table types, e.g. Iceberg/Kudu/HBase, right?
For unpartitioned tables, Yes.
For partitioned tables in Iceberg, kudu, there is no way of setting 
lastRefreshEventID at partition level.


http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3222
PS4, Line 3222:     createTable(testTblName, true);
> Both this and the Python test used a partitioned table. As this feature is
Ack


http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3231
PS4, Line 3231:     alterTableAddParameter(testTblName, "somekey", "someval");
> Why is calling alterTable from Impala relevant here?
Ideally, not really. Just want to emphasize that lastrefreshEventId would skip 
this event.


http://gerrit.cloudera.org:8080/#/c/20022/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3233
PS4, Line 3233:         "test", null);
              :     eventsProcessor_.processEvents();
              :     assertTrue(testTbl.getLastRefreshEventId()
> Shouldn't we load a table and call getLastRefreshEventId before eventsProce
Ack. we would want to compare currentEventId of event processor to be equal to 
the lastrefreshEventId after the refresh event. Made the necessary change.



--
To view, visit http://gerrit.cloudera.org:8080/20022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0dc5c7396d80616680d8a5805ce80db293b72e1
Gerrit-Change-Number: 20022
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 08 Aug 2023 03:09:10 +0000
Gerrit-HasComments: Yes

Reply via email to