Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id 
for all DDLS from Impala clients
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1045
PS4, Line 1045:           alterTableOrViewRename(tbl,
              :               
TableName.fromThrift(params.getRename_params().getNew_table_name()),
              :               newCatalogVersion, wantMinimalResult, response);
              :           return;
I don't see this change applied to renames, while the commit message suggests 
that it is applied to all DDLs.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1263
PS4, Line 1263:       boolean syncToLatestEventId =
              :           
BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls();
              :       if (syncToLatestEventId) {
              :         MetastoreEventsProcessor.syncToLatestEventId(catalog_, 
tbl,
              :             catalog_.getEventFactoryForSyncToLatestEvent());
refactor idea: this could be moved to a function like 
CatalogOpExecutor.trySyncToLatestEventId(). If it returns true, then reloading 
the table / handling in-flight events could be skipped.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1270
PS4, Line 1270:         if (tbl.hasInProgressModification()) 
Preconditions.checkState(reloadMetadata);
This seems valid even in the syncToLatestEventId case.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276
PS4, Line 1276:           catalog_.addVersionsForInflightEvents(false, tbl, 
newCatalogVersion);
Are we using in flight event lists for anything when 
enableSyncToLatestEventOnDdls is true? If not, then checking the lists could be 
skipped in the event processor. This would solve IMPALA-12461.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1658
PS4, Line 1658: syncToLatestEventId
What kind of guarantees do we have for the state of the event processor when 
syncing the events for a table? I don't see any synchronization - is it 
possible that it processes events from HMS polling at the same time? My concern 
is about processing events for the same table - can't this lead to processing 
the event twice, once in syncToLatestEventId, and once on the the event 
processor thread?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004
PS4, Line 7004:       try {
              :         createInsertEvents((FeFsTable) table, 
update.getUpdated_partitions(),
              :             addedPartitionNames, update.is_overwrite, tblTxn);
              :       } catch (Exception e) {
              :         LOG.error("Failed to fire insert events for table {}", 
table.getFullName(), e);
              :       }
Allowing createInsertEvents() to fail is more problematic in case of 
enableSyncToLatestEventOnDdls, as we rely on creating these events to reload 
the table.


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031
PS4, Line 7031:           catalog_.addVersionsForInflightEvents(false, table, 
newCatalogVersion);
Can't we skip this too if the events will by synced for the table?


http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040
PS4, Line 7040:       // update metastore table/partitions for insert command
              :       boolean syncToLatestEventId =
              :           
BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls();
              :       if (syncToLatestEventId) {
              :         MetastoreEventsProcessor.syncToLatestEventId(catalog_, 
table,
              :             catalog_.getEventFactoryForSyncToLatestEvent());
              :       } else {
              :         loadTableMetadata(table, newCatalogVersion, true, 
false, partsToLoadMetadata,
              :             partitionToEventId, "INSERT", 
update.getDebug_action());
              :       }
The commit message and the flag name suggests that we only apply this for DDLs, 
while here it is done for a DML. I also see that it is applied to TRUNCATE.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Fri, 27 Oct 2023 17:08:10 +0000
Gerrit-HasComments: Yes

Reply via email to