Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id for all DDLS from Impala clients ...................................................................... Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1045 PS4, Line 1045: alterTableOrViewRename(tbl, : TableName.fromThrift(params.getRename_params().getNew_table_name()), : newCatalogVersion, wantMinimalResult, response); : return; I don't see this change applied to renames, while the commit message suggests that it is applied to all DDLs. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1263 PS4, Line 1263: boolean syncToLatestEventId = : BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls(); : if (syncToLatestEventId) { : MetastoreEventsProcessor.syncToLatestEventId(catalog_, tbl, : catalog_.getEventFactoryForSyncToLatestEvent()); refactor idea: this could be moved to a function like CatalogOpExecutor.trySyncToLatestEventId(). If it returns true, then reloading the table / handling in-flight events could be skipped. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1270 PS4, Line 1270: if (tbl.hasInProgressModification()) Preconditions.checkState(reloadMetadata); This seems valid even in the syncToLatestEventId case. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276 PS4, Line 1276: catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion); Are we using in flight event lists for anything when enableSyncToLatestEventOnDdls is true? If not, then checking the lists could be skipped in the event processor. This would solve IMPALA-12461. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1658 PS4, Line 1658: syncToLatestEventId What kind of guarantees do we have for the state of the event processor when syncing the events for a table? I don't see any synchronization - is it possible that it processes events from HMS polling at the same time? My concern is about processing events for the same table - can't this lead to processing the event twice, once in syncToLatestEventId, and once on the the event processor thread? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004 PS4, Line 7004: try { : createInsertEvents((FeFsTable) table, update.getUpdated_partitions(), : addedPartitionNames, update.is_overwrite, tblTxn); : } catch (Exception e) { : LOG.error("Failed to fire insert events for table {}", table.getFullName(), e); : } Allowing createInsertEvents() to fail is more problematic in case of enableSyncToLatestEventOnDdls, as we rely on creating these events to reload the table. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031 PS4, Line 7031: catalog_.addVersionsForInflightEvents(false, table, newCatalogVersion); Can't we skip this too if the events will by synced for the table? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040 PS4, Line 7040: // update metastore table/partitions for insert command : boolean syncToLatestEventId = : BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls(); : if (syncToLatestEventId) { : MetastoreEventsProcessor.syncToLatestEventId(catalog_, table, : catalog_.getEventFactoryForSyncToLatestEvent()); : } else { : loadTableMetadata(table, newCatalogVersion, true, false, partsToLoadMetadata, : partitionToEventId, "INSERT", update.getDebug_action()); : } The commit message and the flag name suggests that we only apply this for DDLs, while here it is done for a DML. I also see that it is applied to TRUNCATE. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Fri, 27 Oct 2023 17:08:10 +0000 Gerrit-HasComments: Yes
