Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 )
Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id after the DDL operation for all DDLS from Impala clients ...................................................................... Patch Set 5: (14 comments) http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@8 PS4, Line 8: after the DD > Can you add to the title whether Impala syncs the events before or after th Ack http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@19 PS4, Line 19: n: when the enable_sync_to_latest_event_on_ddls flag is : set to true, we do t > This is stale now. Ack http://gerrit.cloudera.org:8080/#/c/20367/4//COMMIT_MSG@20 PS4, Line 20: DDL operation first, i.e., perform HMS operation : and then sync the db/table in the catalogD's cache to the latest event : in HMS for the corresponding db/table. Currently we fetch all events > Won't there be always a pending event, as the DDL from Impala creates a new Agreed. But we may also fetch events that are from other databases/tables. Ideally, we want to fetch only events that only required for the current metadata object. http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20367/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1396 PS3, Line 1396: Preconditions.checkNotNull(topicUpdateEntry); > I think that we also need to hold versionLock_ before calling this, as the I have to remove this method. This is not required anymore. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1045 PS4, Line 1045: alterTableOrViewRename(tbl, : TableName.fromThrift(params.getRename_params().getNew_table_name()), : newCatalogVersion, wantMinimalResult, response); : return; > I don't see this change applied to renames, while the commit message sugges Rename is a complex operation, it has to change the catalogD's cache immediately after the HMS operation without waiting for MetastoreEventsProcessor to sync the metadata object's events. I'll update the commit message accordingly, regarding operations that don't depend upon the event processor to update the cache. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1113 PS4, Line 1113: reloadMetadata = false; > All the code paths that set 'reloadMetadata' to false have updated the cata reloadMetadata is set to false for addPartitions() -- https://gerrit.cloudera.org/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#4393 and dropPartitions() -- https://gerrit.cloudera.org/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#5265 I have also added tests CatalogHmsSyncToLatestEventIdTest to verify the same. Do you see any issues with not modifying the cache in place? Should I change this back to the original state for addPartitions() and dropPartitions()? Please let me know. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1263 PS4, Line 1263: // Make sure we won't forget finalizing the modification. : if (tbl.hasInProgressModification()) Preconditions.checkState(reloadMetadata); : if (!trySyncToLatestEventId(tbl) && reloadMetadata) { : loadTableMetadata(tbl, newCatalogVersion, reloadFileMetadata, : reloadTableSchema, null, "ALTER TABLE " + params > refactor idea: this could be moved to a function like CatalogOpExecutor.try Ack http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1270 PS4, Line 1270: catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion); > This seems valid even in the syncToLatestEventId case. Ack http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1276 PS4, Line 1276: } > Are we using in flight event lists for anything when enableSyncToLatestEven This is used to identify if the events are from other Impala services. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1658 PS4, Line 1658: Id(eventId); > What kind of guarantees do we have for the state of the event processor whe I understand your concern. By setting the syncToLatestEventId on the metadata object (tbl/database) during to syncToLatestEventId() operation, these events are skipped by the event processor by comparing the current event id to the lastSyncEventId on the metadata object. E.g.: See CatalogHmsSyncToLatestEventIdTestL#246 where alter database Event is skipped on the event processor once the eventId is synced. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7004 PS4, Line 7004: // Add catalog service id and the 'newCatalogVersion' to the table parameters. : // This way we can avoid reloading the table on self-events (Iceberg generates : // an ALTER TABLE statement to set the new metadata_location). : IcebergCatalogOpExecutor.addCatalogVersionToTxn(iceTxn, : catalog_.getCatalogServiceId(), newCatalogVersion); : > Allowing createInsertEvents() to fail is more problematic in case of enable Ack. Reverted the change for insert operations. http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7031 PS4, Line 7031: } > Can't we skip this too if the events will by synced for the table? I wasn't too sure about Iceberg tables so I didn't touch this part. Also, lastSyncEventId doesn't exist for the Iceberg table definition, right? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7040 PS4, Line 7040: /** : * Populates insert event data and calls fireInsertEvents() if external event : * processing is enabled. : * This method is replicating what Hive does in case a table or partition is inserts : * into. There are 2 cases: : * 1. If the table is transactional, we should first generate ADD_PARTITION events : * for new partitions which are generated. This is taken care of in the updateCatalog : * method. Additionally, for each partition including existing partitions which were : * inserted into, this method creates an ACID_WRITE event using the HMS API : * ad > The commit message and the flag name suggests that we only apply this for D IMPALA-10926, https://gerrit.cloudera.org/c/17859/42/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java#1139 Addressed this for truncate operations also. So IMO, it makes sense to revert the change for the insert operation and only do this for the truncate operation. I'll update the commit message to notify which DML statements (truncate) are updating the caching. Do you agree with this? http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/20367/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1546 PS4, Line 1546: alterTableRename(testTblName, newTblName, TEST_DB_NAME); : eventsProcessor_.processEvents(); > What will happen if a rename is detected during a DDL and not in eventsProc Added a test for the above-suggested scenario and confirmed that the event processor is active. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Mon, 30 Oct 2023 23:56:17 +0000 Gerrit-HasComments: Yes
