Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21175 )
Change subject: IMPALA-12829: Skip processing transaction events if the table is HMS sync disabled. ...................................................................... Patch Set 23: (7 comments) http://gerrit.cloudera.org:8080/#/c/21175/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21175/19//COMMIT_MSG@24 PS19, Line 24: 1) CatalogServiceCatalog#reloadTableIfExists() didn't verify if the : current eventId is older than the table's lastSyncEventId which leads to : unecessary reloading of table for commit txns. : 2) Insert queries from impala didn't update the validWriteIdList for : transactional tables in the cache, so CommitTxn events triggered by : insert events are triggering reload on unpartitioned transactional : tables again while consuming these CommitTxn events. Fixed it by : updating the validWriteIdList in the cache. : 3) CommitTxn events generated after AlterTable events are leading to : incorrect results if file metadata reload is skipped in AlterTable : events. Reason being AlterTable event will update the writeId from : metastore but doesn't reload filemetadata which yields incorrect : results. This is fixed in HdfsTable class to not skip filemetadata : reload if writeId is changed. > I have tried moving out some fixes, but all of them stem from https://gerri Ack http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4568 PS23, Line 4568: HMS nit: "Hms" since we use camel case naming convention. http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2987 PS23, Line 2987: prevWriteIdChanged = !writeIdList.toString().equals( : validWriteIds_.writeToString()); This seems too conservative that if there are only some new open write ids, we still reload the file metadata. But it seems comparing two ValidWriteIdList in details is quite complex. Can we file a follow-up JIRA to improve this? It'd be nice to add a log saying ValidWriteIdList changed from what to what. http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1648 PS23, Line 1648: blacklisted nit: "blacklisted" is confused with the blacklisted_dbs, blacklisted_tables flags. Might be more clear to say something like "skip txn events if db/table has disabled applying HMS events". http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1661 PS23, Line 1661: self.run_stmt_in_hive( : """ALTER TABLE {} SET TBLPROPERTIES ('impala.disableHmsSync'='true')""" : .format(full_tbl_name)) Can we add a similar test for setting this in db properties? http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1696 PS23, Line 1696: # HMS Event Processor, so that transactional events are synced correctly. This sleep is added in IMPALA-12543 and it's intended to use an interval longer than the event polling interval. When a DDL finishes its HMS RPC, it sleeps 2s before reloading the table metadata. The event generated by the HMS RPC will arrive before this reload and should be correctly treated as an self-event. Could you explain more how it causes transactional events not being synced correctly? http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1713 PS23, Line 1713: 1 Why do we change this? -- To view, visit http://gerrit.cloudera.org:8080/21175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5d0ecb3b756755bc04c66a538a9ae6b88011a019 Gerrit-Change-Number: 21175 Gerrit-PatchSet: 23 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Tue, 08 Jul 2025 12:43:15 +0000 Gerrit-HasComments: Yes
