Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21175 )

Change subject: IMPALA-12829: Skip processing transaction events if the table 
is HMS sync disabled.
......................................................................


Patch Set 23:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21175/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21175/19//COMMIT_MSG@24
PS19, Line 24: 1) CatalogServiceCatalog#reloadTableIfExists() didn't verify if 
the
             : current eventId is older than the table's lastSyncEventId which 
leads to
             : unecessary reloading of table for commit txns.
             : 2) Insert queries from impala didn't update the validWriteIdList 
for
             : transactional tables in the cache, so CommitTxn events triggered 
by
             : insert events are triggering reload on unpartitioned 
transactional
             : tables again while consuming these CommitTxn events. Fixed it by
             : updating the validWriteIdList in the cache.
             : 3) CommitTxn events generated after AlterTable events are 
leading to
             : incorrect results if file metadata reload is skipped in 
AlterTable
             : events. Reason being AlterTable event will update the writeId 
from
             : metastore but doesn't reload filemetadata which yields incorrect
             : results. This is fixed in HdfsTable class to not skip 
filemetadata
             : reload if writeId is changed.
> I have tried moving out some fixes, but all of them stem from https://gerri
Ack


http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4568
PS23, Line 4568: HMS
nit: "Hms" since we use camel case naming convention.


http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21175/23/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2987
PS23, Line 2987:         prevWriteIdChanged = !writeIdList.toString().equals(
               :             validWriteIds_.writeToString());
This seems too conservative that if there are only some new open write ids, we 
still reload the file metadata. But it seems comparing two ValidWriteIdList in 
details is quite complex. Can we file a follow-up JIRA to improve this? It'd be 
nice to add a log saying ValidWriteIdList changed from what to what.


http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1648
PS23, Line 1648: blacklisted
nit: "blacklisted" is confused with the blacklisted_dbs, blacklisted_tables 
flags. Might be more clear to say something like "skip txn events if db/table 
has disabled applying HMS events".


http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1661
PS23, Line 1661:       self.run_stmt_in_hive(
               :         """ALTER TABLE {} SET TBLPROPERTIES 
('impala.disableHmsSync'='true')"""
               :         .format(full_tbl_name))
Can we add a similar test for setting this in db properties?


http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1696
PS23, Line 1696:     # HMS Event Processor, so that transactional events are 
synced correctly.
This sleep is added in IMPALA-12543 and it's intended to use an interval longer 
than the event polling interval. When a DDL finishes its HMS RPC, it sleeps 2s 
before reloading the table metadata. The event generated by the HMS RPC will 
arrive before this reload and should be correctly treated as an self-event.

Could you explain more how it causes transactional events not being synced 
correctly?


http://gerrit.cloudera.org:8080/#/c/21175/23/tests/custom_cluster/test_events_custom_configs.py@1713
PS23, Line 1713: 1
Why do we change this?



--
To view, visit http://gerrit.cloudera.org:8080/21175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d0ecb3b756755bc04c66a538a9ae6b88011a019
Gerrit-Change-Number: 21175
Gerrit-PatchSet: 23
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Tue, 08 Jul 2025 12:43:15 +0000
Gerrit-HasComments: Yes

Reply via email to