Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23805 )
Change subject: IMPALA-14637: COMMIT_TXN events should trigger reload for truncate ops ...................................................................... Patch Set 10: (3 comments) Thanks for the reviews! Update summary: - Replaced writeIdToTxnId_ and txnToTruncateOps_ with a single map keyed by TableWriteId and mapped to truncate ops. CommitTxnEvent uses txnToWriteIds_ and this map to get the truncate ops. - Added tests for partition level truncates. - Reworded some comments. http://gerrit.cloudera.org:8080/#/c/23805/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23805/8//COMMIT_MSG@17 PS8, Line 17: ip the reloads. : > Do they always have 0 data files? I think truncates always have 0 data files (with one hidden metadata file). But I have a wrong memory that it's the cause. Instead, it's due to truncate doesn't add any TXN_WRITE_NOTIFICATION_LOG entries. Filed HIVE-29677 to track this and updated the commit message. > Can't we catch truncate events by checking that that there is a new base dir, > but there are no files written for it? I'm not sure if I understand this correctly. But if we don't track the tuncate operations, we don't know what tables were truncated in CommitTxnEvent. Note that COMMIT_TXN event just has the transaction id. It doesn't contain the db/table names. > At the first glance the logic looks slightly broken to me, for example what > happens if catalogd starts up after the alter but before the commit? That's a good point. I think we need HMS side changes (e.g. HIVE-29677 to persist the truncates) to better handle this. It's an existing limitation for tracking transactional info, e.g. txnToWriteIds_ in Catalog.java also has this issue. http://gerrit.cloudera.org:8080/#/c/23805/9/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/23805/9/fe/src/main/java/org/apache/impala/catalog/Catalog.java@116 PS9, Line 116: // here and reload the affected tables / partitions when the COMMIT_TXN event is > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/23805/8/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/23805/8/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@4470 PS8, Line 4470: loadTable(tblName); : HdfsTable table = (HdfsTable) catalog_.getTable(TE > Is it possible to truncate per partition? If yes, it would make sense to ha Nice catch! I used null here as the partition names so the whole table is truncated. Added test for partition level truncate. -- To view, visit http://gerrit.cloudera.org:8080/23805 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I89aac12819f08dd9ed42d5d8b21a96c04b04d75c Gerrit-Change-Number: 23805 Gerrit-PatchSet: 10 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 22 Jun 2026 07:35:13 +0000 Gerrit-HasComments: Yes
