Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23805 )

Change subject: IMPALA-14637: COMMIT_TXN events should trigger reload for 
truncate ops
......................................................................


Patch Set 10:

(3 comments)

Thanks for the reviews! Update summary:

 - Replaced writeIdToTxnId_ and txnToTruncateOps_ with a single map keyed by 
TableWriteId and mapped to truncate ops. CommitTxnEvent uses txnToWriteIds_ and 
this map to get the truncate ops.
 - Added tests for partition level truncates.
 - Reworded some comments.

http://gerrit.cloudera.org:8080/#/c/23805/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23805/8//COMMIT_MSG@17
PS8, Line 17: ip the reloads.
            :
> Do they always have 0 data files?

I think truncates always have 0 data files (with one hidden metadata file). But 
I have a wrong memory that it's the cause. Instead, it's due to truncate 
doesn't add any TXN_WRITE_NOTIFICATION_LOG entries. Filed HIVE-29677 to track 
this and updated the commit message.

> Can't we catch truncate events by checking that that there is a new base dir, 
> but there are no files written for it?

I'm not sure if I understand this correctly. But if we don't track the tuncate 
operations, we don't know what tables were truncated in CommitTxnEvent. Note 
that COMMIT_TXN event just has the transaction id. It doesn't contain the 
db/table names.

> At the first glance the logic looks slightly broken to me, for example what 
> happens if catalogd starts up after the alter but before the commit?

That's a good point. I think we need HMS side changes (e.g. HIVE-29677 to 
persist the truncates) to better handle this. It's an existing limitation for 
tracking transactional info, e.g. txnToWriteIds_ in Catalog.java also has this 
issue.


http://gerrit.cloudera.org:8080/#/c/23805/9/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/23805/9/fe/src/main/java/org/apache/impala/catalog/Catalog.java@116
PS9, Line 116:   // here and reload the affected tables / partitions when the 
COMMIT_TXN event is
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/23805/8/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/23805/8/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@4470
PS8, Line 4470:     loadTable(tblName);
              :     HdfsTable table = (HdfsTable) catalog_.getTable(TE
> Is it possible to truncate per partition? If yes, it would make sense to ha
Nice catch! I used null here as the partition names so the whole table is 
truncated. Added test for partition level truncate.



--
To view, visit http://gerrit.cloudera.org:8080/23805
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89aac12819f08dd9ed42d5d8b21a96c04b04d75c
Gerrit-Change-Number: 23805
Gerrit-PatchSet: 10
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 22 Jun 2026 07:35:13 +0000
Gerrit-HasComments: Yes

Reply via email to