Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24509 )

Change subject: IMPALA-13794: More Accurate Iceberg Metadata Memory Estimates
......................................................................


Patch Set 2:

(3 comments)

I am not sure if the object size data I gathered was on a branch that included 
IMPALA-14564 or not.  I want to re-run the EE/Custom Cluster tests again with 
this Jira for sure included in the branch.  I will wait until after we resolve 
the open questions though.

http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@652
PS2, Line 652: 494
> The numbers look a bit ad hoc to me, for example why are data files without
To determine these numbers, I used org.github.jamm.MemoryMeter#measureDeep to 
determine the size of individual objects stored in the various collections (for 
the exact code, see 
https://github.com/jasonmfehr/impala/commit/163e20fe1d02ac096f82779d26f5637528886938).
  Specifically, the size of the decoded IcebergFileDescriptor object was 
measured (perhaps I should have measured the size of the EncodedFileDescriptor 
object?  I then ran the full EE/Custom Cluster test suite and averaged the 
results of each reported object's size.

The FileDescriptor objects store a FbFileDesc object which is an internal 
representation of a flatbuffer file descriptor which in turn stores data about 
the blocks.  Should this flatbuffer have been excluded from the FileDescriptor 
object size measurements?

I'm not sure why there is a difference between the sizes of data files with and 
without deletes.  The actual delete file metadata is stored in 
position/equality delete files, thus I would not expect much of a difference 
between the data files with and without deletes.


http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@653
PS2, Line 653: fileStore_
> As this mainly uses members from IcebergContentFileStore, that class could
Are you saying I should have run MemoryMeter#measureDeep on the fileStore_ 
object instead?


http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@657
PS2, Line 657: 163
> HdfsTable uses much higher estimate:
HdfsTable stores instances of the HdfsPartition object while 
IcebergContentFileStore stores instances of the FbIcebergPartition flatbuffers 
object.

Possibly the memUsageEstimate should also store PER_PARTITION_MEM_USAGE_BYTES * 
hdfsTable_.getPartitions().size()?  That does not seem quite right since 
hdfsTable_ skips loading the file information for each of its internally stored 
partitions.

I also gathered the size of the icebergApiTable_ object using 
MemoryMeter#measureDeep.  The average size came out to 573,991.  I have not 
added that size in yet either.



--
To view, visit http://gerrit.cloudera.org:8080/24509
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I471e6460ac0f2f924c0a701a077bfc22def8aa7b
Gerrit-Change-Number: 24509
Gerrit-PatchSet: 2
Gerrit-Owner: Jason Fehr <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 25 Jun 2026 19:20:00 +0000
Gerrit-HasComments: Yes

Reply via email to