Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/24509 )
Change subject: IMPALA-13794: More Accurate Iceberg Metadata Memory Estimates ...................................................................... Patch Set 2: (3 comments) I am not sure if the object size data I gathered was on a branch that included IMPALA-14564 or not. I want to re-run the EE/Custom Cluster tests again with this Jira for sure included in the branch. I will wait until after we resolve the open questions though. http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@652 PS2, Line 652: 494 > The numbers look a bit ad hoc to me, for example why are data files without To determine these numbers, I used org.github.jamm.MemoryMeter#measureDeep to determine the size of individual objects stored in the various collections (for the exact code, see https://github.com/jasonmfehr/impala/commit/163e20fe1d02ac096f82779d26f5637528886938). Specifically, the size of the decoded IcebergFileDescriptor object was measured (perhaps I should have measured the size of the EncodedFileDescriptor object? I then ran the full EE/Custom Cluster test suite and averaged the results of each reported object's size. The FileDescriptor objects store a FbFileDesc object which is an internal representation of a flatbuffer file descriptor which in turn stores data about the blocks. Should this flatbuffer have been excluded from the FileDescriptor object size measurements? I'm not sure why there is a difference between the sizes of data files with and without deletes. The actual delete file metadata is stored in position/equality delete files, thus I would not expect much of a difference between the data files with and without deletes. http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@653 PS2, Line 653: fileStore_ > As this mainly uses members from IcebergContentFileStore, that class could Are you saying I should have run MemoryMeter#measureDeep on the fileStore_ object instead? http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@657 PS2, Line 657: 163 > HdfsTable uses much higher estimate: HdfsTable stores instances of the HdfsPartition object while IcebergContentFileStore stores instances of the FbIcebergPartition flatbuffers object. Possibly the memUsageEstimate should also store PER_PARTITION_MEM_USAGE_BYTES * hdfsTable_.getPartitions().size()? That does not seem quite right since hdfsTable_ skips loading the file information for each of its internally stored partitions. I also gathered the size of the icebergApiTable_ object using MemoryMeter#measureDeep. The average size came out to 573,991. I have not added that size in yet either. -- To view, visit http://gerrit.cloudera.org:8080/24509 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I471e6460ac0f2f924c0a701a077bfc22def8aa7b Gerrit-Change-Number: 24509 Gerrit-PatchSet: 2 Gerrit-Owner: Jason Fehr <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 25 Jun 2026 19:20:00 +0000 Gerrit-HasComments: Yes
