Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/24509 )
Change subject: IMPALA-13794: More Accurate Iceberg Metadata Memory Estimates ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@652 PS2, Line 652: 494 > To determine these numbers, I used org.github.jamm.MemoryMeter#measureDeep >perhaps I should have measured the size of the EncodedFileDescriptor object? yes, that is more relevant for long term mem need, EncodedFileDescriptor was created exactly to spare flatbuffer + ByteBuffer overead in front of the actual arrays Btw where did you measure, in catalogd or in coordinator? >I'm not sure why there is a difference between the sizes of data files with >and without deletes. Those differences look small and can come from tiny differences like path lengths used in different tests. http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@653 PS2, Line 653: fileStore_ > Are you saying I should have run MemoryMeter#measureDeep on the fileStore_ I meant to add a function like IcebergContentFileStore.getMemoryEstimate(). http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@657 PS2, Line 657: 163 > HdfsTable stores instances of the HdfsPartition object while IcebergContent about partitions: fyi there is a bug that currently leads to much higher sizes in coordinator, fix is on review: https://gerrit.cloudera.org/#/c/24515/ about icebergApiTable_: I think that the way we cache those (especially in coordinator) is very wrong. I create a ticket for this: IMPALA-14858 >573,991 I saw similar sizes - regardless of the actual table size. My assumption is that this is mainly the size of the underlying HMS client, which is shared between tables in the same catalog. So I think that if you measure two api tables their total size won't be the double of this, as this class are not "self contained" like IcebergContentFileStore. This is mainly a problem on coordinator side where we use the measured size for eviction decisions. It would be good to investigate this deeper - probably the right decision would be to ignore api tables when determining size. -- To view, visit http://gerrit.cloudera.org:8080/24509 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I471e6460ac0f2f924c0a701a077bfc22def8aa7b Gerrit-Change-Number: 24509 Gerrit-PatchSet: 2 Gerrit-Owner: Jason Fehr <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 26 Jun 2026 12:12:54 +0000 Gerrit-HasComments: Yes
