Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24509 )

Change subject: IMPALA-13794: More Accurate Iceberg Metadata Memory Estimates
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@652
PS2, Line 652: 494
> To determine these numbers, I used org.github.jamm.MemoryMeter#measureDeep
>perhaps I should have measured the size of the EncodedFileDescriptor object?

yes, that is more relevant for long term mem need, EncodedFileDescriptor was 
created exactly to spare flatbuffer + ByteBuffer overead in front of the actual 
arrays

Btw where did you measure, in catalogd or in coordinator?

>I'm not sure why there is a difference between the sizes of data files with 
>and without deletes.

Those differences look small and can come from tiny differences like path 
lengths used in different tests.


http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@653
PS2, Line 653: fileStore_
> Are you saying I should have run MemoryMeter#measureDeep on the fileStore_
I meant to add a function like IcebergContentFileStore.getMemoryEstimate().


http://gerrit.cloudera.org:8080/#/c/24509/2/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@657
PS2, Line 657: 163
> HdfsTable stores instances of the HdfsPartition object while IcebergContent
about partitions:
fyi there is a bug that currently leads to much higher sizes in coordinator, 
fix is on review:
https://gerrit.cloudera.org/#/c/24515/

about icebergApiTable_:
I think that the way we cache those (especially in coordinator) is very wrong. 
I create a ticket for this: IMPALA-14858

>573,991

I saw similar sizes - regardless of the actual table size. My assumption is 
that this is mainly the size of the underlying HMS client, which is shared 
between tables in the same catalog. So I think that if you measure two api 
tables their total size won't be the double of this, as this class are not 
"self contained" like IcebergContentFileStore. This is mainly a problem on 
coordinator side where we use the measured size for eviction decisions. It 
would be good to investigate this deeper - probably the right decision would be 
to ignore api tables when determining size.



--
To view, visit http://gerrit.cloudera.org:8080/24509
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I471e6460ac0f2f924c0a701a077bfc22def8aa7b
Gerrit-Change-Number: 24509
Gerrit-PatchSet: 2
Gerrit-Owner: Jason Fehr <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 26 Jun 2026 12:12:54 +0000
Gerrit-HasComments: Yes

Reply via email to