Noemi Pap-Takacs has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/23838 )
Change subject: IMPALA-14564: Remove redundant partition info from Iceberg file descriptors ...................................................................... IMPALA-14564: Remove redundant partition info from Iceberg file descriptors Iceberg file descriptors used to contain information about the partition they belong to: the spec id and the partition values. These fields uniquely identify the partition the file belongs to and are only dependent on the partition not the file itself. It means that it is redundant to store these fields in each file descriptor in the Catalog. This change normalizes partition-related metadata out of the file descriptors in the Catalog and the Frontend. The partition information is stored separately in the IcebergContentFileStore in FlatBuffers binary format. File descriptors only store a unique id of the partition they belong to. Since the scan nodes need the partition information for execution, we denormalize the file and partition metadata: we look up the relevant partitions by their ids in the Planner, put them in the file descriptors that are sent to the executors. Metadata handling during scheduling and execution remains the same. This change introduces a separate FlatBuffer schema to serialize necessary Iceberg file metadata from the Catalog to the Frontend and from the Frontend to the executors (in scan ranges) containing only the relevant fields for maximum memory efficiency. Testing: - ran existing e2e tests - extended FileMetadataLoaderTest to check the partitions Assisted by Copilot (Claude Sonnet 4.6) Change-Id: I57c2fd6f1ebb636aa9e7ca925413ca51858cbc2a --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/fbs/CatalogObjects.fbs M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergFileDescriptor.java M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java 16 files changed, 302 insertions(+), 146 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/23838/16 -- To view, visit http://gerrit.cloudera.org:8080/23838 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I57c2fd6f1ebb636aa9e7ca925413ca51858cbc2a Gerrit-Change-Number: 23838 Gerrit-PatchSet: 16 Gerrit-Owner: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
