Peter Rozsa has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/24143 )
Change subject: IMPALA-14755: (part 2) Impement Iceberg deletion vector reading/writing ...................................................................... IMPALA-14755: (part 2) Impement Iceberg deletion vector reading/writing This is the second part of a multi-part implementation adding support for Iceberg deletion vectors stored in Puffin files. This commit wires the Puffin reader/writer infrastructure from part 1 into the query execution pipeline and catalog layer, enabling DELETE on Iceberg V3 tables using deletion vectors. Puffin file writing is partition-scoped: the delete sink creates one Puffin file per output partition, and each blob inside that file is a serialised RoaringBitmap64 covering exactly the deleted row positions of one data file in that partition. When a data file already has a deletion vector from a previous DELETE, the existing bitmap is and OR-ed with the new one before the merged blob is written, so each Puffin file always holds the complete, up-to-date set of deleted positions for its partition. DELETE on a V3 table is blocked at analysis time if the table has any existing V2 position- or equality-delete files. The table must first be compacted with OPTIMIZE TABLE to remove those files before DELETE can be used on them. Testing: - iceberg-v3-delete.test and iceberg-v3-delete-partition-sort.test added. - Manually validated that deletion vectors written by Spark can be read from Impala and deletion vectors written by Impala can be read from Spark. Change-Id: I5613c31a7aa46b94b7c70386c939c08cc68632cd --- M be/src/exec/blob-reader.h M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/iceberg-delete-sink-base.cc M be/src/exec/iceberg-delete-sink-base.h M be/src/exec/iceberg-delete-sink-config.cc M be/src/exec/iceberg-delete-sink-config.h M be/src/exec/puffin/puffin-writer.cc M be/src/runtime/dml-exec-state.cc M be/src/runtime/dml-exec-state.h M be/src/scheduling/scheduler.cc M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M common/fbs/IcebergObjects.fbs M fe/src/main/java/org/apache/impala/analysis/IcebergDeleteImpl.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java R fe/src/main/java/org/apache/impala/planner/IcebergDeleteJoinNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-delete-partition-sort.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-delete.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-negative.test M tests/query_test/test_iceberg.py 32 files changed, 1,957 insertions(+), 126 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/24143/3 -- To view, visit http://gerrit.cloudera.org:8080/24143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5613c31a7aa46b94b7c70386c939c08cc68632cd Gerrit-Change-Number: 24143 Gerrit-PatchSet: 3 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
