Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20753 )
Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables ...................................................................... IMPALA-12597: Basic Equality delete read support for Iceberg tables In general, applying equality deletes is similar to how position deletes are applied to data files: using a LEFT ANTI JOIN where the SCAN for the data rows is on the left side while the SCAN for the delete rows is on the right side of the JOIN. The difference is the virtual columns and the conjuncts being used. For equality deletes the data sequence number of a delete file has to be greater than the data sequence number of the data file being investigated. This information is added as a virtual column to the scans and a conjunct is created in the JOIN node to check the relation. The equality delete fields from the delete files are checked agains the respective columns of the data SCANS. This patch makes it possible for Impala to read Iceberg tables with basic equality delete files. The Iceberg spec gives great flexibility for engines for writing equality deletes, however in practice Flink, one of the engines that write EQ-deletes supports only a subset of the use cases. This patch focuses on reading the EQ-deletes written by Flink. The restrictions are the following: - All equality delete files in a table should have the same equality field ID list. - For partitioned Iceberg tables it is expected that the partition values are also written into the equality delete files. - Tables with equality deletes shouldn't have partition or schema evolution. - Floating point equality columns aren't supported. - If a malformed equality delete file doesn't have some of the equality field IDs then Parquet reader will fill those missing fields with NULLs. As a side effect this will drop the rows from the result where the corresponding data columns have a null value. See IMPALA-11388 epic Jira for more details. Testing: - Checked if the existing functional_parquet.iceberg_v2_delete_equality table can be read successfully. - Added new test tables so that E2E tests can validate correctness. Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Reviewed-on: http://gerrit.cloudera.org:8080/20753 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.h M common/thrift/CatalogObjects.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/00000-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e00000001_800513971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-8985205515767142888-1-0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-911559291487642581-1-bb4b8c07-84e1-421a-bb6c-594f297d118e.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f00000000_1372039299_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e035009600000001_138281890_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a00000000_909175610_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/data/a94b351bfa56dbd8-ddb31c6400000000_1397530881_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/data/delete-494fbbd4b792bdd2-aabb8e2000000000_1239775374_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/data/e6484a6fb0a2b4e6-d242a84000000000_1980761347_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/25ce5480-23b6-4c70-a724-63931f8d84c6-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/87d3b6df-f00d-40a4-aafa-5d7f20e3299b-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/c8b17188-94bf-4496-9069-3eda900cd71d-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/snap-4301391241829251636-1-25ce5480-23b6-4c70-a724-63931f8d84c6.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/snap-4346796256488077976-1-87d3b6df-f00d-40a4-aafa-5d7f20e3299b.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/snap-9091814429631192676-1-c8b17188-94bf-4496-9069-3eda900cd71d.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_nulls/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-bdb0c103-bfaf-49b4-935a-16a951e55b1c-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-bdb0c103-bfaf-49b4-935a-16a951e55b1c-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-fe90d7bb-47e7-4221-8e06-f63922d3fa23-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/data/d=2023-12-24/00000-0-fe90d7bb-47e7-4221-8e06-f63922d3fa23-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/807d7d2a-0557-4bbe-9f07-9467de72598a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/807d7d2a-0557-4bbe-9f07-9467de72598a-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/c9c459ff-9747-4dab-b65f-cace0f31e669-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/c9c459ff-9747-4dab-b65f-cace0f31e669-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/snap-3409033829588781878-1-c9c459ff-9747-4dab-b65f-cace0f31e669.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/snap-586812101365618837-1-807d7d2a-0557-4bbe-9f07-9467de72598a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partition_evolution/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-0c3800d3-c638-4591-b40c-158dcd5ebe25-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-0c3800d3-c638-4591-b40c-158dcd5ebe25-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-c6e2da66-fe58-44b5-81bd-575da62c7a91-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-24/00000-0-c6e2da66-fe58-44b5-81bd-575da62c7a91-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-759289e0-d713-41a1-bdaf-f9feab643720-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-759289e0-d713-41a1-bdaf-f9feab643720-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-e1567ae8-d9c3-4071-b671-8bbbe79d36d1-00001.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/data/d=2023-12-25/00000-0-e1567ae8-d9c3-4071-b671-8bbbe79d36d1-00002.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/09ca0acc-c19e-4073-80f7-b476a6e568c7-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/09ca0acc-c19e-4073-80f7-b476a6e568c7-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/57f8cd32-619c-46fc-a683-1dee7473c990-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/57f8cd32-619c-46fc-a683-1dee7473c990-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/db5041df-259d-48b9-ade1-1bf382a93d5a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/db5041df-259d-48b9-ade1-1bf382a93d5a-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/e3dac70e-a8aa-4d15-9d35-20c4f25f36d5-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/e3dac70e-a8aa-4d15-9d35-20c4f25f36d5-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-3217166167862484560-1-db5041df-259d-48b9-ade1-1bf382a93d5a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-3979814664665937114-1-09ca0acc-c19e-4073-80f7-b476a6e568c7.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-4821964189199835313-1-e3dac70e-a8aa-4d15-9d35-20c4f25f36d5.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/snap-7564375228633944060-1-57f8cd32-619c-46fc-a683-1dee7473c990.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_partitioned/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test M tests/query_test/test_iceberg.py 107 files changed, 3,229 insertions(+), 251 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
