[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21348 ) Change subject: IMPALA-13029: Tests for multi format equality deletes .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README@1193 PS1, Line 1193:set tblproperties ('write.format.default'='avro'); > Would it be possible to do schema evolution + Avro delete files? I.e. using not with schema evolution but with some hacking I created another test table to have avro eq-deletes with different eq-IDs. -- To view, visit http://gerrit.cloudera.org:8080/21348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba Gerrit-Change-Number: 21348 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 24 Apr 2024 11:53:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21348 to look at the new patch set (#2). Change subject: IMPALA-13029: Tests for multi format equality deletes .. IMPALA-13029: Tests for multi format equality deletes So far we only had test coverage for Parquet equality deletes. This patch adds new tests where we have equality deletes also in ORC and AVRO. Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba --- M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-10-937fe984-34d8-4351-a419-842ad9d30758-2.orc A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-11-1bd90c11-bab5-4f53-9cd9-7bf85adaa97a-2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-9-77e491ad-7b25-4b81-b10a-53b383ae0355-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/61480fd29dfdfefb-48a79ca5_1414128186_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/1dd8f48c-c2ef-4239-8fb2-25bcbef7026c-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/454a3281-55e9-4e00-a8a2-fac5c23ff043-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/61ebd3da-da91-443e-9413-2a010f77443b-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/dab4e396-8cc2-4f57-b856-3864822ab5d3-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1904885991593677469-1-1dd8f48c-c2ef-4239-8fb2-25bcbef7026c.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1935861967137943703-1-454a3281-55e9-4e00-a8a2-fac5c23ff043.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-4400093814370842303-1-61ebd3da-da91-443e-9413-2a010f77443b.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-8244791200683984727-1-dab4e396-8cc2-4f57-b856-3864822ab5d3.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v7.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/data/0-8-2283e34a-7899-47f8-87c5-9cf060a1736c-2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/data/0-9-a93dc3c5-a017-41f9-adea-44b59e71dcd7-2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/data/0a47c8ccf4f25b0c-5ab55354_1092617260_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/224885a4-3e53-430d-88a2-3a91a7705c89-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/2ab761d1-2a51-4432-aac4-d48af6a40b0c-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/53cb7c86-e535-4516-8d16-5b2a228ac8a0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/snap-2698672561374065751-1-2ab761d1-2a51-4432-aac4-d48af6a40b0c.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/snap-792611178233942655-1-224885a4-3e53-430d-88a2-3a91a7705c89.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/snap-8134794228158837818-1-53cb7c86-e535-4516-8d16-5b2a228ac8a0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/v2.metadata.json A
[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21348 Change subject: IMPALA-13029: Tests for multi format equality deletes .. IMPALA-13029: Tests for multi format equality deletes So far we only had test coverage for Parquet equality deletes. This patch adds new tests where we have equality deletes also in ORC and AVRO. Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba --- M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-10-937fe984-34d8-4351-a419-842ad9d30758-2.orc A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-11-1bd90c11-bab5-4f53-9cd9-7bf85adaa97a-2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-9-77e491ad-7b25-4b81-b10a-53b383ae0355-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/61480fd29dfdfefb-48a79ca5_1414128186_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/1dd8f48c-c2ef-4239-8fb2-25bcbef7026c-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/454a3281-55e9-4e00-a8a2-fac5c23ff043-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/61ebd3da-da91-443e-9413-2a010f77443b-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/dab4e396-8cc2-4f57-b856-3864822ab5d3-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1904885991593677469-1-1dd8f48c-c2ef-4239-8fb2-25bcbef7026c.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1935861967137943703-1-454a3281-55e9-4e00-a8a2-fac5c23ff043.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-4400093814370842303-1-61ebd3da-da91-443e-9413-2a010f77443b.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-8244791200683984727-1-dab4e396-8cc2-4f57-b856-3864822ab5d3.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v7.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-equality-deletes.test M tests/query_test/test_iceberg.py 25 files changed, 892 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21348/1 -- To view, visit http://gerrit.cloudera.org:8080/21348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba Gerrit-Change-Number: 21348 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Tue, 23 Apr 2024 09:06:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2024 08:54:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java: http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87 PS1, Line 87: if (desc.hdfsTable.isSetAvroSchema()) { I guess the issue is also true for AVRO equality delete files. Should we also add test coverage for that? (could be separate patch) http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test: http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92 PS1, Line 92: row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*' there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and one for (3,3). You only check for 1 of such line. -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 22 Apr 2024 15:20:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21269 ) Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@227 PS7, Line 227: RETURN_IF_ERROR(GuardType::create(env, jbuffer, _guard)); if create() returned an error here we'd leak memory because of 'jbuffer', right? http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@804 PS7, Line 804: select data_file from functional_parquet.iceberg_query_metadata.entries; The query should go into the QUERY section -- To view, visit http://gerrit.cloudera.org:8080/21269 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2 Gerrit-Change-Number: 21269 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Mon, 22 Apr 2024 10:46:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 16:10:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455 PS7, Line 1455: if vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0: > We only test DIRECTED mode + V2 operator (which go in hand with each other) Here the code says that "if we don't disable the V2 read optimisations then skip the test" or in other words "if we have V2 read opts then skip". I might misread this code or we don't test the DIRECTED mode here only the old ANTI JOIN case. -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Apr 2024 09:02:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21285 ) Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs .. Patch Set 1: Code-Review+2 (1 comment) Thanks! +2 with a nit http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG@9 PS1, Line 9: useing nit: using -- To view, visit http://gerrit.cloudera.org:8080/21285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581 Gerrit-Change-Number: 21285 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Apr 2024 07:19:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21267 ) Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107 PS1, Line 107: fileDescs_ = new ArrayList<>(fileDescs_); : Collections.sort(fileDescs_); > Alternatively we could do the sorting during file metadata loading, so we w I'm experimenting with your suggestion and see that it would bring too much complexity into the code. We could maintain an invariant in the ContentFileStore that all the file descriptor lists are sorted, however in the case when we can't get the FDs directly from ContentFileStore (timetravel or predicate on partition cols) we get the FDs one by one and we'd need to sort the resulting lists anyway in planning time. So this idea would help with very basic queries onyl, but would bring too much complexity and an extra invariant we have to keep maintaining. So I propose to go with the original approach and let IcebergScanNode create a copy and sort the FDs itself. -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 11:54:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@ PS7, Line : (filename_value_ss.len == 0 && prev_channels.empty())); This is triggered when there is 2 consecutive rows in the delete file where the file_path is null, right? I wonder if the new test table also covers this. http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql@3957 PS7, Line 3957: iceberg_v2_null_delete_record Could you add some details about this table to the README? Would be nice to understand the content. http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455 PS7, Line 1455: if vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0: Would it make sense to test where we have DIRECTED mode to see that the KrpcDataStreamSender is capable of filtering the null file_paths? -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Apr 2024 07:47:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21267 ) Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21267/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21267/1//COMMIT_MSG@15 PS1, Line 15: it > nit: its Done -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 09 Apr 2024 08:43:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Hello Peter Rozsa, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21267 to look at the new patch set (#2). Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans When a table is partitioned IcebergScanNode sorts the file descriptors for better scheduling. However, the list of file descriptors comes from IcebergContentFileStore and is shared between different select queries on the table. When another query tries to iterate the list of file descriptors and at the same time the IcebergScanNode sorts them we get a ConcurrentModificationException. To solve this IceberScanNode now creates its own copy of the file descriptor list not to interfere with other queries. Manual testing: 300-400 SELECT * Iceberg queries were sent into Impala in a loop that confidently reproduced the original issue. With the fix the issue is gone. The queries used for the repro: 1: select * from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a, functional_parquet.iceberg_partitioned_orc_external b where a.action = b.action and b.id=3; 2: select * from functional_parquet.iceberg_v2_equality_delete_schema_evolution; Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/21267/2 -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21258 ) Change subject: IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@9 PS2, Line 9: BROADCAST DIRECTED http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@13 PS2, Line 13: IcebergDeleteBuilder now also tolerates NULL file paths which are I'm wondering if there is a valid use case when a file path is null. Is it Impala or some other engine that writes such rows? -- To view, visit http://gerrit.cloudera.org:8080/21258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d Gerrit-Change-Number: 21258 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 09 Apr 2024 08:40:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21267 Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans .. IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans When a table is partitioned IcebergScanNode sorts the file descriptors for better scheduling. However, the list of file descriptors comes from IcebergContentFileStore and is shared between different select queries on the table. When another query tries to iterate the list of file descriptors and at the same time the IcebergScanNode sorts them we get a ConcurrentModificationException. To solve this IceberScanNode now creates an own copy of the file descriptor list not to interfere with other queries. Manual testing: 300-400 SELECT * Iceberg queries were sent into Impala in a loop that confidently reproduced the original issue. With the fix the issue is gone. The queries used for the repro: 1: select * from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a, functional_parquet.iceberg_partitioned_orc_external b where a.action = b.action and b.id=3; 2: select * from functional_parquet.iceberg_v2_equality_delete_schema_evolution; Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/21267/1 -- To view, visit http://gerrit.cloudera.org:8080/21267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3 Gerrit-Change-Number: 21267 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21249 ) Change subject: IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba Gerrit-Change-Number: 21249 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Apr 2024 08:58:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21236 to look at the new patch set (#4). Change subject: IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables .. IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables Similarly to how regular tables behave, the nested columns are omitted when we do a SELECT * on Iceberg metadata tables and the user needs to turn EXPAND_COMPLEX_TYPES on to include the nested columns into the result. This patch changes this behaviour to unconditionally include the nested columns from Iceberg metadata tables. Note, the behavior of handling nested columns from regular tables doesn't change with this patch. Testing: - Adjusted the SELECT * metadata table queries to add the nested columns into the results. - Added some new tests where both metadata tables and regular tables were queried in the same query. Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 2 files changed, 103 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/21236/4 -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21236 to look at the new patch set (#2). Change subject: IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables .. IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables Similarly to how regular tables behave, the nested columns are omitted when we do a SELECT * on Iceberg metadata tables and the user needs to turn EXPAND_COMPLEX_TYPES on to include the nested columns into the result. This patch changes this behaviour to unconditionally include the nested columns from Iceberg metadata tables. Note, the behavior of handling nested columns from regular tables doesn't change with this patch. Testing: - Adjusted the SELECT * metadata table queries to add the nested columns into the results. - Added some new tests where both metadata tables and regular tables were queried in the same query. Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 2 files changed, 103 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/21236/2 -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21236 ) Change subject: IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables > The title could include "select *", otherwise it's not clear what this refe Done http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@9 PS1, Line 9: > Nit: should come after "behave". Done http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@14 PS1, Line 14: Note, the behavior of handling nested columns from regular tables > We could mention that although this is technically a breaking change, metad I don't think this is a breaking change because we don't have a release ATM that contains metadata querying. I checked that the parser and planner parts went into 4.3 but the executor part is waiting for 4.4 so we are safe with this change. http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@25 PS1, Line 25: $NAMENODE/test-warehou > Is it intentional that these are changed from "$NAMENODE" to "hdfs://localh it isn't, thanks for noticing. dockerised GVO also broke because of this http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1110 PS1, Line 1110: select readable_metrics.i.* from functional_parquet.iceberg_query_metadata.`files`; > This example can be confusing at first because the 'readable_metrics' struc I changed this to expand readable_metrics.i -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 04 Apr 2024 05:03:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12612: Expand complex type columns from Iceberg metadata tables
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21236 Change subject: IMPALA-12612: Expand complex type columns from Iceberg metadata tables .. IMPALA-12612: Expand complex type columns from Iceberg metadata tables Similarly to how regular tables, behave the nested columns are omitted when we do a SELECT * on Iceberg metadata tables and the user needs to turn EXPAND_COMPLEX_TYPES on to include the nested columns into the result. This patch changes this behaviour to unconditionally include the nested columns from Iceberg metadata tables. Note, the behavior of handling nested columns from regular tables doesn't change with this patch. Testing: - Adjusted the SELECT * metadata table queries to add the nested columns into the results. - Added some new tests where both metadata tables and regular tables were queried in the same query. Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 2 files changed, 103 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/21236/1 -- To view, visit http://gerrit.cloudera.org:8080/21236 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02 Gerrit-Change-Number: 21236 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 03 Apr 2024 10:54:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 15:29:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@822 PS6, Line 822: 1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq','PARQUET',0,1,1606,'{2147483546:215,2147483545:51}','{2147483546:1,2147483545:1}','{2147483546:0,2147483545:0}','NULL','{2147483546:null,2147483545:null}','{2147483546:null,2147483545:null}','NULL','NULL','NULL',NULL,'{"d":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"i":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"s":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null}}' shouldn't these also be regexes similarly as above because of the paths? -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 14:34:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 10 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 02 Apr 2024 13:22:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc: http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@156 PS8, Line 156: jobject map_entry; Shouldn't we release 'map_entry' at the end of this function? http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270 PS8, Line 270: env->DeleteLocalRef(collection_scanner); I think we can leak memory if any of the RETURN_IF_ERROR or RETURN_IF_CANCELLED returns from the function. Would it be possible to wrap these globalrefs into some custom object that we write and then we can release the memory in the desctructor? http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270 PS8, Line 270: DeleteLocalRef Isn't 'collection_scanner' a GlobalRef? We call DeleteLocalRef here so I'm a bit confused :) http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@288 PS8, Line 288: env->DeleteLocalRef(item); Same comment about leaking memory -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 02 Apr 2024 11:35:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@92 PS4, Line 92: const ColumnType& type = slot_desc->type(); > no need to extract this into a variable My bad, it in fact is used. -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 09:20:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@92 PS4, Line 92: const ColumnType& type = slot_desc->type(); no need to extract this into a variable -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 09:19:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21219 ) Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types .. Patch Set 4: (3 comments) Thanks for the patch! I general this looks good. http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@328 PS4, Line 328: return heap_byte_buffer_cl_; I understand this will be needed for the permanent solution and not the NULLing, but I feel that a patch should contain only what it is required for that patch. Would it be possible to remove what is not needed now? http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@813 PS4, Line 813: file_path":"hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/ic not sure about this but for dockerised builds don't the file paths start with '/test-warehouse/' without 'hdfs://localhost:20500'? I think it's safer to have a regexp here http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@819 PS4, Line 819: One possible extra test case is to set the EXPAND_COMPLEX_TYPES true and do a select * on a metadata table. -- To view, visit http://gerrit.cloudera.org:8080/21219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203 Gerrit-Change-Number: 21219 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 02 Apr 2024 08:56:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 5: Code-Review+2 Did a rebase to resolve git conflicts. carry +2 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 02 Apr 2024 07:59:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21210 to look at the new patch set (#5). Change subject: IMPALA-12600: Schema evolution with equality delete files .. IMPALA-12600: Schema evolution with equality delete files This patch adds test coverage for a table that has equality delete files and also schema evolution, where the schema changes didn't affect the primary key columns. Note, partition evolution on tables with equality deletes is still not supported. Testing: - Added a new test table for this use-case and some E2E tests on that table. Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 --- M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test 27 files changed, 1,034 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/5 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h: http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@63 PS4, Line 63: /// Note that it returns a GlobalRef, that has to be released explicitly. Probably I get this comment wrong, but I was searching for some code that releases whatever this function returns but didn't think anything relevant. http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@123 PS3, Line 123: VLOG(3) << "Skipping unsupported column type: " << slot_desc->type().type; Does this set Binary cols NULL now? http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125 PS4, Line 125: env->DeleteLocalRef(accessed_value); I think this 'accessed_value' is allocated one level above in the call chain, so I'd make that level responsible for the deallocation. http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@212 PS4, Line 212: if constexpr (IS_ARRAY) { Let's consider moving these DCHECKs into the same IF-ELSE at L228-234. http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@220 PS4, Line 220: const TupleDescriptor* item_tuple_desc = slot_desc->children_tuple_descriptor(); nit: DCHECK item_tuple_desc is not null? http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@236 PS4, Line 236: remaining_array_size name is misleading now that this is not just for arrays. remaining_items? http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@297 PS4, Line 297: const jobject* key_struct_like_row = key_slot_desc->type().IsStructType() : ? : nullptr; Can the key of a map be a struct? http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@761 PS3, Line 761: > Added this query (except for "['spark.app.id']"). Can't we do a map_col.KEY and map_col.VALUE for maps? Would be nice to have some test coverage for these if possible. -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 5 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Mar 2024 15:01:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#10). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 18 files changed, 388 insertions(+), 119 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/10 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21149 ) Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. Patch Set 9: After a rebase with master some Iceberg metadata table scanning tests broke. Apparently, this patch collides with the Array col querying from metadata tables resulting a Precondition check fail in TupleDescriptor.getNumNullBits(). The reason apparently, is that a struct is expected to be nullable. However, now with this change when in SlotDescriptor.setPath() I update the nullability flag, for metadata tables all the IcebergColumns will be not nullable. The IcebergColumns nullability comes from the Iceberg Schema for metadata tables from the Iceberg lib, and apparently e.g. partitions.partition is a struct but also not nullable and this gets caught byt the mentioned precondition. As a workaround I set all the metadata table cols nullable. Since we don't write these this causes no harm. -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 28 Mar 2024 08:34:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#9). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 18 files changed, 386 insertions(+), 119 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/9 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21125 ) Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table columns .. Patch Set 3: (4 comments) Thanks for the change, Dani! Frankly, reviewing these JNI changes isn't the easiest so I mostly focused on the tests so far. I'll keep reading the c++ implementation meanwhile. http://gerrit.cloudera.org:8080/#/c/21125/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java: http://gerrit.cloudera.org:8080/#/c/21125/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@163 PS3, Line 163: return iterator_.next(); nit: merge with line above http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@761 PS3, Line 761: Checked the Iceberg-Spark page for metadata tables: https://iceberg.apache.org/docs/latest/spark-queries/#snapshots This query seems interesting: select h.made_current_at, s.operation, h.snapshot_id, h.is_current_ancestor, s.summary['spark.app.id'] from prod.db.table.history h join prod.db.table.snapshots s on h.snapshot_id = s.snapshot_id order by made_current_at; Do you know if such a query is feasible now with our map implementation? http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@764 PS3, Line 764: # Describe all the metadata tables once entries table seems to have embedded nested type cols. Could you try to add test coverage for them? http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@765 PS3, Line 765: Can you add a test that queries different collections, arrays, maps? -- To view, visit http://gerrit.cloudera.org:8080/21125 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a Gerrit-Change-Number: 21125 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 16:57:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 16:09:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 3: Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 16:07:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 3: PS3 is rebase with master + conflict resolution -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 16:07:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21210 to look at the new patch set (#3). Change subject: IMPALA-12600: Schema evolution with equality delete files .. IMPALA-12600: Schema evolution with equality delete files This patch adds test coverage for a table that has equality delete files and also schema evolution, where the schema changes didn't affect the primary key columns. Note, partition evolution on tables with equality deletes is still not supported. Testing: - Added a new test table for this use-case and some E2E tests on that table. Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 --- M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test 27 files changed, 1,034 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/3 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#8). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 15 files changed, 315 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/8 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21210 ) Change subject: IMPALA-12600: Schema evolution with equality delete files .. Patch Set 2: Thanks for taking a look, Zoli! Added a PlannerTest too. -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 27 Mar 2024 15:39:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21210 to look at the new patch set (#2). Change subject: IMPALA-12600: Schema evolution with equality delete files .. IMPALA-12600: Schema evolution with equality delete files This patch adds test coverage for a table that has equality delete files and also schema evolution, where the schema changes didn't affect the primary key columns. Note, partition evolution on tables with equality deletes is still not supported. Testing: - Added a new test table for this use-case and some E2E tests on that table. Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 --- M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test 27 files changed, 1,034 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/2 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21149 ) Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492 PS3, Line 492: if (isIcebergTable()) { > Thanks. Maybe in the future Iceberg options will not be a subset of Kudu op Done http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@915 PS3, Line 915: > Ok, it it's difficult we can leave it as it is. Well, Iceberg by design only throws RuntimeExceptions, and at the call sites, it's not that easy to identify what the issue was. I mean the error messages are good, but I don't want to make decisions based on the error message http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@890 PS5, Line 890: doubl > "double" would be better in the name. Done -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 27 Mar 2024 15:10:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#6). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 15 files changed, 315 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/6 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21210 Change subject: IMPALA-12600: Schema evolution with equality delete files .. IMPALA-12600: Schema evolution with equality delete files This patch adds test coverage for a table that has equality delete files and also schema evolution, where the schema changes didn't affect the primary key columns. Note, partition evolution on tables with equality deletes is still not supported. Testing: - Added a new test table for this use-case and some E2E tests on that table. Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 --- M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test 26 files changed, 975 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/1 -- To view, visit http://gerrit.cloudera.org:8080/21210 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395 Gerrit-Change-Number: 21210 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21179 ) Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 26 Mar 2024 10:49:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21149 ) Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java@181 PS3, Line 181: if the column has options th > What are these incompatible with? Could you clarify the name or add a comme Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180 PS3, Line 180: > Could this be an ELSE IF? Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180 PS3, Line 180: isNullable_ = icebergColumn.isNullable(); > We could add a comment like on L174, or extend that one. Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@482 PS3, Line 482: > Nit: 'tables have' would be better, now that we modify this comment. Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@490 PS3, Line 490: if ( > Nit: ELSE IF should go on the previous line. Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492 PS3, Line 492: if (isIcebergTable()) { > Why don't we need a similar clause for Iceberg? A comment could clarify. added a comment http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492 PS3, Line 492: if (isIcebergTable()) { > Shouldn't we include this condition and return false in the "isIcebergTable no. This is the case when the table is not Iceberg, not Kudu but still some Kudu column options are provided. The reason we don't have to check for Iceberg options here is that Iceberg options are subset of Kudu options. http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@572 PS3, Line 572: } else if (!primaryKeyColNames_.isEmpty() && !isPrimaryKeyUnique() > If non-unique primary keys are now supported in Iceberg, shouldn't we modif L507 is a bit different. There are 2 ways to provide primary keys: 1: CREATE TABLE (i int not null primary key, j int); 2: CREATE TABLE (i int not null, j int, primary key(i)); With this patch we only support 2). That is more flexible btw, because you can add multiple cols into the PK, while with 1) you can't. Changed the comment at L89. http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@577 PS3, Line 577: > Are not enforced and non unique PKs the same? isPrimaryKeyUnique_ stores the info if NOT UNIQUE was provided or not. Added a comment about this at L89. http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java File fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java: http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@174 PS3, Line 174: given columns > Could mention 'primaryKeyColumnNames'. Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@192 PS3, Line 192: > Nit: Field. Done http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2446 PS3, Line 2446: not enforced > Is this a breaking change if this test used to work without "not enforced"? well, previously Iceberg tables accepted primary key but it was simply ignored. I don't think this qualifies for a breaking change but let me know if you feel otherwise. Anyway this mimics the SQL syntax in Flink. http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test: http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@619 PS3, Line 619: QUERY > Could you add comments for the cases which don't have them? It would be inf Done
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#5). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 15 files changed, 313 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/5 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#4). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 15 files changed, 311 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/4 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21138 ) Change subject: IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e Gerrit-Change-Number: 21138 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Mar 2024 15:21:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21148 ) Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Mar 2024 09:43:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21179 ) Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 22 Mar 2024 09:41:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21148 ) Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zihao Ye Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Mar 2024 15:29:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21179 ) Change subject: IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1192 PS1, Line 1192: select * from iceberg_avro_format where rand() < 0.5; Isn't this flaky because of the rand()? Or is it not that random? :) http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1271 PS1, Line 1271: Would it make sense to involve some time travel too? -- To view, visit http://gerrit.cloudera.org:8080/21179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342 Gerrit-Change-Number: 21179 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 21 Mar 2024 15:13:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21138 ) Change subject: IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/21138/2/be/src/scheduling/schedule-state.h File be/src/scheduling/schedule-state.h: http://gerrit.cloudera.org:8080/#/c/21138/2/be/src/scheduling/schedule-state.h@107 PS2, Line 107: bool is_root_coord_fragment; now that we have 2 similar things 'is_root_coord_fragment' and 'fragment.must_run_on_coordinator' could you add a comment here to make more clear what this is for. http://gerrit.cloudera.org:8080/#/c/21138/2/common/thrift/Planner.thrift File common/thrift/Planner.thrift: http://gerrit.cloudera.org:8080/#/c/21138/2/common/thrift/Planner.thrift@51 PS2, Line 51: 15: required bool must_run_on_coordinator I know the order of the IDs is confusing now, but I feel this would be better at the end of this struct. http://gerrit.cloudera.org:8080/#/c/21138/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21138/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@93 PS2, Line 93: mustRunOnCoord_ nit: 'coordinatorOnly_' ? http://gerrit.cloudera.org:8080/#/c/21138/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@193 PS2, Line 193: outputPartition_.equals(DataPartition.UNPARTITIONED)); 'outputPartition_' is unconditionally set to UNPARTITIONED in this function so this check doesn't make much sense. Also, I;m not sure we want to make this a constructor param. Can we default it to false and set it true in a setMustRunOnCoord() or such? We can then assert on the 'outputPartition_' there. Or did you mean 'dataPartition_'? -- To view, visit http://gerrit.cloudera.org:8080/21138 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e Gerrit-Change-Number: 21138 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Mar 2024 15:04:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#3). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M tests/custom_cluster/test_lineage.py 15 files changed, 280 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/3 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21149 to look at the new patch set (#2). Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test 14 files changed, 272 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/2 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21148 ) Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test: http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test@159 PS1, Line 159: select file__position, year, month, id, date_string_col nit: could you add a comment that in this test we prune partitions that doesn't support virtual columns so the query can run successfully? http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test: http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test@40 PS1, Line 40: Virtual column FILE__POSITION is not supported Could you replace some of the FILE__POSITIONS to some other virtual columns like INPUT__FILE__NAME? -- To view, visit http://gerrit.cloudera.org:8080/21148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0 Gerrit-Change-Number: 21148 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 14 Mar 2024 16:58:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21149 Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables .. IMPALA-12729: Allow creating primary keys for Iceberg tables There are writer engines that use Iceberg's identifier-field-ids from the Iceberg schema to identify the columns to be written into the equality delete files (Flink, NiFi). So far Impala wasn't able to populate this identifier-field-ids. This patch introduces the support for not enforced primary keys for Iceberg tables, where the primary key is going to be used for setting identifier-field-ids during Iceberg schema creation. Example syntax: CREATE TABLE ice_tbl ( i int NOT NULL, j int, s string NOT NULL primary key(i, s) not enforced) PARTITIONED BY SPEC (truncate(10, s)) STORED AS ICEBERG; There are some constraints with primary keys (PK) following the behavior of Flink: - Only NOT NULL columns can be in the PK. - PK is not allowed in the column definition level like 'i int NOT NULL PRIMARY KEY'. - If the table is partitioned then the partition columns have to be part of the PK. - Float and double columns are not allowed for the PK. - Not allowed to drop a column that is used as a PK. Testing: - New E2E tests added for different table creation scenarios. - Manual test to use Nifi for writing into a table with PK. Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/jflex/sql-scanner.flex M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test 13 files changed, 261 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/1 -- To view, visit http://gerrit.cloudera.org:8080/21149 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6 Gerrit-Change-Number: 21149 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21139 to look at the new patch set (#4). Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables .. IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables This is a part 1 change that turns off the count(*) optimisations for V2 tables as there is a correctness issue with it. The reason is that Spark compaction may leave some dangling delete files that mess up the logic in Impala. Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test M tests/query_test/test_iceberg.py 32 files changed, 1,009 insertions(+), 244 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/4 -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21139 ) Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/21139/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21139/2//COMMIT_MSG@11 PS2, Line 11: u > Nit: mess (plural). Done http://gerrit.cloudera.org:8080/#/c/21139/2/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/21139/2/testdata/data/README@1093 PS2, Line 1093: iceberg_spark_compaction_with_dangling_delete: > Could you also provide the SQL commands? It would then be possible to exact Done -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 Gerrit-Change-Number: 21139 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Mar 2024 14:05:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21139 to look at the new patch set (#3). Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables .. IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables This is a part 1 change that turns off the count(*) optimisations for V2 tables as there is a correctness issue with it. The reason is that Spark compaction may leave some dangling delete files that mess up the logic in Impala. Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test M tests/query_test/test_iceberg.py 32 files changed, 1,009 insertions(+), 244 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/3 -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21139 ) Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21139/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21139/1//COMMIT_MSG@12 PS1, Line 12: logic > logic? Done http://gerrit.cloudera.org:8080/#/c/21139/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/21139/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1473 PS1, Line 1473: > nit: missing space Done -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 Gerrit-Change-Number: 21139 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Mar 2024 13:39:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21139 to look at the new patch set (#2). Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables .. IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables This is a part 1 change that turns off the count(*) optimisations for V2 tables as there is a correctness issue with it. The reason is that Spark compaction may leave some dangling delete files that messes up the logic in Impala. Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test M tests/query_test/test_iceberg.py 32 files changed, 998 insertions(+), 244 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/2 -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21139 Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables .. IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables This is a part 1 change that turns off the count(*) optimisations for V2 tables as there is a correctness issue with it. The reason is that Spark compaction may leave some dangling delete files that messes up the login in Impala. Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test M tests/query_test/test_iceberg.py 6 files changed, 408 insertions(+), 244 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/1 -- To view, visit http://gerrit.cloudera.org:8080/21139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237 Gerrit-Change-Number: 21139 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables This commit adds support for reading ARRAY columns inside Iceberg Metadata tables. The change starts with some refactoring, to consolidate accessing JVM through JNI a new backend class was introduced, IcebergMetadataScanner. This class is the C++ part of the Java IcebergMetadataScanner, it is responsible to manage the Java scanner object. In Iceberg array types do not have accessors, so structs inside arrays have to be accessed by position, for the value obtaining logics have been changed to allow access by position. The IcebergRowReader needed an IcebergMetadataScanner, so that it can iterate over the arrays returned by the scanner and add them to the collection. This change will not cover MAP, these slots are set to NULL, it will be done in IMPALA-12611. Testing: - Added E2E tests. Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Reviewed-on: http://gerrit.cloudera.org:8080/21061 Tested-by: Impala Public Jenkins Reviewed-by: Gabor Kaszab --- M be/src/exec/iceberg-metadata/CMakeLists.txt M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h A be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc A be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M be/src/service/impalad-main.cc M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 11 files changed, 734 insertions(+), 292 deletions(-) Approvals: Impala Public Jenkins: Verified Gabor Kaszab: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 13 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 05 Mar 2024 14:56:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21099 ) Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations .. Patch Set 2: Code-Review+2 Thanks for the answers, Zoltan! -- To view, visit http://gerrit.cloudera.org:8080/21099 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c Gerrit-Change-Number: 21099 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 05 Mar 2024 09:50:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21099 ) Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21099/2/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test: http://gerrit.cloudera.org:8080/#/c/21099/2/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test@421 PS2, Line 421: BUFFERED DELETE FROM ICEBERG [functional_parquet.iceberg_v2_partitioned_position_deletes-POSITION-DELETE] I'm wondering if getting rid of DELETE FROM ICEBERG in this patch is how tightly related to the issue we are about to solve with this patch. For me it seems kind of a separate stuff at first sight. If you say that this holds low or non-existing risk then I'm fine keeping the patch as is now, I just want to make sure that we won't introduce any further risks with fixing one issue. e.g. perf degradation with switching from DeleteSink to BufferedDeleteSink http://gerrit.cloudera.org:8080/#/c/21099/2/tests/stress/test_update_stress.py File tests/stress/test_update_stress.py: http://gerrit.cloudera.org:8080/#/c/21099/2/tests/stress/test_update_stress.py@279 PS2, Line 279: # Prepare INSERT statement of 'num_rows' records. nit: I meant to add it to a comment that this code prefers the INSERT in such a way that all the rows will be added in a single insert to have a single file (opposed to the similar part of the test above that adds a separate file for each row). -- To view, visit http://gerrit.cloudera.org:8080/21099 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c Gerrit-Change-Number: 21099 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 04 Mar 2024 20:24:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21099 ) Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations .. Patch Set 1: (7 comments) great job with this quick fix, Zoltan! I added some nits but in general the patch is good. http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h File be/src/runtime/dml-exec-state.h: http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h@135 PS1, Line 135: const std::vector& ReferencedDataFilesByPositionDeletes() { since it returns const ref, can the function also be const? http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h@170 PS1, Line 170: referenced_data_files_by_position_deletes_ nit: data_files_referenced_by_position_deletes http://gerrit.cloudera.org:8080/#/c/21099/1/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/21099/1/common/protobuf/control_service.proto@115 PS1, Line 115: referenced_data_files_by_position_deletes nit: data_files_referenced_by_position_deletes? http://gerrit.cloudera.org:8080/#/c/21099/1/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/21099/1/common/thrift/CatalogService.thrift@249 PS1, Line 249: referenced_data_files_by_position_deletes nit: data_files_referenced_by_position_deletes maybe? http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py File tests/stress/test_update_stress.py: http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@269 PS1, Line 269: test_concurrent_deletes_and_updates nit: this table name omits optimize. test_concurrent_write_and_optimize? http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@277 PS1, Line 277: for i in range(num_rows): Could you add a comment that here you prepare the INSERT query so that these rows can be put into the same file? http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@283 PS1, Line 283: flag I know we've been using this variable name for a while now, but I feel that the name doesn't suggest anything what it is used for. Can we rename this 'all_rows_deleted' or such? -- To view, visit http://gerrit.cloudera.org:8080/21099 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c Gerrit-Change-Number: 21099 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 04 Mar 2024 12:15:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 8 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Feb 2024 15:29:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20951 ) Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. Patch Set 11: Code-Review+2 carry +2 from Zoltan -- To view, visit http://gerrit.cloudera.org:8080/20951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 Gerrit-Change-Number: 20951 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Feb 2024 15:17:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20951 ) Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/20951/9/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/20951/9/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1507 PS9, Line 1507: > typo: field Done -- To view, visit http://gerrit.cloudera.org:8080/20951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 Gerrit-Change-Number: 20951 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Feb 2024 15:16:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#10). Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality field id lists for Iceberg tables This patch adds support for reading Iceberg tables that have different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink and NiFi was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro A
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/21061/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21061/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@250 PS7, Line 250: VLOG(3) << "Skipping unsupported column type: " << type.type; ahh, I just noticed that we just write a log line if there is an unsupported column type. Shouldn't we run into a DCHECK failure or such in this case? http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@715 PS6, Line 715: # Describe all the metadata tables once > Ok, I'm going to add support for MAPs probably soon anyway. But please ment Expand complex types would come after we have MAP support anyway. So I still think that giving an error instead of a null would make sense if it's not that complex to implement. -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 7 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Feb 2024 14:41:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/21061/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21061/6//COMMIT_MSG@18 PS6, Line 18: alue nit: value http://gerrit.cloudera.org:8080/#/c/21061/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/21061/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h@21 PS6, Line 21: #include "exec/scan-node.h" I think even including scan-node.h in the header could be avoided by forward declarations http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@222 PS3, Line 222: if (ExecNode::EvalConjuncts(scan_node_->conjunct_evals().data(), > I just realised that with the current implementation it won't be possible t well, if this code doesn't work then let's remove it. What would be needed to make this work? FROM clause collections, and predicates on the ITEM, right? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@248 PS3, Line 248: } case TYPE_STRUCT: { > Yes, but I think it helps to understand what is happening here. But then we return nullptr for TYPE_STRUCT that could cause issues on the caller side. If this method is not expected to be called for TYPE_STRUCT then it's cleaner to get rid of that case. http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java File fe/src/main/java/org/apache/impala/analysis/FromClause.java: http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java@169 PS3, Line 169: throw new AnalysisException("Querying collection types (ARRAY/MAP) in FROM " + > Adding Jira: IMPALA-12853 Thanks for creating a Jira! Could you also mention the Jira ID here in a comment? http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@624 PS6, Line 624: '/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq','NULL' Could you also add the content col to the select list so that we can see which files are eq-delete files. Would make it easier to verify that only eq-delete files (content=2) have eq-id list http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@663 PS6, Line 663: select null_value_counts from functional_parquet.iceberg_query_metadata.all_files; This seems a duplicate test with the one on L715 http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@715 PS6, Line 715: select null_value_counts from functional_parquet.iceberg_query_metadata.all_files; Shouldn't we get a not supported error when querying map columns from a metadata table? Returning nulls might be misleading -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 6 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Feb 2024 11:33:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. Patch Set 3: (14 comments) I managed to check the c++ part too http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@58 PS3, Line 58: jobject& result This should be pointer as Daniel said http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@93 PS3, Line 93: /// Accessor map for the scan result, pairs the slot ids with the java Accessor objects. This is not a map of accessors now but a slot ID to field ID mapping. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@103 PS3, Line 103: Accessor collection I think this comment is off now. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@109 PS3, Line 109: const SlotDescriptor* Don't we use const ref types for immutable params instead of const pointers? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@113 PS3, Line 113: SlotDescriptor* const ref instead http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@88 PS3, Line 88: Status IcebergMetadataScanner::InitSlotIdFieldIdMap(JNIEnv* env) { Could you help me understand if this functiondoes or should go into collections, e.g. get the field ids for struct members inside collections? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@145 PS3, Line 145: clazz I saw this param name elsewhere too, but not sure if this is something conventional or if it makes something, but seems a weird name for me. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@155 PS3, Line 155: Primitive inside an ARRAY Can we DCHECK on this fact? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@22 PS3, Line 22: #include "exec/iceberg-metadata/iceberg-metadata-scanner.h" I think we can get rid of this include with a forward declaration. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@114 PS3, Line 114: env->DeleteLocalRef(accessed_value); I'm a bit lost with these refs, but aren't local refs released automatically when going out from the scope? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@222 PS3, Line 222: if (ExecNode::EvalConjuncts(scan_node_->conjunct_evals().data(), Could you add a test that exercises the path where EvalConjuncts is false? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@237 PS3, Line 237: For nit: I'd fing 'From' instead of 'For' easier to understand http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@248 PS3, Line 248: } case TYPE_STRUCT: { Would it work to simply remove this case? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@252 PS3, Line 252: return list_cl_; Is this java.lang.util.List? Please add a comment similarly to the other types -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 3 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 27 Feb 2024 16:43:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 ) Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables .. Patch Set 3: (9 comments) Thanks for the patch, Tamas! I checked the tests and the Java part. Will take a deeper look at the rest tomorrow. http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG@18 PS3, Line 18: the the nit: duplicate 'the' http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@21 PS3, Line 21: #include "exec/scan-node.h" Can we not include scan-node.h? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/service/impalad-main.cc File be/src/service/impalad-main.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/service/impalad-main.cc@32 PS3, Line 32: #include "exec/iceberg-metadata/iceberg-metadata-scan-node.h" This include can be removed I think http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java File fe/src/main/java/org/apache/impala/analysis/FromClause.java: http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java@169 PS3, Line 169: tblRef.getDesc().getType().isMapType()) { If I'm not mistaken, this check is for the case when a collection is provided in the FROM clause. If this use case is deliberately supported then please add test coverage too. http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java: http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@116 PS3, Line 116: StructLke typo http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@137 PS3, Line 137: private Iterator iterator; If we store the iterator as a member, is there a guarantee that the underlying array is not destroyed? http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@146 PS3, Line 146: return iterator.next(); nit: could fit into the line above http://gerrit.cloudera.org:8080/#/c/21061/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21061/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@567 PS3, Line 567: # Test 11 : Query ARRAY type columns The test above is also "Test 11". I don't think that giving a number of these tests make sense, it just makes it complicated to maintain and to add anything to the middle. http://gerrit.cloudera.org:8080/#/c/21061/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@591 PS3, Line 591: row_regex:'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/.*','NULL' The name of the files in this table is fix, so we could spell the full path out in the expected results, in my opinion. -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 3 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 26 Feb 2024 16:22:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1261 PS6, Line 1261: // Test that even if we have privileges on a different table in the same db, we : // still can't access 'iceberg_query_metadata' if we don't > Added a comment to clarify. thanks! http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py File tests/authorization/test_authorization.py: http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@129 PS6, Line 129: # Make sure there is no privilege stuck from the previous execution of the test. : admin_client.execute("revoke {priv} on database {db} from user {user}".format( : priv=priv, db=unique_db, user=user), user=ADMIN) > Interestingly if I delete the database but do not drop the privileges next I think if we can't justify the existence of some code then that code shouldn't exist. What I can think of is that when that other test was being written in some intermediate state there were test issues resulting that the DB or the privilege remained and hence this extra safety net. I don't think that this can happen with this test. I saw in other tests that creating the DB is in an outer try-finally block while granting the privileges is in an internal try-finally (I recall I also wrote something similar not that long ago). Note that dropping the DB will not drop the privileges associated with it so once you recreate the DB with the same name the old privileges will be there. Hence the need for an extra step to drop the privileges too (from Ranger). http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@137 PS6, Line 137: assert str(exc1).strip() + non_existent_suffix == str(exc2).strip() > The error message contains the username (which depends on where the test is We can then assert on "AuthorizationException" and "does not have privileges to access" are part of the error message. -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 22 Feb 2024 16:15:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20951 ) Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/20951/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20951/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@520 PS5, Line 520:*/ > Nit: If you made equalityDeletesRecordCount_ a parameter you could write a Done http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/data/README@850 PS5, Line 850: primary key(i) > Impala doesn't do anything with this clause, right? ATM you are right, however I have a working PoC for IMPALA-12729 where I can populate the identifier-field-ids with primary keys so for this test I used that dev Impala to create the table. Since Nifi is not able to create tables, I did't want to involve 3 engines for this test: 1) create with Flink 2) populate with Nifi 3) query with Impala. http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1583 PS5, Line 1583: cardinality=8 > Would be nice to fix IMPALA-11797 later to get better cardinalities. Or, if I opened a jira for eq-deletes similarly to IMPALA-12371: IMPALA-12826 -- To view, visit http://gerrit.cloudera.org:8080/20951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 Gerrit-Change-Number: 20951 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 22 Feb 2024 13:31:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#7). Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality field id lists for Iceberg tables This patch adds support for reading Iceberg tables that have different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink and NiFi was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro A
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#6). Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality field id lists for Iceberg tables This patch adds support for reading Iceberg tables that have different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink and NiFi was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro A
[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21026 ) Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/main/cup/sql-parser.cup@2936 PS6, Line 2936: ident_or_default:tbl nit: this fits into the line above similarly to L2933 http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@2044 PS6, Line 2044: ParserError("SHOW TABLES IN db.tbl"); Can you add a test here where you don't provide a full path just the table name for SHOW METADATA TABLES? Or if it'd be an Analysis error could you add it to AnalyzeDDLTest? http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java File fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java: http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1261 PS6, Line 1261: test.error(accessError("functional_parquet.iceberg_query_metadata"), : onTable("functional_parquet", "alltypes", privilege)); I admit I'm not really familiar with these tests but I don't get this part where we expect access error on 'iceberg_query_metadat' but we also have this onTable on 'alltypes.' http://gerrit.cloudera.org:8080/#/c/21026/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/21026/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@857 PS6, Line 857: show metadata tables in functional_parquet.iceberg_query_metadata; Just a note, that this test could break when we bump the Iceberg version and the new version has some new metadata tables. Not much we can do about this. http://gerrit.cloudera.org:8080/#/c/21026/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@879 PS6, Line 879: show metadata tables in functional_parquet.alltypestiny; Could you try something similar on a view that is on top of an Iceberg table? http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py File tests/authorization/test_authorization.py: http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@123 PS6, Line 123: admin_client.execute("drop database if exists {} cascade".format(unique_db), : user=ADMIN) : admin_client.execute("create database {}".format(unique_db), user=ADMIN) I'm wondering what is the case when this db exists already. We drop this in the finally section so it couldn't be remaining from previous failed tests. http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@129 PS6, Line 129: # Make sure there is no privilege stuck from the previous execution of the test. : admin_client.execute("revoke {priv} on database {db} from user {user}".format( : priv=priv, db=unique_db, user=user), user=ADMIN) Similarly to above, there can't be any remaining privs from previous tests since you remove priv in the finally block. http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@137 PS6, Line 137: assert str(exc1).strip() + non_existent_suffix == str(exc2).strip() Can't you assert on the actual error string? -- To view, visit http://gerrit.cloudera.org:8080/21026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49 Gerrit-Change-Number: 21026 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 22 Feb 2024 08:28:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20951 ) Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. Patch Set 5: PS5 is a rebase with master and some conflict resolution -- To view, visit http://gerrit.cloudera.org:8080/20951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 Gerrit-Change-Number: 20951 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 08 Feb 2024 16:26:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#5). Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality field id lists for Iceberg tables This patch adds support for reading Iceberg tables that have different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink and NiFi was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m1.avro A
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20951 ) Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. Patch Set 4: (12 comments) Thanks for taking a look Tamas and Zoli! About your question, Tamas: the eq IDs were stored on a ContentFileStore level when there was this restriction that all the eq-delete files have to have the same eq ID lists. However, with this change I had to move them from the ContentFileStore to a per FileDescriptor level, and that is stored in FB. So it's not a question of Thrift vs FB rather on what abstraction level we want to store these IDs. http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@7 PS3, Line 7: d > nit: d Done http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@9 PS3, Line 9: hav > nit: have Done http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs File common/fbs/IcebergObjects.fbs: http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs@53 PS3, Line 53: equality_field_ids > I saw this in multiple places, the new term became equality_fiel_ids, in ma Flink refers to these IDs as equality field IDs, while the Iceberg spec as equality IDs. Also in the vxmetadata.json in the Iceberg metadata this is called 'identifier-field-ids' so even Iceberg is not consistent with the naming. I think they can be used as synonyms, and I believe I spelled the whole 'equalityFiledID' out in most of the cases, while where I found a variable name too long I decided to omit the 'field' part. I feel that both naming are straightforward to understand. http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@126 PS3, Line 126: private Set allEqualityFieldIds_ = new HashSet<>(); : private Map, Set> equalityIdsToDeleteFiles_ = : new HashMap<>(); > nit: Here we use both equality ids and equality field ids. well, I found equalityFieldIdsToDeleteFiles_ too long for a member name so decided to shorten it a bit. It's still easy to understand what we mean by equalityIds I think. http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@136 PS3, Line 136: field > typo: field Done http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@510 PS3, Line 510: > nit: D Done http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@520 PS3, Line 520: > This way [11, 12] would come earlier than [2, 3]. It shouldn't make much di With this ordering I figured that the plan for a query should be deterministic and we shouldn't see a change in the order of the JOINs for the eq-deletes. For planner tests this would eliminate flakiness for example. So [11] coming before [2] isn't really an issue, only looks strange. Anyway, you're right, a custom Comparator makes sense here and then I won't need to do these string conversions back and forth. http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/data/README@884 PS3, Line 884: 4-Impala: > You also mention NiFi in the commit message, but here I don't see any refer When I created these table Nifi's PutIcebergCDC processor wrote all the columns into the eq-delete files so for these tests I only had Flink. However, since then there is a new version of the mentioned Nifi processor that writes onlt the identifier fields (PKs) into the eq-deletes so I can rewrite now one of the tables to use Nifi as the writer. http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/datasets/functional/functional_schema_template.sql@3643 PS3, Line 3643: > Do we need this? This is the default for Impala. I don't think this is needed indeed. Some other tables has this as well, removed those too. http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1396 PS3, Line 1396: ns=1/1 files= > All EQ delete group has cardinality=2, which means we don't have test for t Reworked iceberg_v2_delete_equality_multi_eq_ids to serve this
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables
Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#4). Change subject: IMPALA-12598: Allow multiple equality field id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality field id lists for Iceberg tables This patch adds support for reading Iceberg tables that have different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink and NiFi was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m1.avro A
[Impala-ASF-CR] IMPALA-12072: Include snapshot id of Iceberg tables in query plan / profile
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20204 ) Change subject: IMPALA-12072: Include snapshot id of Iceberg tables in query plan / profile .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20204 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee0b4967429ea733729ad8e44df32e3b24b88525 Gerrit-Change-Number: 20204 Gerrit-PatchSet: 14 Gerrit-Owner: Gergely Fürnstáhl Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 07 Feb 2024 11:42:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12072: Include snapshot id of Iceberg tables in query plan / profile
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20204 ) Change subject: IMPALA-12072: Include snapshot id of Iceberg tables in query plan / profile .. Patch Set 13: (3 comments) Thanks for the change Zoli and Gergo! In general I think the patch is good, I just had some nits. http://gerrit.cloudera.org:8080/#/c/20204/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/20204/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1327 PS13, Line 1327: runPlannerTestFile("iceberg-v2-update", "functional_parquet", I see you added the snapshot ids to the .test file. Shouldn't you also add the VALIDATE_ICEBERG_SNAPSHOT_IDS option here? Same goes for 'insert-sort-by-zorder' and 'iceberg-predicates'. http://gerrit.cloudera.org:8080/#/c/20204/13/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test File testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test: http://gerrit.cloudera.org:8080/#/c/20204/13/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test@172 PS13, Line 172: | Per-Host Resources: mem-estimate=20.00MB mem-reservation=4.01MB thread-reservation=2 nit: I see some changes in this test that are irrelevant for this patch. http://gerrit.cloudera.org:8080/#/c/20204/13/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test@356 PS13, Line 356: | tuple-ids=0 row-size=36B cardinality=2 Same as above with these cardinality numbers -- To view, visit http://gerrit.cloudera.org:8080/20204 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee0b4967429ea733729ad8e44df32e3b24b88525 Gerrit-Change-Number: 20204 Gerrit-PatchSet: 13 Gerrit-Owner: Gergely Fürnstáhl Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Feb 2024 15:42:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12787: Concurrent DELETE and UPDATE operations on Iceberg tables can be problematic
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20999 ) Change subject: IMPALA-12787: Concurrent DELETE and UPDATE operations on Iceberg tables can be problematic .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e581ea17fa8f6ccd9c87aaad1281bb694079f6e Gerrit-Change-Number: 20999 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 06 Feb 2024 13:11:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 26 Jan 2024 22:04:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20954 ) Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error .. Patch Set 1: Code-Review+1 (3 comments) Thanks for the quick fix, Zoltan! In general the patch seems fine, I only have some nits. http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc File be/src/exec/iceberg-delete-sink-base.cc: http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc@91 PS1, Line 91: DCHECK(!IsTimestamp(scalar_type)); I don't think we need these DCHECKs here since you give an error from the below if anyway. http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java: http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@21 PS1, Line 21: import org.apache.impala.catalog.PrimitiveType; Is this used? http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@562 PS1, Line 562: transformParam), Type.fromTScalarType(field.getType(; nit: For me it was a bit misleading that this param got into the same line as the end of IcebergPartitionTransform constructor call. Maybe in a new line it would express better for the reader that it belongs to the IcebergPartitionField creation. -- To view, visit http://gerrit.cloudera.org:8080/20954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8 Gerrit-Change-Number: 20954 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Comment-Date: Fri, 26 Jan 2024 11:37:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables
Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#3). Change subject: IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables This patch adds support for reading Iceberg tables that has different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. - Also did some experiments creating test tables using NiFi. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro A
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20951 to look at the new patch set (#2). Change subject: IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables This patch adds support for reading Iceberg tables that has different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. - Also did some experiments creating test tables using NiFi. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro A
[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20951 Change subject: IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables .. IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables This patch adds support for reading Iceberg tables that has different equality field ID lists associated to different equality delete files. In practice this is a use case when one equality delete file deletes by e.g. columnA and columnB while another one deletes by columnB and columnC. In order to achieve such functionality the plan tree creation needed some adjustments so that it can create separate LEFT ANTI JOIN nodes for the different equality field ID lists. Testing: - Flink was used for creating some test tables with the desired equality field IDs. Coverage on these tables are added to the test suite. - Also did some experiments creating test tables using NiFi. Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79 --- M common/fbs/IcebergObjects.fbs M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/data/README D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro A
[Impala-ASF-CR] IMPALA-12673: Table migration fails if partition contains '/'
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20845 Change subject: IMPALA-12673: Table migration fails if partition contains '/' .. IMPALA-12673: Table migration fails if partition contains '/' Due to Iceberg #7612 migrating a table to Iceberg resulted in incorrect data and stats if some of the string partition fields contained '/' character. As a result we deliberately rejected migrating such tables. Now that Impala uses an Iceberg version that has the fix we can allow migrating such tables too. Change-Id: I05b4ca44c7edb81cee6747f83a5bd82c5a4b5c44 --- M fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test 2 files changed, 8 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/20845/1 -- To view, visit http://gerrit.cloudera.org:8080/20845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I05b4ca44c7edb81cee6747f83a5bd82c5a4b5c44 Gerrit-Change-Number: 20845 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Dec 2023 07:29:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 10 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Dec 2023 13:11:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py@1264 PS8, Line 1264: : @SkipIfDockerizedCluster.internal_hostname : @SkipIf.hardcoded_uris : def test_multiple_equality_ids(self, unique_database): : """This test loads an Iceberg table that has 2 equality delete files with different : equality ID lists. A query on such a table fails due to lack of support.""" : SRC_DIR = os.path.join(os.environ['IMPALA_HOME'], : "testdata/data/iceberg_test/hadoop_catalog/ice/" : "iceberg_v2_delete_different_equality_ids") : DST_DIR = "/test-warehouse/iceberg_test/hadoop_catalog/ice/" \ : "iceberg_v2_delete_different_equality_ids" > I see, thanks for giving it a try. I think this would add some extra complexity for this test but the gain wouldn't be that much because this is an error-case anyway. -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Dec 2023 12:33:36 + Gerrit-HasComments: Yes