from:"Gabor Kaszab \(Code Review\)"

[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-04-24 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21348 )

Change subject: IMPALA-13029: Tests for multi format equality deletes
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/21348/1/testdata/data/README@1193
PS1, Line 1193:set tblproperties ('write.format.default'='avro');
> Would it be possible to do schema evolution + Avro delete files? I.e. using
not with schema evolution but with some hacking I created another test table to 
have avro eq-deletes with different eq-IDs.



--
To view, visit http://gerrit.cloudera.org:8080/21348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
Gerrit-Change-Number: 21348
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 24 Apr 2024 11:53:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-04-24 Thread Gabor Kaszab (Code Review)

Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21348

to look at the new patch set (#2).

Change subject: IMPALA-13029: Tests for multi format equality deletes
..

IMPALA-13029: Tests for multi format equality deletes

So far we only had test coverage for Parquet equality deletes. This
patch adds new tests where we have equality deletes also in ORC and AVRO.

Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
---
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-10-937fe984-34d8-4351-a419-842ad9d30758-2.orc
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-11-1bd90c11-bab5-4f53-9cd9-7bf85adaa97a-2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-9-77e491ad-7b25-4b81-b10a-53b383ae0355-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/61480fd29dfdfefb-48a79ca5_1414128186_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/1dd8f48c-c2ef-4239-8fb2-25bcbef7026c-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/454a3281-55e9-4e00-a8a2-fac5c23ff043-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/61ebd3da-da91-443e-9413-2a010f77443b-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/dab4e396-8cc2-4f57-b856-3864822ab5d3-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1904885991593677469-1-1dd8f48c-c2ef-4239-8fb2-25bcbef7026c.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1935861967137943703-1-454a3281-55e9-4e00-a8a2-fac5c23ff043.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-4400093814370842303-1-61ebd3da-da91-443e-9413-2a010f77443b.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-8244791200683984727-1-dab4e396-8cc2-4f57-b856-3864822ab5d3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v7.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/data/0-8-2283e34a-7899-47f8-87c5-9cf060a1736c-2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/data/0-9-a93dc3c5-a017-41f9-adea-44b59e71dcd7-2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/data/0a47c8ccf4f25b0c-5ab55354_1092617260_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/224885a4-3e53-430d-88a2-3a91a7705c89-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/2ab761d1-2a51-4432-aac4-d48af6a40b0c-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/53cb7c86-e535-4516-8d16-5b2a228ac8a0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/snap-2698672561374065751-1-2ab761d1-2a51-4432-aac4-d48af6a40b0c.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/snap-792611178233942655-1-224885a4-3e53-430d-88a2-3a91a7705c89.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/snap-8134794228158837818-1-53cb7c86-e535-4516-8d16-5b2a228ac8a0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multiple_avro_equality_deletes/metadata/v2.metadata.json
A

[Impala-ASF-CR] IMPALA-13029: Tests for multi format equality deletes

2024-04-23 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21348


Change subject: IMPALA-13029: Tests for multi format equality deletes
..

IMPALA-13029: Tests for multi format equality deletes

So far we only had test coverage for Parquet equality deletes. This
patch adds new tests where we have equality deletes also in ORC and AVRO.

Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
---
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-10-937fe984-34d8-4351-a419-842ad9d30758-2.orc
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-11-1bd90c11-bab5-4f53-9cd9-7bf85adaa97a-2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/0-9-77e491ad-7b25-4b81-b10a-53b383ae0355-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/data/61480fd29dfdfefb-48a79ca5_1414128186_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/1dd8f48c-c2ef-4239-8fb2-25bcbef7026c-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/454a3281-55e9-4e00-a8a2-fac5c23ff043-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/61ebd3da-da91-443e-9413-2a010f77443b-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/dab4e396-8cc2-4f57-b856-3864822ab5d3-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1904885991593677469-1-1dd8f48c-c2ef-4239-8fb2-25bcbef7026c.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-1935861967137943703-1-454a3281-55e9-4e00-a8a2-fac5c23ff043.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-4400093814370842303-1-61ebd3da-da91-443e-9413-2a010f77443b.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/snap-8244791200683984727-1-dab4e396-8cc2-4f57-b856-3864822ab5d3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/v7.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_multi_format_equality_deletes/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-equality-deletes.test
M tests/query_test/test_iceberg.py
25 files changed, 892 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/21348/1
--
To view, visit http://gerrit.cloudera.org:8080/21348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f0ebf7f4d401877741eb3e1c990f1318ac2b4ba
Gerrit-Change-Number: 21348
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-23 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Tue, 23 Apr 2024 09:06:59 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-23 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 23 Apr 2024 08:54:29 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87
PS1, Line 87:   if (desc.hdfsTable.isSetAvroSchema()) {
I guess the issue is also true for AVRO equality delete files. Should we also 
add test coverage for that? (could be separate patch)


http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test:

http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92
PS1, Line 92: 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*'
there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and one 
for (3,3). You only check for 1 of such line.



--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 22 Apr 2024 15:20:10 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested in complex types in select list

2024-04-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21269 )

Change subject: IMPALA-12973,IMPALA-11491,IMPALA-12651: Support BINARY nested 
in complex types in select list
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21269/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@227
PS7, Line 227:   RETURN_IF_ERROR(GuardType::create(env, jbuffer, 
_guard));
if create() returned an error here we'd leak memory because of 'jbuffer', right?


http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21269/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@804
PS7, Line 804: select data_file from 
functional_parquet.iceberg_query_metadata.entries;
The query should go into the QUERY section



--
To view, visit http://gerrit.cloudera.org:8080/21269
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7b1d7fa332a901f05a46e0199e13fb841d2687c2
Gerrit-Change-Number: 21269
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 22 Apr 2024 10:46:09 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-11 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Apr 2024 16:10:37 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-11 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455
PS7, Line 1455: if 
vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0:
> We only test DIRECTED mode + V2 operator (which go in hand with each other)
Here the code says that "if we don't disable the V2 read optimisations then 
skip the test" or in other words "if we have V2 read opts then skip".
I might misread this code or we don't test the DIRECTED mode here only the old 
ANTI JOIN case.



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Apr 2024 09:02:43 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs

2024-04-11 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21285 )

Change subject: IMPALA-12991: Eliminate unnecessary SORT for Iceberg DELETEs
..


Patch Set 1: Code-Review+2

(1 comment)

Thanks! +2 with a nit

http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21285/1//COMMIT_MSG@9
PS1, Line 9: useing
nit: using



--
To view, visit http://gerrit.cloudera.org:8080/21285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94a691e7990228a1ec2de03e6ad90ebb97931581
Gerrit-Change-Number: 21285
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 11 Apr 2024 07:19:13 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

2024-04-10 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21267 )

Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg 
table scans
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21267/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@107
PS1, Line 107:   fileDescs_ = new ArrayList<>(fileDescs_);
 :   Collections.sort(fileDescs_);
> Alternatively we could do the sorting during file metadata loading, so we w
I'm experimenting with your suggestion and see that it would bring too much 
complexity into the code. We could maintain an invariant in the 
ContentFileStore that all the file descriptor lists are sorted, however in the 
case when we can't get the FDs directly from ContentFileStore (timetravel or 
predicate on partition cols) we get the FDs one by one and we'd need to sort 
the resulting lists anyway in planning time.
So this idea would help with very basic queries onyl, but would bring too much 
complexity and an extra invariant we have to keep maintaining.

So I propose to go with the original approach and let IcebergScanNode create a 
copy and sort the FDs itself.



--
To view, visit http://gerrit.cloudera.org:8080/21267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Gerrit-Change-Number: 21267
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 11:54:34 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-10 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/21258/7/be/src/runtime/krpc-data-stream-sender.cc@
PS7, Line : (filename_value_ss.len == 0 && 
prev_channels.empty()));
This is triggered when there is 2 consecutive rows in the delete file where the 
file_path is null, right? I wonder if the new test table also covers this.


http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/21258/7/testdata/datasets/functional/functional_schema_template.sql@3957
PS7, Line 3957: iceberg_v2_null_delete_record
Could you add some details about this table to the README? Would be nice to 
understand the content.


http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/21258/7/tests/query_test/test_iceberg.py@1455
PS7, Line 1455: if 
vector.get_value('exec_option')['disable_optimized_iceberg_v2_read'] == 0:
Would it make sense to test where we have DIRECTED mode to see that the 
KrpcDataStreamSender is capable of filtering the null file_paths?



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 10 Apr 2024 07:47:52 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

2024-04-09 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21267 )

Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg 
table scans
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21267/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21267/1//COMMIT_MSG@15
PS1, Line 15: it
> nit: its
Done



--
To view, visit http://gerrit.cloudera.org:8080/21267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Gerrit-Change-Number: 21267
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Tue, 09 Apr 2024 08:43:51 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

2024-04-09 Thread Gabor Kaszab (Code Review)

Hello Peter Rozsa, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21267

to look at the new patch set (#2).

Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg 
table scans
..

IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

When a table is partitioned IcebergScanNode sorts the file descriptors
for better scheduling. However, the list of file descriptors comes from
IcebergContentFileStore and is shared between different select queries
on the table. When another query tries to iterate the list of file
descriptors and at the same time the IcebergScanNode sorts them we get
a ConcurrentModificationException.
To solve this IceberScanNode now creates its own copy of the file
descriptor list not to interfere with other queries.

Manual testing:
300-400 SELECT * Iceberg queries were sent into Impala in a loop that
confidently reproduced the original issue. With the fix the issue is
gone.
The queries used for the repro:
1:
select *
from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a,
functional_parquet.iceberg_partitioned_orc_external b
where a.action = b.action and b.id=3;
2:
select *
from functional_parquet.iceberg_v2_equality_delete_schema_evolution;

Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/21267/2
--
To view, visit http://gerrit.cloudera.org:8080/21267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Gerrit-Change-Number: 21267
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa

[Impala-ASF-CR] IMPALA-12810: Simplify IcebergDeleteNode and IcebergDeleteBuilder

2024-04-09 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21258 )

Change subject: IMPALA-12810: Simplify IcebergDeleteNode and 
IcebergDeleteBuilder
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@9
PS2, Line 9: BROADCAST
DIRECTED


http://gerrit.cloudera.org:8080/#/c/21258/2//COMMIT_MSG@13
PS2, Line 13: IcebergDeleteBuilder now also tolerates NULL file paths which are
I'm wondering if there is a valid use case when a file path is null. Is it 
Impala or some other engine that writes such rows?



--
To view, visit http://gerrit.cloudera.org:8080/21258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba02b33433990950b49628f11e732e01ed8a34d
Gerrit-Change-Number: 21258
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 09 Apr 2024 08:40:37 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

2024-04-09 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21267


Change subject: IMPALA-12970: Fix ConcurrentModificationException for Iceberg 
table scans
..

IMPALA-12970: Fix ConcurrentModificationException for Iceberg table scans

When a table is partitioned IcebergScanNode sorts the file descriptors
for better scheduling. However, the list of file descriptors comes from
IcebergContentFileStore and is shared between different select queries
on the table. When another query tries to iterate the list of file
descriptors and at the same time the IcebergScanNode sorts them we get
a ConcurrentModificationException.
To solve this IceberScanNode now creates an own copy of the file
descriptor list not to interfere with other queries.

Manual testing:
300-400 SELECT * Iceberg queries were sent into Impala in a loop that
confidently reproduced the original issue. With the fix the issue is
gone.
The queries used for the repro:
1:
select *
from functional_parquet.iceberg_v2_partitioned_position_deletes_orc a,
functional_parquet.iceberg_partitioned_orc_external b
where a.action = b.action and b.id=3;
2:
select *
from functional_parquet.iceberg_v2_equality_delete_schema_evolution;

Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
1 file changed, 3 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/21267/1
--
To view, visit http://gerrit.cloudera.org:8080/21267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iafe57f05ffa0fa6a0875c141cfafd5ee1607a5c3
Gerrit-Change-Number: 21267
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-08 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21249 )

Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Apr 2024 08:58:22 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables

2024-04-04 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21236

to look at the new patch set (#4).

Change subject: IMPALA-12612: SELECT * queries expand complex type columns from 
Iceberg metadata tables
..

IMPALA-12612: SELECT * queries expand complex type columns from Iceberg 
metadata tables

Similarly to how regular tables behave, the nested columns are omitted
when we do a SELECT * on Iceberg metadata tables and the user needs to
turn EXPAND_COMPLEX_TYPES on to include the nested columns into the
result. This patch changes this behaviour to unconditionally include
the nested columns from Iceberg metadata tables.
Note, the behavior of handling nested columns from regular tables
doesn't change with this patch.

Testing:
  - Adjusted the SELECT * metadata table queries to add the nested
columns into the results.
  - Added some new tests where both metadata tables and regular tables
were queried in the same query.

Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
2 files changed, 103 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/21236/4
--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables

2024-04-03 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21236

to look at the new patch set (#2).

Change subject: IMPALA-12612: SELECT * queries expand complex type columns from 
Iceberg metadata tables
..

IMPALA-12612: SELECT * queries expand complex type columns from Iceberg 
metadata tables

Similarly to how regular tables behave, the nested columns are omitted
when we do a SELECT * on Iceberg metadata tables and the user needs to
turn EXPAND_COMPLEX_TYPES on to include the nested columns into the
result. This patch changes this behaviour to unconditionally include
the nested columns from Iceberg metadata tables.
Note, the behavior of handling nested columns from regular tables
doesn't change with this patch.

Testing:
  - Adjusted the SELECT * metadata table queries to add the nested
columns into the results.
  - Added some new tests where both metadata tables and regular tables
were queried in the same query.

Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
2 files changed, 103 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/21236/2
--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12612: SELECT * queries expand complex type columns from Iceberg metadata tables

2024-04-03 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21236 )

Change subject: IMPALA-12612: SELECT * queries expand complex type columns from 
Iceberg metadata tables
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-12612: SELECT * queries expand complex type columns from 
Iceberg metadata tables
> The title could include "select *", otherwise it's not clear what this refe
Done


http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@9
PS1, Line 9:
> Nit: should come after "behave".
Done


http://gerrit.cloudera.org:8080/#/c/21236/1//COMMIT_MSG@14
PS1, Line 14: Note, the behavior of handling nested columns from regular tables
> We could mention that although this is technically a breaking change, metad
I don't think this is a breaking change because we don't have a release ATM 
that contains metadata querying. I checked that the parser and planner parts 
went into 4.3 but the executor part is waiting for 4.4 so we are safe with this 
change.


http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@25
PS1, Line 25: $NAMENODE/test-warehou
> Is it intentional that these are changed from "$NAMENODE" to "hdfs://localh
it isn't, thanks for noticing. dockerised GVO also broke because of this


http://gerrit.cloudera.org:8080/#/c/21236/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1110
PS1, Line 1110: select readable_metrics.i.* from 
functional_parquet.iceberg_query_metadata.`files`;
> This example can be confusing at first because the 'readable_metrics' struc
I changed this to expand readable_metrics.i



--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 04 Apr 2024 05:03:06 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12612: Expand complex type columns from Iceberg metadata tables

2024-04-03 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21236


Change subject: IMPALA-12612: Expand complex type columns from Iceberg metadata 
tables
..

IMPALA-12612: Expand complex type columns from Iceberg metadata tables

Similarly to how regular tables, behave the nested columns are omitted
when we do a SELECT * on Iceberg metadata tables and the user needs to
turn EXPAND_COMPLEX_TYPES on to include the nested columns into the
result. This patch changes this behaviour to unconditionally include
the nested columns from Iceberg metadata tables.
Note, the behavior of handling nested columns from regular tables
doesn't change with this patch.

Testing:
  - Adjusted the SELECT * metadata table queries to add the nested
columns into the results.
  - Added some new tests where both metadata tables and regular tables
were queried in the same query.

Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
2 files changed, 103 insertions(+), 82 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/21236/1
--
To view, visit http://gerrit.cloudera.org:8080/21236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia298705ba54411cc439e99d5cb27184093541f02
Gerrit-Change-Number: 21236
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-03 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 03 Apr 2024 10:54:38 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 15:29:44 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21219/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@822
PS6, Line 822: 
1,'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq','PARQUET',0,1,1606,'{2147483546:215,2147483545:51}','{2147483546:1,2147483545:1}','{2147483546:0,2147483545:0}','NULL','{2147483546:null,2147483545:null}','{2147483546:null,2147483545:null}','NULL','NULL','NULL',NULL,'{"d":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"i":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"s":{"column_size":null,"value_count":null,"null_value_count":null,"nan_value_count":null,"lower_bound":null,"upper_bound":null}}'
shouldn't these also be regexes similarly as above because of the paths?



--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 14:34:26 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 02 Apr 2024 13:22:10 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@156
PS8, Line 156:   jobject map_entry;
Shouldn't we release 'map_entry' at the end of this function?


http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270
PS8, Line 270:   env->DeleteLocalRef(collection_scanner);
I think we can leak memory if any of the RETURN_IF_ERROR or RETURN_IF_CANCELLED 
returns from the function. Would it be possible to wrap these globalrefs into 
some custom object that we write and then we can release the memory in the 
desctructor?


http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@270
PS8, Line 270: DeleteLocalRef
Isn't 'collection_scanner' a GlobalRef? We call DeleteLocalRef here so I'm a 
bit confused :)


http://gerrit.cloudera.org:8080/#/c/21125/8/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@288
PS8, Line 288:   env->DeleteLocalRef(item);
Same comment about leaking memory



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 02 Apr 2024 11:35:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@92
PS4, Line 92:   const ColumnType& type = slot_desc->type();
> no need to extract this into a variable
My bad, it in fact is used.



--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 09:20:25 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@92
PS4, Line 92:   const ColumnType& type = slot_desc->type();
no need to extract this into a variable



--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 09:19:41 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12899: Temporary workaround for BINARY in complex types

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21219 )

Change subject: IMPALA-12899: Temporary workaround for BINARY in complex types
..


Patch Set 4:

(3 comments)

Thanks for the patch! I general this looks good.

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21219/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@328
PS4, Line 328: return heap_byte_buffer_cl_;
I understand this will be needed for the permanent solution and not the 
NULLing, but I feel that a patch should contain only what it is required for 
that patch. Would it be possible to remove what is not needed now?


http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@813
PS4, Line 813: 
file_path":"hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/ic
not sure about this but for dockerised builds don't the file paths start with 
'/test-warehouse/' without 'hdfs://localhost:20500'? I think it's safer to have 
a regexp here


http://gerrit.cloudera.org:8080/#/c/21219/4/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@819
PS4, Line 819: 
One possible extra test case is to set the EXPAND_COMPLEX_TYPES true and do a 
select * on a metadata table.



--
To view, visit http://gerrit.cloudera.org:8080/21219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d834126c7d702a25e957bb6071ecbf0fda2c203
Gerrit-Change-Number: 21219
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 02 Apr 2024 08:56:31 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-04-02 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 5: Code-Review+2

Did a rebase to resolve git conflicts. carry +2


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 02 Apr 2024 07:59:13 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-04-02 Thread Gabor Kaszab (Code Review)

Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21210

to look at the new patch set (#5).

Change subject: IMPALA-12600: Schema evolution with equality delete files
..

IMPALA-12600: Schema evolution with equality delete files

This patch adds test coverage for a table that has equality delete
files and also schema evolution, where the schema changes didn't affect
the primary key columns.
Note, partition evolution on tables with equality deletes is still
not supported.

Testing:
  - Added a new test table for this use-case and some E2E tests on that
table.

Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
---
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
27 files changed, 1,034 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/5
--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy

[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-28 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h:

http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@63
PS4, Line 63:   /// Note that it returns a GlobalRef, that has to be released 
explicitly.
Probably I get this comment wrong, but I was searching for some code that 
releases whatever this function returns but didn't think anything relevant.


http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21125/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@123
PS3, Line 123:   VLOG(3) << "Skipping unsupported column type: " << 
slot_desc->type().type;
Does this set Binary cols NULL now?


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@125
PS4, Line 125:   env->DeleteLocalRef(accessed_value);
I think this 'accessed_value' is allocated one level above in the call chain, 
so I'd make that level responsible for the deallocation.


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@212
PS4, Line 212:   if constexpr (IS_ARRAY) {
Let's consider moving these DCHECKs into the same IF-ELSE at L228-234.


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@220
PS4, Line 220:   const TupleDescriptor* item_tuple_desc = 
slot_desc->children_tuple_descriptor();
nit: DCHECK item_tuple_desc is not null?


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@236
PS4, Line 236: remaining_array_size
name is misleading now that this is not just for arrays. remaining_items?


http://gerrit.cloudera.org:8080/#/c/21125/4/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@297
PS4, Line 297: const jobject* key_struct_like_row = 
key_slot_desc->type().IsStructType()
 : ?  : nullptr;
Can the key of a map be a struct?


http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@761
PS3, Line 761: 
> Added this query (except for "['spark.app.id']").
Can't we do a map_col.KEY and map_col.VALUE for maps? Would be nice to have 
some test coverage for these if possible.



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Mar 2024 15:01:52 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#10).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
18 files changed, 388 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/10
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 9:

After a rebase with master some Iceberg metadata table scanning tests broke. 
Apparently, this patch collides with the Array col querying from metadata 
tables resulting a Precondition check fail in TupleDescriptor.getNumNullBits().
The reason apparently, is that a struct is expected to be nullable. However, 
now with this change when in SlotDescriptor.setPath() I update the nullability 
flag, for metadata tables all the IcebergColumns will be not nullable. The 
IcebergColumns nullability comes from the Iceberg Schema for metadata tables 
from the Iceberg lib, and apparently e.g. partitions.partition is a struct but 
also not nullable and this gets caught byt the mentioned precondition. As a 
workaround I set all the metadata table cols nullable. Since we don't write 
these this causes no harm.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 08:34:36 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#9).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
18 files changed, 386 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/9
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12611: Add support to MAP type Iceberg Metadata table columns

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21125 )

Change subject: IMPALA-12611: Add support to MAP type Iceberg Metadata table 
columns
..


Patch Set 3:

(4 comments)

Thanks for the change, Dani! Frankly, reviewing these JNI changes isn't the 
easiest so I mostly focused on the tests so far. I'll keep reading the c++ 
implementation meanwhile.

http://gerrit.cloudera.org:8080/#/c/21125/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java:

http://gerrit.cloudera.org:8080/#/c/21125/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@163
PS3, Line 163: return iterator_.next();
nit: merge with line above


http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@761
PS3, Line 761: 
Checked the Iceberg-Spark page for metadata tables: 
https://iceberg.apache.org/docs/latest/spark-queries/#snapshots

This query seems interesting:
select
h.made_current_at,
s.operation,
h.snapshot_id,
h.is_current_ancestor,
s.summary['spark.app.id']
from prod.db.table.history h
join prod.db.table.snapshots s
  on h.snapshot_id = s.snapshot_id
order by made_current_at;
Do you know if such a query is feasible now with our map implementation?


http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@764
PS3, Line 764: # Describe all the metadata tables once
entries table seems to have embedded nested type cols. Could you try to add 
test coverage for them?


http://gerrit.cloudera.org:8080/#/c/21125/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@765
PS3, Line 765: 
Can you add a test that queries different collections, arrays, maps?



--
To view, visit http://gerrit.cloudera.org:8080/21125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a8b3a574ca45c893315c3b41b33ce4e0eff865a
Gerrit-Change-Number: 21125
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 16:57:14 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 16:09:10 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 3:

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 16:07:54 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 3:

PS3 is rebase with master + conflict resolution


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 16:07:40 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21210

to look at the new patch set (#3).

Change subject: IMPALA-12600: Schema evolution with equality delete files
..

IMPALA-12600: Schema evolution with equality delete files

This patch adds test coverage for a table that has equality delete
files and also schema evolution, where the schema changes didn't affect
the primary key columns.
Note, partition evolution on tables with equality deletes is still
not supported.

Testing:
  - Added a new test table for this use-case and some E2E tests on that
table.

Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
---
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
27 files changed, 1,034 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/3
--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#8).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 315 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/8
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21210 )

Change subject: IMPALA-12600: Schema evolution with equality delete files
..


Patch Set 2:

Thanks for taking a look, Zoli! Added a PlannerTest too.


--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:39:22 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21210

to look at the new patch set (#2).

Change subject: IMPALA-12600: Schema evolution with equality delete files
..

IMPALA-12600: Schema evolution with equality delete files

This patch adds test coverage for a table that has equality delete
files and also schema evolution, where the schema changes didn't affect
the primary key columns.
Note, partition evolution on tables with equality deletes is still
not supported.

Testing:
  - Added a new test table for this use-case and some E2E tests on that
table.

Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
---
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
27 files changed, 1,034 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/2
--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> Thanks. Maybe in the future Iceberg options will not be a subset of Kudu op
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@915
PS3, Line 915:
> Ok, it it's difficult we can leave it as it is.
Well, Iceberg by design only throws RuntimeExceptions, and at the call sites, 
it's not that easy to identify what the issue was. I mean the error messages 
are good, but I don't want to make decisions based on the error message


http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@890
PS5, Line 890: doubl
> "double" would be better in the name.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:10:43 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#6).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 315 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/6
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12600: Schema evolution with equality delete files

2024-03-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21210


Change subject: IMPALA-12600: Schema evolution with equality delete files
..

IMPALA-12600: Schema evolution with equality delete files

This patch adds test coverage for a table that has equality delete
files and also schema evolution, where the schema changes didn't affect
the primary key columns.
Note, partition evolution on tables with equality deletes is still
not supported.

Testing:
  - Added a new test table for this use-case and some E2E tests on that
table.

Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
---
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-20/3645a3085845c344-9698e594_1309071497_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-10-e4b47c78-9a7a-4d68-81d9-ab22e44a3630-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/0-11-a72caf13-6a91-4fd5-b509-54ec8b16864f-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-21/3645a3085845c344-9698e594_1656341410_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/data/d=2024-03-22/3645a3085845c344-9698e594_1634747934_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/94efa501-9664-420d-a524-30535c11d363-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/bf0f2c96-954e-4b3c-a686-8b06e9fd56e8-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/fb7a4022-ee0a-4540-87d8-b8fa8e4c8596-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-3986738438831924669-1-bf0f2c96-954e-4b3c-a686-8b06e9fd56e8.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-5816823095034839884-1-fb7a4022-ee0a-4540-87d8-b8fa8e4c8596.avro_tmp
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/snap-7131747670101362192-1-94efa501-9664-420d-a524-30535c11d363.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/v6.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_equality_delete_schema_evolution/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
26 files changed, 975 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21210/1
--
To view, visit http://gerrit.cloudera.org:8080/21210
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I125f72bade5b79bad5aaa6b676d6afaf3ca98395
Gerrit-Change-Number: 21210
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-26 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21179 )

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 26 Mar 2024 10:49:06 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java@181
PS3, Line 181:  if the column has options th
> What are these incompatible with? Could you clarify the name or add a comme
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180
PS3, Line 180:
> Could this be an ELSE IF?
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180
PS3, Line 180:   isNullable_ = icebergColumn.isNullable();
> We could add a comment like on L174, or extend that one.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@482
PS3, Line 482:
> Nit: 'tables have' would be better, now that we modify this comment.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@490
PS3, Line 490: if (
> Nit: ELSE IF should go on the previous line.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> Why don't we need a similar clause for Iceberg? A comment could clarify.
added a comment


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> Shouldn't we include this condition and return false in the "isIcebergTable
no. This is the case when the table is not Iceberg, not Kudu but still some 
Kudu column options are provided. The reason we don't have to check for Iceberg 
options here is that Iceberg options are subset of Kudu options.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@572
PS3, Line 572: } else if (!primaryKeyColNames_.isEmpty() && 
!isPrimaryKeyUnique()
> If non-unique primary keys are now supported in Iceberg, shouldn't we modif
L507 is a bit different. There are 2 ways to provide primary keys:
1: CREATE TABLE (i int not null primary key, j int);
2: CREATE TABLE (i int not null, j int, primary key(i));

With this patch we only support 2). That is more flexible btw, because you can 
add multiple cols into the PK, while with 1) you can't.

Changed the comment at L89.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@577
PS3, Line 577:
> Are not enforced and non unique PKs the same?
isPrimaryKeyUnique_ stores the info if NOT UNIQUE was provided or not. Added a 
comment about this at L89.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
File fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@174
PS3, Line 174: given columns
> Could mention 'primaryKeyColumnNames'.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@192
PS3, Line 192:
> Nit: Field.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2446
PS3, Line 2446: not enforced
> Is this a breaking change if this test used to work without "not enforced"?
well, previously Iceberg tables accepted primary key but it was simply ignored. 
I don't think this qualifies for a breaking change but let me know if you feel 
otherwise. Anyway this mimics the SQL syntax in Flink.


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@619
PS3, Line 619:  QUERY
> Could you add comments for the cases which don't have them? It would be inf
Done

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#5).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 313 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/5
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#4).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 311 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/4
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator

2024-03-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21138 )

Change subject: IMPALA-12809: Iceberg metadata table scanner should always be 
scheduled to the coordinator
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21138
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e
Gerrit-Change-Number: 21138
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Mar 2024 15:21:31 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21148 )

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Mar 2024 09:43:17 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21179 )

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 22 Mar 2024 09:41:52 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-21 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21148 )

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Mar 2024 15:29:49 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12879: Conjunct not referring to table field causes ERROR for Iceberg table

2024-03-21 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21179 )

Change subject: IMPALA-12879: Conjunct not referring to table field causes 
ERROR for Iceberg table
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1192
PS1, Line 1192: select * from iceberg_avro_format where rand() < 0.5;
Isn't this flaky because of the rand()? Or is it not that random? :)


http://gerrit.cloudera.org:8080/#/c/21179/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@1271
PS1, Line 1271: 
Would it make sense to involve some time travel too?



--
To view, visit http://gerrit.cloudera.org:8080/21179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id43a6798df3f4cc3a0e00ac610e25aa3b5781342
Gerrit-Change-Number: 21179
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 21 Mar 2024 15:13:10 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12809: Iceberg metadata table scanner should always be scheduled to the coordinator

2024-03-21 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21138 )

Change subject: IMPALA-12809: Iceberg metadata table scanner should always be 
scheduled to the coordinator
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21138/2/be/src/scheduling/schedule-state.h
File be/src/scheduling/schedule-state.h:

http://gerrit.cloudera.org:8080/#/c/21138/2/be/src/scheduling/schedule-state.h@107
PS2, Line 107:   bool is_root_coord_fragment;
now that we have 2 similar things 'is_root_coord_fragment' and 
'fragment.must_run_on_coordinator' could you add a comment here to make more 
clear what this is for.


http://gerrit.cloudera.org:8080/#/c/21138/2/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

http://gerrit.cloudera.org:8080/#/c/21138/2/common/thrift/Planner.thrift@51
PS2, Line 51:   15: required bool must_run_on_coordinator
I know the order of the IDs is confusing now, but I feel this would be better 
at the end of this struct.


http://gerrit.cloudera.org:8080/#/c/21138/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21138/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@93
PS2, Line 93: mustRunOnCoord_
nit: 'coordinatorOnly_' ?


http://gerrit.cloudera.org:8080/#/c/21138/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@193
PS2, Line 193: outputPartition_.equals(DataPartition.UNPARTITIONED));
'outputPartition_' is unconditionally set to UNPARTITIONED in this function so 
this check doesn't make much sense. Also, I;m not sure we want to make this a 
constructor param. Can we default it to false and set it true in a 
setMustRunOnCoord() or such? We can then assert on the 'outputPartition_' there.
Or did you mean 'dataPartition_'?



--
To view, visit http://gerrit.cloudera.org:8080/21138
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4397f64e9def42d2b84ffd7bc14ff31df27d58e
Gerrit-Change-Number: 21138
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Mar 2024 15:04:04 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-21 Thread Gabor Kaszab (Code Review)

Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#3).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 280 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/3
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-19 Thread Gabor Kaszab (Code Review)

Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#2).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
14 files changed, 272 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/2
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-12903: Querying virtual column FILE POSITION for TEXT and JSON tables crashes Impala

2024-03-14 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21148 )

Change subject: IMPALA-12903: Querying virtual column FILE__POSITION for TEXT 
and JSON tables crashes Impala
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test:

http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-generic.test@159
PS1, Line 159: select file__position, year, month, id, date_string_col
nit: could you add a comment that in this test we prune partitions that doesn't 
support virtual columns so the query can run successfully?


http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test:

http://gerrit.cloudera.org:8080/#/c/21148/1/testdata/workloads/functional-query/queries/QueryTest/virtual-column-file-position-negative.test@40
PS1, Line 40: Virtual column FILE__POSITION is not supported
Could you replace some of the FILE__POSITIONS to some other virtual columns 
like INPUT__FILE__NAME?



--
To view, visit http://gerrit.cloudera.org:8080/21148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e1af8d526f9046aceddb5944da9e6f9c63768b0
Gerrit-Change-Number: 21148
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 14 Mar 2024 16:58:28 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-14 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21149


Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
13 files changed, 261 insertions(+), 48 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/1
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21139

to look at the new patch set (#4).

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..

IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that mess up
the logic in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
32 files changed, 1,009 insertions(+), 244 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/4
--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit

[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg 
tables
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21139/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21139/2//COMMIT_MSG@11
PS2, Line 11:  u
> Nit: mess (plural).
Done


http://gerrit.cloudera.org:8080/#/c/21139/2/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/21139/2/testdata/data/README@1093
PS2, Line 1093: iceberg_spark_compaction_with_dangling_delete:
> Could you also provide the SQL commands? It would then be possible to exact
Done



--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:05:18 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21139

to look at the new patch set (#3).

Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg 
tables
..

IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that mess up
the logic in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
32 files changed, 1,009 insertions(+), 244 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/3
--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg 
tables
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21139/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21139/1//COMMIT_MSG@12
PS1, Line 12: logic
> logic?
Done


http://gerrit.cloudera.org:8080/#/c/21139/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/21139/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1473
PS1, Line 1473:
> nit: missing space
Done



--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 13:39:23 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Gabor Kaszab (Code Review)

Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21139

to look at the new patch set (#2).

Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg 
tables
..

IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that messes up
the logic in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
32 files changed, 998 insertions(+), 244 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/2
--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

[Impala-ASF-CR] IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-12 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21139


Change subject: IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg 
tables
..

IMPALA-12894: Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that messes up
the login in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
6 files changed, 408 insertions(+), 244 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/1
--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-03-05 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..

IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

This commit adds support for reading ARRAY columns inside Iceberg
Metadata tables.

The change starts with some refactoring, to consolidate accessing JVM
through JNI a new backend class was introduced, IcebergMetadataScanner.
This class is the C++ part of the Java IcebergMetadataScanner, it is
responsible to manage the Java scanner object.

In Iceberg array types do not have accessors, so structs inside arrays
have to be accessed by position, for the value obtaining logics have been
changed to allow access by position.

The IcebergRowReader needed an IcebergMetadataScanner, so that it can
iterate over the arrays returned by the scanner and add them to the
collection.

This change will not cover MAP, these slots are set to NULL, it will
be done in IMPALA-12611.

Testing:
 - Added E2E tests.

Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Reviewed-on: http://gerrit.cloudera.org:8080/21061
Tested-by: Impala Public Jenkins 
Reviewed-by: Gabor Kaszab 
---
M be/src/exec/iceberg-metadata/CMakeLists.txt
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
A be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
A be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/service/impalad-main.cc
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
11 files changed, 734 insertions(+), 292 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Gabor Kaszab: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 13
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-03-05 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 12
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 05 Mar 2024 14:56:39 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations

2024-03-05 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21099 )

Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta 
operations
..


Patch Set 2: Code-Review+2

Thanks for the answers, Zoltan!


--
To view, visit http://gerrit.cloudera.org:8080/21099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c
Gerrit-Change-Number: 21099
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 05 Mar 2024 09:50:37 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations

2024-03-04 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21099 )

Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta 
operations
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21099/2/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test:

http://gerrit.cloudera.org:8080/#/c/21099/2/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test@421
PS2, Line 421: BUFFERED DELETE FROM ICEBERG 
[functional_parquet.iceberg_v2_partitioned_position_deletes-POSITION-DELETE]
I'm wondering if getting rid of DELETE FROM ICEBERG in this patch is how 
tightly related to the issue we are about to solve with this patch. For me it 
seems kind of a separate stuff at first sight.
If you say that this holds low or non-existing risk then I'm fine keeping the 
patch as is now, I just want to make sure that we won't introduce any further 
risks with fixing one issue. e.g. perf degradation with switching from 
DeleteSink to BufferedDeleteSink


http://gerrit.cloudera.org:8080/#/c/21099/2/tests/stress/test_update_stress.py
File tests/stress/test_update_stress.py:

http://gerrit.cloudera.org:8080/#/c/21099/2/tests/stress/test_update_stress.py@279
PS2, Line 279: # Prepare INSERT statement of 'num_rows' records.
nit: I meant to add it to a comment that this code prefers the INSERT in such a 
way that all the rows will be added in a single insert to have a single file 
(opposed to the similar part of the test above that adds a separate file for 
each row).



--
To view, visit http://gerrit.cloudera.org:8080/21099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c
Gerrit-Change-Number: 21099
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 04 Mar 2024 20:24:51 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12860: Invoke validateDataFilesExist for RowDelta operations

2024-03-04 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21099 )

Change subject: IMPALA-12860: Invoke validateDataFilesExist for RowDelta 
operations
..


Patch Set 1:

(7 comments)

great job with this quick fix, Zoltan! I added some nits but in general the 
patch is good.

http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h@135
PS1, Line 135:  const std::vector& 
ReferencedDataFilesByPositionDeletes() {
since it returns const ref, can the function also be const?


http://gerrit.cloudera.org:8080/#/c/21099/1/be/src/runtime/dml-exec-state.h@170
PS1, Line 170: referenced_data_files_by_position_deletes_
nit: data_files_referenced_by_position_deletes


http://gerrit.cloudera.org:8080/#/c/21099/1/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/21099/1/common/protobuf/control_service.proto@115
PS1, Line 115: referenced_data_files_by_position_deletes
nit: data_files_referenced_by_position_deletes?


http://gerrit.cloudera.org:8080/#/c/21099/1/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/21099/1/common/thrift/CatalogService.thrift@249
PS1, Line 249: referenced_data_files_by_position_deletes
nit: data_files_referenced_by_position_deletes maybe?


http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py
File tests/stress/test_update_stress.py:

http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@269
PS1, Line 269: test_concurrent_deletes_and_updates
nit: this table name omits optimize. test_concurrent_write_and_optimize?


http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@277
PS1, Line 277: for i in range(num_rows):
Could you add a comment that here you prepare the INSERT query so that these 
rows can be put into the same file?


http://gerrit.cloudera.org:8080/#/c/21099/1/tests/stress/test_update_stress.py@283
PS1, Line 283: flag
I know we've been using this variable name for a while now, but I feel that the 
name doesn't suggest anything what it is used for. Can we rename this 
'all_rows_deleted' or such?



--
To view, visit http://gerrit.cloudera.org:8080/21099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4869eb863ff0afe8f691ccf2fd681a92d36b405c
Gerrit-Change-Number: 21099
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 04 Mar 2024 12:15:41 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-02-29 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 8
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Feb 2024 15:29:28 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-29 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..


Patch Set 11: Code-Review+2

carry +2 from Zoltan


--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Feb 2024 15:17:35 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-29 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20951/9/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/20951/9/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1507
PS9, Line 1507:
> typo: field
Done



--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Feb 2024 15:16:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-29 Thread Gabor Kaszab (Code Review)

Hello Andrew Sherman, Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#10).

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

This patch adds support for reading Iceberg tables that have
different equality field ID lists associated to different equality
delete files. In practice this is a use case when one equality delete
file deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink and NiFi was used for creating some test tables with the
desired equality field IDs. Coverage on these tables are added to
the test suite.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro
A

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-02-29 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21061/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21061/7/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@250
PS7, Line 250:   VLOG(3) << "Skipping unsupported column type: " << 
type.type;
ahh, I just noticed that we just write a log line if there is an unsupported 
column type. Shouldn't we run into a DCHECK failure or such in this case?


http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@715
PS6, Line 715: # Describe all the metadata tables once
> Ok, I'm going to add support for MAPs probably soon anyway. But please ment
Expand complex types would come after we have MAP support anyway. So I still 
think that giving an error instead of a null would make sense if it's not that 
complex to implement.



--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 7
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Feb 2024 14:41:16 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-02-29 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21061/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21061/6//COMMIT_MSG@18
PS6, Line 18: alue
nit: value


http://gerrit.cloudera.org:8080/#/c/21061/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/21061/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h@21
PS6, Line 21: #include "exec/scan-node.h"
I think even including scan-node.h in the header could be avoided by forward 
declarations


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@222
PS3, Line 222:   if 
(ExecNode::EvalConjuncts(scan_node_->conjunct_evals().data(),
> I just realised that with the current implementation it won't be possible t
well, if this code doesn't work then let's remove it. What would be needed to 
make this work? FROM clause collections, and predicates on the ITEM, right?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@248
PS3, Line 248: } case TYPE_STRUCT: {
> Yes, but I think it helps to understand what is happening here.
But then we return nullptr for TYPE_STRUCT that could cause issues on the 
caller side. If this method is not expected to be called for TYPE_STRUCT then 
it's cleaner to get rid of that case.


http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java
File fe/src/main/java/org/apache/impala/analysis/FromClause.java:

http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java@169
PS3, Line 169:   throw new AnalysisException("Querying collection types 
(ARRAY/MAP) in FROM " +
> Adding Jira:  IMPALA-12853
Thanks for creating a Jira! Could you also mention the Jira ID here in a 
comment?


http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@624
PS6, Line 624: 
'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq','NULL'
Could you also add the content col to the select list so that we can see which 
files are eq-delete files. Would make it easier to verify that only eq-delete 
files (content=2) have eq-id list


http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@663
PS6, Line 663: select null_value_counts from 
functional_parquet.iceberg_query_metadata.all_files;
This seems a duplicate test with the one on L715


http://gerrit.cloudera.org:8080/#/c/21061/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@715
PS6, Line 715: select null_value_counts from 
functional_parquet.iceberg_query_metadata.all_files;
Shouldn't we get a not supported error when querying map columns from a 
metadata table? Returning nulls might be misleading



--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 6
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Feb 2024 11:33:24 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-02-27 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..


Patch Set 3:

(14 comments)

I managed to check the c++ part too

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@58
PS3, Line 58: jobject& result
This should be pointer as Daniel said


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@93
PS3, Line 93:   /// Accessor map for the scan result, pairs the slot ids with 
the java Accessor objects.
This is not a map of accessors now but a slot ID to field ID mapping.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@103
PS3, Line 103: Accessor collection
I think this comment is off now.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@109
PS3, Line 109: const SlotDescriptor*
Don't we use const ref types for immutable params instead of const pointers?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@113
PS3, Line 113: SlotDescriptor*
const ref instead


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@88
PS3, Line 88: Status IcebergMetadataScanner::InitSlotIdFieldIdMap(JNIEnv* env) {
Could you help me understand if this functiondoes or should go into 
collections, e.g. get the field ids for struct members inside collections?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@145
PS3, Line 145: clazz
I saw this param name elsewhere too, but not sure if this is something 
conventional or if it makes something, but seems a weird name for me.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@155
PS3, Line 155: Primitive inside an ARRAY
Can we DCHECK on this fact?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@22
PS3, Line 22: #include "exec/iceberg-metadata/iceberg-metadata-scanner.h"
I think we can get rid of this include with a forward declaration.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@114
PS3, Line 114: env->DeleteLocalRef(accessed_value);
I'm a bit lost with these refs, but aren't local refs released automatically 
when going out from the scope?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@222
PS3, Line 222:   if 
(ExecNode::EvalConjuncts(scan_node_->conjunct_evals().data(),
Could you add a test that exercises the path where EvalConjuncts is false?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@237
PS3, Line 237: For
nit: I'd fing 'From' instead of 'For' easier to understand


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@248
PS3, Line 248: } case TYPE_STRUCT: {
Would it work to simply remove this case?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@252
PS3, Line 252: return list_cl_;
Is this java.lang.util.List? Please add a comment similarly to the other types



--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 27 Feb 2024 16:43:07 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables

2024-02-26 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
..


Patch Set 3:

(9 comments)

Thanks for the patch, Tamas! I checked the tests and the Java part. Will take a 
deeper look at the rest tomorrow.

http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG@18
PS3, Line 18: the the
nit: duplicate 'the'


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@21
PS3, Line 21: #include "exec/scan-node.h"
Can we not include scan-node.h?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/service/impalad-main.cc
File be/src/service/impalad-main.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/service/impalad-main.cc@32
PS3, Line 32: #include "exec/iceberg-metadata/iceberg-metadata-scan-node.h"
This include can be removed I think


http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java
File fe/src/main/java/org/apache/impala/analysis/FromClause.java:

http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/analysis/FromClause.java@169
PS3, Line 169: tblRef.getDesc().getType().isMapType()) {
If I'm not mistaken, this check is for the case when a collection is provided 
in the FROM clause. If this use case is deliberately supported then please add 
test coverage too.


http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java:

http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@116
PS3, Line 116: StructLke
typo


http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@137
PS3, Line 137: private Iterator iterator;
If we store the iterator as a member, is there a guarantee that the underlying 
array is not destroyed?


http://gerrit.cloudera.org:8080/#/c/21061/3/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@146
PS3, Line 146: return iterator.next();
nit: could fit into the line above


http://gerrit.cloudera.org:8080/#/c/21061/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21061/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@567
PS3, Line 567: # Test 11 : Query ARRAY type columns
The test above is also "Test 11". I don't think that giving a number of these 
tests make sense, it just makes it complicated to maintain and to add anything 
to the middle.


http://gerrit.cloudera.org:8080/#/c/21061/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@591
PS3, Line 591: 
row_regex:'/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/.*','NULL'
The name of the files in this table is fix, so we could spell the full path out 
in the expected results, in my opinion.



--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 26 Feb 2024 16:22:30 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-02-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
File 
fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1261
PS6, Line 1261: // Test that even if we have privileges on a different table in 
the same db, we
  :   // still can't access 'iceberg_query_metadata' if we don't
> Added a comment to clarify.
thanks!


http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py
File tests/authorization/test_authorization.py:

http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@129
PS6, Line 129: # Make sure there is no privilege stuck from the previous 
execution of the test.
 :   admin_client.execute("revoke {priv} on database {db} from 
user {user}".format(
 :   priv=priv, db=unique_db, user=user), user=ADMIN)
> Interestingly if I delete the database but do not drop the privileges next
I think if we can't justify the existence of some code then that code shouldn't 
exist. What I can think of is that when that other test was being written in 
some intermediate state there were test issues resulting that the DB or the 
privilege remained and hence this extra safety net. I don't think that this can 
happen with this test.
I saw in other tests that creating the DB is in an outer try-finally block 
while granting the privileges is in an internal try-finally (I recall I also 
wrote something similar not that long ago).

Note that dropping the DB will not drop the privileges associated with it so 
once you recreate the DB with the same name the old privileges will be there. 
Hence the need for an extra step to drop the privileges too (from Ranger).


http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@137
PS6, Line 137:   assert str(exc1).strip() + non_existent_suffix == 
str(exc2).strip()
> The error message contains the username (which depends on where the test is
We can then assert on "AuthorizationException" and "does not have privileges to 
access" are part of the error message.



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2024 16:15:57 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20951/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20951/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@520
PS5, Line 520:*/
> Nit: If you made equalityDeletesRecordCount_ a parameter you could write a
Done


http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/data/README@850
PS5, Line 850: primary key(i)
> Impala doesn't do anything with this clause, right?
ATM you are right, however I have a working PoC for IMPALA-12729 where I can 
populate the identifier-field-ids with primary keys so for this test I used 
that dev Impala to create the table. Since Nifi is not able to create tables, I 
did't want to involve 3 engines for this test: 1) create with Flink 2) populate 
with Nifi 3) query with Impala.


http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test:

http://gerrit.cloudera.org:8080/#/c/20951/5/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1583
PS5, Line 1583: cardinality=8
> Would be nice to fix IMPALA-11797 later to get better cardinalities. Or, if
I opened a jira for eq-deletes similarly to IMPALA-12371:
IMPALA-12826



--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2024 13:31:05 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-22 Thread Gabor Kaszab (Code Review)

Hello Andrew Sherman, Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#7).

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

This patch adds support for reading Iceberg tables that have
different equality field ID lists associated to different equality
delete files. In practice this is a use case when one equality delete
file deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink and NiFi was used for creating some test tables with the
desired equality field IDs. Coverage on these tables are added to
the test suite.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro
A

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-22 Thread Gabor Kaszab (Code Review)

Hello Andrew Sherman, Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#6).

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

This patch adds support for reading Iceberg tables that have
different equality field ID lists associated to different equality
delete files. In practice this is a use case when one equality delete
file deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink and NiFi was used for creating some test tables with the
desired equality field IDs. Coverage on these tables are added to
the test suite.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro
A

[Impala-ASF-CR] IMPALA-12609: Implement SHOW METADATA TABLES IN statement to list Iceberg Metadata tables

2024-02-22 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21026 )

Change subject: IMPALA-12609: Implement SHOW METADATA TABLES IN statement to 
list Iceberg Metadata tables
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/main/cup/sql-parser.cup@2936
PS6, Line 2936: ident_or_default:tbl
nit: this fits into the line above similarly to L2933


http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@2044
PS6, Line 2044: ParserError("SHOW TABLES IN db.tbl");
Can you add a test here where you don't provide a full path just the table name 
for SHOW METADATA TABLES? Or if it'd be an Analysis error could you add it to 
AnalyzeDDLTest?


http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
File 
fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/21026/6/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@1261
PS6, Line 1261: 
test.error(accessError("functional_parquet.iceberg_query_metadata"),
  :   onTable("functional_parquet", "alltypes", privilege));
I admit I'm not really familiar with these tests but I don't get this part 
where we expect access error on 'iceberg_query_metadat' but we also have this 
onTable on 'alltypes.'


http://gerrit.cloudera.org:8080/#/c/21026/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/21026/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@857
PS6, Line 857: show metadata tables in 
functional_parquet.iceberg_query_metadata;
Just a note, that this test could break when we bump the Iceberg version and 
the new version has some new metadata tables. Not much we can do about this.


http://gerrit.cloudera.org:8080/#/c/21026/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@879
PS6, Line 879: show metadata tables in functional_parquet.alltypestiny;
Could you try something similar on a view that is on top of an Iceberg table?


http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py
File tests/authorization/test_authorization.py:

http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@123
PS6, Line 123: admin_client.execute("drop database if exists {} 
cascade".format(unique_db),
 :   user=ADMIN)
 :   admin_client.execute("create database 
{}".format(unique_db), user=ADMIN)
I'm wondering what is the case when this db exists already. We drop this in the 
finally section so it couldn't be remaining from previous failed tests.


http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@129
PS6, Line 129: # Make sure there is no privilege stuck from the previous 
execution of the test.
 :   admin_client.execute("revoke {priv} on database {db} from 
user {user}".format(
 :   priv=priv, db=unique_db, user=user), user=ADMIN)
Similarly to above, there can't be any remaining privs from previous tests 
since you remove priv in the finally block.


http://gerrit.cloudera.org:8080/#/c/21026/6/tests/authorization/test_authorization.py@137
PS6, Line 137:   assert str(exc1).strip() + non_existent_suffix == 
str(exc2).strip()
Can't you assert on the actual error string?



--
To view, visit http://gerrit.cloudera.org:8080/21026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide10ccf10fc0abf5c270119ba7092c67e712ec49
Gerrit-Change-Number: 21026
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2024 08:28:05 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-08 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..


Patch Set 5:

PS5 is a rebase with master and some conflict resolution


--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 08 Feb 2024 16:26:49 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-08 Thread Gabor Kaszab (Code Review)

Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#5).

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

This patch adds support for reading Iceberg tables that have
different equality field ID lists associated to different equality
delete files. In practice this is a use case when one equality delete
file deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink and NiFi was used for creating some test tables with the
desired equality field IDs. Coverage on these tables are added to
the test suite.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m1.avro
A

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-08 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..


Patch Set 4:

(12 comments)

Thanks for taking a look Tamas and Zoli!

About your question, Tamas: the eq IDs were stored on a ContentFileStore level 
when there was this restriction that all the eq-delete files have to have the 
same eq ID lists. However, with this change I had to move them from the 
ContentFileStore to a per FileDescriptor level, and that is stored in FB. So 
it's not a question of Thrift vs FB rather on what abstraction level we want to 
store these IDs.

http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@7
PS3, Line 7: d
> nit: d
Done


http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@9
PS3, Line 9: hav
> nit: have
Done


http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs
File common/fbs/IcebergObjects.fbs:

http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs@53
PS3, Line 53: equality_field_ids
> I saw this in multiple places, the new term became equality_fiel_ids, in ma
Flink refers to these IDs as equality field IDs, while the Iceberg spec as 
equality IDs. Also in the vxmetadata.json in the Iceberg metadata this is 
called 'identifier-field-ids' so even Iceberg is not consistent with the naming.
I think they can be used as synonyms, and I believe I spelled the whole 
'equalityFiledID' out in most of the cases, while where I found a variable name 
too long I decided to omit the 'field' part. I feel that both naming are 
straightforward to understand.


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@126
PS3, Line 126:   private Set allEqualityFieldIds_ = new HashSet<>();
 :   private Map, Set> 
equalityIdsToDeleteFiles_ =
 :   new HashMap<>();
> nit: Here we use both equality ids and equality field ids.
well, I found equalityFieldIdsToDeleteFiles_ too long for a member name so 
decided to shorten it a bit. It's still easy to understand what we mean by 
equalityIds I think.


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@136
PS3, Line 136: field
> typo: field
Done


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@510
PS3, Line 510:
> nit: D
Done


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@520
PS3, Line 520:
> This way [11, 12] would come earlier than [2, 3]. It shouldn't make much di
With this ordering I figured that the plan for a query should be deterministic 
and we shouldn't see a change in the order of the JOINs for the eq-deletes. For 
planner tests this would eliminate flakiness for example. So [11] coming before 
[2] isn't really an issue, only looks strange.

Anyway, you're right, a custom Comparator makes sense here and then I won't 
need to do these string conversions back and forth.


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/data/README@884
PS3, Line 884: 4-Impala:
> You also mention NiFi in the commit message, but here I don't see any refer
When I created these table Nifi's PutIcebergCDC processor wrote all the columns 
into the eq-delete files so for these tests I only had Flink. However, since 
then there is a new version of the mentioned Nifi processor that writes onlt 
the identifier fields (PKs) into the eq-deletes so I can rewrite now one of the 
tables to use Nifi as the writer.


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/datasets/functional/functional_schema_template.sql@3643
PS3, Line 3643:
> Do we need this? This is the default for Impala.
I don't think this is needed indeed. Some other tables has this as well, 
removed those too.


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1396
PS3, Line 1396: ns=1/1 files=
> All EQ delete group has cardinality=2, which means we don't have test for t
Reworked iceberg_v2_delete_equality_multi_eq_ids  to serve this

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

2024-02-08 Thread Gabor Kaszab (Code Review)

Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#4).

Change subject: IMPALA-12598: Allow multiple equality field id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality field id lists for Iceberg tables

This patch adds support for reading Iceberg tables that have
different equality field ID lists associated to different equality
delete files. In practice this is a use case when one equality delete
file deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink and NiFi was used for creating some test tables with the
desired equality field IDs. Coverage on these tables are added to
the test suite.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-22-09b63f85-c950-4d32-a9fd-0abd5229d768-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-23-1365d982-ca94-4e1d-9b86-2f973c000cf0-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-24-94d478ff-204b-4eb4-9a71-9876a31daf64-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-25-8814a344-4216-4d37-9e9b-091de336e3fa-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/2eb78a39-190a-4eab-b24a-df07f32f2cc0-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/58a6295d-0076-4e3e-ac77-84ce48c406cf-m1.avro
A

[Impala-ASF-CR] IMPALA-12072: Include snapshot id of Iceberg tables in query plan / profile

2024-02-07 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20204 )

Change subject: IMPALA-12072: Include snapshot id of Iceberg tables in query 
plan / profile
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee0b4967429ea733729ad8e44df32e3b24b88525
Gerrit-Change-Number: 20204
Gerrit-PatchSet: 14
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 07 Feb 2024 11:42:34 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12072: Include snapshot id of Iceberg tables in query plan / profile

2024-02-06 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20204 )

Change subject: IMPALA-12072: Include snapshot id of Iceberg tables in query 
plan / profile
..


Patch Set 13:

(3 comments)

Thanks for the change Zoli and Gergo! In general I think the patch is good, I 
just had some nits.

http://gerrit.cloudera.org:8080/#/c/20204/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/20204/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1327
PS13, Line 1327: runPlannerTestFile("iceberg-v2-update", 
"functional_parquet",
I see you added the snapshot ids to the .test file. Shouldn't you also add the 
VALIDATE_ICEBERG_SNAPSHOT_IDS option here?

Same goes for 'insert-sort-by-zorder' and 'iceberg-predicates'.


http://gerrit.cloudera.org:8080/#/c/20204/13/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
File testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test:

http://gerrit.cloudera.org:8080/#/c/20204/13/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test@172
PS13, Line 172: |  Per-Host Resources: mem-estimate=20.00MB 
mem-reservation=4.01MB thread-reservation=2
nit: I see some changes in this test that are irrelevant for this patch.


http://gerrit.cloudera.org:8080/#/c/20204/13/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test@356
PS13, Line 356: |  tuple-ids=0 row-size=36B cardinality=2
Same as above with these cardinality numbers



--
To view, visit http://gerrit.cloudera.org:8080/20204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee0b4967429ea733729ad8e44df32e3b24b88525
Gerrit-Change-Number: 20204
Gerrit-PatchSet: 13
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Feb 2024 15:42:44 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12787: Concurrent DELETE and UPDATE operations on Iceberg tables can be problematic

2024-02-06 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20999 )

Change subject: IMPALA-12787: Concurrent DELETE and UPDATE operations on 
Iceberg tables can be problematic
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e581ea17fa8f6ccd9c87aaad1281bb694079f6e
Gerrit-Change-Number: 20999
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Feb 2024 13:11:08 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error

2024-01-26 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20954 )

Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE 
fails with error
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8
Gerrit-Change-Number: 20954
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jan 2024 22:04:33 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE fails with error

2024-01-26 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20954 )

Change subject: IMPALA-12742: DELETE/UPDATE Iceberg table partitioned by DATE 
fails with error
..


Patch Set 1: Code-Review+1

(3 comments)

Thanks for the quick fix, Zoltan! In general the patch seems fine, I only have 
some nits.

http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20954/1/be/src/exec/iceberg-delete-sink-base.cc@91
PS1, Line 91: DCHECK(!IsTimestamp(scalar_type));
I don't think we need these DCHECKs here since you give an error from the below 
if anyway.


http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java:

http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@21
PS1, Line 21: import org.apache.impala.catalog.PrimitiveType;
Is this used?


http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/20954/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@562
PS1, Line 562:   transformParam), 
Type.fromTScalarType(field.getType(;
nit: For me it was a bit misleading that this param got into the same line as 
the end of IcebergPartitionTransform constructor call. Maybe in a new line it 
would express better for the reader that it belongs to the 
IcebergPartitionField creation.



--
To view, visit http://gerrit.cloudera.org:8080/20954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I506f95527e741efe18c71706e2cdea51b45958b8
Gerrit-Change-Number: 20954
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Fri, 26 Jan 2024 11:37:13 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Gabor Kaszab (Code Review)

Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#3).

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

This patch adds support for reading Iceberg tables that has different
equality field ID lists associated to different equality delete
files. In practice this is a use case when one equality delete file
deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink was used for creating some test tables with the desired
equality field IDs. Coverage on these tables are added to the
test suite.
  - Also did some experiments creating test tables using NiFi.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro
A

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Gabor Kaszab (Code Review)

Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#2).

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

This patch adds support for reading Iceberg tables that has different
equality field ID lists associated to different equality delete
files. In practice this is a use case when one equality delete file
deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink was used for creating some test tables with the desired
equality field IDs. Coverage on these tables are added to the
test suite.
  - Also did some experiments creating test tables using NiFi.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro
A

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20951


Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

This patch adds support for reading Iceberg tables that has different
equality field ID lists associated to different equality delete
files. In practice this is a use case when one equality delete file
deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink was used for creating some test tables with the desired
equality field IDs. Coverage on these tables are added to the
test suite.
  - Also did some experiments creating test tables using NiFi.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro
A

[Impala-ASF-CR] IMPALA-12673: Table migration fails if partition contains '/'

2024-01-03 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20845


Change subject: IMPALA-12673: Table migration fails if partition contains '/'
..

IMPALA-12673: Table migration fails if partition contains '/'

Due to Iceberg #7612 migrating a table to Iceberg resulted in incorrect
data and stats if some of the string partition fields contained '/'
character. As a result we deliberately rejected migrating such tables.
Now that Impala uses an Iceberg version that has the fix we can allow
migrating such tables too.

Change-Id: I05b4ca44c7edb81cee6747f83a5bd82c5a4b5c44
---
M fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test
2 files changed, 8 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/20845/1
--
To view, visit http://gerrit.cloudera.org:8080/20845
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I05b4ca44c7edb81cee6747f83a5bd82c5a4b5c44
Gerrit-Change-Number: 20845
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab

[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-19 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 11
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Dec 2023 07:29:41 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-19 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 10
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Dec 2023 13:11:18 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-19 Thread Gabor Kaszab (Code Review)

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20753/8/tests/query_test/test_iceberg.py@1264
PS8, Line 1264:
  :   @SkipIfDockerizedCluster.internal_hostname
  :   @SkipIf.hardcoded_uris
  :   def test_multiple_equality_ids(self, unique_database):
  : """This test loads an Iceberg table that has 2 equality 
delete files with different
  : equality ID lists. A query on such a table fails due to 
lack of support."""
  : SRC_DIR = os.path.join(os.environ['IMPALA_HOME'],
  : "testdata/data/iceberg_test/hadoop_catalog/ice/"
  : "iceberg_v2_delete_different_equality_ids")
  : DST_DIR = 
"/test-warehouse/iceberg_test/hadoop_catalog/ice/" \
  : "iceberg_v2_delete_different_equality_ids"
> I see, thanks for giving it a try.
I think this would add some extra complexity for this test but the gain 
wouldn't be that much because this is an error-case anyway.



--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Dec 2023 12:33:36 +
Gerrit-HasComments: Yes

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1122 matches

Mail list logo