[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-02-02 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..


Patch Set 3:

(6 comments)

Thanks Gabor for working on this, the change looks great!

http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@136
PS3, Line 136: filed
typo: field


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@520
PS3, Line 520: [1,2] => "[1, 2]
This way [11, 12] would come earlier than [2, 3]. It shouldn't make much 
difference, as this ordering is quite arbitrary anyway, so it should be OK I 
think. Alternatively, can't we just use/write a comparator that is able to 
compare two lists of integers?


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/data/README@884
PS3, Line 884:
You also mention NiFi in the commit message, but here I don't see any reference 
to it. Probably those were manual tests only, and the tables weren't added to 
the test corpus, right? Maybe it'd be worth clarifying in the commit message.


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/datasets/functional/functional_schema_template.sql@3643
PS3, Line 3643: 'write.update.mode'='merge-on-read'
Do we need this? This is the default for Impala.


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1396
PS3, Line 1396: cardinality=2
All EQ delete group has cardinality=2, which means we don't have test for the 
ordering between them.


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1543
PS3, Line 1543: 
Would be nice to see some time travel plans.



--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 02 Feb 2024 15:22:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-26 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..


Patch Set 3:

(6 comments)

Hi Gabor, thank you for the feature, it looks great.
Marked some nits and I have a question, what was the reason behind moving from 
Thrift to FlatBuffer?

http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@7
PS3, Line 7: d
nit: d


http://gerrit.cloudera.org:8080/#/c/20951/3//COMMIT_MSG@9
PS3, Line 9: has
nit: have
It refers to the plural object.


http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs
File common/fbs/IcebergObjects.fbs:

http://gerrit.cloudera.org:8080/#/c/20951/3/common/fbs/IcebergObjects.fbs@53
PS3, Line 53: equality_field_ids
I saw this in multiple places, the new term became equality_fiel_ids, in many 
places it was left as equality_ids.
I will mark some and I think we should use one of them to avoid confusion and 
to be consistent. Iceberg uses equality_ids to refer to equality field ids.
https://iceberg.apache.org/spec/


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@126
PS3, Line 126:   private Set allEqualityFieldIds_ = new HashSet<>();
 :   private Map, Set> 
equalityIdsToDeleteFiles_ =
 :   new HashMap<>();
nit: Here we use both equality ids and equality field ids.


http://gerrit.cloudera.org:8080/#/c/20951/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@510
PS3, Line 510: d
nit: D


http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test:

http://gerrit.cloudera.org:8080/#/c/20951/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-equality-deletes.test@114
PS3, Line 114: filed
nit: field



--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 26 Jan 2024 10:33:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15053/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Jan 2024 13:33:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Gabor Kaszab (Code Review)
Hello Tamas Mate, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#3).

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

This patch adds support for reading Iceberg tables that has different
equality field ID lists associated to different equality delete
files. In practice this is a use case when one equality delete file
deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink was used for creating some test tables with the desired
equality field IDs. Coverage on these tables are added to the
test suite.
  - Also did some experiments creating test tables using NiFi.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro
A 

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/15051/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Jan 2024 10:49:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/15050/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Jan 2024 10:47:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Gabor Kaszab (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20951

to look at the new patch set (#2).

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

This patch adds support for reading Iceberg tables that has different
equality field ID lists associated to different equality delete
files. In practice this is a use case when one equality delete file
deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink was used for creating some test tables with the desired
equality field IDs. Coverage on these tables are added to the
test suite.
  - Also did some experiments creating test tables using NiFi.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro
A 

[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20951 )

Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20951/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20951/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@460
PS1, Line 460:   Set equalityDeleteFiles = 
equalityIdsToDeleteFiles_.get(equalityIds);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/20951/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@478
PS1, Line 478:   // TODO IMPALA-12608: As an optimization we can populate 
the conjuncts below that are
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
Gerrit-Change-Number: 20951
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 25 Jan 2024 10:28:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

2024-01-25 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20951


Change subject: IMPALA-12598: Allow multiple equality filed id lists for 
Iceberg tables
..

IMPALA-12598: Allow multiple equality filed id lists for Iceberg tables

This patch adds support for reading Iceberg tables that has different
equality field ID lists associated to different equality delete
files. In practice this is a use case when one equality delete file
deletes by e.g. columnA and columnB while another one deletes by
columnB and columnC.

In order to achieve such functionality the plan tree creation needed
some adjustments so that it can create separate LEFT ANTI JOIN nodes
for the different equality field ID lists.

Testing:
  - Flink was used for creating some test tables with the desired
equality field IDs. Coverage on these tables are added to the
test suite.
  - Also did some experiments creating test tables using NiFi.

Change-Id: I3e52d7a5800bf1b479f0c234679be92442d09f79
---
M common/fbs/IcebergObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/af4e128ee3256830-d9bd9e2f_1372039299_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-41417e7df44b347b-e03500960001_138281890_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/data/delete-61438487836ebfcc-95c9ce7a_909175610_data.0.parq
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/2d3fafd7-bce6-483f-be82-e0ccce9203fc-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/57a963d3-0e4e-4540-8080-a57afd51ba99-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/8bd425d8-25fb-4603-8cc7-aeb5ad2a3917-m0.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-397031335297740726-1-2d3fafd7-bce6-483f-be82-e0ccce9203fc.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-6117850509763739078-1-57a963d3-0e4e-4540-8080-a57afd51ba99.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/snap-8494861454990126958-1-8bd425d8-25fb-4603-8cc7-aeb5ad2a3917.avro
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v1.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v2.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v3.metadata.json
D 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_different_equality_ids/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-1483849a-0bdf-49f1-82ac-b3cfa757c541-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-a8488080-c95c-4b79-9db9-085ed10090d6-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/data/0-0-d92dc85b-efc8-4173-b96f-10a13c1d1e18-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/18458ea9-087c-4e3d-8264-5e8b1fe425b1-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_equality_multi_eq_ids/metadata/b7db365c-79e0-404d-8bcd-834bb3e958c0-m1.avro
A