[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..

IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that mess up
the logic in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Reviewed-on: http://gerrit.cloudera.org:8080/21139
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
32 files changed, 1,009 insertions(+), 244 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit 

[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 18:59:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15500/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:23:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:13:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21139

to look at the new patch set (#4).

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..

IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that mess up
the logic in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/0-8-7d506ac2-9987-4514-8310-505eb02c528a-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/2b4453538b945045-7ba1864b_1900113267_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/3549308fee10b145-141d9f69_502574269_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-3549308fee10b145-141d9f69_1919298510_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/data/delete-ca41ed5edf889878-632c88f10001_1119661503_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/52100098-3c71-4111-8d7e-1c02e8343a0e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/a69c2096-fc8b-4365-8b7b-3b561afdd7e2-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/aa501eb1-924a-4460-a2a0-ad577de8aef5-m3.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/f6475cdb-128e-4438-ab63-2251736670ad-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-1208327814823543579-1-52100098-3c71-4111-8d7e-1c02e8343a0e.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-37664836060851883-1-f6475cdb-128e-4438-ab63-2251736670ad.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-5278394901353853232-1-aa501eb1-924a-4460-a2a0-ad577de8aef5.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/snap-6274599306850878811-1-a69c2096-fc8b-4365-8b7b-3b561afdd7e2.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v3.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v4.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/v5.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_spark_compaction_with_dangling_delete/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
32 files changed, 1,009 insertions(+), 244 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/21139/4
--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit 

[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:14:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

2024-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21139 )

Change subject: IMPALA-12894: (part 1) Turn off the count(*) optimisation for 
V2 Iceberg tables
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10375/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Gerrit-Change-Number: 21139
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Mar 2024 14:14:15 +
Gerrit-HasComments: No