[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 25 Mar 2024 19:15:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 3: (2 comments) I've only gone through the non-test files so far. http://gerrit.cloudera.org:8080/#/c/21190/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21190/3//COMMIT_MSG@10 PS3, Line 10: files. During analysis we check the existence of delete files Could you describe the cause of the bug in more detail? http://gerrit.cloudera.org:8080/#/c/21190/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/21190/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@983 PS3, Line 983: use Nit: superfluous "use". -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 25 Mar 2024 14:58:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15650/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 25 Mar 2024 14:39:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10422/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 25 Mar 2024 14:19:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21190 to look at the new patch set (#3). Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files Impala can return incorrect results if a table has dangling delete files. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. It also introduces a new query option, "iceberg_disable_count_star_optimization", so users can completely disable the statistic-based count(*)-optimization if necessary. Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 11 files changed, 336 insertions(+), 433 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/3 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 22 Mar 2024 23:36:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15635/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 22 Mar 2024 19:19:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15634/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 22 Mar 2024 19:10:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10416/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 22 Mar 2024 18:18:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Zoltan Borok-Nagy has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21190 ) Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files Impala can return incorrect results if a table has dangling delete files. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. TODO: * introduce query option so we can completely disable the count(*) optimization Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 7 files changed, 307 insertions(+), 431 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/2 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite data files
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21190 Change subject: IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files .. IMPALA-12894: Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files Impala can return incorrect results if a table has dangling delete files. During analysis we check the existence of delete files based on the snapshot summary. But during planning in IcebergScanPlanner we do it based on planFiles(), i.e. dangling delete files don't count in the latter case. Because of this Impala can create incorrect plans for count(*) optimization. This patch fixes the FeIcebergTable.hasDeleteFiles() method, so it ignores dangling delete files. TODO: * introduce query option so we can completely disable the count(*) optimization Testing: * e2e tests * planner tests Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test 7 files changed, 307 insertions(+), 430 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/21190/1 -- To view, visit http://gerrit.cloudera.org:8080/21190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie3aca0b0a104f9ca4589cde9643f3f341d4ff99f Gerrit-Change-Number: 21190 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy