Zoltan Borok-Nagy has submitted this change and it was merged. (
http://gerrit.cloudera.org:8080/19494 )
Change subject: IMPALA-11802: Optimize count(*) queries for Iceberg V2 position
delete tables
......................................................................
IMPALA-11802: Optimize count(*) queries for Iceberg V2 position delete tables
The SCAN plan of count star query for Iceberg V2 position delete tables
as follows:
AGGREGATE
COUNT(*)
|
UNION ALL
/ \
/ \
/ \
SCAN all ANTI JOIN
datafiles / \
without / \
deletes SCAN SCAN
datafiles deletes
Since Iceberg provides the number of records in a file(record_count), we
can use this to optimize a simple count star query for Iceberg V2
position delete tables. Firstly, the number of records of all DataFiles
without corresponding DeleteFiles can be calculated by Iceberg meta
files. And then rewrite the query as follows:
ArithmeticExpr(ADD)
/ \
/ \
/ \
record_count AGGREGATE
of all COUNT(*)
datafiles |
without ANTI JOIN
deletes / \
/ \
SCAN SCAN
datafiles deletes
Testing:
* Existing tests
* Added e2e tests
Change-Id: I8172c805121bf91d23fe063f806493afe2f03d41
Reviewed-on: http://gerrit.cloudera.org:8080/19494
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
---
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/rewrite/CountStarToConstRule.java
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
A
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-plain-count-star-optimization.test
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
M
testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
13 files changed, 485 insertions(+), 51 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Zoltan Borok-Nagy: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/19494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8172c805121bf91d23fe063f806493afe2f03d41
Gerrit-Change-Number: 19494
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Xiaoqing Gao <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>