Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/23985
Change subject: IMPALA-11986: (part 1) Optimize partition key scans for Iceberg tables ...................................................................... IMPALA-11986: (part 1) Optimize partition key scans for Iceberg tables This patch optimizes queries that only scan IDENTITY-partitioned columns. The optimization only applies, if: * All materialized aggregate expressions have distinct semantics (e.g. MIN, MAX, NDV). In other words, this optimization will work for COUNT(DISTINCT c) but not COUNT(c). * All materialized columns are IDENTITY-partitioned in all partition specs (this can be relaxed later) If the above conditions are met, then each data file (without deletes) only produce a single record. The rest of the table (data files with deletes and delete files) are scanned normally. Testing: * added e2e tests Change-Id: I32f78ee60ac4a410e91cf0e858199dd39d2e9afe --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-key-scans.test M tests/query_test/test_iceberg.py 5 files changed, 206 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/23985/1 -- To view, visit http://gerrit.cloudera.org:8080/23985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I32f78ee60ac4a410e91cf0e858199dd39d2e9afe Gerrit-Change-Number: 23985 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
