Alex Behm has uploaded this change for review. (
http://gerrit.cloudera.org:8080/10444
Change subject: IMPALA-110: Planner support for multiple distinct aggregations.
..
IMPALA-110: Planner support for multiple distinct aggregations.
Adds planner support for multiple distinct aggregations in
a single SELECT block.
Design summary:
- The existing tree-based plan shape with a two-phased
aggregation is maintained.
- Existing plans are not changed.
- A single query block may contain multiple distinct aggregates.
- Aggregates are grouped into "aggregation classes" based on their
expressions in the distinct portion which may be empty for
non-distinct aggregates.
- The aggregation framework is generalized to simultaneously process
multiple aggregation classes within the tree-based plan. This process
splits the results of different aggregation classes into separate rows,
so a final aggregation is needed to transpose the results into the
desired form.
- Main challenge: Each aggregation class consumes and produces
different tuples, so conceptually a union-type of tuples flows through
the runtime. The tuple union is represented by a TupleRow with one tuple
per aggregation class. Only one tuple in such a TupleRow is non-NULL.
- Backend exec nodes in the aggregation plan will be aware of this
tuple-union either explicitly in their implementation or by relying on
expressions that distinguish the aggregation classes.
- To distinguish the aggregation classes, e.g. in hash exchanges,
CASE expressions are crafted to hash/group on the appropriate slots.
Deferred FE work:
- Beautify/condense the long CASE exprs
- Push applicable conjuncts into individual aggregators before
the transposition step
- Added a few testing TODOs to reduce the size of this patch
- Decide whether we want to change existing plans to the new model
Testing:
- Added analyzer and planner tests
- Ran end-to-end queries based on a prototype BE implementation
- Ran hdfs/core tests
Change-Id: I4c5cb348f9431350d2e5bf2c84325dcc44d38d2f
---
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/scalar-expr.cc
A be/src/exprs/valid-tuple-id.cc
A be/src/exprs/valid-tuple-id.h
M common/thrift/Exprs.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/analysis/ValidTupleIdExpr.java
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
A
testdata/workloads/functional-planner/queries/PlannerTest/multiple-distinct-limit.test
A
testdata/workloads/functional-planner/queries/PlannerTest/multiple-distinct-materialization.test
A
testdata/workloads/functional-planner/queries/PlannerTest/multiple-distinct-predicates.test
A
testdata/workloads/functional-planner/queries/PlannerTest/multiple-distinct.test
37 files changed, 4,809 insertions(+), 591 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/10444/1
--
To view, visit http://gerrit.cloudera.org:8080/10444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4c5cb348f9431350d2e5bf2c84325dcc44d38d2f
Gerrit-Change-Number: 10444
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm