Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14216


Change subject: WIP: IMPALA-2138,IMPALA-1306: project tuples before exchanges
......................................................................

WIP: IMPALA-2138,IMPALA-1306: project tuples before exchanges

This patch adds an optimization pass to the planner
that adds projection operations to the distributed
plan before exchanges. The projections are added
if the row either has slots that are not used
upstream or if the row has multiple tuples, i.e.
is not flat.

The pass is performed late in the planning process,
after join ordering, runtime filters, etc. There is
a phase ordering problem because the projection
pass depends on the join order and strategy,
but those decisions depend on the amount of data
being exchanged across the network, which is reduced
by projection. Tacking the projection onto the end
solves the problem with the following pros and cons.
Pros:
* This is not invasive to the rest of the planner
* Other planner decisions like join ordering will not be
  affected by the change, making this change safer and
  makes the chance of serious perf regressions minimal.
Cons:
* Join ordering and strategy may make sub-optimal decisions
  because they include the cost of exchanging projected
  slots in decisions.

Note that the projection is implemented with a separate
UnionNode (called PROJECT in the plan). We could squeeze
out some more performance by doing the projection inline
in the exchange operator and avoiding a copy, but this
already appears to be a big win even with the extra
separate step: the cost of the PROJECT seems to be
offset by reduced serialisation and compression
time in the exchange. The UNION is codegen'd and quite
efficient, and we save work in the exchange node from
fewer slots, the flattened tuple representation, and
less data to compress and transfer.

The details of the optimisation pass are documented in
the class comment.

TODO in fe implementation
* O(n^2) algorithms in tree walk? Expr lists are revisited
  each time.
* Audit expr method implementations
* remove debug logging
* Consider removing the validation outside of testing

Testing:
TODO
* Need to update planner tests
* Exhaustive tests
* Run some tests with projection disabled - join queries, agg queries?

* Projection that should happen outside subplan
set explain_level=2; use functional_parquet; explain select
straight_join t1.id, m.key
from complextypestbl t1 join [broadcast] complextypestbl t2, t2.int_map
m
where t1.id = t2.id and t2.nested_struct.a > 10;

This was based on a prototype by Alex Behm.

Change-Id: I94ccf2d1acecd9a593f4c29fc15202a799d2f7f5
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/OrderByElement.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
A fe/src/main/java/org/apache/impala/planner/ProjectionPass.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SelectNode.java
M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
35 files changed, 976 insertions(+), 48 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/14216/8
--
To view, visit http://gerrit.cloudera.org:8080/14216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I94ccf2d1acecd9a593f4c29fc15202a799d2f7f5
Gerrit-Change-Number: 14216
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>

Reply via email to