Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14216
Change subject: WIP: IMPALA-2138,IMPALA-1306: project tuples before exchanges ...................................................................... WIP: IMPALA-2138,IMPALA-1306: project tuples before exchanges This patch adds an optimization pass to the planner that adds projection operations to the distributed plan before exchanges. The projections are added if the row either has slots that are not used upstream or if the row has multiple tuples, i.e. is not flat. The pass is performed late in the planning process, after join ordering, runtime filters, etc. There is a phase ordering problem because the projection pass depends on the join order and strategy, but those decisions depend on the amount of data being exchanged across the network, which is reduced by projection. Tacking the projection onto the end solves the problem with the following pros and cons. Pros: * This is not invasive to the rest of the planner * Other planner decisions like join ordering will not be affected by the change, making this change safer and makes the chance of serious perf regressions minimal. Cons: * Join ordering and strategy may make sub-optimal decisions because they include the cost of exchanging projected slots in decisions. Note that the projection is implemented with a separate UnionNode (called PROJECT in the plan). We could squeeze out some more performance by doing the projection inline in the exchange operator and avoiding a copy, but this already appears to be a big win even with the extra separate step: the cost of the PROJECT seems to be offset by reduced serialisation and compression time in the exchange. The UNION is codegen'd and quite efficient, and we save work in the exchange node from fewer slots, the flattened tuple representation, and less data to compress and transfer. The details of the optimisation pass are documented in the class comment. TODO in fe implementation * O(n^2) algorithms in tree walk? Expr lists are revisited each time. * Audit expr method implementations * remove debug logging * Consider removing the validation outside of testing Testing: TODO * Need to update planner tests * Exhaustive tests * Run some tests with projection disabled - join queries, agg queries? * Projection that should happen outside subplan set explain_level=2; use functional_parquet; explain select straight_join t1.id, m.key from complextypestbl t1 join [broadcast] complextypestbl t2, t2.int_map m where t1.id = t2.id and t2.nested_struct.a > 10; This was based on a prototype by Alex Behm. Change-Id: I94ccf2d1acecd9a593f4c29fc15202a799d2f7f5 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/OrderByElement.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java A fe/src/main/java/org/apache/impala/planner/ProjectionPass.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java 35 files changed, 976 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/14216/8 -- To view, visit http://gerrit.cloudera.org:8080/14216 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I94ccf2d1acecd9a593f4c29fc15202a799d2f7f5 Gerrit-Change-Number: 14216 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
