Hello Tim Armstrong, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16864
to look at the new patch set (#2).
Change subject: IMPALA-10287: Include parallelism in cost comparison of
broadcast vs partition
......................................................................
IMPALA-10287: Include parallelism in cost comparison of broadcast vs partition
The current planner tends to pick broadcast distribution in some cases
even when partition distribution would be more optimal (seen in
TPC-DS performance runs).
This patch adds 2 query options:
- use_dop_for_costing (type:boolean, default:true)
- broadcast_to_partition_factor (type:double, default:1.0)
With use_dop_for_costing enabled, the distributed planner will increase
the cost of the broadcast join's build side by C.sqrt(m) where
m = degree of parallelism of the join node and,
C = the broadcast_to_partition_factor
This allows the planner to more favorably consider partition distribution
where appropriate.
The choice of sqrt in the calculation is not a final choice
at this point but is intended to model a non-linear relationship
between mt_dop and the query performance. After further performance
testing with tuning the above factor, we can establish a better
correlation and refine the formula (tracked by IMPALA-10395).
Testing:
- Added a new test file with TPC-DS Q78 which shows partition
distribution for a left-outer join (with store_returns on the right
input) in the query when the query options are enabled (it chooses
broadcast otherwise).
- Ran PlannerTest and TpcdsPlannerTest.
- Ran e2e tests for Tpcds and Tpch.
Change-Id: Idff569299e5c78720ca17c616a531adac78208e1
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
7 files changed, 603 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/16864/2
--
To view, visit http://gerrit.cloudera.org:8080/16864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idff569299e5c78720ca17c616a531adac78208e1
Gerrit-Change-Number: 16864
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>