Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16864 )
Change subject: IMPALA-10287: Include parallelism in cost comparison of broadcast vs partition ...................................................................... IMPALA-10287: Include parallelism in cost comparison of broadcast vs partition The current planner tends to pick broadcast distribution in some cases even when partition distribution would be more optimal (seen in TPC-DS performance runs). This patch adds 2 query options: - use_dop_for_costing (type:boolean, default:true) - broadcast_to_partition_factor (type:double, default:1.0) With use_dop_for_costing enabled, the distributed planner will increase the cost of the broadcast join's build side by C.sqrt(m) where m = degree of parallelism of the join node and, C = the broadcast_to_partition_factor This allows the planner to more favorably consider partition distribution where appropriate. The choice of sqrt in the calculation is not a final choice at this point but is intended to model a non-linear relationship between mt_dop and the query performance. After further performance testing with tuning the above factor, we can establish a better correlation and refine the formula (tracked by IMPALA-10395). Testing: - Added a new test file with TPC-DS Q78 which shows partition distribution for a left-outer join (with store_returns on the right input) in the query when the query options are enabled (it chooses broadcast otherwise). - Ran PlannerTest and TpcdsPlannerTest. - Ran e2e tests for Tpcds and Tpch. Change-Id: Idff569299e5c78720ca17c616a531adac78208e1 Reviewed-on: http://gerrit.cloudera.org:8080/16864 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test 7 files changed, 603 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16864 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idff569299e5c78720ca17c616a531adac78208e1 Gerrit-Change-Number: 16864 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
