Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/23258 )
Change subject: IMPALA-14263: Add broadcast_cost_scale_factor option ...................................................................... IMPALA-14263: Add broadcast_cost_scale_factor option This commit enhances the distributed planner's costing model for broadcast joins by introducing the `broadcast_cost_scale_factor` query option. This option enables users to fine-tune the planner's decision between broadcast and partitioned joins. Key changes: - The total broadcast cost is scaled by the new `broadcast_cost_scale_factor` query option, allowing users to favor or penalize broadcast joins as needed when setting query hint is not feasible. - Updated the planner logic and test cases to reflect the new costing model and options. This addresses scenarios where the default costing could lead to suboptimal join distribution choices, particularly in a large-scale cluster where the number of executors can increase broadcast cost, while choosing a partitioned strategy can lead to data skew. Admin can set `broadcast_cost_scale_factor` less than 1.0 to make DistributedPlanner favor broadcast more than partitioned join (with possible downside of higher memory usage per query and higher network transmission). Existing query hints still take precedence over this option. Note that this option is applied independent of `broadcast_to_partition_factor` option (see IMPALA-10287). In MT_DOP>1 setup, it should be sufficient to set `use_dop_for_costing=True` and tune `broadcast_to_partition_factor` only. Testing: Added FE tests. Change-Id: I475f8a26b2171e87952b69f66a5c18f77c2b3133 Reviewed-on: http://gerrit.cloudera.org:8080/23258 Reviewed-by: Wenzhe Zhou <[email protected]> Reviewed-by: Aman Sinha <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test 6 files changed, 454 insertions(+), 3 deletions(-) Approvals: Wenzhe Zhou: Looks good to me, but someone else must approve Aman Sinha: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/23258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I475f8a26b2171e87952b69f66a5c18f77c2b3133 Gerrit-Change-Number: 23258 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
