ekoifman commented on a change in pull request #34464:
URL: https://github.com/apache/spark/pull/34464#discussion_r775074373
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala
##########
@@ -69,16 +77,16 @@ object DynamicJoinSelection extends Rule[LogicalPlan] {
}
def apply(plan: LogicalPlan): LogicalPlan = plan.transformDown {
- case j @ ExtractEquiJoinKeys(_, _, _, _, _, left, right, hint) =>
+ case j @ ExtractEquiJoinKeys(joinType, _, _, _, _, left, right, hint) =>
var newHint = hint
if (!hint.leftHint.exists(_.strategy.isDefined)) {
- selectJoinStrategy(left).foreach { strategy =>
+ selectJoinStrategy(left, joinType).foreach { strategy =>
Review comment:
For LOJ with many empty partitions on the left, the local join can
short-circuit whether you broadcast or shuffle. I'm not sure how to determine
which strategy will send less data around. Is there another heuristic that can
be used?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]