[GitHub] [spark] imback82 commented on a change in pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

GitBox Fri, 26 Jun 2020 22:51:39 -0700


imback82 commented on a change in pull request #28676:
URL: https://github.com/apache/spark/pull/28676#discussion_r446489214




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -554,7 +554,7 @@ class AdaptiveQueryExecSuite
       val smj = findTopLevelSortMergeJoin(plan)
       assert(smj.size == 2)
       val smj2 = findTopLevelSortMergeJoin(adaptivePlan)
-      assert(smj2.size == 2, origPlan.toString)
+      assert(smj2.size == 1, origPlan.toString)

Review comment:
       Simply changing it to outer join may not work. For example,
   ```
   SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.c LEFT JOIN t2 as t3 ON t2.c = 
t3.c
   ```
   For the left outer join between `t1` and `t2`, you can only build right side 
(`t2`), but the resulting output partitioning is from this join is on the left 
side (`t1`). Thus, the join between `t2` and `t3` will always introduce shuffle 
and this will not help getting the higher cost.
   
   (On top of this, putting any `WHERE` clause would convert outer join to 
inner join. :))




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] imback82 commented on a change in pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

Reply via email to