imback82 commented on a change in pull request #28676:
URL: https://github.com/apache/spark/pull/28676#discussion_r446489214
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -554,7 +554,7 @@ class AdaptiveQueryExecSuite
val smj = findTopLevelSortMergeJoin(plan)
assert(smj.size == 2)
val smj2 = findTopLevelSortMergeJoin(adaptivePlan)
- assert(smj2.size == 2, origPlan.toString)
+ assert(smj2.size == 1, origPlan.toString)
Review comment:
Simply changing it to outer join may not work. For example,
```
SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.c LEFT JOIN t2 as t3 ON t2.c =
t3.c
```
For the left outer join between `t1` and `t2`, you can only build right side
(`t2`), but the resulting output partitioning is from this join is on the left
side (`t1`). Thus, the join between `t2` and `t3` will always introduce shuffle
and this will not help getting the higher cost.
(On top of this, putting any `WHERE` clause would convert outer join to
inner join. :))
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]