JoshRosen commented on code in PR #43938:
URL: https://github.com/apache/spark/pull/43938#discussion_r1402752963
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala:
##########
@@ -138,9 +138,8 @@ trait HashJoin extends JoinCodegenSupport {
UnsafeProjection.create(streamedBoundKeys)
@transient protected[this] lazy val boundCondition = if
(condition.isDefined) {
- if (joinType == FullOuter && buildSide == BuildLeft) {
- // Put join left side before right side. This is to be consistent with
- // `ShuffledHashJoinExec.fullOuterJoin`.
+ if ((joinType == FullOuter || joinType == LeftOuter) && buildSide ==
BuildLeft) {
+ // Put join left side before right side. This is to be consistent with
ShuffledHashJoinExec.
Review Comment:
I wonder whether this whole `boundCondition` block can be simplified to
```
Predicate.create(condition.get, left.output ++ right.output).eval _
```
since the left side is always on the left and the right side is always on
the right.
(**Edit**: the proposed simplification in ⬆️ is not correct (the bound
condition input schema is not necessarily the join's output schema); see later
comments for discussion of an alternative simplification).
Further down in this file, we have
https://github.com/apache/spark/blob/aa81f424aee2db9d85f1ab6720a519b9552cc7a7/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L104-L107
showing that `buildPlan` and `streamPlan` are just re-mappings of `left` and
`right` depending on the build side.
----
Several years ago, it looks like we used to do something similar to my
proposal:
https://github.com/apache/spark/blob/0340b3d279de6be4903673bbf3e6a1a2653de6c0/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L119-L123
but we switched to the current use of `buildPlan` and `streamPlan` in a
refactoring in https://github.com/apache/spark/pull/12102 (I'm not fully clear
on why).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]