JoshRosen commented on code in PR #43938:
URL: https://github.com/apache/spark/pull/43938#discussion_r1402752963


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala:
##########
@@ -138,9 +138,8 @@ trait HashJoin extends JoinCodegenSupport {
     UnsafeProjection.create(streamedBoundKeys)
 
   @transient protected[this] lazy val boundCondition = if 
(condition.isDefined) {
-    if (joinType == FullOuter && buildSide == BuildLeft) {
-      // Put join left side before right side. This is to be consistent with
-      // `ShuffledHashJoinExec.fullOuterJoin`.
+    if ((joinType == FullOuter || joinType == LeftOuter) && buildSide == 
BuildLeft) {
+      // Put join left side before right side. This is to be consistent with 
ShuffledHashJoinExec.

Review Comment:
   I wonder whether this whole `boundCondition` block can be simplified to
   
   ```
   Predicate.create(condition.get, left.output ++ right.output).eval _
   ```
   
   since the left side is always on the left and the right side is always on 
the right.
   
   (**Edit**: the proposed simplification in ⬆️  is not correct (the bound 
condition input schema is not necessarily the join's output schema); see later 
comments for discussion of an alternative simplification).
   
   Further down in this file, we have
   
   
https://github.com/apache/spark/blob/aa81f424aee2db9d85f1ab6720a519b9552cc7a7/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L104-L107
   
   showing that `buildPlan` and `streamPlan` are just re-mappings of `left` and 
`right` depending on the build side.
   
   ----
   
   Several years ago, it looks like we used to do something similar to my 
proposal:
   
   
https://github.com/apache/spark/blob/0340b3d279de6be4903673bbf3e6a1a2653de6c0/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L119-L123
   
   but we switched to the current use of `buildPlan` and `streamPlan` in a 
refactoring in https://github.com/apache/spark/pull/12102 (I'm not fully clear 
on why).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to