wankunde commented on code in PR #40523:
URL: https://github.com/apache/spark/pull/40523#discussion_r1159906763


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala:
##########
@@ -365,9 +365,21 @@ case class ShuffledHashJoinExec(
        """.stripMargin
     val streamedKeyAnyNull = s"${streamedKeyExprCode.value}.anyNull()"
 
+    // Evaluate the variables from the stream side and used in the condition 
but do not clear the
+    // code as they may be used in the following function.

Review Comment:
   
   > I don't quite understand why it's only a bug for full outer join. Inner 
join invokes `getJoinCondition` as well.
   
   I'm sorry, SMJ inner join doesn't invokes `getJoinCondition`?
   And inner join evaluate the variables before codegen the condition 
expression. 
   ```
         val (streamedBefore, streamedAfter) = 
splitVarsByCondition(streamedOutput, streamedVars)
         val (bufferedBefore, bufferedAfter) = 
splitVarsByCondition(bufferedOutput, bufferedVars)
   ```
   The parent operator will consume the join result in the same method, so 
those variables don't need evaluate again while full outer join will evaluate 
those variables again in method `smj_consumeFullOuterJoinRow_0()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to