wankunde commented on code in PR #40523:
URL: https://github.com/apache/spark/pull/40523#discussion_r1159906763
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala:
##########
@@ -365,9 +365,21 @@ case class ShuffledHashJoinExec(
""".stripMargin
val streamedKeyAnyNull = s"${streamedKeyExprCode.value}.anyNull()"
+ // Evaluate the variables from the stream side and used in the condition
but do not clear the
+ // code as they may be used in the following function.
Review Comment:
> I don't quite understand why it's only a bug for full outer join. Inner
join invokes `getJoinCondition` as well.
I'm sorry, SMJ inner join doesn't invokes `getJoinCondition`?
And inner join evaluate the variables before codegen the condition
expression.
```
val (streamedBefore, streamedAfter) =
splitVarsByCondition(streamedOutput, streamedVars)
val (bufferedBefore, bufferedAfter) =
splitVarsByCondition(bufferedOutput, bufferedVars)
```
The parent operator will consume the join result in the same method, so
those variables don't need evaluate again while full outer join will evaluate
those variables again in method `smj_consumeFullOuterJoinRow_0()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]