[GitHub] [spark] wankunde commented on a diff in pull request #40523: [SPARK-42897][SQL] Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

via GitHub Tue, 04 Apr 2023 21:56:52 -0700


wankunde commented on code in PR #40523:
URL: https://github.com/apache/spark/pull/40523#discussion_r1158021949



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala:
##########
@@ -1036,8 +1036,17 @@ case class SortMergeJoinExec(
     val rightResultVars = genOneSideJoinVars(
       ctx, rightOutputRow, right, setDefaultValue = true)
     val resultVars = leftResultVars ++ rightResultVars
-    val (_, conditionCheck, _) =
-      getJoinCondition(ctx, leftResultVars, left, right, Some(rightOutputRow))
+    // Evaluate the variables on the left and used in the condition but do not 
clear the code as

Review Comment:
   Thanks @cloud-fan for your comment.
    
   1. `ShuffledHashJoinExec` also has the same issue, and fix it at the same 
time.
   2. We can not use `splitVarsByCondition` and `evaluateVariables`  because 
`evaluateVariables` will empty the variable's code, so when we read the same 
variables in the following `consumeFullOuterJoinRow` method, those variables 
will be undefined.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wankunde commented on a diff in pull request #40523: [SPARK-42897][SQL] Avoid evaluate more than once for the variables from the left side in the FullOuter SMJ condition

Reply via email to