Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19937#discussion_r156004320
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
---
@@ -513,26 +513,28 @@ case class SortMergeJoinExec(
* the variables should be declared separately from accessing the
columns, we can't use the
* codegen of BoundReference here.
*/
- private def createLeftVars(ctx: CodegenContext, leftRow: String):
Seq[ExprCode] = {
+ private def createLeftVars(ctx: CodegenContext, leftRow: String):
(Seq[ExprCode], Seq[String]) = {
ctx.INPUT_ROW = leftRow
left.output.zipWithIndex.map { case (a, i) =>
val value = ctx.freshName("value")
val valueCode = ctx.getValue(leftRow, a.dataType, i.toString)
- // declare it as class member, so we can access the column before or
in the loop.
- ctx.addMutableState(ctx.javaType(a.dataType), value)
if (a.nullable) {
val isNull = ctx.freshName("isNull")
- ctx.addMutableState(ctx.JAVA_BOOLEAN, isNull)
val code =
s"""
|$isNull = $leftRow.isNullAt($i);
|$value = $isNull ? ${ctx.defaultValue(a.dataType)} :
($valueCode);
""".stripMargin
- ExprCode(code, isNull, value)
+ (ExprCode(code, isNull, value),
--- End diff --
Why separate variable declaration and code to two parts? Previously it is
because it uses global variables. Now since we use local variables, I think we
can simply do:
```scala
val code =
s"""
|boolean $isNull = $leftRow.isNullAt($i);
|${ctx.javaType(a.dataType)} $value = $isNull ?
${ctx.defaultValue(a.dataType)} : ($valueCode);
""".stripMargin
```
Don't we?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]