maropu commented on a change in pull request #29975:
URL: https://github.com/apache/spark/pull/29975#discussion_r502984076
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1739,8 +1747,11 @@ object CodeGenerator extends Logging {
def getLocalInputVariableValues(
Review comment:
Could you describe what's the second value of the returned value in the
code comment above?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1739,8 +1747,11 @@ object CodeGenerator extends Logging {
def getLocalInputVariableValues(
ctx: CodegenContext,
expr: Expression,
- subExprs: Map[Expression, SubExprEliminationState] = Map.empty):
Set[VariableValue] = {
+ subExprs: Map[Expression, SubExprEliminationState] = Map.empty):
+ (Set[VariableValue], Set[ExprCode]) = {
Review comment:
nit:
```
subExprs: Map[Expression, SubExprEliminationState] = Map.empty)
: (Set[VariableValue], Set[ExprCode]) = {
val argSet = mutable.Set[VariableValue]()
```
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1055,10 +1060,13 @@ class CodegenContext extends Logging {
}
}
- val codes = if (commonExprVals.map(_.code.length).sum >
SQLConf.get.methodSplitThreshold) {
- val inputVarsForAllFuncs = commonExprs.map { expr =>
- getLocalInputVariableValues(this, expr.head).toSeq
- }
+ val (inputVarsForAllFuncs, exprCodesNeedEvaluate) = commonExprs.map { expr
=>
Review comment:
> ProjectExec doesn't require all its child's outputs to be evaluated in
advance. Instead it only early evaluates the outputs used more than twice
(deferring evaluation). So we need to extract these variables used by
subexpressions and evaluate them before subexpressions
Could you leave some comments here?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
case ref: BoundReference if ctx.currentVars != null &&
ctx.currentVars(ref.ordinal) != null =>
- val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
- collectLocalVariable(value)
- collectLocalVariable(isNull)
+ val exprCode = ctx.currentVars(ref.ordinal)
+ // If the referred variable is not evaluated yet.
+ if (exprCode.code != EmptyBlock) {
+ exprCodesNeedEvaluate += exprCode.copy()
+ exprCode.code = EmptyBlock
+ }
+ collectLocalVariable(exprCode.value)
+ collectLocalVariable(exprCode.isNull)
case e =>
stack.pushAll(e.children)
}
}
- argSet.toSet
+ (argSet.toSet, exprCodesNeedEvaluate.toSet)
Review comment:
`Set` instead of `Seq` here?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
case ref: BoundReference if ctx.currentVars != null &&
ctx.currentVars(ref.ordinal) != null =>
- val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
- collectLocalVariable(value)
- collectLocalVariable(isNull)
+ val exprCode = ctx.currentVars(ref.ordinal)
+ // If the referred variable is not evaluated yet.
+ if (exprCode.code != EmptyBlock) {
+ exprCodesNeedEvaluate += exprCode.copy()
Review comment:
We need this copy? A unnecessary copy can happen if
`exprCodesNeedEvaluate` already has the same entry?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -90,8 +90,13 @@ case class SubExprEliminationState(isNull: ExprValue, value:
ExprValue)
* @param codes Strings representing the codes that evaluate common
subexpressions.
* @param states Foreach expression that is participating in subexpression
elimination,
* the state to use.
+ * @param exprCodesNeedEvaluate Some expression codes that need to be evaluate
before
+ * calling common subexpressions.
*/
-case class SubExprCodes(codes: Seq[String], states: Map[Expression,
SubExprEliminationState])
+case class SubExprCodes(
+ codes: Seq[String],
+ states: Map[Expression, SubExprEliminationState],
+ exprCodesNeedEvaluate: Seq[ExprCode])
Review comment:
just a suggestion: `exprCodesNeedEvaluate` -> `exprCodesForEarlyEvals`?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -90,8 +90,13 @@ case class SubExprEliminationState(isNull: ExprValue, value:
ExprValue)
* @param codes Strings representing the codes that evaluate common
subexpressions.
* @param states Foreach expression that is participating in subexpression
elimination,
* the state to use.
+ * @param exprCodesNeedEvaluate Some expression codes that need to be evaluate
before
Review comment:
nit: `evaluate` -> `evaluated`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]