[GitHub] [spark] maropu commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

GitBox Sun, 11 Oct 2020 17:25:36 -0700


maropu commented on a change in pull request #29975:
URL: https://github.com/apache/spark/pull/29975#discussion_r502984076




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1739,8 +1747,11 @@ object CodeGenerator extends Logging {
   def getLocalInputVariableValues(

Review comment:
       Could you describe what's the second value of the returned value in the 
code comment above?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1739,8 +1747,11 @@ object CodeGenerator extends Logging {
   def getLocalInputVariableValues(
       ctx: CodegenContext,
       expr: Expression,
-      subExprs: Map[Expression, SubExprEliminationState] = Map.empty): 
Set[VariableValue] = {
+      subExprs: Map[Expression, SubExprEliminationState] = Map.empty):
+        (Set[VariableValue], Set[ExprCode]) = {

Review comment:
       nit:
   ```
         subExprs: Map[Expression, SubExprEliminationState] = Map.empty)
       : (Set[VariableValue], Set[ExprCode]) = {
       val argSet = mutable.Set[VariableValue]()
   ```

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1055,10 +1060,13 @@ class CodegenContext extends Logging {
       }
     }
 
-    val codes = if (commonExprVals.map(_.code.length).sum > 
SQLConf.get.methodSplitThreshold) {
-      val inputVarsForAllFuncs = commonExprs.map { expr =>
-        getLocalInputVariableValues(this, expr.head).toSeq
-      }
+    val (inputVarsForAllFuncs, exprCodesNeedEvaluate) = commonExprs.map { expr 
=>

Review comment:
       > ProjectExec doesn't require all its child's outputs to be evaluated in 
advance. Instead it only early evaluates the outputs used more than twice 
(deferring evaluation). So we need to extract these variables used by 
subexpressions and evaluate them before subexpressions
   
   Could you leave some comments here?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
 
         case ref: BoundReference if ctx.currentVars != null &&
             ctx.currentVars(ref.ordinal) != null =>
-          val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
-          collectLocalVariable(value)
-          collectLocalVariable(isNull)
+          val exprCode = ctx.currentVars(ref.ordinal)
+          // If the referred variable is not evaluated yet.
+          if (exprCode.code != EmptyBlock) {
+            exprCodesNeedEvaluate += exprCode.copy()
+            exprCode.code = EmptyBlock
+          }
+          collectLocalVariable(exprCode.value)
+          collectLocalVariable(exprCode.isNull)
 
         case e =>
           stack.pushAll(e.children)
       }
     }
 
-    argSet.toSet
+    (argSet.toSet, exprCodesNeedEvaluate.toSet)

Review comment:
       `Set` instead of `Seq` here?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
 
         case ref: BoundReference if ctx.currentVars != null &&
             ctx.currentVars(ref.ordinal) != null =>
-          val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
-          collectLocalVariable(value)
-          collectLocalVariable(isNull)
+          val exprCode = ctx.currentVars(ref.ordinal)
+          // If the referred variable is not evaluated yet.
+          if (exprCode.code != EmptyBlock) {
+            exprCodesNeedEvaluate += exprCode.copy()

Review comment:
       We need this copy? A unnecessary copy can happen if 
`exprCodesNeedEvaluate` already has the same entry?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -90,8 +90,13 @@ case class SubExprEliminationState(isNull: ExprValue, value: 
ExprValue)
  * @param codes Strings representing the codes that evaluate common 
subexpressions.
  * @param states Foreach expression that is participating in subexpression 
elimination,
  *               the state to use.
+ * @param exprCodesNeedEvaluate Some expression codes that need to be evaluate 
before
+ *                              calling common subexpressions.
  */
-case class SubExprCodes(codes: Seq[String], states: Map[Expression, 
SubExprEliminationState])
+case class SubExprCodes(
+  codes: Seq[String],
+  states: Map[Expression, SubExprEliminationState],
+  exprCodesNeedEvaluate: Seq[ExprCode])

Review comment:
       just a suggestion: `exprCodesNeedEvaluate` -> `exprCodesForEarlyEvals`? 

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##########
@@ -90,8 +90,13 @@ case class SubExprEliminationState(isNull: ExprValue, value: 
ExprValue)
  * @param codes Strings representing the codes that evaluate common 
subexpressions.
  * @param states Foreach expression that is participating in subexpression 
elimination,
  *               the state to use.
+ * @param exprCodesNeedEvaluate Some expression codes that need to be evaluate 
before

Review comment:
       nit: `evaluate` -> `evaluated`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

Reply via email to