rednaxelafx commented on a change in pull request #23731: [SPARK-26572][SQL]
fix aggregate codegen result evaluation
URL: https://github.com/apache/spark/pull/23731#discussion_r256137885
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
##########
@@ -290,6 +290,18 @@ trait CodegenSupport extends SparkPlan {
evaluateVars.toString()
}
+ /**
+ * Returns source code to evaluate the variables for non-deterministic
expressions, and clear the
+ * code of evaluated variables, to prevent them to be evaluated twice.
+ */
+ protected def evaluateNondeterministicVariables(
Review comment:
Nitpick on naming: "variables" are never non-deterministic, only expressions
can have the property of being deterministic or not. Two options:
- I'd prefer naming this utility function `evaluateNondeterministicResults`
to emphasis this should (mostly) be used on the results of an output projection
list.
- But the existing utility function `evaluateRequiredVariables` uses the
"variable" notion, so keeping consistency there is fine too.
I'm fine either way.
Also, historically Spark SQL's WSCG would use variable names like `eval` for
the `ExprCode` type, e.g. `evals: Seq[ExprCode]`. Not sure why it started that
way but you can see that naming pattern throughout the WSCG code base.
Again, your new utility function follows the same names used in
`evaluateRequiredVariables` so that's fine. Local consistency is good enough.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]