Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/18931#discussion_r163747141
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
---
@@ -156,13 +162,96 @@ trait CodegenSupport extends SparkPlan {
ctx.INPUT_ROW = null
ctx.freshNamePrefix = parent.variablePrefix
val evaluated = evaluateRequiredVariables(output, inputVars,
parent.usedInputs)
+
+ // Under certain conditions, we can put the logic to consume the rows
of this operator into
+ // another function. So we can prevent a generated function too long
to be optimized by JIT.
+ // The conditions:
+ // 1. The config "spark.sql.codegen.splitConsumeFuncByOperator" is
enabled.
+ // 2. `inputVars` are all materialized. That is guaranteed to be true
if the parent plan uses
+ // all variables in output (see `requireAllOutput`).
+ // 3. The number of output variables must less than maximum number of
parameters in Java method
+ // declaration.
+ val requireAllOutput = output.forall(parent.usedInputs.contains(_))
+ val consumeFunc =
+ if (SQLConf.get.wholeStageSplitConsumeFuncByOperator &&
requireAllOutput &&
--- End diff --
super nit:
```
val confEnabled = SQLConf.get.wholeStageSplitConsumeFuncByOperator
if (confEnabled && ...)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]