Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r163315858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -149,13 +149,100 @@ trait CodegenSupport extends SparkPlan { ctx.freshNamePrefix = parent.variablePrefix val evaluated = evaluateRequiredVariables(output, inputVars, parent.usedInputs) + + // Under certain conditions, we can put the logic to consume the rows of this operator into + // another function. So we can prevent a generated function too long to be optimized by JIT. + // The conditions: + // 1. The parent uses all variables in output. we can't defer variable evaluation when consume + // in another function. + // 2. The output variables are not empty. If it's empty, we don't bother to do that. + // 3. We don't use row variable. The construction of row uses deferred variable evaluation. We + // can't do it. + // 4. The number of output variables must less than maximum number of parameters in Java method + // declaration. --- End diff -- My only concern is if we have a bunch of simple operators and we create a lot of small methods here. Maybe it's fine as optimizer would prevent such cases.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org