[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

cloud-fan Wed, 24 Jan 2018 02:52:22 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18931#discussion_r163512972
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
    @@ -156,13 +156,94 @@ trait CodegenSupport extends SparkPlan {
         ctx.INPUT_ROW = null
         ctx.freshNamePrefix = parent.variablePrefix
         val evaluated = evaluateRequiredVariables(output, inputVars, 
parent.usedInputs)
    +
    +    // Under certain conditions, we can put the logic to consume the rows 
of this operator into
    +    // another function. So we can prevent a generated function too long 
to be optimized by JIT.
    +    // The conditions:
    +    // 1. The config "SQLConf.DECOUPLE_OPERATOR_CONSUME_FUNCTIONS" is 
enabled.
    +    // 2. The parent uses all variables in output. we can't defer variable 
evaluation when consume
    +    //    in another function.
    +    // 3. The output variables are not empty. If it's empty, we don't 
bother to do that.
    +    // 4. We don't use row variable. The construction of row uses deferred 
variable evaluation. We
    --- End diff --
    
    I think what we need is `inputVars` are all materialized, which can be 
guaranteed by `requireAllOutput` and `outputVars != null`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

Reply via email to