[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

cloud-fan Thu, 21 Dec 2017 22:37:03 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19683#discussion_r158436309
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ---
    @@ -59,13 +61,21 @@ case class GenerateExec(
         generator: Generator,
         join: Boolean,
         outer: Boolean,
    +    omitGeneratorChild: Boolean,
         generatorOutput: Seq[Attribute],
         child: SparkPlan)
       extends UnaryExecNode with CodegenSupport {
     
    +  private def projectedChildOutput = generator match {
    +    case g: UnaryExpression if omitGeneratorChild =>
    --- End diff --
    
    why limit to `UnaryExpression`? Think about if we have an array concat 
function in the future, and when we do `explode(array_concat(col1, col2))`, we 
should be able to omit both `col1` and `col2`.
    
    I'd like to add a `omitGeneratorReferences` parameter, and here can be 
simplified to
    ```
    private def requiredChildOutput = if (omitGeneratorReferences) {
      val generatorReferences = generator.references
      child.output.filterNot(generatorReferences.contains)
    } else {
      child.output
    }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

Reply via email to