[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

dilipbiswal Fri, 27 Apr 2018 01:41:57 -0700

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/21049
  
    @henryr Since SubqueryAlias is used as a correlation name and used mostly 
for resolving attributes, in my understanding  its not safe to apply this 
optimization. I will borrow @gatorsmile 's example here.  Please note that the 
alias is specified after the sort. Below is plan after this optimization that 
removes sorts under SubqueryAlias->child. 
    
    ```SQL
    scala> Seq((1, 2, "1"), (3, 4, "3")).toDF("int", "int2", 
"str_sort").orderBy('int.asc).as('df1)
    res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [int: int, 
int2: int ... 1 more field]
    
    scala> res0.explain(true)
    == Parsed Logical Plan ==
    SubqueryAlias df1
    +- AnalysisBarrier
          +- Sort [int#7 ASC NULLS FIRST], true
             +- Project [_1#3 AS int#7, _2#4 AS int2#8, _3#5 AS str_sort#9]
                +- LocalRelation [_1#3, _2#4, _3#5]
    
    == Analyzed Logical Plan ==
    int: int, int2: int, str_sort: string
    SubqueryAlias df1
    +- Sort [int#7 ASC NULLS FIRST], true
       +- Project [_1#3 AS int#7, _2#4 AS int2#8, _3#5 AS str_sort#9]
          +- LocalRelation [_1#3, _2#4, _3#5]
    
    == Optimized Logical Plan ==
    LocalRelation [int#7, int2#8, str_sort#9]
    
    == Physical Plan ==
    LocalTableScan [int#7, int2#8, str_sort#9]
    ```
    In this case we should not be removing the top level sort from user's query 
right ?
    
    cc @gatorsmile for his opinion.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

Reply via email to