[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

dilipbiswal Sun, 29 Apr 2018 23:35:09 -0700

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/21049
  
    @henryr 
     Is there any reason to actually use an alias at the root of a plan like 
this (outside of composing with other plans, where this optimization would 
apply)? 
    
    I can't think of a reason :-). Just that the API allows users do that.  
    
    How about this query ? 
    ``` SQL
     scala> spark.sql("with abcd as (select * from t1 order by t1.c1) select * 
from abcd").explain(true)
    18/04/29 23:28:45 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
    == Parsed Logical Plan ==
    CTE [abcd]
    :  +- 'SubqueryAlias abcd
    :     +- 'Sort ['t1.c1 ASC NULLS FIRST], true
    :        +- 'Project [*]
    :           +- 'UnresolvedRelation `t1`
    +- 'Project [*]
       +- 'UnresolvedRelation `abcd`
    
    == Analyzed Logical Plan ==
    c1: int, c2: int, c3: int
    Project [c1#7, c2#8, c3#9]
    +- SubqueryAlias abcd
       +- Sort [c1#7 ASC NULLS FIRST], true
          +- Project [c1#7, c2#8, c3#9]
             +- SubqueryAlias t1
                +- HiveTableRelation `default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9]
    
    == Optimized Logical Plan ==
    HiveTableRelation `default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9]
    
    == Physical Plan ==
    HiveTableScan [c1#7, c2#8, c3#9], HiveTableRelation `default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9]
    ```
    
    IMHO, its probably better to correctly detect the real subqueries and apply 
this optimization in order to be fully sure about it.
    
    cc @gatorsmile



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

Reply via email to