[GitHub] spark pull request: [SPARK-15495][SQL][WIP] Improve the explain ou...

clockfly Fri, 27 May 2016 14:41:37 -0700

GitHub user clockfly opened a pull request:

    https://github.com/apache/spark/pull/13363


    [SPARK-15495][SQL][WIP] Improve the explain output for Aggregation operator

    ## What changes were proposed in this pull request?
    
    This PR improves the explain output of Aggregator operator. 
    
    **Before change:**
    
    ```
    scala> spark.sql("select count(a), count(c), b from df1 group by 
b").explain()
    == Physical Plan ==
    *TungstenAggregate(key=[b#8], 
functions=[(count(1),mode=Final,isDistinct=false),(count(1),mode=Final,isDistinct=false)],
 output=[count(a)#63L,count(c)#64L,b#8])
    +- Exchange hashpartitioning(b#8, 200), None
    +- *TungstenAggregate(key=[b#8], 
functions=[(count(1),mode=Partial,isDistinct=false),(count(1),mode=Partial,isDistinct=false)],
 output=[b#8,count#67L,count#68L])
        +- LocalTableScan [b#8], [[2]]
    ```
    
    **After change:**
    
    ```
    scala> spark.sql("select count(a), count(c), b from df1 group by 
b").explain()
    == Physical Plan ==
    *Aggregate(key=[b#8], functions=[count(1)(Final),count(1)(Final)])
    +- Exchange hashpartitioning(b#8, 200)
    +- *Aggregate(key=[b#8], functions=[caount(1)(Partial),count(1)(Partial)])
        +- LocalTableScan [b#8: int], [[2]]
    ```
        
    **For explain(extended = true), it added the "output" field:**
    
    ```
    scala> spark.sql("select count(a), count(c), b from df1 group by 
b").explain(true)
    ...
    == Physical Plan ==
    *Aggregate(key=[b#8], functions=[count(1)(Final),count(1)(Final)], 
output=[count(a)#54L,count(c)#55L,b#8])
    +- Exchange HashPartitioning 200
    +- *Aggregate(key=[b#8], functions=[count(1)(Partial),count(1)(Partial)], 
output=[b#8,count#58L,count#59L])
        +- LocalTableScan [AttributeReference b, IntegerType, {}], [[2]]
    ```    
        
    ## How was this patch tested?
    
    Manual test and existing UT.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/clockfly/spark verbose3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13363
    
----
commit 782332f3c11fb9f1560453c10343b1d756b8fb2e
Author: Sean Zhong <[email protected]>
Date:   2016-05-27T18:54:36Z

    Improve the aggregate otuput.

commit 0f53c89373dbeae0b491b1f4a4bbaaefb103f273
Author: Sean Zhong <[email protected]>
Date:   2016-05-27T19:56:22Z

    Improve the aggregate otuput.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-15495][SQL][WIP] Improve the explain ou...

Reply via email to