GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/20613

    SPARK-23368 Avoid unnecessary Exchange or Sort after projection

    ## What changes were proposed in this pull request?
    
    1. Add "project" methods for both Partitioning and Ordering 
(Seq[SortOrder]), which returns what should be the partitioning or ordering 
after the projection, i.e., substitutes the projected expressions with aliases 
specified in the project.
    
    2. In ProjectExec, XXXAggregateExec, make outputPartitioning and 
outputOrdering return "child.outputPartitioning.project(projectList)" or 
"SortOrder.projectOrderings(child.outputOrdering, projectList)" instead of 
"child.outputPartitioning" or "child.outputOrdering".
    
    ## How was this patch tested?
    
    1. Add 2 tests in SQLQuerySuite to verify that the unnecessary Exchange 
and/or Sort have been eliminated from the execution plan.
    2. Add 1 unit test in DistributionSuite.
    
    ## Note
    Note that there could be some variation in the implementation of 
"Partitioning.project(projectList)", depending on whether or not we'd choose to 
retain the original partitioning together with the projected partitioning, and 
how far we'd go to include all possible valid partitionings.
    Since it is usually impossible to refer to the original expression once it 
is projected with an alias, unless it appears elsewhere in the projection list 
(without alias or with a different alias). This would lead to the 
implementation being more complex and generating a cartesian product of all 
substituted/unsubstituted expressions if 1) a partitioning contains more than 
one expression; 2) a partitioning contains expressions that consist of more 
than one projected expression.
    That said, I consider "an expression being projected twice in one 
projection" to be a rare case which does not make much sense, so the current 
implementation should be good enough.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-23368

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20613
    
----
commit 91b3cd0ab4aa16781866fb635b571c805ae4359b
Author: maryannxue <maryann.xue@...>
Date:   2018-02-14T21:56:33Z

    SPARK-23368 Avoid unnecessary Exchange or Sort after projection

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to