GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/11460

    [SPARK-13609] [SQL] Support Column Pruning for MapPartitions

    #### What changes were proposed in this pull request?
    
    This PR is to prune unnecessary columns when the operator is  
`MapPartitions`. The solution is to add an extra `Project` in the child node. 
    
    For the other two operators `AppendColumns` and `MapGroups`, it sounds 
doable. More discussions are required. The major reason is the current 
implementation of the `inputPlan` of `groupBy` is based on the child of 
`AppendColumns`. It might be a bug? Thus, will submit a separate PR. 
    
    #### How was this patch tested?
    
    Added a test case in ColumnPruningSuite to verify the rule. Added another 
test case in DatasetSuite.scala to verify the data. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark datasetPruningNew

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11460
    
----
commit bd5567856933617f522cf5d01d63566b56fc5142
Author: gatorsmile <[email protected]>
Date:   2016-03-02T03:52:07Z

    dataset column pruning

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to