[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...

viirya Mon, 12 Mar 2018 21:42:52 -0700

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/20806


    [SPARK-23661][SQL] Implement treeAggregate on Dataset API

    ## What changes were proposed in this pull request?
    
    Many algorithms in MLlib are still not migrated their internal computing 
workload from RDD to DataFrame. `treeAggregate` is one of obstacles we need to 
address in order to see complete migration.
    
    This patch is submitted to provide `treeAggregate` on Dataset API. For now 
this should be a private API used by ML component.
    
    The approach of tree aggregation imitates RDD's `treeAggregate`.
     
    ## How was this patch tested?
    
    Added unit test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 treeAggregate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20806.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20806
    
----
commit a254d1501c0119b4881c0443f28c263f0c9dec0e
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-03-12T08:41:20Z

    Implement treeAggregate on Dataset API.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...

Reply via email to