GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/5626

    [SPARK-6113] [ml] Tree ensembles for Pipelines API

    This is a continuation of [https://github.com/apache/spark/pull/5530] 
(which was for Decision Trees), but for ensembles: Random Forests and 
Gradient-Boosted Trees.  Please refer to the JIRA 
[https://issues.apache.org/jira/browse/SPARK-6113], the design doc linked from 
the JIRA, and the previous PR linked above for design discussions.
    
    This PR follows the example set by the previous PR for Decision Trees.  It 
includes a few cleanups to Decision Trees.
    
    Note: There is one issue which will be addressed in a separate PR: 
Ensembles' component Models have no parent or fittingParamMap.  I plan to 
submit a separate PR which makes those values in Model be Options.  It does not 
matter much which PR gets merged first.
    
    CC: @mengxr @manishamde @codedeft @chouqin

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark dt-api-ensembles

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5626
    
----
commit ee1a10b3b8c0c48ca24bebb10377806e0049bdf0
Author: Joseph K. Bradley <[email protected]>
Date:   2015-04-19T02:21:39Z

    Added files from old PR and did some initial updates.

commit d045ebd7ac557f5ed2887aa4991a1ff21b283e67
Author: Joseph K. Bradley <[email protected]>
Date:   2015-04-20T03:42:28Z

    some more updates, but far from done

commit c0f30c198dc66c427c8ee558e68ee89a7d44fd3d
Author: Joseph K. Bradley <[email protected]>
Date:   2015-04-21T19:36:43Z

    Added random forests and test suites to spark.ml.  Not tested yet.  Need to 
add example as well

commit ea3d901a65ac3d1a2e93ecf089dfbd3c7a91cae9
Author: Joseph K. Bradley <[email protected]>
Date:   2015-04-22T04:54:12Z

    Added GBT to spark.ml, with tests and examples

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to