GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/5626
[SPARK-6113] [ml] Tree ensembles for Pipelines API
This is a continuation of [https://github.com/apache/spark/pull/5530]
(which was for Decision Trees), but for ensembles: Random Forests and
Gradient-Boosted Trees. Please refer to the JIRA
[https://issues.apache.org/jira/browse/SPARK-6113], the design doc linked from
the JIRA, and the previous PR linked above for design discussions.
This PR follows the example set by the previous PR for Decision Trees. It
includes a few cleanups to Decision Trees.
Note: There is one issue which will be addressed in a separate PR:
Ensembles' component Models have no parent or fittingParamMap. I plan to
submit a separate PR which makes those values in Model be Options. It does not
matter much which PR gets merged first.
CC: @mengxr @manishamde @codedeft @chouqin
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark dt-api-ensembles
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5626.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5626
----
commit ee1a10b3b8c0c48ca24bebb10377806e0049bdf0
Author: Joseph K. Bradley <[email protected]>
Date: 2015-04-19T02:21:39Z
Added files from old PR and did some initial updates.
commit d045ebd7ac557f5ed2887aa4991a1ff21b283e67
Author: Joseph K. Bradley <[email protected]>
Date: 2015-04-20T03:42:28Z
some more updates, but far from done
commit c0f30c198dc66c427c8ee558e68ee89a7d44fd3d
Author: Joseph K. Bradley <[email protected]>
Date: 2015-04-21T19:36:43Z
Added random forests and test suites to spark.ml. Not tested yet. Need to
add example as well
commit ea3d901a65ac3d1a2e93ecf089dfbd3c7a91cae9
Author: Joseph K. Bradley <[email protected]>
Date: 2015-04-22T04:54:12Z
Added GBT to spark.ml, with tests and examples
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]