GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/3374
[SPARK-4486][MLLIB] Improve GradientBoosting APIs and doc
There are some inconsistencies in the gradient boosting APIs. The target is
a general boosting meta-algorithm, but the implementation is attached to trees.
This was partially due to the delay of SPARK-1856. But for the 1.2 release, we
should make the APIs consistent.
1. WeightedEnsembleModel -> private[tree] TreeEnsembleModel
1. GradientBoosting -> GradientBoostedTrees
1. Add RandomForestModel and GradientBoostedTreesModel and hide
CombiningStrategy
1. Slightly refactored TreeEnsembleModel
1. Remove `trainClassifier` and `trainRegressor` from
`GradientBoostedTrees` because they are the same as `train`
1. Rename class `train` method to `run` because it hides the static methods
with the same name in Java. Deprecated `DecisionTree.run` class method.
1. Simplify BoostingStrategy and make sure the input strategy is not
modified. Users should put algo and numClasses in treeStrategy. We create
ensembleStrategy inside boosting.
1. Fix a bug in GradientBoostedTreesSuite with AbsoluteError
1. doc updates
@manishamde @jkbradley
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-4486
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3374.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3374
----
commit 19030a5edf8acc90010d2430fcf5c46d4389d86a
Author: Xiangrui Meng <[email protected]>
Date: 2014-11-19T21:17:01Z
update boosting public APIs
commit 751da4e16a1fea86398abdb37ecb33b2b8f723a8
Author: Xiangrui Meng <[email protected]>
Date: 2014-11-19T22:09:11Z
rename class method train -> run
commit ea4c467474ff488d4f4367edeb008cf2c042fc64
Author: Xiangrui Meng <[email protected]>
Date: 2014-11-19T22:25:51Z
fix unit tests
commit 4aae3b761c5e98d19d6a5bf6b8a425f4bb4d2ebc
Author: Xiangrui Meng <[email protected]>
Date: 2014-11-19T23:25:16Z
add RandomForestModel and GradientBoostedTreesModel, hide CombiningStrategy
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]