GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/12050
[SPARK-12382][ML] Remove mllib GBT implementation and wrap ml
## What changes were proposed in this pull request?
This patch removes the implementation of gradient boosted trees in
mllib/tree/GradientBoostedTrees.scala and changes mllib GBTs to call the
implementation in spark.ML.
Primary changes:
* Removed `boost` method in mllib GradientBoostedTrees.scala
* Created new test suite GradientBoostedTreesSuite in ML, which contains
unit tests that were specific to GBT internals from mllib
Other changes:
* Added an `updatePrediction` method in GradientBoostedTrees package. This
method is added to provide consistency for methods that build predictions from
boosted models. There are several methods that hard code the method of
predicting as: sum_{i=1}^{numTrees} (treePrediction*treeWeight). Calling this
function ensures that test methods that check accuracy use the same prediction
method that the algorithm uses during training
* Added methods that were previously only used in testing, but were public
methods, to GradientBoostedTrees. This includes `computeError` (previously part
of `Loss` trait) and `evaluateEachIteration`. These are used in the new
spark.ML unit tests. They are left in mllib as well so as to not break the API.
## How was this patch tested?
Existing unit tests which compare ML and MLlib ensure that mllib GBTs have
not changed. Only a single unit test was moved to ML, which verifies that
`runWithValidation` performs as expected.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sethah/spark SPARK-12382
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12050.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12050
----
commit 44118c276da77fe0b147ca849a1cefb2bdecab14
Author: sethah <[email protected]>
Date: 2016-03-28T20:42:20Z
initial commit to remove mllib GBT implementation
commit f695c3ef79e34284c6801e36b89559557fd600db
Author: sethah <[email protected]>
Date: 2016-03-29T03:47:27Z
storing changes, incomplete
commit f1aa44424fafe7e494d69688ad0c50fe83fd5bf8
Author: sethah <[email protected]>
Date: 2016-03-29T21:42:08Z
remove mllib GBT
commit 69e7038131aa5582eee7e1eac0cb579b24ebc68b
Author: sethah <[email protected]>
Date: 2016-03-29T22:23:52Z
cleaning up
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]