GitHub user n-triple-a opened a pull request:
https://github.com/apache/spark/pull/13851
[SPARK-9478] [ml] Add class weights to Random Forest
## What changes were proposed in this pull request?
This PR is to implement class weights support to Random Forest (and also
Decision Tree). This is useful in handling unbalanced data in classification
problems.
## How was this patch tested?
Add a unit test in the `DecisionTreeClassifierSuite`. Manual tests are also
done locally on an unbalanced dataset.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/n-triple-a/spark weightedRandomForest
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13851.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13851
----
commit bc7e824cbc7e8995dc9c04df8ece9e7a45fea168
Author: Yuewei Na <[email protected]>
Date: 2016-06-07T00:55:54Z
Modify impurity implementations, NOTICE: a further modification is
needed(getCalculator & fromString method)
commit df3b4e7831995aa4a53b914732152a515629a057
Author: Yuewei Na <[email protected]>
Date: 2016-06-07T21:42:53Z
save changes, but compile error exists
commit 7bcabdac3d54ed9c682de5493da89f26e8a8e55a
Author: Yuewei Na <[email protected]>
Date: 2016-06-07T22:37:31Z
simple testSuites and run properly
commit 61c48588f9bae554e23c1436936c5653c36ae217
Author: Yuewei Na <[email protected]>
Date: 2016-06-07T22:57:57Z
add unbalenced data test case, modify label prediction process s.t. the
predictions are correct with Impurity=WeightedGini
commit aeb08563113f14f950ae9956dbe5104978a90196
Author: Yuewei Na <[email protected]>
Date: 2016-06-07T23:18:45Z
Make Decision Tree predicions correct without changing the base class
ProbClassifier, but changing the definition of Impurity and DecTreClassifier
commit 4dc3e325caf96e7aae1fec7d0f5db290d0bd7195
Author: Yuewei Na <[email protected]>
Date: 2016-06-10T01:36:19Z
1.put classweight def to the right area 2.change interfaces of
getOldStrategy 3.make classweights can be passed when reconstructing the tree,
including json read/write
commit ad55f6df874aed3eb74ccacb896dd846a9cc9544
Author: Yuewei Na <[email protected]>
Date: 2016-06-15T00:07:19Z
add SetClassWeights to RandomForestClassifier
commit c17067fca7449c8e5b2b6326ef3b56087f737c6e
Author: Yuewei Na <[email protected]>
Date: 2016-06-15T19:16:32Z
change code style such that requirements are met
commit 17635d99574c2fbb7bd684899c465c0289edb5ad
Author: Yuewei Na <[email protected]>
Date: 2016-06-15T23:57:02Z
random forest with class weights runs properly
commit 9d52c1f4973e3ef8770cbe0d61b7b1ae67041f68
Author: Yuewei Na <[email protected]>
Date: 2016-06-16T20:58:49Z
minor changes
commit bf1acfdb7293ee63b31438972c25de683c852e7d
Author: Yuewei Na <[email protected]>
Date: 2016-06-17T00:27:41Z
add Strategy new param doc
commit 2baf814cfec0cac97390e7b64f9064e0bc378d7a
Author: Yuewei Na <[email protected]>
Date: 2016-06-17T18:00:49Z
move classW def to DeTrParam class
commit fe3819c3434d41c20e4625d81ca3da8977bdc67e
Author: Yuewei Na <[email protected]>
Date: 2016-06-17T18:26:40Z
remove @BeanProperty
commit fd2eee567deb3308e6184f4ecb20f681f9fa9353
Author: Yuewei Na <[email protected]>
Date: 2016-06-17T19:10:42Z
change getOldImpu interfaces
commit 455c47e274e1dff50268a6c07b2f0a67c32ac24c
Author: Yuewei Na <[email protected]>
Date: 2016-06-20T17:22:19Z
first version that pass all run_test, by adding redundant constructor and
reverting getOldStrategy to old versions
commit 9c99973476c9143535a913761ceced3ab1d73541
Author: Yuewei Na <[email protected]>
Date: 2016-06-21T18:36:40Z
first version that pass all tests
commit f53a2ccf001ec7db54ebff010fa321e3c116d9a5
Author: Yuewei Na <[email protected]>
Date: 2016-06-21T21:21:27Z
minor modifications for code styling
commit ec5f6df4a5c11053f4a57201a09b631a5f8b81cf
Author: Yuewei Na <[email protected]>
Date: 2016-06-22T17:42:10Z
Minor code style modificaions
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]