GitHub user n-triple-a opened a pull request:

    https://github.com/apache/spark/pull/13851

    [SPARK-9478] [ml] Add class weights to Random Forest

    ## What changes were proposed in this pull request?
    
    This PR is to implement class weights support to Random Forest (and also 
Decision Tree). This is useful in handling unbalanced data in classification 
problems.
    
    
    ## How was this patch tested?
    
    Add a unit test in the `DecisionTreeClassifierSuite`. Manual tests are also 
done locally on an unbalanced dataset.
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/n-triple-a/spark weightedRandomForest

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13851
    
----
commit bc7e824cbc7e8995dc9c04df8ece9e7a45fea168
Author: Yuewei Na <[email protected]>
Date:   2016-06-07T00:55:54Z

    Modify impurity implementations, NOTICE: a further modification is 
needed(getCalculator & fromString method)

commit df3b4e7831995aa4a53b914732152a515629a057
Author: Yuewei Na <[email protected]>
Date:   2016-06-07T21:42:53Z

    save changes, but compile error exists

commit 7bcabdac3d54ed9c682de5493da89f26e8a8e55a
Author: Yuewei Na <[email protected]>
Date:   2016-06-07T22:37:31Z

    simple testSuites and run properly

commit 61c48588f9bae554e23c1436936c5653c36ae217
Author: Yuewei Na <[email protected]>
Date:   2016-06-07T22:57:57Z

    add unbalenced data test case, modify label prediction process s.t. the 
predictions are correct with Impurity=WeightedGini

commit aeb08563113f14f950ae9956dbe5104978a90196
Author: Yuewei Na <[email protected]>
Date:   2016-06-07T23:18:45Z

    Make Decision Tree predicions correct without changing the base class 
ProbClassifier, but changing the definition of Impurity and DecTreClassifier

commit 4dc3e325caf96e7aae1fec7d0f5db290d0bd7195
Author: Yuewei Na <[email protected]>
Date:   2016-06-10T01:36:19Z

    1.put classweight def to the right area 2.change interfaces of 
getOldStrategy 3.make classweights can be passed when reconstructing the tree, 
including json read/write

commit ad55f6df874aed3eb74ccacb896dd846a9cc9544
Author: Yuewei Na <[email protected]>
Date:   2016-06-15T00:07:19Z

    add SetClassWeights to RandomForestClassifier

commit c17067fca7449c8e5b2b6326ef3b56087f737c6e
Author: Yuewei Na <[email protected]>
Date:   2016-06-15T19:16:32Z

    change code style such that requirements are met

commit 17635d99574c2fbb7bd684899c465c0289edb5ad
Author: Yuewei Na <[email protected]>
Date:   2016-06-15T23:57:02Z

    random forest with class weights runs properly

commit 9d52c1f4973e3ef8770cbe0d61b7b1ae67041f68
Author: Yuewei Na <[email protected]>
Date:   2016-06-16T20:58:49Z

    minor changes

commit bf1acfdb7293ee63b31438972c25de683c852e7d
Author: Yuewei Na <[email protected]>
Date:   2016-06-17T00:27:41Z

    add Strategy new param doc

commit 2baf814cfec0cac97390e7b64f9064e0bc378d7a
Author: Yuewei Na <[email protected]>
Date:   2016-06-17T18:00:49Z

    move classW def to DeTrParam class

commit fe3819c3434d41c20e4625d81ca3da8977bdc67e
Author: Yuewei Na <[email protected]>
Date:   2016-06-17T18:26:40Z

    remove @BeanProperty

commit fd2eee567deb3308e6184f4ecb20f681f9fa9353
Author: Yuewei Na <[email protected]>
Date:   2016-06-17T19:10:42Z

    change getOldImpu interfaces

commit 455c47e274e1dff50268a6c07b2f0a67c32ac24c
Author: Yuewei Na <[email protected]>
Date:   2016-06-20T17:22:19Z

    first version that pass all run_test, by adding redundant constructor and 
reverting getOldStrategy to old versions

commit 9c99973476c9143535a913761ceced3ab1d73541
Author: Yuewei Na <[email protected]>
Date:   2016-06-21T18:36:40Z

    first version that pass all tests

commit f53a2ccf001ec7db54ebff010fa321e3c116d9a5
Author: Yuewei Na <[email protected]>
Date:   2016-06-21T21:21:27Z

    minor modifications for code styling

commit ec5f6df4a5c11053f4a57201a09b631a5f8b81cf
Author: Yuewei Na <[email protected]>
Date:   2016-06-22T17:42:10Z

    Minor code style modificaions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to