GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/1740

    [SPARK-2197] [mllib] Java DecisionTree bug fix and easy-of-use

    Bug fix: Before, when an RDD was created in Java and passed to 
DecisionTree.train(), the fake class tag caused problems.
    * Fix: DecisionTree: Used new RDD.retag() method to allow passing RDDs from 
Java.
    
    Other improvements to Decision Trees for easy-of-use with Java:
    * impurity classes: Added instance() methods to help with Java interface.
    * Strategy: Added Java-friendly constructor
    ** Note: I removed quantileCalculationStrategy from the Java-friendly 
constructor since (a) it is a special class and (b) there is only 1 option 
currently.  I suspect we will redo the API before the other options are 
included.
    
    CC: @mengxr

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark dt-java-new

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1740
    
----
commit 225822fe38762596b8c917a867e5cdbb2d9b4b55
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-01T21:50:42Z

    Bug: In DecisionTree, the method 
sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins 
from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is 
the bound for unordered categorical features, not ordered ones. The upper bound 
should be the arity (i.e., max value) of the feature.
    
    Added new test to DecisionTreeSuite to catch this: "regression stump with 
categorical variables of arity 2"
    
    Bug fix: Modified upper bound discussed above.
    
    Also: Small improvements to coding style in DecisionTree.

commit f1a8283c5cb6a497a9ac60c8ce1859dbe9a051b0
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-01T22:56:09Z

    Added old JavaDecisionTreeSuite, to be updated later

commit 13a585e5b818735dfc6aa481547fc201ddfc1798
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-02T00:18:12Z

    Merge remote-tracking branch 'upstream/master' into dt-java

commit 320853f464ca8658d7e28a9f39f288da33c88b23
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-02T00:40:53Z

    Added JavaDecisionTreeSuite, partly written

commit d78ada636f490db6fb1e4a9f75af7f492c07f222
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-02T06:32:49Z

    Merge remote-tracking branch 'upstream/master' into dt-java

commit f7b5ca1ed464de5d7d20f4c006621afa8d8b9e56
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-02T19:56:47Z

    Improvements to make it easier to run DecisionTree from Java.
    * DecisionTree: Used new RDD.retag() method to allow passing RDDs from Java.
    * impurity classes: Added instance() methods to help with Java interface.
    * Strategy: Added Java-friendly constructor
    ** Note: I removed quantileCalculationStrategy from the Java-friendly 
constructor since (a) it is a special class and (b) there is only 1 option 
currently.  I suspect we will redo the API before the other options are 
included.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to