GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/1720

    [SPARK-2796] [mllib] DecisionTree bug fix: ordered categorical features

    Bug: In DecisionTree, the method 
sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins 
from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is 
the bound for unordered categorical features, not ordered ones. The upper bound 
should be the arity (i.e., max value) of the feature.
    
    Added new test to DecisionTreeSuite to catch this: "regression stump with 
categorical variables of arity 2"
    
    Bug fix: Modified upper bound discussed above.
    
    Also: Small improvements to coding style in DecisionTree.
    
    CC @mengxr @manishamde

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark decisiontree-bugfix2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1720
    
----
commit 225822fe38762596b8c917a867e5cdbb2d9b4b55
Author: Joseph K. Bradley <[email protected]>
Date:   2014-08-01T21:50:42Z

    Bug: In DecisionTree, the method 
sequentialBinSearchForOrderedCategoricalFeatureInClassification() indexed bins 
from 0 to (math.pow(2, featureCategories.toInt - 1) - 1). This upper bound is 
the bound for unordered categorical features, not ordered ones. The upper bound 
should be the arity (i.e., max value) of the feature.
    
    Added new test to DecisionTreeSuite to catch this: "regression stump with 
categorical variables of arity 2"
    
    Bug fix: Modified upper bound discussed above.
    
    Also: Small improvements to coding style in DecisionTree.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to