GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/1673
Decision tree bug fixes
1) Inconsistent aggregate (agg) indexing for unordered features.
2) Fixed gain calculations for edge cases.
Other updates, to help with tests:
* Updated DecisionTreeRunner to print more info.
* Added utility functions to DecisionTreeModel: toString, depth, numNodes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark decisiontree-bugfix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1673.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1673
----
commit 5f920a10b6114baa0744f55843969843b1f2babc
Author: Joseph K. Bradley <[email protected]>
Date: 2014-07-30T22:24:55Z
Demonstration of bug before submitting fix: Updated DecisionTreeSuite so
that 3 tests fail. Will describe bug in next commit.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---