GitHub user johnnywalleye opened a pull request:

    https://github.com/apache/spark/pull/1316

    fix bin offset in DecisionTree node aggregations

    Hi, this pull fixes (what I believe to be) a bug in DecisionTree.scala.
    
    
    In the extractLeftRightNodeAggregates function, the first set of 
rightNodeAgg values for Regression are set in line 792 as follows:
    
    rightNodeAgg(featureIndex)(2 * (numBins - 2))
      = binData(shift + (2 * numBins - 1)))
    
    
    Then there is a loop that sets the rest of the values, as in line 809:
    
    rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) = 
      binData(shift + (2 *(numBins - 2 - splitIndex))) +
      rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex))
    
    
    But since splitIndex starts at 1, this ends up skipping a set of binData 
values.
    
    The changes here address this issue, for both the Regression and 
Classification cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/johnnywalleye/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1316
    
----
commit 73809dae4e02e1cf1e8a8c67fff1cf07376d8e15
Author: johnnywalleye <[email protected]>
Date:   2014-07-07T14:04:22Z

    fix bin offset in DecisionTree node aggregations

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to