GitHub user chouqin opened a pull request:

    https://github.com/apache/spark/pull/2180

    Dt predict

    In current implementation, prediction for a node is calculated along with 
calculation of information gain stats for each possible splits. The value to 
predict for a specific node is determined, no matter what the splits are.
    To save computation, we can first calculate prediction first and then 
calculate information gain stats for each split.
    
    This is also necessary if we want to support minimum instances per node 
parameters([SPARK-2207](https://issues.apache.org/jira/browse/SPARK-2207)) 
because when all splits don't satisfy minimum instances requirement , we don't 
use information gain of any splits. There should be a way to get the prediction 
value.
    
    This PR also removes unused function `nodeIndexToLevel`.
    
    CC: @mengxr @manishamde @jkbradley, do you think this is really necessary?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chouqin/spark dt-predict

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2180
    
----
commit 0552c7e798f5d62b74511372c0d38e08e50e6bac
Author: qiping.lqp <[email protected]>
Date:   2014-08-28T08:03:55Z

    separate calculation of predict of node from calculation of info gain of 
splits

commit c205eb8775a8dabfd567501972e2c9732c2fe80a
Author: qiping.lqp <[email protected]>
Date:   2014-08-28T08:05:20Z

    commit Predict.scala

commit d92b3d47666e1c907222605b873172ef4a2c770c
Author: qiping.lqp <[email protected]>
Date:   2014-08-28T08:19:59Z

    fix decision tree suite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to