GitHub user chouqin opened a pull request:

    https://github.com/apache/spark/pull/2332

    [SPARK-2207][SPARK-3272]Add minimum information gain and minimum instances 
per node as training parameters for decision tree.

    These two parameters can act as early stop rules to do pre-pruning. When a 
split cause cause left or right child to have less than `minInstancesPerNode` 
or has less information gain than `minInfoGain`, current node will not be split 
by this split.
    
    When there is no possible splits that satisfy requirements, there is no 
useful information gain stats, but we still need to calculate the predict value 
for current node. So I separated calculation of predict from calculation of 
information gain, which can also save computation when the number of possible 
splits is large. Please see 
[SPARK-3272](https://issues.apache.org/jira/browse/SPARK-3272) for more details.
    
    CC: @mengxr @manishamde @jkbradley, please help me review this, thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chouqin/spark dt-preprune

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2332
    
----
commit ac4237808090237fe4c562da8c88c55c330d451f
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T03:17:58Z

    add min info gain and min instances per node parameters in decision tree

commit ff34845c8e43f5b9755dd1fdf428be8b2284c68b
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T04:29:12Z

    separate calculation of predict of node from calculation of info gain

commit 987cbf4b177f29e232bf2ba2ca595ea7015694da
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T04:30:01Z

    fix bug

commit f195e830a94097e5d6d42f22c67c32ca8900d848
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T06:04:20Z

    fix style

commit 845c6fa58c00bfba426e56e71eb46a6f8c3f5985
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T06:05:37Z

    fix style

commit e72c7e4d0ad015fdf25ea2959bdbf524056e38ca
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T06:52:24Z

    add comments

commit 46b891fd7f30b9f2d439134931b35dab387fe2b1
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T08:09:34Z

    fix bug

commit cadd569cf64d6eb7b9c9979a5066a2f63f15fed9
Author: qiping.lqp <[email protected]>
Date:   2014-09-09T08:48:51Z

    add api docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to