GitHub user chouqin opened a pull request:

    https://github.com/apache/spark/pull/1941

    [SPARK-3022] [mllib] FindBinsForLevel in decision tree should call findBin 
only once for each feature

    `findbinsForLevel` is applied to every `LabeledPoint` to find bins for all 
nodes at a given level. Given a specific `LabeledPoint` and a specific feature, 
the bin to put this labeled point should always be same.But in current 
implementation, `findBin` on a (labeledpoint, feature) pair is called for all 
nodes and all levels, which is a waste of computation.
    
    In my implementation, `findBin` for each (labeledpoint, feature) pair is 
executed only once before the start of level-wise training of decision tree. 
Then, at each level, this `feature2bin` array can be reused.
    
    What's more, `findbinsForLevel` now return a array of smaller size, all the 
nodes on which this labeledPoint is valid share the same `feature2bin` array, 
instead of each node having a copy of it.
    
    CC: @mengxr @manishamde @jkbradley,  Please have a look at this, thanks.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chouqin/spark dt-findbins

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1941
    
----
commit 065a42c21c5810c2a09a125b39e8d56c38a18ebc
Author: qiping.lqp <[email protected]>
Date:   2014-08-14T06:06:43Z

    improve decision tree: findbins called only once for each feature

commit 45549f777eafe166bfb49e606ac12c3faec5f965
Author: qiping.lqp <[email protected]>
Date:   2014-08-14T06:49:44Z

    fix unit test for decision tree

commit 4a7bc0bab5a31712ebc8a5f37349a125d0d125b0
Author: qiping.lqp <[email protected]>
Date:   2014-08-14T06:58:02Z

    fix style: line length doesn't exceed 100.

commit 59b67819cd0a256beae6e2ca67cfebfb20b690d7
Author: qiping.lqp <[email protected]>
Date:   2014-08-14T07:36:24Z

    add comments

commit 4ef9e7fd53f6cefc998def2be2e08e0c22ec9e7a
Author: qiping.lqp <[email protected]>
Date:   2014-08-14T07:51:33Z

    fix indentation

commit f1a20e3b347fb3df77da9e51d50a7e0d54c0a75e
Author: qiping.lqp <[email protected]>
Date:   2014-08-14T07:57:57Z

    fix indentation too

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to