GitHub user manishamde opened a pull request:

    https://github.com/apache/spark/pull/475

    SPARK-1544 Add support for deep decision trees.

    etrain and I came with a PR for arbitrarily deep decision trees at the cost 
of multiple passes over the data at deep tree levels. 
    
    To summarize:
    1) We take a parameter that indicates the amount of memory users want to 
reserve for computation on each worker (and 2x that at the driver).
    2) Using that information, we calculate two things - the maximum depth to 
which we train as usual (which is, implicitly, the maximum number of nodes we 
want to train in parallel), and the size of the groups we should use in the 
case where we exceed this depth.
    
    cc: @atalwalkar, @hirakendu, @mengxr

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishamde/spark deep_tree

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #475
    
----
commit 50b143a4385f209fbc1793f3e03134cab3ab9583
Author: Manish Amde <[email protected]>
Date:   2014-04-20T20:33:03Z

    adding support for very deep trees

commit abc5a23bf80d792a345d723b44bff3ee217cd5ac
Author: Evan Sparks <[email protected]>
Date:   2014-04-22T01:41:36Z

    Parameterizing max memory.

commit 2f6072c12a1466d783da258d4aa1bde789e7e875
Author: manishamde <[email protected]>
Date:   2014-04-22T03:43:47Z

    Merge pull request #5 from etrain/deep_tree
    
    Parameterizing max memory.

commit 2f1e093c5187a1ed532f9c19b25f8a2a6a46e27a
Author: Manish Amde <[email protected]>
Date:   2014-04-22T03:49:46Z

    minor: added doc for maxMemory parameter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to