GitHub user manishamde opened a pull request:
https://github.com/apache/spark/pull/475
SPARK-1544 Add support for deep decision trees.
etrain and I came with a PR for arbitrarily deep decision trees at the cost
of multiple passes over the data at deep tree levels.
To summarize:
1) We take a parameter that indicates the amount of memory users want to
reserve for computation on each worker (and 2x that at the driver).
2) Using that information, we calculate two things - the maximum depth to
which we train as usual (which is, implicitly, the maximum number of nodes we
want to train in parallel), and the size of the groups we should use in the
case where we exceed this depth.
cc: @atalwalkar, @hirakendu, @mengxr
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/manishamde/spark deep_tree
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/475.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #475
----
commit 50b143a4385f209fbc1793f3e03134cab3ab9583
Author: Manish Amde <[email protected]>
Date: 2014-04-20T20:33:03Z
adding support for very deep trees
commit abc5a23bf80d792a345d723b44bff3ee217cd5ac
Author: Evan Sparks <[email protected]>
Date: 2014-04-22T01:41:36Z
Parameterizing max memory.
commit 2f6072c12a1466d783da258d4aa1bde789e7e875
Author: manishamde <[email protected]>
Date: 2014-04-22T03:43:47Z
Merge pull request #5 from etrain/deep_tree
Parameterizing max memory.
commit 2f1e093c5187a1ed532f9c19b25f8a2a6a46e27a
Author: Manish Amde <[email protected]>
Date: 2014-04-22T03:49:46Z
minor: added doc for maxMemory parameter
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---