GitHub user manishamde reopened a pull request:
https://github.com/apache/spark/pull/475
SPARK-1544 Add support for deep decision trees.
@etrain and I came with a PR for arbitrarily deep decision trees at the
cost of multiple passes over the data at deep tree levels.
To summarize:
1) We take a parameter that indicates the amount of memory users want to
reserve for computation on each worker (and 2x that at the driver).
2) Using that information, we calculate two things - the maximum depth to
which we train as usual (which is, implicitly, the maximum number of nodes we
want to train in parallel), and the size of the groups we should use in the
case where we exceed this depth.
cc: @atalwalkar, @hirakendu, @mengxr
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/manishamde/spark deep_tree
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/475.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #475
----
commit 50b143a4385f209fbc1793f3e03134cab3ab9583
Author: Manish Amde <[email protected]>
Date: 2014-04-20T20:33:03Z
adding support for very deep trees
commit abc5a23bf80d792a345d723b44bff3ee217cd5ac
Author: Evan Sparks <[email protected]>
Date: 2014-04-22T01:41:36Z
Parameterizing max memory.
commit 2f6072c12a1466d783da258d4aa1bde789e7e875
Author: manishamde <[email protected]>
Date: 2014-04-22T03:43:47Z
Merge pull request #5 from etrain/deep_tree
Parameterizing max memory.
commit 2f1e093c5187a1ed532f9c19b25f8a2a6a46e27a
Author: Manish Amde <[email protected]>
Date: 2014-04-22T03:49:46Z
minor: added doc for maxMemory parameter
commit 02877721328a560f210a7906061108ce5dd4bbbe
Author: Evan Sparks <[email protected]>
Date: 2014-04-22T18:13:27Z
Fixing scalastyle issue.
commit fecf89a51d6efc9e2ff06e700338ea944a4dd580
Author: manishamde <[email protected]>
Date: 2014-04-22T18:15:57Z
Merge pull request #6 from etrain/deep_tree
Fixing scalastyle issue.
commit 719d0098bb08b50e523cec3e388115d5a206512b
Author: Manish Amde <[email protected]>
Date: 2014-04-24T00:04:05Z
updating user documentation
commit 9dbdabeeacc5fe5e0f1a729ce1ed8ab6ff399000
Author: Manish Amde <[email protected]>
Date: 2014-04-29T21:43:19Z
merge from master
commit 15171550fe83e42fcb707744c9035ed540fb78d1
Author: Manish Amde <[email protected]>
Date: 2014-04-29T21:45:34Z
updated documentation
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---