Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1110#issuecomment-47683286
We benchmarked treeReduce in our random forest implementation, and since
the trees generated from each partition are fairly large (more than 100MB), we
found that
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1110#issuecomment-47686100
@dbtsai Thanks for testing it! I'm going to move `treeReduce` and
`treeAggregate` to `mllib.rdd.RDDFunctions`. For normal data processing, people
generally use more
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/1110
[WIP][SPARK-2174][MLLIB] treeReduce and treeAggregate
In `reduce` and `aggregate`, the driver node spends linear time on the
number of partitions. It becomes a bottleneck when there are many
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1110#issuecomment-46380502
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1110#issuecomment-46380509
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1110#issuecomment-46382961
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1110#issuecomment-46382962
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15860/
---
If your