[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-21 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19232 I'm gonna merge this as a non-trivial win. I think the benefit outweighs concerns. We can add to the change later. --- - To unsub

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19232 Sure, we all agree there is a mechanism for avoiding overhead. However, performance tests are very tricky things, 5% is not a huge improvement, and hard-coding the aggregation depth to `2` limits the

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-17 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19232 Yeah I wonder if this slows things down for smaller data sets, because of the extra levels and checks, but then again, when the aggregation is small, anything's similarly fast. The default depth is s

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19232 I'm not really aware of situations where it would be detrimental, since it has a mechanism for avoiding the intermediate stages when it doesn't make sense. However, one of the big advantages of `tree

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19232 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81779/ Test PASSed. ---

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19232 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19232 **[Test build #81779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81779/testReport)** for PR 19232 at commit [`0c3ed2d`](https://github.com/apache/spark/commit/0

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19232 @sethah I feel like we talked about something like this in BigDL. Is there really any downside to treeAggregate? --- - To unsubsc

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19232 **[Test build #81779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81779/testReport)** for PR 19232 at commit [`0c3ed2d`](https://github.com/apache/spark/commit/0c

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19232 ping @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: review