[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61067173 [Test build #22531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22531/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61068006 [Test build #22532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22532/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61068865 [Test build #22533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22533/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61069431 [Test build #22534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22534/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61069535 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61069532 [Test build #22534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22534/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-30 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61069736 @jkbradley @codedeft I think I have implemented all the suggestions on the PR except for 1) public API and 2) warning when using non SquaredError loss functions. I

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread codedeft
Github user codedeft commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19563516 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread codedeft
Github user codedeft commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19563689 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19564695 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19564926 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread codedeft
Github user codedeft commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19567364 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread codedeft
Github user codedeft commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19567808 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19569087 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread codedeft
Github user codedeft commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19569497 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19569553 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread codedeft
Github user codedeft commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19570062 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19570259 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61000703 It's a good point about the sequential nature of boosting models being important when doing approximate predictions (using only some of the weak hypotheses); I could

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19570610 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -26,7 +26,7 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19572600 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61024859 @jkbradley I originally used checkpointing instead of simply caching in memory. There are trade-offs going with one versus the other. I will study what @codedeft

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-29 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-61026909 Studying the trade-offs sounds great. I think it's OK if checkpointing is added later as an option. Thanks! --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495219 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495241 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostingSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495244 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495254 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -70,7 +71,8 @@ class Strategy ( val

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495228 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495261 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495234 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495252 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495258 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -46,20 +47,63 @@ private[tree] object BaggedPoint { *

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495249 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495231 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60812919 @manishamde Added comments based on a quick pass looking mainly at the API. My main concern is the same as in my comment above about the verbosity of (a) the many

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19495237 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60814290 @jkbradley Your API suggestions sound reasonable. Let me work on simplifying the API. I had originally started with something similar to what you suggested so I will

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496069 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/SquaredError.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496095 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -46,20 +47,63 @@ private[tree] object BaggedPoint { *

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496113 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -70,7 +71,8 @@ class Strategy ( val

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496210 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496224 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496241 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/impl/BaggedPointSuite.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496253 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/GradientBoostingSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496266 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496447 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/WeightedEnsembleModel.scala --- @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19496561 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19497878 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19497871 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19497886 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19497891 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19497888 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19498319 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60820308 @manishamde Thanks in advance for the API simplification! Also, I'm realizing that this code should be correct for SquaredError but might not be quite right

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19498899 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19498943 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60821283 @jkbradley Your understanding is correct. Sorry for not mentioning it explicitly on the JIRA/PR earlier. Yes, calculating median, etc. for terminal region

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60821510 Great, that sounds reasonable. I believe we could do it eventually: since the trees won't be too deep in many cases, the sufficient stats to pass around might be

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19499284 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19499327 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19499682 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/EnsembleCombiningStrategy.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19499795 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60824501 @manishamde Thinking more about the losses, I'm really not sure if absolute error and logistic loss will behave reasonably. Could we make those losses private[tree]

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60824862 @jkbradley Should we even support classification then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60824790 @jkbradley I agree. This needs more testing since it's a non-standard option. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60837931 I think it's OK to leave classification support but make a note in the doc for SquaredError that it is meant for Regression. What do you think? --- If your project

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19508317 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19510078 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-28 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60863725 By the way, checkpointing is not quite the right term; currently, the code persists but does not checkpoint the RDDs. I hope that the logic which @codedeft

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60561667 [Test build #22285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22285/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60568456 [Test build #22285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22285/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60568462 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60686705 @manishamde I'll make a pass now; thanks for the updates! A patch (SPARK-4022) was just merged which causes a few small conflicts. Could you please fix those?

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60690690 @jkbradley I fixed the merge conflicts. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60690948 [Test build #22313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22313/consoleFull) for PR 2607 at commit

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60696746 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60696743 [Test build #22313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22313/consoleFull) for PR 2607 at commit