[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-30 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23829546 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-30 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3951 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-72170235 LGTM. Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-72140407 [Test build #26364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26364/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-72146516 [Test build #26364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26364/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-72146525 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71800548 [Test build #26220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26220/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71809380 [Test build #26220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26220/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71809389 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23671034 --- Diff: python/pyspark/mllib/tests.py --- @@ -179,10 +179,27 @@ def test_classification(self): self.assertTrue(dt_model.predict(features[2]) =

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23671027 --- Diff: examples/src/main/python/mllib/gradient_boosted_trees.py --- @@ -0,0 +1,82 @@ +# +# Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23671080 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,132 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread kazk1018
Github user kazk1018 commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23741889 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23709333 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23709337 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23709413 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23710574 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23710555 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +381,137 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread cthom
Github user cthom commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71731512 Is there anyway to maintain some kind state about the model as it's being built? For GBT models, one usually sees a plot of the error vs number of trees in the model. If

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71734912 @cthom Validation on-the-fly during training would be great to have. Let's discuss it in a separate JIRA; I just created one:

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71767784 [Test build #26201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26201/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71788951 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71788946 [Test build #26208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26208/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71783971 [Test build #26208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26208/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread nightwolfzor
Github user nightwolfzor commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71779075 Any chance this one will make it into the 1.3 release? We'd really like to see this one! --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71772703 [Test build #26201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26201/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71772711 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71344973 @kazk1018 Thanks for the updates; sorry for the delayed response. Please ping me if updates are added ready for review. The 2 other items which would be good

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500731 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -529,6 +530,35 @@ class PythonMLLibAPI extends Serializable {

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500733 --- Diff: python/pyspark/mllib/tree.py --- @@ -24,7 +24,41 @@ from pyspark.mllib.linalg import _convert_to_vector from pyspark.mllib.regression

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500735 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +387,129 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500737 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +387,129 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500734 --- Diff: python/pyspark/mllib/tree.py --- @@ -24,7 +24,41 @@ from pyspark.mllib.linalg import _convert_to_vector from pyspark.mllib.regression

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500732 --- Diff: python/pyspark/mllib/tree.py --- @@ -24,7 +24,41 @@ from pyspark.mllib.linalg import _convert_to_vector from pyspark.mllib.regression

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69995496 Taking a look now will add comments soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r22972683 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -21,6 +21,8 @@ import java.io.OutputStream import

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69998021 @kazk1018 Thanks for the PR! A few high-level items: * Will it reduce duplicate code to abstract the TreeEnsembleModel concept, as in Scala? Forests and boosting

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-70032835 [Test build #25587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25587/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-70034484 [Test build #25589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25589/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-70038578 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-70038575 [Test build #25589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25589/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-70038835 [Test build #25587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25587/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-70038840 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69440276 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69440274 [Test build #25357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25357/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69436744 [Test build #25357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25357/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69396014 @kazk1018 It looks like there are merge issues. Can you please fix these? Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69307821 [Test build #25310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25310/consoleFull) for PR 3951 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69307827 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-69302626 [Test build #25310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25310/consoleFull) for PR 3951 at commit