[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread staple
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-56903523 Great, thanks. My username is 'staple', looks like you already assigned to me. --- If your project is set up for it, you can reply to this email and have your reply appea

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-56898211 LGTM. Merged into master. What's your username on JIRA? I'll assign the JIRA to you. Thanks! --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2347 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-56886264 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20819/consoleFull) for PR 2347 at commit [`bd49701`](https://github.com/a

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-56886274 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-56876846 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20819/consoleFull) for PR 2347 at commit [`bd49701`](https://github.com/ap

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-25 Thread staple
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-56876698 Hi, I addressed the recent review comments and merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55758803 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20389/consoleFull) for PR 2347 at commit [`9bed1fd`](https://github.com/a

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55747882 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20389/consoleFull) for PR 2347 at commit [`9bed1fd`](https://github.com/ap

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-16 Thread staple
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55747592 Hi, per the discussion in https://github.com/apache/spark/pull/2362 the plan is to continue caching before deserialization from python rather than after, in order to minim

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55708415 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20372/consoleFull) for PR 2347 at commit [`03d0e2f`](https://github.com/a

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55702131 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20372/consoleFull) for PR 2347 at commit [`03d0e2f`](https://github.com/ap

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55701824 this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featu

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55685703 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55374976 @davies It is hard to tell whether we already have fast access to the input RDD. Force caching may cause problems, e.g., 1. kicking out some cached RDDs, 2. us

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-11 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55308192 Is it possible that add the cache for RDD automatically instead of show an warning, if the cache is always helpful? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-11 Thread staple
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55288257 Hi, I made the requested comment changes. I also filed a separate PR for the caching changes: #2362 --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-11 Thread staple
Github user staple commented on a diff in the pull request: https://github.com/apache/spark/pull/2347#discussion_r17430431 --- Diff: docs/mllib-linear-methods.md --- @@ -470,7 +471,7 @@ public class LinearRegression { } } ); -JavaRDD MSE = ne

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55181535 @staple For Python, I think caching on the JVM side is good. The only thing we need to take care of is that NaiveBayes and DecisionTree doesn't need caching. --- If your

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2347#discussion_r17388066 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -125,6 +133,11 @@ class KMeans private ( } val model = r

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2347#discussion_r17388049 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -117,6 +118,13 @@ class KMeans private ( * performance, because th

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2347#discussion_r17388072 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -256,6 +262,11 @@ class RowMatrix( logWarning(s"Req

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2347#discussion_r17388045 --- Diff: docs/mllib-linear-methods.md --- @@ -470,7 +471,7 @@ public class LinearRegression { } } ); -JavaRDD MSE = ne

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread staple
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55162937 Sure, I changed the warning message text as you suggested. Do you think the deserialization mapping in the python RDDs I described is ok (a lightweight operation)?

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2347#discussion_r17375458 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -117,6 +118,12 @@ class KMeans private ( * performance, because th

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55145287 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project d

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread staple
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2347#issuecomment-55138304 See above where I describe how, for python RDDs, the input data is automatically cached and then deserialized via a map to an uncached RDD, requiring deserialization of ev

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

2014-09-10 Thread staple
GitHub user staple opened a pull request: https://github.com/apache/spark/pull/2347 [SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncached data. Add warnings to KMeans, GeneralizedLinearAlgorithm, and computeSVD when called with input data that is not cached. KMea