Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-56903523
Great, thanks. My username is 'staple', looks like you already assigned to
me.
---
If your project is set up for it, you can reply to this email and have your
reply appea
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-56898211
LGTM. Merged into master. What's your username on JIRA? I'll assign the
JIRA to you. Thanks!
---
If your project is set up for it, you can reply to this email and have yo
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2347
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-56886264
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20819/consoleFull)
for PR 2347 at commit
[`bd49701`](https://github.com/a
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-56886274
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-56876846
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20819/consoleFull)
for PR 2347 at commit
[`bd49701`](https://github.com/ap
Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-56876698
Hi, I addressed the recent review comments and merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55758803
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20389/consoleFull)
for PR 2347 at commit
[`9bed1fd`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55747882
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20389/consoleFull)
for PR 2347 at commit
[`9bed1fd`](https://github.com/ap
Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55747592
Hi, per the discussion in https://github.com/apache/spark/pull/2362 the
plan is to continue caching before deserialization from python rather than
after, in order to minim
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55708415
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20372/consoleFull)
for PR 2347 at commit
[`03d0e2f`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55702131
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20372/consoleFull)
for PR 2347 at commit
[`03d0e2f`](https://github.com/ap
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55701824
this is ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this featu
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55685703
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55374976
@davies It is hard to tell whether we already have fast access to the input
RDD. Force caching may cause problems, e.g.,
1. kicking out some cached RDDs,
2. us
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55308192
Is it possible that add the cache for RDD automatically instead of show an
warning, if the cache is always helpful?
---
If your project is set up for it, you can reply to
Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55288257
Hi, I made the requested comment changes. I also filed a separate PR for
the caching changes: #2362
---
If your project is set up for it, you can reply to this email and
Github user staple commented on a diff in the pull request:
https://github.com/apache/spark/pull/2347#discussion_r17430431
--- Diff: docs/mllib-linear-methods.md ---
@@ -470,7 +471,7 @@ public class LinearRegression {
}
}
);
-JavaRDD MSE = ne
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55181535
@staple For Python, I think caching on the JVM side is good. The only thing
we need to take care of is that NaiveBayes and DecisionTree doesn't need
caching.
---
If your
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2347#discussion_r17388066
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -125,6 +133,11 @@ class KMeans private (
}
val model = r
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2347#discussion_r17388049
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -117,6 +118,13 @@ class KMeans private (
* performance, because th
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2347#discussion_r17388072
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
---
@@ -256,6 +262,11 @@ class RowMatrix(
logWarning(s"Req
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2347#discussion_r17388045
--- Diff: docs/mllib-linear-methods.md ---
@@ -470,7 +471,7 @@ public class LinearRegression {
}
}
);
-JavaRDD MSE = ne
Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55162937
Sure, I changed the warning message text as you suggested.
Do you think the deserialization mapping in the python RDDs I described is
ok (a lightweight operation)?
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2347#discussion_r17375458
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -117,6 +118,12 @@ class KMeans private (
* performance, because th
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55145287
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project d
Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2347#issuecomment-55138304
See above where I describe how, for python RDDs, the input data is
automatically cached and then deserialized via a map to an uncached RDD,
requiring deserialization of ev
GitHub user staple opened a pull request:
https://github.com/apache/spark/pull/2347
[SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncached
data.
Add warnings to KMeans, GeneralizedLinearAlgorithm, and computeSVD when
called with input data that is not cached. KMea
28 matches
Mail list logo