Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52385422
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52385516
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18666/consoleFull)
for PR 1912 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52386383
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18666/consoleFull)
for PR 1912 at commit
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16327343
--- Diff: python/pyspark/broadcast.py ---
@@ -52,17 +50,38 @@ class Broadcast(object):
Access its value through C{.value}.
-
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52409331
I guess we don't necessarily want to expose `destroy()` to the end-user,
since it's private in the Scala APIs. I suppose we might still be leaking
broadcast variables
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52409343
Actually, I'm just going to merge this now and I'll add the docstring as
part of a subsequent documentation-improvement PR (I also want to edit some
Scala / Java docs,
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52409390
I've merged this into `master` and `branch-1.1`. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1912
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52277037
I had add Broadcast.unpersist(blocking=False).
Because we have an copy in disks, so read it from there when user want to
access it driver, then we can keep the
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52377800
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52377931
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18647/consoleFull)
for PR 1912 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52379131
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18647/consoleFull)
for PR 1912 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52248668
@frol After fixing your local test, are you still noticing any broadcast
performance issues? If you still see any odd behavior, could you post a small
script or set
Github user frol commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52251712
@JoshRosen No, I'm not noticing any broadcast performance issues now.
PySpark works like a charm again. Thank you!
---
If your project is set up for it, you can reply to
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16279926
--- Diff: python/pyspark/broadcast.py ---
@@ -52,17 +47,31 @@ class Broadcast(object):
Access its value through C{.value}.
-
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52273858
It occurs to me: what if we had .value retrieve and depickle the value from
the JVM? Also, won't we still experience memory leaks in the JVM if we
iteratively create
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52276573
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18598/consoleFull)
for PR 1912 at commit
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16159823
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -315,6 +315,15 @@ private[spark] object PythonRDD extends Logging {
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52015135
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52016367
QA tests have started for PR 1912. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18430/consoleFull
---
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52016718
I was talking to Jenkins when I said test this please, but thanks @davies
for adding tests too.
---
If your project is set up for it, you can reply to this email and
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52017380
LoL, I realized this just after pushing the commit :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52019343
QA results for PR 1912:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52076888
QA tests have started for PR 1912. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18447/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52076907
QA results for PR 1912:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user frol commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52079674
@davies I am about to test it again with CompressedSerializer. Am I right
that I don't need to change anything in my project, but just rebuild Spark?
---
If your project
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52094561
@frol , Yes, thanks again!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52095557
QA tests have started for PR 1912. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18456/consoleFull
---
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16199114
--- Diff: python/pyspark/broadcast.py ---
@@ -19,18 +19,13 @@
from pyspark.context import SparkContext
sc = SparkContext('local', 'test')
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16199423
--- Diff: python/pyspark/rdd.py ---
@@ -1809,7 +1809,8 @@ def _jrdd(self):
self._jrdd_deserializer = NoOpSerializer()
command
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16199737
--- Diff: python/pyspark/context.py ---
@@ -562,17 +562,24 @@ def union(self, rdds):
rest = ListConverter().convert(rest,
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52099498
This looks good to me and I'm really glad to read the [JIRA
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/1912#discussion_r16200173
--- Diff: python/pyspark/context.py ---
@@ -562,17 +562,24 @@ def union(self, rdds):
rest = ListConverter().convert(rest,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52101740
QA results for PR 1912:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52103597
QA tests have started for PR 1912. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18460/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52110417
QA results for PR 1912:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brclass
Github user frol commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52119630
@davies Compression improved things, but my tasks have heavy computations
inside, so it saved only 10 seconds on a 4.5-minute task and also about 10-20
seconds on a
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52121865
@frol , The big win of compression maybe save the memory in JVM. It's also
a win if it does not increase the runtime. If the future, we could try LZ4, it
may help a
Github user frol commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52122539
@davies I'm talking about memory in Python workers and it is my issue. (I
figured out that my local test had a mistake and after I fix it local test and
Spark Python
GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/1912
[SPARK-1065] [PySpark] improve supporting for large broadcast
Passing large object by py4j is very slow (cost much memory), so pass
broadcast objects via files (similar to parallelize()).
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-51999152
QA results for PR 1912:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds no public classesbrbrFor more
information see test
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/1912#issuecomment-52001814
failed tests were not related to this PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
42 matches
Mail list logo