Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-58183559
We had a build against the spark master on Oct 2, and when ran our
application with data around 600GB, we got the following exception. Does this
PR fix this issue which
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-58201237
It could be fixed by https://github.com/apache/spark/pull/2624
It's strange that I can not see this comment on PR #2030.
On Tue, Oct 7, 2014 at 6:28 AM,
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-58214186
I thought it was a close issue, so I moved my comment to JIRA. I ran into
this issue in spark-shell not the standalone application, does SPARK-3762
apply in this
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52807817
@aarondav The table and graph in
https://github.com/apache/spark/pull/2030#issuecomment-52693339 compares pre-PR
to post-PR. Actually it breaks it down into three runs:
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52596008
cc @shivaram @mosharaf
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52596109
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18816/consoleFull)
for PR 2030 at commit
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16399340
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -27,41 +29,87 @@ import org.apache.spark.io.CompressionCodec
Github user shivaram commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16399641
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -27,41 +29,87 @@ import org.apache.spark.io.CompressionCodec
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52597491
Ran into some task failures when testing this commit on EC2 with the
SchedulerThroughputTest:
```
14/08/19 07:01:24 WARN scheduler.TaskSetManager: Lost
Github user shivaram commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16399782
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +137,30 @@ private[spark] class TorrentBroadcast[T:
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52598348
@rxin -- Nice work in reducing this to 2 RPCs. The patch looks good in
terms of maintaining the same functionality as before. I'll wait for the
Snappy fix and for
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52599404
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18816/consoleFull)
for PR 2030 at commit
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52600292
Ok I pushed a new version that should've addressed all the comments and
fixed the bug.
---
If your project is set up for it, you can reply to this email and have your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52600436
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18825/consoleFull)
for PR 2030 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52602155
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18827/consoleFull)
for PR 2030 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52604743
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18833/consoleFull)
for PR 2030 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52604677
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18825/consoleFull)
for PR 2030 at commit
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52606422
Benchmarked as of 0d8ed5b and the results aren't conclusively faster than
`master`; the good news is that we've narrowed the gap that I saw earlier
between `master`
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52606590
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18827/consoleFull)
for PR 2030 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52610271
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18833/consoleFull)
for PR 2030 at commit
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52663723
Testing again to make sure tests pass two times in a row.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52663705
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52664118
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18851/consoleFull)
for PR 2030 at commit
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52664672
@JoshRosen Thanks for testing -- The perf results are a bit surprising
(especially that master branch became faster since the earlier one). I also
realized we do 3
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52665346
@shivaram I'm actually going to re-run these tests this morning after
restarting my cluster. I'll test before and after #2028 and after this commit.
I can also test
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52666365
I think just this patch vs. before #2028 vs. 1.0.2 should be fine. I just
wanted to make sure the performance regression is minimal due to broadcast --
so as long as we
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52670386
Thanks for looking into this. If you can't find a difference after
verifying the results, we should probably hold this change for 1.2.
---
If your project is set up for
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52671331
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18851/consoleFull)
for PR 2030 at commit
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52673128
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52673575
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18857/consoleFull)
for PR 2030 at commit
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16431857
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16431974
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432043
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432099
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432102
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432179
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432379
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432354
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52675858
I tested this with a local cluster (which might not be a representative
experiment), but this commit makes dummy short tasks finish in about 1/4 of the
time.
```
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16432796
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16433411
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-5260
@rxin I think we can only benefit from broadcast an rdd when the closure is
big enough, such as more than 1M bytes. But his most cases, the closure shoud
be less than
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16433678
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user shivaram commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16433991
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T:
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52678684
Actually with this change my local cluster testing didn't see much
difference between torrent and http.
---
If your project is set up for it, you can reply to this
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16434120
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16434140
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16434184
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -18,50 +18,116 @@
package org.apache.spark.broadcast
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16434246
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16434413
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16434924
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52681776
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18857/consoleFull)
for PR 2030 at commit
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16435754
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16435835
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16437861
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2030#discussion_r16438557
--- Diff:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -109,99 +155,30 @@ private[spark] class TorrentBroadcast[T: ClassTag](
Github user mosharaf commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52689216
Looks good to me.
Re: one of the earlier comments about broadcasting small objects through
TorrentBroadcast. A not-so-intrusive way would be to piggyback data
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52693339
New results (testing against 82577339dd58b5811eab5d10667775e61e37ff51,
1f1819b20f887b487557c31e54b8bcd95b582dc6, 1.0.2, and this):
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52693773
Currently running more jobs from the spark-perf suite against
5bacb9dbfab4ae5c83eb1874bd6fd6ae87ff4ad6 (the latest commit) as a stress-test
to gain more confidence
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52694150
Thanks Josh -- Perf results look good. LGTM. Can you file JIRAs to track
the multi-threading things for 1.2 ?
---
If your project is set up for it, you can reply to
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52695504
[SPARK-3115](https://issues.apache.org/jira/browse/SPARK-3115) is an
umbrella issue to improve task broadcast latency for small tasks; Reynold has
created a bunch of
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52715181
Here are some updated results that show performance for a wider range of
closure sizes (all in bytes):
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-52734661
Ok merging in master branch-1.1.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2030
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
64 matches
Mail list logo