[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-12-15 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-67102974 What's the status on this PR / JIRA? As far as I know, it seems that TorrentBroadcast has been more stable lately, so if the only motivation here was stability then I

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-12-15 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-67107062 Close this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-12-15 Thread davies
Github user davies closed the pull request at: https://github.com/apache/spark/pull/2933 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-11-09 Thread squito
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-62330965 I agree with @pwendell . It seems like the right thing to do is just fix Broadcast ... and if we can't, then wouldn't you also want to turn off Broadcast even for big

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-11-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-61396544 [Test build #22746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22746/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-61396546 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-11-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-61395215 [Test build #22746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22746/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60807456 [Test build #486 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/486/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60818616 [Test build #486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/486/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-27 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60553581 @JoshRosen I think we still have it (in tests at tonight): ``` [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-27 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60553702 This is really strange; I thought that the unexpected exception type would have been addressed by https://github.com/apache/spark/pull/2932 --- If your project is set

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-27 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60553953 Can you point me to the commit that produced that stacktrace? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-27 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60554225 @JoshRosen @pwendell The test branch (internal) did not have that commit. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60525495 I find this a little bit hacky. If the broadcast implementation has bugs or performance issues, we should just fix them and it will stabalize over time like any other

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60529659 Broadcast (especially TorrentBroadcast) is designed for large object, using it to send out small shared variables just like using tank to shot a mosquitoes, it's not a

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60530014 I don't see fundamentally why the broadcast mechanism can't be done as efficiently as task launching itself. Do you have a reproducible workload where this caused a

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60531516 The motivation is not about performance, it's about stability. We're fighting with the problem of failure during deserialize a task for days, they can not be

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60534648 We're fighting with the problem of failure during deserialize a task for days (failed in TorrentBroadcast) I thought we had fixed this issue; can you point me

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60473661 [Test build #427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/427/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60474042 @JoshRosen #2846 fixes the scalastyle bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60474692 **[Test build #22196 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22196/consoleFull)** for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60474693 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60491396 [Test build #450 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/450/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60494261 [Test build #450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/450/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/2933#discussion_r19377173 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -124,6 +123,10 @@ class DAGScheduler( /** If enabled, we may run

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2933#discussion_r19377630 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -124,6 +123,10 @@ class DAGScheduler( /** If enabled, we may run

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2933#discussion_r19377652 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -124,6 +123,10 @@ class DAGScheduler( /** If enabled, we may run

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60498227 [Test build #3 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/3/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-6041 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60499988 [Test build #3 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/3/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60500716 I've been thinking about this some more and I wonder about the motivation for this change: how much of a performance benefit does this buy us for typical workloads?

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-25 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60501858 @JoshRosen The motivation is not about performance, it's about stability. Sending tasks to executors is the critical part in spark, it should be as stable as possible.

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-24 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2933#discussion_r19369470 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -124,6 +123,10 @@ class DAGScheduler( /** If enabled, we may run

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-24 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2933#discussion_r19369647 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -69,6 +70,10 @@ private[spark] class Stage( var resultOfJob:

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60472421 [Test build #427 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/427/consoleFull) for PR 2933 at commit

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60472463 [Test build #22196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22196/consoleFull) for PR 2933 at commit