[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-21 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 LGTM. Merging to master and 2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15481 LGTM now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the f

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67322/ Test PASSed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67322/consoleFull)** for PR 15481 at commit [`7bf3bf8`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67322/consoleFull)** for PR 15481 at commit [`7bf3bf8`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, o

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67312/ Test FAILed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67312/consoleFull)** for PR 15481 at commit [`7bf3bf8`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67312/consoleFull)** for PR 15481 at commit [`7bf3bf8`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-20 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 `CoarseGrainedSchedulerBackend.removeExecutor` also use ask, but it does not matter right? because it just send msg once and log the error if failure --- If your project is set up for it, you can re

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 > It also means we can get rid of the RemoveExecutor pattern match from receive right ? yep --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 Ah ! Apologies, I got confused. Yes, I agree, that is a better approach. It also means we can get rid of the RemoveExecutor pattern match from receive right ? As it stands now, that looks bu

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 I meant `CoarseGrainedSchedulerBackend.removeExecutor` not `DriverEndpoint.removeExecutor`. It's confusing that we have two methods having the same name :( --- If your project is set up for it, yo

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing I think the issue is that case RemoveExecutor() is not identical to what exists in receiveAndReply Any reason 'executorDataMap.get(executorId).foreach(_.executorEndpoint.send(StopExecut

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 @mridulm `ask` is very cheap. It just puts the serialized message into a buffer. The current codes now need to duplicate the codes and as @viirya pointed out, it misses `executorDataMap.get(executo

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing To minimize scope of synchronized block. The way @scwf has now, the synchronized block is limited to duplicating key and setting some state. Remaining can happen outside of the lock.

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 Sorry that my comment was unclear. I meant: ```Scala protected def reset(): Unit = synchronized { numPendingExecutors = 0 executorsPendingToRemove.clear() // Rem

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67173/ Test PASSed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67173/consoleFull)** for PR 15481 at commit [`7d86054`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15481 @mridulm I checked #9963 and looks like we don't test against `CoarseGrainedSchedulerBackend.reset`. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 BTW, it was interesting that the earlier change did not trigger a test failure (the issue @viirya pointed out - about needing to move RemoveExecutor to receive) --- If your project is set up for i

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 LGTM, @zsxwing any comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and w

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67173/consoleFull)** for PR 15481 at commit [`7d86054`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 Updated, can you review again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wis

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67155/ Test FAILed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67155/consoleFull)** for PR 15481 at commit [`af6072a`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67155/consoleFull)** for PR 15481 at commit [`af6072a`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 ok, i will revert to the initial commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabl

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15481 Seems it could be changed to `send` instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing Ah, then simply making it send() instead of askWithRetry() should do, no ? That was actually in the initial PR - I was not sure if we want to change the behavior from askWithRetry to se

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15481 > reset is called in YarnSchedulerEndpoint which ideally should not be a blocking action. I'm wondering if we can also fix this. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @scwf I think the initial fix with a small change might be sufficient. What I meant was something like this : ``` protected def reset(): Unit = { val executors = synchroniz

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67105/ Test PASSed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67105/consoleFull)** for PR 15481 at commit [`2997ccb`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67105/consoleFull)** for PR 15481 at commit [`2997ccb`](https://github.com/apache/spark/commit/2

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67067/ Test FAILed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67067/consoleFull)** for PR 15481 at commit [`2997ccb`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67067/consoleFull)** for PR 15481 at commit [`2997ccb`](https://github.com/apache/spark/commit/2

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread scwf
Github user scwf commented on the issue: https://github.com/apache/spark/pull/15481 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67054/ Test FAILed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67054/consoleFull)** for PR 15481 at commit [`2997ccb`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15481 LGTM, sorry to bring in deadlock issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #67054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67054/consoleFull)** for PR 15481 at commit [`2997ccb`](https://github.com/apache/spark/commit/2

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 Would be cleaner to simply copy executorDataMap.keys and works off that to ensure there is no coupling between actor thread and invoker. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing or @andrewor14 might know best on this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66957/ Test PASSed. ---

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15481 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #66957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66957/consoleFull)** for PR 15481 at commit [`3681fae`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15481 **[Test build #66957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66957/consoleFull)** for PR 15481 at commit [`3681fae`](https://github.com/apache/spark/commit/3