GitHub user angolon opened a pull request:

    https://github.com/apache/spark/pull/14710

    [SPARK-16533][CORE]

    ## What changes were proposed in this pull request?
    This pull request reverts the changes made as a part of #14605, which 
simply side-steps the deadlock issue. Instead, I propose the following approach:
    * Use `scheduleWithFixedDelay` when calling 
`ExecutorAllocationManager.schedule` for scheduling executor requests. The 
intent of this is that if invocations are delayed beyond the default schedule 
interval on account of lock contention, then we avoid a situation where calls 
to `schedule` are made back-to-back, potentially releasing and then immediately 
reacquiring these locks - further exacerbating contention.
    * Replace a number of calls to `askWithRetry` with `ask` inside of message 
handling code in `CoarseGrainedSchedulerBackend` and its ilk. This allows us 
queue messages with the relevant endpoints, release whatever locks we might be 
holding, and then block whilst awaiting the response. This change is made at 
the cost of being able to retry should sending the message fail, as retrying 
outside of the lock could easily cause race conditions if other conflicting 
messages have been sent whilst awaiting a response. I believe this to be the 
lesser of two evils, as in many cases these RPC calls are to process local 
components, and so failures are more likely to be deterministic, and timeouts 
are more likely to be caused by lock contention.
    
    ## How was this patch tested?
    Existing tests, and manual tests under yarn-client mode.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/angolon/spark SPARK-16533

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14710
    
----
commit cef69bf470199c63b6638933756b1d057dc890d1
Author: Angus Gerry <[email protected]>
Date:   2016-08-19T01:52:58Z

    Revert "[SPARK-17022][YARN] Handle potential deadlock in driver handling 
messages"
    
    This reverts commit ea0bf91b4a2ca3ef472906e50e31fd6268b6f53e.

commit 4970b3b0bcd834bbe5d5473a3065f04a48b12643
Author: Angus Gerry <[email protected]>
Date:   2016-08-09T04:45:29Z

    [SPARK-16533][CORE] Use scheduleWithFixedDelay when calling 
ExecutorAllocatorManager.schedule to ease contention on locks.

commit 920274a3ed0b8278d38d721587a24c9441fa5ff3
Author: Angus Gerry <[email protected]>
Date:   2016-08-04T06:27:56Z

    [SPARK-16533][CORE] Replace many calls to askWithRetry to plain old ask.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to