[jira] [Created] (SPARK-17022) Potential deadlock in driver handling message

Tao Wang (JIRA) Thu, 11 Aug 2016 08:54:35 -0700

Tao Wang created SPARK-17022:
--------------------------------

             Summary: Potential deadlock in driver handling message
                 Key: SPARK-17022
                 URL: https://issues.apache.org/jira/browse/SPARK-17022
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 2.0.0, 1.6.1, 1.6.0, 1.5.2, 1.5.1, 1.5.0
            Reporter: Tao Wang
            Priority: Critical



Suggest t1 < t2 < t3 
At t1, someone called YarnSchedulerBackend.doRequestTotalExecutors from one of 
three functions: CoarseGrainedSchedulerBackend.killExecutors, 
CoarseGrainedSchedulerBackend.requestTotalExecutors or 
CoarseGrainedSchedulerBackend.requestExecutors, in all of which will hold the 
lock `CoarseGrainedSchedulerBackend`.
Then YarnSchedulerBackend.doRequestTotalExecutors will send a RequestExecutors 
message to `yarnSchedulerEndpoint` and wait for reply.

At t2, someone send a RemoveExecutor to `yarnSchedulerEndpoint` and the message 
is received by the endpoint.

At t3, the RequestExexutor message sent at t1 is received by the endpoint.

Then the endpoint would first handle RemoveExecutor then the RequestExecutor 
message.

When handling RemoveExecutor, it would send the same message to 
`driverEndpoint` and wait for reply.

In `driverEndpoint` it will request lock `CoarseGrainedSchedulerBackend` to 
handle that message, while the lock has been occupied in t1.

So it would cause a deadlock.

We have found the issue in our deployment, it would block the driver to make it 
handle no messages until the two message all went timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17022) Potential deadlock in driver handling message

Reply via email to