[ 
https://issues.apache.org/jira/browse/YARN-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6251:
------------------------------
    Attachment: YARN-6251.001.patch

Uploading fix.

The deadlock is due to the fact that the {{completeContainer()}} method (used 
to flush resources of temporary containers created during the update) is called 
in the AM's allocate thread, which tries to grab the lock on the queue and 
app... which can be contended for in the reverse order by the Scheduler thread 
on a NODE_UPDATE at the same time.

The proposed solution is: Instead of calling {{completeContainer()}} directly, 
we send it as an event to the Scheduler to handle.. This will ensure that the 
Scheduler is the only entity that will have the lock.   

> Fix Scheduler locking issue introduced by YARN-6216
> ---------------------------------------------------
>
>                 Key: YARN-6251
>                 URL: https://issues.apache.org/jira/browse/YARN-6251
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>             Fix For: 3.0.0-alpha3
>
>         Attachments: YARN-6251.001.patch
>
>
> Opening to track a locking issue that was uncovered when running a custom SLS 
> AMSimulator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to