[
https://issues.apache.org/jira/browse/YARN-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun Suresh updated YARN-6251:
------------------------------
Attachment: YARN-6251.001.patch
Uploading fix.
The deadlock is due to the fact that the {{completeContainer()}} method (used
to flush resources of temporary containers created during the update) is called
in the AM's allocate thread, which tries to grab the lock on the queue and
app... which can be contended for in the reverse order by the Scheduler thread
on a NODE_UPDATE at the same time.
The proposed solution is: Instead of calling {{completeContainer()}} directly,
we send it as an event to the Scheduler to handle.. This will ensure that the
Scheduler is the only entity that will have the lock.
> Fix Scheduler locking issue introduced by YARN-6216
> ---------------------------------------------------
>
> Key: YARN-6251
> URL: https://issues.apache.org/jira/browse/YARN-6251
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6251.001.patch
>
>
> Opening to track a locking issue that was uncovered when running a custom SLS
> AMSimulator.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]