[ https://issues.apache.org/jira/browse/YARN-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun Suresh updated YARN-6251: ------------------------------ Attachment: YARN-6251.001.patch Uploading fix. The deadlock is due to the fact that the {{completeContainer()}} method (used to flush resources of temporary containers created during the update) is called in the AM's allocate thread, which tries to grab the lock on the queue and app... which can be contended for in the reverse order by the Scheduler thread on a NODE_UPDATE at the same time. The proposed solution is: Instead of calling {{completeContainer()}} directly, we send it as an event to the Scheduler to handle.. This will ensure that the Scheduler is the only entity that will have the lock. > Fix Scheduler locking issue introduced by YARN-6216 > --------------------------------------------------- > > Key: YARN-6251 > URL: https://issues.apache.org/jira/browse/YARN-6251 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Arun Suresh > Assignee: Arun Suresh > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6251.001.patch > > > Opening to track a locking issue that was uncovered when running a custom SLS > AMSimulator. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org