[ https://issues.apache.org/jira/browse/YARN-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei Yan updated YARN-2608: -------------------------- Attachment: YARN-2608-3.patch Update a patch to fix the findbugs. > FairScheduler may hung due to two potential deadlocks > ----------------------------------------------------- > > Key: YARN-2608 > URL: https://issues.apache.org/jira/browse/YARN-2608 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Wei Yan > Assignee: Wei Yan > Attachments: YARN-2608-1.patch, YARN-2608-2.patch, YARN-2608-3.patch > > > Two potential deadlocks exist inside the FairScheduler. > 1. AllocationFileLoaderService would reload the queue configuration, which > calls FairScheduler.AllocationReloadListener.onReload() function. And require > *FairScheduler's lock*; > {code} > public void onReload(AllocationConfiguration queueInfo) { > synchronized (FairScheduler.this) { > .... > } > } > {code} > after that, it would require the *QueueManager's queues lock*. > {code} > private FSQueue getQueue(String name, boolean create, FSQueueType > queueType) { > name = ensureRootPrefix(name); > synchronized (queues) { > .... > } > } > {code} > Another thread FairScheduler.assignToQueue may also need to create a new > queue when a new job submitted. This thread would hold the *QueueManager's > queues lock* firstly, and then would like to hold the *FairScheduler's lock* > as it needs to call FairScheduler.getClock() function when creating a new > FSLeafQueue. Deadlock may happen here. > 2. The AllocationFileLoaderService holds *AllocationFileLoaderService's > lock* first, and then waits for *FairScheduler's lock*. Another thread (like > AdminService.refreshQueues) may call FairScheduler's reinitialize function, > which holds *FairScheduler's lock* first, and then waits for > *AllocationFileLoaderService's lock*. Deadlock may happen here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)