[ https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148665#comment-14148665 ]
zhihai xu commented on YARN-2594: --------------------------------- The [ReentrantReadWriteLock | http://tutorials.jenkov.com/java-util-concurrent/readwritelock.html] implementation is {code} Read Lock If no threads have locked the ReadWriteLock for writing, and no thread have requested a write lock (but not yet obtained it). Thus, multiple threads can lock the lock for reading. Write Lock If no threads are reading or writing. Thus, only one thread at a time can lock the lock for writing {code} Base on the above information, the first three threads can cause a deadlock, The readLock is firstly acquired by thread#1, then thread#3 is blocked for writeLock, finally when Thread#2 try to acquire the readLock, thread#2 is also blocked because thread#3 is requesting the writeLock before thread#2. So this is not a bug in Java. The following is the source code in ReentrantReadWriteLock.java: {code} static final class NonfairSync extends Sync { private static final long serialVersionUID = -8159625535654395037L; final boolean writerShouldBlock() { return false; // writers can always barge } final boolean readerShouldBlock() { /* As a heuristic to avoid indefinite writer starvation, * block if the thread that momentarily appears to be head * of queue, if one exists, is a waiting writer. This is * only a probabilistic effect since a new reader will not * block if there is a waiting writer behind other enabled * readers that have not yet drained from the queue. */ return apparentlyFirstQueuedIsExclusive(); } } {code} readerShouldBlock will check whether any threads request writeLock before it. > Potential deadlock in RM when querying ApplicationResourceUsageReport > --------------------------------------------------------------------- > > Key: YARN-2594 > URL: https://issues.apache.org/jira/browse/YARN-2594 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Karam Singh > Assignee: Wangda Tan > Priority: Blocker > Attachments: YARN-2594.patch > > > ResoruceManager sometimes become un-responsive: > There was in exception in ResourceManager log and contains only following > type of messages: > {code} > 2014-09-19 19:13:45,241 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000 > 2014-09-19 19:30:26,312 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000 > 2014-09-19 19:47:07,351 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000 > 2014-09-19 20:03:48,460 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000 > 2014-09-19 20:20:29,542 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000 > 2014-09-19 20:37:10,635 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000 > 2014-09-19 20:53:51,722 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)