William Montaz created YARN-11814: ------------------------------------- Summary: Deadlock when use yarn rmadmin -refreshQueues Key: YARN-11814 URL: https://issues.apache.org/jira/browse/YARN-11814 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.3.6 Reporter: William Montaz
This ticket is a revival of https://issues.apache.org/jira/browse/YARN-9163 as there is a clear bug in YARN code and no JDK issue as suspected initially (we dig thoroughly the ReentrantReadWriteLock code as well as the locking guarantees documented, and the behavior stays the same even with newer versions of java) I put a comment in YARN-9163 with an example of how the bug is triggered with simple java code. Could you please reconsider the initial patch proposal ? I also put the java example on how to create a deadlock here: {code:java} import java.util.concurrent.locks.ReentrantLock; import java.util.concurrent.locks.ReentrantReadWriteLock; public class Main { ReentrantLock otherLock = new ReentrantLock(); void log(String s) { System.out.printf("%s: %s%n", Thread.currentThread().getName(), s); } public static void main(String[] args) throws Exception { new Main().runTest(); } public void runTest() throws Exception { ReentrantReadWriteLock rwlock = new ReentrantReadWriteLock(false); // obtain read lock log("get readlock"); rwlock.readLock().lock(); //should success to get the readLock new Thread(this.new ReadLockThread(rwlock), "TryRead").start(); //will get other lock and 2 sec later try to get read new Thread(this.new WriteLockThread(rwlock), "TryWrite").start(); //will try to get rwlock's write lock and be queued before previous read thread log("try to get other lock"); otherLock.lock(); //should not succeed as this lock is taken by the read thread, but the read thread is blocked by the write thread in queue (even if the writer thread did not yet acquired the lock) rwlock.readLock().unlock(); } class WriteLockThread implements Runnable { private ReentrantReadWriteLock rwlock; public WriteLockThread(ReentrantReadWriteLock rwlock) { this.rwlock = rwlock; } public void run() { try { log("try get writelock"); rwlock.writeLock().lock(); //should fail to get the writeLock since the readLock already hold by another thread log("can get writelock"); } finally { rwlock.writeLock().unlock(); } } } class ReadLockThread implements Runnable { private ReentrantReadWriteLock rwlock; public ReadLockThread(ReentrantReadWriteLock rwlock) { this.rwlock = rwlock; } public void run() { try { log("try get write lock"); otherLock.lock(); log("try get read lock"); Thread.sleep(2000); // introduce latency to allow a writer thread to be placed in queue before this one rwlock.readLock().lock(); log("can get readlock"); } catch (InterruptedException e) { throw new RuntimeException(e); } finally { log("unlock readlock"); rwlock.readLock().unlock(); } } } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org