[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253688#comment-14253688 ]
Rohith commented on YARN-2946: ------------------------------ I updated the patch with following fix # All the token storage handled synchronously via state machine. # Removed unnecessary synchronization from the method. This ensures 1st point For the test, deployed in cluster by integrating with JCarder. Executed same scenario as per my earlier comment for checking any deadlock cycles. JCarder has not identified any deadlock cycles. Kindly review the patch > DeadLocks in RMStateStore<->ZKRMStateStore > ------------------------------------------ > > Key: YARN-2946 > URL: https://issues.apache.org/jira/browse/YARN-2946 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.0 > Reporter: Rohith > Assignee: Rohith > Priority: Blocker > Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, > 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, > RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java > > > Found one deadlock in ZKRMStateStore. > # Initial stage zkClient is null because of zk disconnected event. > # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to > re establish zookeeper connection either via synconnected or expired event, > it is highly possible that any other thred can obtain lock on > {{ZKRMStateStore.this}} from state machine transition events. This cause > Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)