[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: 0003-YARN-2946.patch DeadLocks in RMStateStore-ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 0002-YARN-2946.patch, 0003-YARN-2946.patch, 0003-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: 0004-YARN-2946.patch DeadLocks in RMStateStore-ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 0002-YARN-2946.patch, 0003-YARN-2946.patch, 0003-YARN-2946.patch, 0004-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: 0003-YARN-2946.patch DeadLocks in RMStateStore-ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 0002-YARN-2946.patch, 0003-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: 0001-YARN-2946.patch DeadLocks in RMStateStore-ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch, 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Summary: DeadLocks in RMStateStore-ZKRMStateStore (was: DeadLock's in RMStateStore-ZKRMStateStore) DeadLocks in RMStateStore-ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2946) DeadLocks in RMStateStore-ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2946: - Attachment: RM_BeforeFix_Deadlock_cycle_2.png RM_BeforeFix_Deadlock_cycle_1.png DeadLocks in RMStateStore-ZKRMStateStore -- Key: YARN-2946 URL: https://issues.apache.org/jira/browse/YARN-2946 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java Found one deadlock in ZKRMStateStore. # Initial stage zkClient is null because of zk disconnected event. # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to re establish zookeeper connection either via synconnected or expired event, it is highly possible that any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events. This cause Deadlock in ZKRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)