[
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258030#comment-14258030
]
Rohith commented on YARN-2946:
------------------------------
Looking into last one year, there many deadlock issues reported!! It would be
great if by default HadoopQA can integrate JCarder tool or any other tool for
identifying suspecious deadlock cycles. Can it be integrated to QA? any
thoughts in community?
> DeadLocks in RMStateStore<->ZKRMStateStore
> ------------------------------------------
>
> Key: YARN-2946
> URL: https://issues.apache.org/jira/browse/YARN-2946
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.0
> Reporter: Rohith
> Assignee: Rohith
> Priority: Blocker
> Attachments: 0001-YARN-2946.patch, 0001-YARN-2946.patch,
> 0002-YARN-2946.patch, 0003-YARN-2946.patch, 0003-YARN-2946.patch,
> 0004-YARN-2946.patch, RM_BeforeFix_Deadlock_cycle_1.png,
> RM_BeforeFix_Deadlock_cycle_2.png, TestYARN2946.java
>
>
> Found one deadlock in ZKRMStateStore.
> # Initial stage zkClient is null because of zk disconnected event.
> # When ZKRMstatestore#runWithCheck() wait(zkSessionTimeout) for zkClient to
> re establish zookeeper connection either via synconnected or expired event,
> it is highly possible that any other thred can obtain lock on
> {{ZKRMStateStore.this}} from state machine transition events. This cause
> Deadlock in ZKRMStateStore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)