[
https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Kambatla updated YARN-1929:
-----------------------------------
Attachment: yarn-1929-1.patch
Here is a first-cut patch that removes unnecessary synchronization from
EmbeddedElectorService, AdminService and CompositeService.
Thinking about the best way to write a unit test for this to avoid regressions
in the future. We can may be override becomeActive to sleep for some time and
try to shut the RM down. If it doesn't shutdown within a particular amount of
time, fail the test? Any other ideas?
> DeadLock in RM when automatic failover is enabled.
> --------------------------------------------------
>
> Key: YARN-1929
> URL: https://issues.apache.org/jira/browse/YARN-1929
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Environment: Yarn HA cluster
> Reporter: Rohith
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: yarn-1929-1.patch
>
>
> Dead lock detected in RM when automatic failover is enabled.
> {noformat}
> Found one Java-level deadlock:
> =============================
> "Thread-2":
> waiting to lock monitor 0x00007fb514303cf0 (object 0x00000000ef153fd0, a
> org.apache.hadoop.ha.ActiveStandbyElector),
> which is held by "main-EventThread"
> "main-EventThread":
> waiting to lock monitor 0x00007fb514750a48 (object 0x00000000ef154020, a
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService),
> which is held by "Thread-2"
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)