[
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942554#comment-13942554
]
Arpit Gupta commented on YARN-1861:
-----------------------------------
Here is a snippet from the log
{code}
2014-03-18 09:39:42,544 INFO zookeeper.ClientCnxn
(ClientCnxn.java:logStartConnect(966)) - Opening socket connection to server
h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181. Will not att
empt to authenticate using SASL (unknown error)
2014-03-18 09:39:42,545 INFO zookeeper.ClientCnxn
(ClientCnxn.java:primeConnection(849)) - Socket connection established to
h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181, initiating
sess
ion
2014-03-18 09:39:45,437 INFO zookeeper.ClientCnxn
(ClientCnxn.java:onConnected(1211)) - Session establishment complete on server
h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181, sessionid
= 0x144d394247b0005, negotiated timeout = 10000
2014-03-18 09:39:47,326 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with
state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.
resourcemanager.recovery.RMStateStore in state
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,326 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(755)) - ZKRMStateStore Session
disconnected
2014-03-18 09:39:47,326 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with
state:SyncConnected for path:null for Service
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(745)) - ZKRMStateStore Session connected
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(751)) - ZKRMStateStore Session restored
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with
state:Disconnected for path:null for Service
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(755)) - ZKRMStateStore Session
disconnected
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with
state:SyncConnected for path:null for Service
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(745)) - ZKRMStateStore Session connected
2014-03-18 09:39:47,327 INFO recovery.ZKRMStateStore
(ZKRMStateStore.java:processWatchEvent(751)) - ZKRMStateStore Session restored
2014-03-18 09:39:47,328 FATAL resourcemanager.ResourceManager
(ResourceManager.java:handle(652)) - Received a
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
STATE_STORE_FENCED. Cause:
org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException:
RMStateStore has been fenced
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:880)
2014-03-18 09:39:47,328 INFO resourcemanager.ResourceManager
(ResourceManager.java:handle(656)) - RMStateStore has been fenced
2014-03-18 09:39:47,328 INFO resourcemanager.ResourceManager
(ResourceManager.java:handle(660)) - Transitioning RM to Standby mode
2014-03-18 09:39:47,328 INFO resourcemanager.ResourceManager
(ResourceManager.java:transitionToStandby(872)) - Transitioning to standby state
{code}
> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
>
> In our HA tests we noticed that the tests got stuck because both RM's got
> into standby state and no one became active.
--
This message was sent by Atlassian JIRA
(v6.2#6252)