[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942554#comment-13942554
 ] 

Arpit Gupta commented on YARN-1861:
-----------------------------------

Here is a snippet from the log

{code}
2014-03-18 09:39:42,544 INFO  zookeeper.ClientCnxn 
(ClientCnxn.java:logStartConnect(966)) - Opening socket connection to server 
h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181. Will not att
empt to authenticate using SASL (unknown error)
2014-03-18 09:39:42,545 INFO  zookeeper.ClientCnxn 
(ClientCnxn.java:primeConnection(849)) - Socket connection established to 
h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181, initiating 
sess
ion
2014-03-18 09:39:45,437 INFO  zookeeper.ClientCnxn 
(ClientCnxn.java:onConnected(1211)) - Session establishment complete on server 
h2-ha-suse-uns-1395117052-2.cs1cloud.internal/172.18.145.62:2181, sessionid
= 0x144d394247b0005, negotiated timeout = 10000
2014-03-18 09:39:47,326 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with 
state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.
resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,326 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(755)) - ZKRMStateStore Session 
disconnected
2014-03-18 09:39:47,326 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with 
state:SyncConnected for path:null for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(745)) - ZKRMStateStore Session connected
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(751)) - ZKRMStateStore Session restored
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with 
state:Disconnected for path:null for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(755)) - ZKRMStateStore Session 
disconnected
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(737)) - Watcher event type: None with 
state:SyncConnected for path:null for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(745)) - ZKRMStateStore Session connected
2014-03-18 09:39:47,327 INFO  recovery.ZKRMStateStore 
(ZKRMStateStore.java:processWatchEvent(751)) - ZKRMStateStore Session restored
2014-03-18 09:39:47,328 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:handle(652)) - Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_FENCED. Cause:
org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: 
RMStateStore has been fenced
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:880)

2014-03-18 09:39:47,328 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:handle(656)) - RMStateStore has been fenced
2014-03-18 09:39:47,328 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:handle(660)) - Transitioning RM to Standby mode
2014-03-18 09:39:47,328 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:transitionToStandby(872)) - Transitioning to standby state
{code}

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to