[ 
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966961#comment-13966961
 ] 

Karthik Kambatla commented on YARN-1924:
----------------------------------------

I still haven't had a chance to take a close look at the patch here. Here is 
another trace we ran into. [~jianhe], [~zjshen] - can you check if the patch 
here would fix this as well or if we should handle this differently.

{noformat}
2014-03-28 13:40:46,277 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:786)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:783)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:868)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:887)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:783)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:797)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:826)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:597)
{noformat} 

> STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update 
> app(attempt) before storing it
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1924
>                 URL: https://issues.apache.org/jira/browse/YARN-1924
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Jian He
>            Priority: Critical
>             Fix For: 2.4.1
>
>         Attachments: YARN-1924.1.patch, YARN-1924.2.patch
>
>
> Noticed on a HA cluster Both RM shut down with this error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to