[
https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965886#comment-13965886
]
Arpit Gupta commented on YARN-1924:
-----------------------------------
Here is the stack trace.
{code}
cheduler from user hrt_qa in queue default
2014-04-10 09:19:35,907 INFO attempt.RMAppAttemptImpl
(RMAppAttemptImpl.java:handle(659)) - appattempt_1397121188061_0004_000002
State change from SUBMITTED to SCHEDULED
2014-04-10 09:19:36,095 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(639)) -
application_1397121188061_0004 State change from ACCEPTED to KILLING
2014-04-10 09:19:36,096 INFO attempt.RMAppAttemptImpl
(RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(986)) - Updating
application attempt appattempt_1397121188061_0004_000002 with final state:
KILLED
2014-04-10 09:19:36,096 INFO attempt.RMAppAttemptImpl
(RMAppAttemptImpl.java:handle(659)) - appattempt_1397121188061_0004_000002
State change from SCHEDULED to FINAL_SAVING
2014-04-10 09:19:36,103 ERROR recovery.RMStateStore
(RMStateStore.java:handleStoreEvent(681)) - Error storing appAttempt:
appattempt_1397121188061_0004_000002
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:834)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:831)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:930)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:949)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:831)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:845)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:862)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:604)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-04-10 09:19:36,107 FATAL resourcemanager.ResourceManager
(ResourceManager.java:handle(657)) - Received a
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:834)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:831)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:930)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:949)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:831)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:845)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:862)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:604)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-04-10 09:19:36,108 INFO util.ExitUtil (ExitUtil.java:terminate(124)) -
Exiting with status 1
{code}
> RM shut down with RMFatalEvent of type STATE_STORE_OP_FAILED
> ------------------------------------------------------------
>
> Key: YARN-1924
> URL: https://issues.apache.org/jira/browse/YARN-1924
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Jian He
> Priority: Critical
>
> Noticed on a HA cluster Both RM shut down with this error.
--
This message was sent by Atlassian JIRA
(v6.2#6252)