[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599001#comment-14599001
 ] 

zhihai xu commented on YARN-3798:
---------------------------------

[~ozawa], thanks for the document.
bq. When the delayed packet arrives at the first server, the old server detects 
that the session has moved, and closes the client connection.
I didn't see this happen based on the logs. The real scenario based on the logs 
is the client connection to ZK Follower is not closed until the session is 
closed. This may be a bug in ZooKeeper server, I create ZOOKEEPER-2219 for this 
issue.
I think it will be better to not make change for SessionMovedException until 
ZOOKEEPER-2219 is fixed, because we may have regression for 
SessionMovedException retry. Based on the logs, I think we can recover from 
SessionMovedException by closing old session and creating a new session.
The followings are the logs:
logs from RM 
{code}
2015-03-16 09:46:04,009 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server c315yhk/?.?.?.66:2181, sessionid = 
0x14be28f50f4419d, negotiated timeout = 10000
2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 6670ms for sessionid 
0x14be28f50f4419d, closing socket connection and attempting reconnect
2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate 
using SASL (unknown error)
2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to c045dkh/?.?.?.67:2181, initiating session
2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 3336ms for sessionid 
0x14be28f50f4419d, closing socket connection and attempting reconnect

2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate 
using SASL (unknown error)
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to c470udy/?.?.?.65:2181, initiating session
2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server c470udy/?.?.?.65:2181, sessionid = 
0x14be28f50f4419d, negotiated timeout = 10000

2015-03-16 10:59:45,693 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode = 
Session moved
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$500(ZKRMStateStore.java:75)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:945)
2015-03-16 10:59:45,694 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
out ZK retries. Giving up!
2015-03-16 10:59:45,697 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode = 
Session moved
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:868)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:885)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationStateInternal(ZKRMStateStore.java:578)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:627)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:745)
2015-03-16 10:59:45,697 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
out ZK retries. Giving up!
2015-03-16 10:59:45,707 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode = 
Session moved
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:868)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:885)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:621)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:745)
2015-03-16 10:59:45,708 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
out ZK retries. Giving up!

2015-03-16 10:59:45,710 INFO org.apache.zookeeper.ZooKeeper: Session: 
0x14be28f50f4419d closed
{code}

logs from ZK Leader:
{code}
2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Established session 0x14be28f50f4419d with negotiated timeout 10000 for client 
/?.?.?.65:50271
2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: 
Exception causing close of session 0x14be28f50f4419d due to 
java.io.IOException: Broken pipe
2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client /?.?.?.65:50271 which had sessionid 
0x14be28f50f4419d
2015-03-16 10:59:45,693 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d 
type:multi cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting 
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,695 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d 
type:multi cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting 
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,700 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d 
type:multi cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting 
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,710 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination for sessionid: 0x14be28f50f4419d
{code}

logs from ZK Follower:
{code}
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: 
Accepted socket connection from /?.?.?.65:42777
2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: 
Revalidating client: 0x14be28f50f4419d
2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Established session 0x14be28f50f4419d with negotiated timeout 10000 for client 
/?.?.?.65:42777
2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client /?.?.?.65:42777 which had sessionid 
0x14be28f50f4419d
{code}

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> -------------------------------------------------------------------------------
>
>                 Key: YARN-3798
>                 URL: https://issues.apache.org/jira/browse/YARN-3798
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: Suse 11 Sp3
>            Reporter: Bibin A Chundatt
>            Assignee: Varun Saxena
>            Priority: Blocker
>         Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>       at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>       at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_000001
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>       at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>       at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,898 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
> info for app: application_1433764310492_7152
> 2015-06-09 10:09:44,898 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>       at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>       at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,920 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> Zk leader process down has happened almost at the same time 
> On startup of  zk process znode for application was available
> *Current*
> RM going down and Job failure
> *Expected*
>  Submitted Job can fail but RM shutdown i not required



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to