[
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599001#comment-14599001
]
zhihai xu commented on YARN-3798:
---------------------------------
[~ozawa], thanks for the document.
bq. When the delayed packet arrives at the first server, the old server detects
that the session has moved, and closes the client connection.
I didn't see this happen based on the logs. The real scenario based on the logs
is the client connection to ZK Follower is not closed until the session is
closed. This may be a bug in ZooKeeper server, I create ZOOKEEPER-2219 for this
issue.
I think it will be better to not make change for SessionMovedException until
ZOOKEEPER-2219 is fixed, because we may have regression for
SessionMovedException retry. Based on the logs, I think we can recover from
SessionMovedException by closing old session and creating a new session.
The followings are the logs:
logs from RM
{code}
2015-03-16 09:46:04,009 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server c315yhk/?.?.?.66:2181, sessionid =
0x14be28f50f4419d, negotiated timeout = 10000
2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 6670ms for sessionid
0x14be28f50f4419d, closing socket connection and attempting reconnect
2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established to c045dkh/?.?.?.67:2181, initiating session
2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 3336ms for sessionid
0x14be28f50f4419d, closing socket connection and attempting reconnect
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established to c470udy/?.?.?.65:2181, initiating session
2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server c470udy/?.?.?.65:2181, sessionid =
0x14be28f50f4419d, negotiated timeout = 10000
2015-03-16 10:59:45,693 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode =
Session moved
at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$500(ZKRMStateStore.java:75)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:945)
2015-03-16 10:59:45,694 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed
out ZK retries. Giving up!
2015-03-16 10:59:45,697 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode =
Session moved
at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:868)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:885)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationStateInternal(ZKRMStateStore.java:578)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:627)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
2015-03-16 10:59:45,697 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed
out ZK retries. Giving up!
2015-03-16 10:59:45,707 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode =
Session moved
at org.apache.zookeeper.KeeperException.create(KeeperException.java:131)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:868)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:885)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:621)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
2015-03-16 10:59:45,708 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed
out ZK retries. Giving up!
2015-03-16 10:59:45,710 INFO org.apache.zookeeper.ZooKeeper: Session:
0x14be28f50f4419d closed
{code}
logs from ZK Leader:
{code}
2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer:
Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer:
Established session 0x14be28f50f4419d with negotiated timeout 10000 for client
/?.?.?.65:50271
2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn:
Exception causing close of session 0x14be28f50f4419d due to
java.io.IOException: Broken pipe
2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed
socket connection for client /?.?.?.65:50271 which had sessionid
0x14be28f50f4419d
2015-03-16 10:59:45,693 INFO org.apache.zookeeper.server.PrepRequestProcessor:
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d
type:multi cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,695 INFO org.apache.zookeeper.server.PrepRequestProcessor:
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d
type:multi cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,700 INFO org.apache.zookeeper.server.PrepRequestProcessor:
Got user-level KeeperException when processing sessionid:0x14be28f50f4419d
type:multi cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting
remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
2015-03-16 10:59:45,710 INFO org.apache.zookeeper.server.PrepRequestProcessor:
Processed session termination for sessionid: 0x14be28f50f4419d
{code}
logs from ZK Follower:
{code}
2015-03-16 10:59:44,673 INFO org.apache.zookeeper.server.NIOServerCnxnFactory:
Accepted socket connection from /?.?.?.65:42777
2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer:
Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner:
Revalidating client: 0x14be28f50f4419d
2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer:
Established session 0x14be28f50f4419d with negotiated timeout 10000 for client
/?.?.?.65:42777
2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed
socket connection for client /?.?.?.65:42777 which had sessionid
0x14be28f50f4419d
{code}
> ZKRMStateStore shouldn't create new session without occurrance of
> SESSIONEXPIED
> -------------------------------------------------------------------------------
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
> Reporter: Bibin A Chundatt
> Assignee: Varun Saxena
> Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch,
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error
> updating appAttempt: appattempt_1433764310492_7152_000001
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,898 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating
> info for app: application_1433764310492_7152
> 2015-06-09 10:09:44,898 FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,920 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 1
> {code}
> Zk leader process down has happened almost at the same time
> On startup of zk process znode for application was available
> *Current*
> RM going down and Job failure
> *Expected*
> Submitted Job can fail but RM shutdown i not required
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)