shanyu zhao created YARN-8001:
---------------------------------
Summary: Newly created Yarn application ID lost after RM failover
Key: YARN-8001
URL: https://issues.apache.org/jira/browse/YARN-8001
Project: Hadoop YARN
Issue Type: Bug
Components: RM
Affects Versions: 2.9.0, 2.7.3
Reporter: shanyu zhao
I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn application
was lost after a RM failover. It looks like when handling Application
submission, RM does not write it to the state-store (We are using zookeeper
based state store) immediately before it respond to the client. But later it
failed over to another RM and all write call to the state store failed. The new
RM recovers state from the state-store, and this app is lost.
The symptom is error message at client side claiming a previously submitted
application ID does not exist:
2018-02-22 14:54:50,258 [JobControl] WARN
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider -
Invocation returned exception on [rm1] :
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application
with id 'application_1519310222933_0160' doesn't exist in RM. Please check that
the job submission was successful.
This is a timeline excerpted from the resource manager logs:
2018-02-22 14:54:06.7685260 headnode1 Storing application with id
application_1519310222933_0160
2018-02-22 14:54:06.7685660 headnode1
application_1519310222933_0160 State change from NEW to NEW_SAVING
2018-02-22 14:54:17.8924760 headnode1 Transitioning to standby state
2018-02-22 14:54:30.3951160 headnode0 Transitioning to active state
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]