shanyu zhao created YARN-8001:
---------------------------------

             Summary: Newly created Yarn application ID lost after RM failover
                 Key: YARN-8001
                 URL: https://issues.apache.org/jira/browse/YARN-8001
             Project: Hadoop YARN
          Issue Type: Bug
          Components: RM
    Affects Versions: 2.9.0, 2.7.3
            Reporter: shanyu zhao


I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn application 
was lost after a RM failover. It looks like when handling Application 
submission, RM does not write it to the state-store (We are using zookeeper 
based state store) immediately before it respond to the client. But later it 
failed over to another RM and all write call to the state store failed. The new 
RM recovers state from the state-store, and this app is lost. 

 

The symptom is error message at client side claiming a previously submitted 
application ID does not exist:

2018-02-22 14:54:50,258 [JobControl] WARN  
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - 
Invocation returned exception on [rm1] : 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1519310222933_0160' doesn't exist in RM. Please check that 
the job submission was successful.

 

This is a timeline excerpted from the resource manager logs:

2018-02-22 14:54:06.7685260    headnode1        Storing application with id 
application_1519310222933_0160

2018-02-22 14:54:06.7685660    headnode1              
application_1519310222933_0160 State change from NEW to NEW_SAVING

2018-02-22 14:54:17.8924760    headnode1        Transitioning to standby state

2018-02-22 14:54:30.3951160    headnode0        Transitioning to active state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to