[ 
https://issues.apache.org/jira/browse/YARN-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389881#comment-16389881
 ] 

shanyu zhao commented on YARN-8001:
-----------------------------------

The failure is a pig job. After a client submitted an application successfully, 
later when it tries to query the status of the app then failed. Who is going to 
re-submit the application? Are you saying when using the Yarn API to get 
application it will automatically resubmit the application?

> Newly created Yarn application ID lost after RM failover
> --------------------------------------------------------
>
>                 Key: YARN-8001
>                 URL: https://issues.apache.org/jira/browse/YARN-8001
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 2.7.3, 2.9.0
>            Reporter: shanyu zhao
>            Priority: Major
>
> I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn 
> application was lost after a RM failover. It looks like when handling 
> Application submission, RM does not write it to the state-store (We are using 
> zookeeper based state store) immediately before it respond to the client. But 
> later it failed over to another RM and all write call to the state store 
> failed. The new RM recovers state from the state-store, and this app is lost. 
>  
> The symptom is error message at client side claiming a previously submitted 
> application ID does not exist:
> 2018-02-22 14:54:50,258 [JobControl] WARN  
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - 
> Invocation returned exception on [rm1] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1519310222933_0160' doesn't exist in RM. Please check 
> that the job submission was successful.
>  
> This is a timeline excerpted from the resource manager logs:
> 2018-02-22 14:54:06.7685260    headnode1        Storing application with id 
> application_1519310222933_0160
> 2018-02-22 14:54:06.7685660    headnode1              
> application_1519310222933_0160 State change from NEW to NEW_SAVING
> 2018-02-22 14:54:17.8924760    headnode1        Transitioning to standby state
> 2018-02-22 14:54:30.3951160    headnode0        Transitioning to active state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to