[
https://issues.apache.org/jira/browse/YARN-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389881#comment-16389881
]
shanyu zhao commented on YARN-8001:
-----------------------------------
The failure is a pig job. After a client submitted an application successfully,
later when it tries to query the status of the app then failed. Who is going to
re-submit the application? Are you saying when using the Yarn API to get
application it will automatically resubmit the application?
> Newly created Yarn application ID lost after RM failover
> --------------------------------------------------------
>
> Key: YARN-8001
> URL: https://issues.apache.org/jira/browse/YARN-8001
> Project: Hadoop YARN
> Issue Type: Bug
> Components: RM
> Affects Versions: 2.7.3, 2.9.0
> Reporter: shanyu zhao
> Priority: Major
>
> I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn
> application was lost after a RM failover. It looks like when handling
> Application submission, RM does not write it to the state-store (We are using
> zookeeper based state store) immediately before it respond to the client. But
> later it failed over to another RM and all write call to the state store
> failed. The new RM recovers state from the state-store, and this app is lost.
>
> The symptom is error message at client side claiming a previously submitted
> application ID does not exist:
> 2018-02-22 14:54:50,258 [JobControl] WARN
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider -
> Invocation returned exception on [rm1] :
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application
> with id 'application_1519310222933_0160' doesn't exist in RM. Please check
> that the job submission was successful.
>
> This is a timeline excerpted from the resource manager logs:
> 2018-02-22 14:54:06.7685260 headnode1 Storing application with id
> application_1519310222933_0160
> 2018-02-22 14:54:06.7685660 headnode1
> application_1519310222933_0160 State change from NEW to NEW_SAVING
> 2018-02-22 14:54:17.8924760 headnode1 Transitioning to standby state
> 2018-02-22 14:54:30.3951160 headnode0 Transitioning to active state
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]