[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862348#comment-13862348
 ] 

Bikas Saha commented on YARN-1410:
----------------------------------

Clarification: The client can get a reject from the RM not only during 
submitApplicationContext but also when it is querying for the app status after 
submitApplicationContext (to check that the app was accepted or not). when the 
RM rejects the app status query then the submitApplicationContext needs to be 
retried.

To be clear, we are suggesting that the straight line case (normal behavior) is 
to not specify appId in the context. YARNClient will add it and submit the app. 
For older client who specify the appId, we will replace the appId upon RM 
failover, if the RM does not recognize this appId. In that case, how do we 
notify the user that the appId has changed and they need to update all their 
usages for the appId.

Have we considered the alternative to make the RM accept the appId in the 
context. It can assume that the submission is being retried after failover from 
a previous RM. What are the cons of this? If we can make this approach work 
then we dont need to deprecate anything and its probably cleaner for the user 
since changing the appId can lead to poor user experience.

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410.1.patch
>
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to