[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924713#comment-13924713
 ] 

Vinod Kumar Vavilapalli commented on YARN-1410:
-----------------------------------------------

Okay, that makes sense - we can't break existing apps because of this.

Restating for others who are listening: This patch isn't adding any more code 
that what is already present w.r.t handling of appIDs. The original statement 
in the description
bq. Since the new RM has a different notion of cluster timestamp (used to 
create app id) the new RM may reject the app submission resulting in unexpected 
failure on the client side.
clearly doesn't happen at present (before the patch) because we don't have 
AppID validations in RM. The solution to the validation when we get to it is to 
make active and standby RM to recognize cluster-timestamps of (at-least some 
of) their own past generations as well as those of others - may be through 
state-store persistence.

The existing patch looks fine enough to me. Checking this in.

> Handle RM fails over after getApplicationID() and before submitApplication().
> -----------------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
> YARN-1410.10.patch, YARN-1410.10.patch, YARN-1410.2.patch, YARN-1410.2.patch, 
> YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, 
> YARN-1410.7.patch, YARN-1410.8.patch, YARN-1410.9.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to