[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914995#comment-13914995
 ] 

Vinod Kumar Vavilapalli commented on YARN-1410:
-----------------------------------------------

h4. Questions
When can appId be null in the submission-context?

h4. Documentation
Though it isn't an incompatible change, it needs extra coding on client side to 
handle fail-over. Let's make sure we document that clearly.

Can we document the appID related methods in SubmitApplicationResponse.java, 
GetNewApplicationResponse.java, ApplicationClientProtocol.getNewApplication(..) 
API and ApplicationClientProtocol.submitApplication(..) API to clearly indicate 
what clients need to do when we return a new appID.

h4. Other changes needed
Does DistributedShell need any changes to reflect the potential change in appId 
after fail-over? If so, let's fix that too here. Please file a MR ticket to fix 
MapReduce too if needed. Fixes are needed in either case if anyone caches the 
appId from GetNewApplicationResponse. 

h4. Beyond this JIRA
Orthogonal to this ticket, we need to make sure clients don't pass in invalid 
application-IDs as part of the submission-context. It can be validated by 
simply looking at our counter and may be also caching recently used appIDs 
(atleast within a single RM). I remember we had a JIRA for this somewhere. We 
also need throttles so that malicious client don't exhaust appIDs.

> Handle RM fails over after getApplicationID() and before submitApplication().
> -----------------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
> YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
> YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch, YARN-1410.8.patch, 
> YARN-1410.9.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to