[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911156#comment-13911156
 ] 

Bikas Saha commented on YARN-1410:
----------------------------------

Sounds good.

Lets track 2) on a separate new jira. Xuan, can you please open one.

For 1) I believe the change would be limited to allow the new RM to accept an 
unknown application id in submitApplication(), under the assumption that the 
previous RM had generated the the app id and the previous RM died either 1) 
before the client even attempted to submit or 2) before saving the app in the 
store and the client is retrying the new RM.

We can remove the idempotent etc annotations and just keep the change limited 
to the initial proposal 1) create new API that accepts app-submission-context 
in which the user does not supply the app id 2) allow the RM to accept an 
app-submission-context that has an unknown app id. Based on the comment - 
https://issues.apache.org/jira/browse/YARN-1410?focusedCommentId=13864516&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13864516

The solution will be incomplete since the old RM could have saved the state and 
the new RM would find a conflicting app-submission request with an existing 
app-id. Thats why we branched off into that discussion. For now, we handle this 
in the following manner. 1) if the state of the existing app is NEW then just 
accept the submitApplication() (effectively emulating the RetryCache) 2) if the 
state of the app != NEW then fail the submitApp. OR we could choose to solve 
this in the new jira being created.



> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
> YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, 
> YARN-1410.5.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to