[
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911156#comment-13911156
]
Bikas Saha commented on YARN-1410:
----------------------------------
Sounds good.
Lets track 2) on a separate new jira. Xuan, can you please open one.
For 1) I believe the change would be limited to allow the new RM to accept an
unknown application id in submitApplication(), under the assumption that the
previous RM had generated the the app id and the previous RM died either 1)
before the client even attempted to submit or 2) before saving the app in the
store and the client is retrying the new RM.
We can remove the idempotent etc annotations and just keep the change limited
to the initial proposal 1) create new API that accepts app-submission-context
in which the user does not supply the app id 2) allow the RM to accept an
app-submission-context that has an unknown app id. Based on the comment -
https://issues.apache.org/jira/browse/YARN-1410?focusedCommentId=13864516&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13864516
The solution will be incomplete since the old RM could have saved the state and
the new RM would find a conflicting app-submission request with an existing
app-id. Thats why we branched off into that discussion. For now, we handle this
in the following manner. 1) if the state of the existing app is NEW then just
accept the submitApplication() (effectively emulating the RetryCache) 2) if the
state of the app != NEW then fail the submitApp. OR we could choose to solve
this in the new jira being created.
> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
> Key: YARN-1410
> URL: https://issues.apache.org/jira/browse/YARN-1410
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Xuan Gong
> Attachments: YARN-1410-outline.patch, YARN-1410.1.patch,
> YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch,
> YARN-1410.5.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create
> app id) the new RM may reject the app submission resulting in unexpected
> failure on the client side.
> The same may happen for other 2 step client API operations.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)