[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862348#comment-13862348 ]
Bikas Saha commented on YARN-1410: ---------------------------------- Clarification: The client can get a reject from the RM not only during submitApplicationContext but also when it is querying for the app status after submitApplicationContext (to check that the app was accepted or not). when the RM rejects the app status query then the submitApplicationContext needs to be retried. To be clear, we are suggesting that the straight line case (normal behavior) is to not specify appId in the context. YARNClient will add it and submit the app. For older client who specify the appId, we will replace the appId upon RM failover, if the RM does not recognize this appId. In that case, how do we notify the user that the appId has changed and they need to update all their usages for the appId. Have we considered the alternative to make the RM accept the appId in the context. It can assume that the submission is being retried after failover from a previous RM. What are the cons of this? If we can make this approach work then we dont need to deprecate anything and its probably cleaner for the user since changing the appId can lead to poor user experience. > Handle client failover during 2 step client API's like app submission > --------------------------------------------------------------------- > > Key: YARN-1410 > URL: https://issues.apache.org/jira/browse/YARN-1410 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Xuan Gong > Attachments: YARN-1410.1.patch > > > App submission involves > 1) creating appId > 2) using that appId to submit an ApplicationSubmissionContext to the user. > The client may have obtained an appId from an RM, the RM may have failed > over, and the client may submit the app to the new RM. > Since the new RM has a different notion of cluster timestamp (used to create > app id) the new RM may reject the app submission resulting in unexpected > failure on the client side. > The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)