[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870094#comment-13870094
 ] 

Xuan Gong commented on YARN-1410:
---------------------------------

bq. A slightly orthogonal question - the AtMostOnce/Idempotent annotations are 
honored only in the failover case and not when RM is restart. May be, we should 
fix up the RetryPolicy in RMProxy to use these annotations in shouldRetry. We 
should probably do this in a separate JIRA though.

Yes. RM restart did not use the AtMostOnce/Idempotent annotation. But, I am not 
sure why we need to use these annotation in RM restart. Currently, at RM 
restart, we use RetryPolicies.retryByException to get RetryPolicy (We only 
handle ConnectionException and IOException), and use 
RetryPolicies.retryUpToMaximumTimeWithFixedSleep to specify the RetryPolicy 
which will give the RetryDecision.RETRY. Also, we use 
DefaultFailoverProxyProvider as the Proxy provider. Those are not enough for 
covering the RM restart case ?

Also, the AtMostOnce and Idempotent annotation are only used when  
RetryDecision is FAILOVER_AND_RETRY. So, this is another reason why we do not 
have them in RM restart case (For the RM restart, the valid RetryDecision is 
RETRY).

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
> YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to