[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972318#comment-13972318
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--------------------------------------

[~xgong], As you pointed out, RetryCache doesn't ensure AtMostOnce semantics 
completely. If active RM fails, we cannot reconstruct RetryCache because ZK 
doesn't have enough information to reconstruct it. However, RetryCache ensure 
AtMostOnce in limited situation: temporal network failure or unstable network 
communication between client(e.g. AM) and server(e.g. RM) or client-side timer 
failure. I targeted the failure. HDFS-4942 describes the detail:

{quote}
In current HA mechanism with FailoverProxyProvider and non HA setups with 
RetryProxy retry a request from the RPC layer. If the retried request has 
already been processed at the namenode, the subsequent attempts fail for 
non-idempotent operations such as create, append, delete, rename etc. This will 
cause application failures during HA failover, network issues etc.
{quote}

ApplicationMasterProtocol also uses RMProxy, so same problem can occur.


> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> -------------------------------------------------------------------
>
>                 Key: YARN-1879
>                 URL: https://issues.apache.org/jira/browse/YARN-1879
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Tsuyoshi OZAWA
>            Priority: Critical
>         Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
> YARN-1879.4.patch, YARN-1879.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to