[
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972318#comment-13972318
]
Tsuyoshi OZAWA commented on YARN-1879:
--------------------------------------
[~xgong], As you pointed out, RetryCache doesn't ensure AtMostOnce semantics
completely. If active RM fails, we cannot reconstruct RetryCache because ZK
doesn't have enough information to reconstruct it. However, RetryCache ensure
AtMostOnce in limited situation: temporal network failure or unstable network
communication between client(e.g. AM) and server(e.g. RM) or client-side timer
failure. I targeted the failure. HDFS-4942 describes the detail:
{quote}
In current HA mechanism with FailoverProxyProvider and non HA setups with
RetryProxy retry a request from the RPC layer. If the retried request has
already been processed at the namenode, the subsequent attempts fail for
non-idempotent operations such as create, append, delete, rename etc. This will
cause application failures during HA failover, network issues etc.
{quote}
ApplicationMasterProtocol also uses RMProxy, so same problem can occur.
> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> -------------------------------------------------------------------
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Jian He
> Assignee: Tsuyoshi OZAWA
> Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch,
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch,
> YARN-1879.4.patch, YARN-1879.5.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)