[
https://issues.apache.org/jira/browse/YARN-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115167#comment-16115167
]
Subru Krishnan commented on YARN-6955:
--------------------------------------
Thanks [~botong] for surfacing this issue. The patch looks mostly good (pending
Yetus warnings fix) except that we should be save the registration request only
if _this.amRegistrationRequest == null_.
> Concurrent registerAM thread in Federation Interceptor
> ------------------------------------------------------
>
> Key: YARN-6955
> URL: https://issues.apache.org/jira/browse/YARN-6955
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Minor
> Attachments: YARN-6955.v1.patch
>
>
> The timeout between AM and AMRMProxy is shorter than the timeout + failOver
> between FederationInterceptor (AMRMProxy) and RM. When the first register
> thread in FI is blocked because of an RM failover, AM can timeout and resend
> register call, leading to two outstanding register call inside FI.
> Eventually when RM comes back up, one thread succeeds register and the other
> thread got an application already registered exception. FI should swallow the
> exception and return success back to AM in both threads.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]