[
https://issues.apache.org/jira/browse/YARN-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588003#comment-16588003
]
Botong Huang commented on YARN-8673:
------------------------------------
Thanks [~giovanni.fumarola]!
> [AMRMProxy] More robust responseId resync after an YarnRM master slave switch
> -----------------------------------------------------------------------------
>
> Key: YARN-8673
> URL: https://issues.apache.org/jira/browse/YARN-8673
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: amrmproxy
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Major
> Attachments: YARN-8673-branch-2.v2.patch, YARN-8673.v1.patch,
> YARN-8673.v2.patch
>
>
> After master slave switch of YarnRM, an _ApplicationNotRegisteredException_
> will be thrown from the new YarnRM. AM will re-regsiter and reset the
> responseId to zero. _AMRMClientRelayer_ inside _FederationInterceptor_
> follows the same protocol, and does the automatic re-register and responseId
> resync. However, when exceptions or temporary network issue happens in the
> allocate call after re-register, the resync logic might be broken. This patch
> improves the robustness of the process by parsing the expected repsonseId
> from YarnRM exception message. So that whenever the responseId is out of sync
> for whatever reason, we can automatically resync and move on.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]