[ 
https://issues.apache.org/jira/browse/YARN-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8673:
-------------------------------
    Attachment: YARN-8673.v1.patch

> [AMRMProxy] More robust responseId resync after an YarnRM master slave switch
> -----------------------------------------------------------------------------
>
>                 Key: YARN-8673
>                 URL: https://issues.apache.org/jira/browse/YARN-8673
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: amrmproxy
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-8673.v1.patch
>
>
> After master slave switch of YarnRM, an _ApplicationNotRegisteredException_ 
> will be thrown from the new YarnRM. AM will re-regsiter and reset the 
> responseId to zero. _AMRMClientRelayer_ inside _FederationInterceptor_ 
> follows the same protocol, and does the automatic re-register and responseId 
> resync. However, when exceptions or temporary network issue happens in the 
> allocate call after re-register, the resync logic might be broken. This patch 
> improves the robustness of the process by parsing the expected repsonseId 
> from YarnRM exception message. So that whenever the responseId is out of sync 
> for whatever reason, we can automatically resync and move on. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to