[ 
https://issues.apache.org/jira/browse/YARN-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-7630:
---------------------------------
    Summary: Fix AMRMToken rollover handling in AMRMProxy  (was: Fix AMRMToken 
handling in AMRMProxy)

> Fix AMRMToken rollover handling in AMRMProxy
> --------------------------------------------
>
>                 Key: YARN-7630
>                 URL: https://issues.apache.org/jira/browse/YARN-7630
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>         Attachments: YARN-7630.v1.patch, YARN-7630.v1.patch
>
>
> Symptom: after RM rolls over the master key for AMRMToken, whenever the RPC 
> connection from FederationInterceptor to RM breaks due to transient network 
> issue and reconnects, heartbeat to RM starts failing because of the “Invalid 
> AMRMToken” exception. Whenever it hits, it happens for both home RM and 
> secondary RMs. 
> Related facts: 
> 1. When RM issues a new AMRMToken, it always send with service name field as 
> empty string. RPC layer in AM side will set it properly before start using 
> it. 
> 2. UGI keeps all tokens using a map from serviceName->Token. Initially 
> AMRMClientUtils.createRMProxy() is used to load the first token and start the 
> RM connection. 
> 3. When RM renew the token, YarnServerSecurityUtils.updateAMRMToken() is used 
> to load it into UGI and replace the existing token (with the same serviceName 
> key). 
> Bug: 
> The bug is that 2-AMRMClientUtils.createRMProxy() and 
> 3-YarnServerSecurityUtils.updateAMRMToken() is not handling the sequence 
> consistently. We always need to load the token (with empty service name) into 
> UGI first before we set the serviceName, so that the previous AMRMToken will 
> be overridden. But 2 is doing it reversely. That’s why after RM rolls the 
> amrmToken, the UGI end up with two tokens. Whenever the RPC connection break 
> and reconnect, the wrong token could be picked and thus trigger the 
> exception. 
> Fix: 
> Should load the AMRMToken into UGI first and then update the service name 
> field for RPC



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to