[ 
https://issues.apache.org/jira/browse/YARN-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7630:
-------------------------------
    Description: 
Symptom: after RM rolls over the master key for AMRMToken, whenever the RPC 
connection from FederationInterceptor to RM breaks due to transient network 
issue and reconnects, heartbeat to RM starts failing because of the “Invalid 
AMRMToken” exception. Whenever it hits, it happens for both home RM and 
secondary RMs. 

Related facts: 
1. When RM issues a new AMRMToken, it always send with service name field as 
empty string. RPC layer in AM side will set it properly before start using it. 
2. UGI keeps all tokens using a map from serviceName->Token. Initially 
AMRMClientUtils.createRMProxy() is used to load the first token and start the 
RM connection. 
3. When RM renew the token, YarnServerSecurityUtils.updateAMRMToken() is used 
to load it into UGI and replace the existing token (with the same serviceName 
key). 

Bug: 
The bug is that 2-AMRMClientUtils.createRMProxy() and 
3-YarnServerSecurityUtils.updateAMRMToken() is not handling the sequence 
consistently. We always need to load the token (with empty service name) into 
UGI first before we set the serviceName, so that the previous AMRMToken will be 
overridden. But 2 is doing it reversely. That’s why after RM rolls the 
amrmToken, the UGI end up with two tokens. Whenever the RPC connection break 
and reconnect, the wrong token could be picked and thus trigger the exception. 

Fix: 
Should load the AMRMToken into UGI first and then update the service name field 
for RPC

  was:
Symptom: after RM rolls over the master key for AMRMToken, whenever the RPC 
connection from FederationInterceptor to RM breaks due to transient network 
issue and reconnects, heartbeat to RM starts failing because of the “Invalid 
AMRMToken” exception. Whenever it hits, it happens for both home RM and 
secondary RMs. 

Related facts: 
1. When RM issues a new AMRMToken, it always send with service name field as 
empty string. RPC layer in AM side will set it properly before start using it. 
2. UGI keeps all tokens using a map from serviceName->Token. Initially 
AMRMClientUtils.createRMProxy() is used to load the first token and start the 
RM connection. 
3. When RM renew the token, YarnServerSecurityUtils.updateAMRMToken() is used 
to load it into UGI and replace the existing token (with the same serviceName 
key). 

Bug: 
The bug is that 2-AMRMClientUtils.createRMProxy() and 
3-YarnServerSecurityUtils.updateAMRMToken() is not handling the sequence 
consistently. We always need to load the token (with empty service name) into 
UGI first before we set the serviceName, so that the previous AMRMToken will be 
overridden. But 2 is doing it reversely. That’s why after RM rolls the 
amrmToken, the UGI end up with two tokens. Whenever the RPC connection break 
and reconnect, the wrong token could be picked and thus trigger the exception. 


> Fix AMRMToken handling in AMRMProxy
> -----------------------------------
>
>                 Key: YARN-7630
>                 URL: https://issues.apache.org/jira/browse/YARN-7630
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>
> Symptom: after RM rolls over the master key for AMRMToken, whenever the RPC 
> connection from FederationInterceptor to RM breaks due to transient network 
> issue and reconnects, heartbeat to RM starts failing because of the “Invalid 
> AMRMToken” exception. Whenever it hits, it happens for both home RM and 
> secondary RMs. 
> Related facts: 
> 1. When RM issues a new AMRMToken, it always send with service name field as 
> empty string. RPC layer in AM side will set it properly before start using 
> it. 
> 2. UGI keeps all tokens using a map from serviceName->Token. Initially 
> AMRMClientUtils.createRMProxy() is used to load the first token and start the 
> RM connection. 
> 3. When RM renew the token, YarnServerSecurityUtils.updateAMRMToken() is used 
> to load it into UGI and replace the existing token (with the same serviceName 
> key). 
> Bug: 
> The bug is that 2-AMRMClientUtils.createRMProxy() and 
> 3-YarnServerSecurityUtils.updateAMRMToken() is not handling the sequence 
> consistently. We always need to load the token (with empty service name) into 
> UGI first before we set the serviceName, so that the previous AMRMToken will 
> be overridden. But 2 is doing it reversely. That’s why after RM rolls the 
> amrmToken, the UGI end up with two tokens. Whenever the RPC connection break 
> and reconnect, the wrong token could be picked and thus trigger the 
> exception. 
> Fix: 
> Should load the AMRMToken into UGI first and then update the service name 
> field for RPC



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to