Botong Huang created YARN-7630:
----------------------------------
Summary: Fix AMRMToken handling in AMRMProxy
Key: YARN-7630
URL: https://issues.apache.org/jira/browse/YARN-7630
Project: Hadoop YARN
Issue Type: Bug
Reporter: Botong Huang
Assignee: Botong Huang
Priority: Minor
Symptom: after RM rolls over the master key for AMRMToken, whenever the RPC
connection from FederationInterceptor to RM breaks due to transient network
issue and reconnects, heartbeat to RM starts failing because of the “Invalid
AMRMToken” exception. Whenever it hits, it happens for both home RM and
secondary RMs.
Related facts:
1. When RM issues a new AMRMToken, it always send with service name field as
empty string. RPC layer in AM side will set it properly before start using it.
2. UGI keeps all tokens using a map from serviceName->Token. Initially
AMRMClientUtils.createRMProxy() is used to load the first token and start the
RM connection.
3. When RM renew the token, YarnServerSecurityUtils.updateAMRMToken() is used
to load it into UGI and replace the existing token (with the same serviceName
key).
Bug:
The bug is that 2-AMRMClientUtils.createRMProxy() and
3-YarnServerSecurityUtils.updateAMRMToken() is not handling the sequence
consistently. We always need to load the token (with empty service name) into
UGI first before we set the serviceName, so that the previous AMRMToken will be
overridden. But 2 is doing it reversely. That’s why after RM rolls the
amrmToken, the UGI end up with two tokens. Whenever the RPC connection break
and reconnect, the wrong token could be picked and thus trigger the exception.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]