[
https://issues.apache.org/jira/browse/YARN-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827238#comment-15827238
]
Subru Krishnan commented on YARN-6093:
--------------------------------------
Thanks [~botong] for the patch. At high level, it looks good but it'll be great
if we can test it e2e in a cluster as this is a nuanced issue.
Also the checkstyle issue seems fairly trivial to fix.
> Invalid AMRM token exception when using FederationRMFailoverProxyProvider at
> AMRMtoken renewal during a RM failover
> -------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-6093
> URL: https://issues.apache.org/jira/browse/YARN-6093
> Project: Hadoop YARN
> Issue Type: Bug
> Components: amrmproxy, federation
> Affects Versions: YARN-2915
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Minor
> Attachments: YARN-6093.v1.patch, YARN-6093-YARN-2915.v1.patch,
> YARN-6093-YARN-2915.v2.patch
>
>
> AMRMProxy uses expired AMRMToken to talk to RM, leading to the "Invalid
> AMRMToken" exception. The bug is triggered when both conditions are met:
> 1. RM rolls master key and renews AMRMToken for a running AM.
> 2. Existing RPC connection between AMRMProxy and RM drops and attempt to
> reconnect via failover in FederationRMFailoverProxyProvider.
> Here's what happened:
> In DefaultRequestInterceptor.init(), we create a proxy ugi, load it with the
> initial AMRMToken issued by RM, and used it for initiating rmClient. Then we
> arrive at FederationRMFailoverProxyProvider.init(), a full copy of ugi tokens
> are saved locally, create an actual RM proxy and setup the RPC connection.
> Later when RM rolls master key and issues a new AMRMToken,
> DefaultRequestInterceptor.updateAMRMToken() updates it into the proxy ugi.
> However the new token is never used, until the existing RPC connection
> between AMRMProxy and RM drops for other reasons (say master RM crashes).
> When we try to reconnect, since the service name of the new AMRMToken is not
> yet set correctly in DefaultRequestInterceptor.updateAMRMToken(), RPC found
> no valid AMRMToken when trying to setup a new connection. We first hit a
> "Client cannot authenticate via:[TOKEN]" exception. This is expected.
> Next, FederationRMFailoverProxyProvider fails over, we reset the service
> token via ClientRMProxy.getRMAddress() and reconnect. Supposedly this would
> have worked.
> However since DefaultRequestInterceptor does not use the proxy user for later
> calls to rmClient, when performing failover in
> FederationRMFailoverProxyProvider, we are not in the proxy user. Currently
> the code solve the problem by reloading the current ugi with all tokens saved
> locally in originalTokens in method addOriginalTokens(). The problem is that
> the original AMRMToken loaded is no longer accepted by RM, and thus we keep
> hitting the "Invalid AMRMToken" exception until AM fails.
> The correct way is that rather than saving the original tokens in the proxy
> ugi, we save the original ugi itself. Every time we perform failover and
> create the new RM proxy, we use the original ugi, which is always loaded with
> the up-to-date AMRMToken.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]