[
https://issues.apache.org/jira/browse/YARN-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867060#comment-15867060
]
Botong Huang edited comment on YARN-6093 at 2/15/17 1:23 AM:
-------------------------------------------------------------
Talked to [~curino] offline, this is a randomized test, and sometimes it fails
because the threshold is too tight. This is irrelevant to the patch here.
was (Author: botong):
Talked to [~curino] offline, this is a randomized test, and sometimes the
threshold is too tight. This is irrelevant to the patch here.
> Invalid AMRM token exception when using FederationRMFailoverProxyProvider at
> AMRMtoken renewal during a RM failover
> -------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-6093
> URL: https://issues.apache.org/jira/browse/YARN-6093
> Project: Hadoop YARN
> Issue Type: Bug
> Components: amrmproxy, federation
> Affects Versions: YARN-2915
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Minor
> Attachments:
> YARN-6093-08dc09581230ba595ce48fe7d3bc4eb2b6f98091.v4.patch,
> YARN-6093-git08dc09581230ba595ce48fe7d3bc4eb2b6f98091.v4.patch,
> YARN-6093.v1.patch, YARN-6093-YARN-2915.v1.patch,
> YARN-6093-YARN-2915.v2.patch, YARN-6093-YARN-2915.v3.patch,
> YARN-6093-YARN-2915.v4.patch
>
>
> AMRMProxy uses expired AMRMToken to talk to RM, leading to the "Invalid
> AMRMToken" exception. The bug is triggered when both conditions are met:
> 1. RM rolls master key and renews AMRMToken for a running AM.
> 2. Existing RPC connection between AMRMProxy and RM drops and attempt to
> reconnect via failover in FederationRMFailoverProxyProvider.
> Here's what happened:
> In DefaultRequestInterceptor.init(), we create a proxy ugi, load it with the
> initial AMRMToken issued by RM, and used it for initiating rmClient. Then we
> arrive at FederationRMFailoverProxyProvider.init(), a full copy of ugi tokens
> are saved locally, create an actual RM proxy and setup the RPC connection.
> Later when RM rolls master key and issues a new AMRMToken,
> DefaultRequestInterceptor.updateAMRMToken() updates it into the proxy ugi.
> However the new token is never used, until the existing RPC connection
> between AMRMProxy and RM drops for other reasons (say master RM crashes).
> When we try to reconnect, since the service name of the new AMRMToken is not
> yet set correctly in DefaultRequestInterceptor.updateAMRMToken(), RPC found
> no valid AMRMToken when trying to setup a new connection. We first hit a
> "Client cannot authenticate via:[TOKEN]" exception. This is expected.
> Next, FederationRMFailoverProxyProvider fails over, we reset the service
> token via ClientRMProxy.getRMAddress() and reconnect. Supposedly this would
> have worked.
> However since DefaultRequestInterceptor does not use the proxy user for later
> calls to rmClient, when performing failover in
> FederationRMFailoverProxyProvider, we are not in the proxy user. Currently
> the code solve the problem by reloading the current ugi with all tokens saved
> locally in originalTokens in method addOriginalTokens(). The problem is that
> the original AMRMToken loaded is no longer accepted by RM, and thus we keep
> hitting the "Invalid AMRMToken" exception until AM fails.
> The correct way is that rather than saving the original tokens in the proxy
> ugi, we save the original ugi itself. Every time we perform failover and
> create the new RM proxy, we use the original ugi, which is always loaded with
> the up-to-date AMRMToken.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]