[ 
https://issues.apache.org/jira/browse/YARN-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-6093:
-------------------------------
    Attachment:     (was: YARN-6093-YARN-2915.v4.patch)

> Invalid AMRM token exception when using FederationRMFailoverProxyProvider at 
> AMRMtoken renewal during a RM failover
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6093
>                 URL: https://issues.apache.org/jira/browse/YARN-6093
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: amrmproxy, federation
>    Affects Versions: YARN-2915
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>         Attachments: 
> YARN-6093-08dc09581230ba595ce48fe7d3bc4eb2b6f98091.v4.patch, 
> YARN-6093-git08dc09581230ba595ce48fe7d3bc4eb2b6f98091.v4.patch, 
> YARN-6093.v1.patch, YARN-6093-YARN-2915.v1.patch, 
> YARN-6093-YARN-2915.v2.patch, YARN-6093-YARN-2915.v3.patch, 
> YARN-6093-YARN-2915.v4.patch
>
>
> AMRMProxy uses expired AMRMToken to talk to RM, leading to the "Invalid 
> AMRMToken" exception. The bug is triggered when both conditions are met: 
> 1. RM rolls master key and renews AMRMToken for a running AM.
> 2. Existing RPC connection between AMRMProxy and RM drops and attempt to 
> reconnect via failover in FederationRMFailoverProxyProvider. 
> Here's what happened: 
> In DefaultRequestInterceptor.init(), we create a proxy ugi, load it with the 
> initial AMRMToken issued by RM, and used it for initiating rmClient. Then we 
> arrive at FederationRMFailoverProxyProvider.init(), a full copy of ugi tokens 
> are saved locally, create an actual RM proxy and setup the RPC connection. 
> Later when RM rolls master key and issues a new AMRMToken, 
> DefaultRequestInterceptor.updateAMRMToken() updates it into the proxy ugi. 
> However the new token is never used, until the existing RPC connection 
> between AMRMProxy and RM drops for other reasons (say master RM crashes). 
> When we try to reconnect, since the service name of the new AMRMToken is not 
> yet set correctly in DefaultRequestInterceptor.updateAMRMToken(), RPC found 
> no valid AMRMToken when trying to setup a new connection. We first hit a 
> "Client cannot authenticate via:[TOKEN]" exception. This is expected. 
> Next, FederationRMFailoverProxyProvider fails over, we reset the service 
> token via ClientRMProxy.getRMAddress() and reconnect. Supposedly this would 
> have worked. 
> However since DefaultRequestInterceptor does not use the proxy user for later 
> calls to rmClient, when performing failover in 
> FederationRMFailoverProxyProvider, we are not in the proxy user. Currently 
> the code solve the problem by reloading the current ugi with all tokens saved 
> locally in originalTokens in method addOriginalTokens(). The problem is that 
> the original AMRMToken loaded is no longer accepted by RM, and thus we keep 
> hitting the "Invalid AMRMToken" exception until AM fails. 
> The correct way is that rather than saving the original tokens in the proxy 
> ugi, we save the original ugi itself. Every time we perform failover and 
> create the new RM proxy, we use the original ugi, which is always loaded with 
> the up-to-date AMRMToken. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to