[ https://issues.apache.org/jira/browse/YARN-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877167#comment-15877167 ]
Jian He commented on YARN-6093: ------------------------------- lgtm, thanks [~botong], [~subru] > Invalid AMRM token exception when using FederationRMFailoverProxyProvider at > AMRMtoken renewal during a RM failover > ------------------------------------------------------------------------------------------------------------------- > > Key: YARN-6093 > URL: https://issues.apache.org/jira/browse/YARN-6093 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy, federation > Affects Versions: YARN-2915 > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Minor > Attachments: > YARN-6093-08dc09581230ba595ce48fe7d3bc4eb2b6f98091.v4.patch, > YARN-6093-git08dc09581230ba595ce48fe7d3bc4eb2b6f98091.v4.patch, > YARN-6093.v1.patch, YARN-6093-YARN-2915.v1.patch, > YARN-6093-YARN-2915.v2.patch, YARN-6093-YARN-2915.v3.patch, > YARN-6093-YARN-2915.v4.patch, YARN-6093-YARN-2915.v5.patch > > > AMRMProxy uses expired AMRMToken to talk to RM, leading to the "Invalid > AMRMToken" exception. The bug is triggered when both conditions are met: > 1. RM rolls master key and renews AMRMToken for a running AM. > 2. Existing RPC connection between AMRMProxy and RM drops and attempt to > reconnect via failover in FederationRMFailoverProxyProvider. > Here's what happened: > In DefaultRequestInterceptor.init(), we create a proxy ugi, load it with the > initial AMRMToken issued by RM, and used it for initiating rmClient. Then we > arrive at FederationRMFailoverProxyProvider.init(), a full copy of ugi tokens > are saved locally, create an actual RM proxy and setup the RPC connection. > Later when RM rolls master key and issues a new AMRMToken, > DefaultRequestInterceptor.updateAMRMToken() updates it into the proxy ugi. > However the new token is never used, until the existing RPC connection > between AMRMProxy and RM drops for other reasons (say master RM crashes). > When we try to reconnect, since the service name of the new AMRMToken is not > yet set correctly in DefaultRequestInterceptor.updateAMRMToken(), RPC found > no valid AMRMToken when trying to setup a new connection. We first hit a > "Client cannot authenticate via:[TOKEN]" exception. This is expected. > Next, FederationRMFailoverProxyProvider fails over, we reset the service > token via ClientRMProxy.getRMAddress() and reconnect. Supposedly this would > have worked. > However since DefaultRequestInterceptor does not use the proxy user for later > calls to rmClient, when performing failover in > FederationRMFailoverProxyProvider, we are not in the proxy user. Currently > the code solve the problem by reloading the current ugi with all tokens saved > locally in originalTokens in method addOriginalTokens(). The problem is that > the original AMRMToken loaded is no longer accepted by RM, and thus we keep > hitting the "Invalid AMRMToken" exception until AM fails. > The correct way is that rather than saving the original tokens in the proxy > ugi, we save the original ugi itself. Every time we perform failover and > create the new RM proxy, we use the original ugi, which is always loaded with > the up-to-date AMRMToken. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org