[
https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020032#comment-14020032
]
Xuan Gong commented on YARN-1779:
---------------------------------
Unfortunately, we do have AMRMToken while the RMs failover. The service name
does not set properly during the failover. That will cause the authentication
failure.
For example, we have two RMs, rm1 and rm2. Assume rm2 is active now, the
applicationMaster will create the RPC connection to RM1 first (In this process,
it will set the service name as RM1's address for the AMRMToken), and save the
rm1'proxy object. But right now, the RM1 is standby, then it will failover to
RM2, and do the same process but save rm2's proxy object. Currently, it will
reset the service name as RM2's address for the AMRMToken. It works fine for
now. When the failover happens again, it will failover to RM1. But at this
time, it will directly read the rm1's proxy object, and it will *not* reset the
service name. In this case, the service name is still RM2's address which will
cause the authentication failure when it tries to authenticate with RM1.
> Handle AMRMTokens across RM failover
> ------------------------------------
>
> Key: YARN-1779
> URL: https://issues.apache.org/jira/browse/YARN-1779
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: Karthik Kambatla
> Priority: Critical
> Labels: ha
>
> Verify if AMRMTokens continue to work against RM failover. If not, we will
> have to do something along the lines of YARN-986.
--
This message was sent by Atlassian JIRA
(v6.2#6252)