[ 
https://issues.apache.org/jira/browse/YARN-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928312#comment-16928312
 ] 

Bibin A Chundatt commented on YARN-9823:
----------------------------------------

[~lichaojacobs] YARN-8434  should help you.

> NodeManager cannot get right ResourceTrack address in Federation mode
> ---------------------------------------------------------------------
>
>                 Key: YARN-9823
>                 URL: https://issues.apache.org/jira/browse/YARN-9823
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: federation, nodemanager
>    Affects Versions: 2.9.2
>         Environment: h2. Hadoop:
> Hadoop 2.9.2 (some line number may not be right because we have merged some 
> 3.0+ patch)
> Security with Kerberos
> configure from 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.html]
> h2. Java:
> Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
> Kerberos:
>  
>  
>            Reporter: qiwei huang
>            Priority: Major
>
> {{the NM will infinitely try to connect the wrong RM's resource tracker port}}
> {quote}{{INFO [main:RetryInvocationHandler@411] - java.net.ConnectException: 
> Call From standby.rm.server/10.122.138.139 to }}{{standby.rm.server}}{{:8031 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see: 
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ResourceTrackerPBClientImpl.registerNodeManager over dev1 after 19 failover 
> attempts. Trying to failover after sleeping for 40497ms.}}
> {quote}
>  
> {{After change *yarn.client.failover-proxy-provider* to 
> *org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider*,
>  the ** NodeManager cannot find the right ResourceTracker address:}}
> {quote}{{getRMHAId:233, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfKeyForRMInstance:294, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfValueForRMInstance:302, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfValueForRMInstance:314, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getSocketAddr:3341, YarnConfiguration (org.apache.hadoop.yarn.conf)}}
> {{getRMAddress:77, ServerRMProxy (org.apache.hadoop.yarn.server.api)}}
> {{run:144, FederationRMFailoverProxyProvider$1 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{doPrivileged:-1, AccessController (java.security)}}
> {{doAs:422, Subject (javax.security.auth)}}
> {{doAs:1893, UserGroupInformation (org.apache.hadoop.security)}}
> {{getProxyInternal:141, FederationRMFailoverProxyProvider 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{performFailover:192, FederationRMFailoverProxyProvider 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{failover:217, RetryInvocationHandler$ProxyDescriptor 
> (org.apache.hadoop.io.retry)}}
> {{processRetryInfo:149, RetryInvocationHandler$Call 
> (org.apache.hadoop.io.retry)}}
> {{processWaitTimeAndRetryInfo:142, RetryInvocationHandler$Call 
> (org.apache.hadoop.io.retry)}}
> {{invokeOnce:107, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
> {{invoke:359, RetryInvocationHandler (org.apache.hadoop.io.retry)}}
> {{registerNodeManager:-1, $Proxy85 (com.sun.proxy)}}
> {{registerWithRM:378, NodeStatusUpdaterImpl 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{serviceStart:252, NodeStatusUpdaterImpl 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{start:194, AbstractService (org.apache.hadoop.service)}}
> {{serviceStart:121, CompositeService (org.apache.hadoop.service)}}
> {{start:194, AbstractService (org.apache.hadoop.service)}}
> {{initAndStartNodeManager:864, NodeManager 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{main:931, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
> {quote}
> {{the Provider will try to find the main RM address on }}*{{getRMHAId:233,}}* 
> {{but it cannot find the right address because it can just return the local 
> Address: }}{{}}
> {quote}{{if (!s.isUnresolved() && NetUtils.isLocalAddress(s.getAddress())) {}}
> {{ currentRMId = rmId.trim();}}
> {{ found++;}}
> {{}}}
> {quote}
> {{If the NM and RM is on the same node, and the this RM is in standby 
> situation, the NM will }}{{infinitely}}{{ call RPC to RM}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to