qiwei huang created YARN-9823:
---------------------------------

             Summary: NodeManager cannot get right ResourceTrack address in 
Federation mode
                 Key: YARN-9823
                 URL: https://issues.apache.org/jira/browse/YARN-9823
             Project: Hadoop YARN
          Issue Type: Bug
          Components: federation, nodemanager
    Affects Versions: 2.9.2
         Environment: h2. Hadoop:

Hadoop 2.9.2 (some line number may not be right because we have merged some 
3.0+ patch)

Security with Kerberos

configure from 
[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.html]
h2. Java:

Java(TM) SE Runtime Environment (build 1.8.0_77-b03)

Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)

Kerberos:

 

 
            Reporter: qiwei huang


{{the NM will infinitely try to connect the wrong RM's resource tracker port}}
{quote}{{INFO [main:RetryInvocationHandler@411] - java.net.ConnectException: 
Call From standby.rm.server/10.122.138.139 to }}{{standby.rm.server}}{{:8031 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while 
invoking ResourceTrackerPBClientImpl.registerNodeManager over dev1 after 19 
failover attempts. Trying to failover after sleeping for 40497ms.}}
{quote}
 

{{After change *yarn.client.failover-proxy-provider* to 
*org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider*,
 the ** NodeManager cannot find the right ResourceTracker address:}}
{quote}{{getRMHAId:233, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfKeyForRMInstance:294, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfValueForRMInstance:302, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfValueForRMInstance:314, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getSocketAddr:3341, YarnConfiguration (org.apache.hadoop.yarn.conf)}}
{{getRMAddress:77, ServerRMProxy (org.apache.hadoop.yarn.server.api)}}
{{run:144, FederationRMFailoverProxyProvider$1 
(org.apache.hadoop.yarn.server.federation.failover)}}
{{doPrivileged:-1, AccessController (java.security)}}
{{doAs:422, Subject (javax.security.auth)}}
{{doAs:1893, UserGroupInformation (org.apache.hadoop.security)}}
{{getProxyInternal:141, FederationRMFailoverProxyProvider 
(org.apache.hadoop.yarn.server.federation.failover)}}
{{performFailover:192, FederationRMFailoverProxyProvider 
(org.apache.hadoop.yarn.server.federation.failover)}}
{{failover:217, RetryInvocationHandler$ProxyDescriptor 
(org.apache.hadoop.io.retry)}}
{{processRetryInfo:149, RetryInvocationHandler$Call 
(org.apache.hadoop.io.retry)}}
{{processWaitTimeAndRetryInfo:142, RetryInvocationHandler$Call 
(org.apache.hadoop.io.retry)}}
{{invokeOnce:107, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
{{invoke:359, RetryInvocationHandler (org.apache.hadoop.io.retry)}}
{{registerNodeManager:-1, $Proxy85 (com.sun.proxy)}}
{{registerWithRM:378, NodeStatusUpdaterImpl 
(org.apache.hadoop.yarn.server.nodemanager)}}
{{serviceStart:252, NodeStatusUpdaterImpl 
(org.apache.hadoop.yarn.server.nodemanager)}}
{{start:194, AbstractService (org.apache.hadoop.service)}}
{{serviceStart:121, CompositeService (org.apache.hadoop.service)}}
{{start:194, AbstractService (org.apache.hadoop.service)}}
{{initAndStartNodeManager:864, NodeManager 
(org.apache.hadoop.yarn.server.nodemanager)}}
{{main:931, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
{quote}
{{the Provider will try to find the main RM address on }}*{{getRMHAId:233,}}* 
{{but it cannot find the right address because it can just return the local 
Address: }}{{}}
{quote}{{if (!s.isUnresolved() && NetUtils.isLocalAddress(s.getAddress())) {}}
{{ currentRMId = rmId.trim();}}
{{ found++;}}
{{}}}
{quote}
{{If the NM and RM is on the same node, and the this RM is in standby 
situation, the NM will }}{{infinitely}}{{ call RPC to RM}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to