Greg Senia created HADOOP-15250:
-----------------------------------

             Summary: MultiHomed Server Network Cluster Network 
                 Key: HADOOP-15250
                 URL: https://issues.apache.org/jira/browse/HADOOP-15250
             Project: Hadoop Common
          Issue Type: Improvement
          Components: ipc, net
    Affects Versions: 3.0.0, 2.9.0, 2.7.3
            Reporter: Greg Senia


We run our Hadoop clusters with two networks attached to each node. These 
network are as follows a server network that is firewalled with firewalld 
allowing inbound traffic: only SSH and things like Knox and Hiveserver2 and the 
HTTP YARN RM/ATS and MR History Server. The second network is the cluster 
network on the second network interface this uses Jumbo frames and is open no 
restrictions and allows all cluster traffic to flow between nodes. 

 

To resolve DNS within the Hadoop Cluster we use DNS Views via BIND so if the 
traffic is originating from nodes with cluster networks we return the internal 
DNS record for the nodes. This all works fine with all the multi-homing 
features added to Hadoop 2.x

 Some logic around views:

a. The internal view is used by cluster machines when performing lookups. So 
hosts on the cluster network should get answers from the internal view in DNS
b. The external view is used by non-local-cluster machines when performing 
lookups. So hosts not on the cluster network should get answers from the 
external view in DNS



 

So this brings me to our problem. We created some firewall rules to allow 
inbound traffic from each clusters server network to allow distcp to occur. But 
we noticed a problem almost immediately that when YARN attempted to talk to the 
Remote Cluster it was binding outgoing traffic to the cluster network interface 
which IS NOT routable. So after researching the code we noticed the following 
in NetUtils.java and Client.java 

Basically in Client.java it looks as if it takes whatever the hostname is and 
attempts to bind to whatever the hostname is resolved to. This is not valid in 
a multi-homed network with one routable interface and one non routable 
interface. After reading through the java.net.Socket documentation it is valid 
to perform socket.bind(null) which will allow the OS routing table and DNS to 
send the traffic to the correct interface. I will also attach the nework traces 
and a test patch for 2.7.x and 3.x code base. I have this test fix below in my 
Hadoop Test Cluster.

Client.java:

      
|/*|
| | * Bind the socket to the host specified in the principal name of the|
| | * client, to ensure Server matching address of the client connection|
| | * to host name in principal passed.|
| | */|
| |InetSocketAddress bindAddr = null;|
| |if (ticket != null && ticket.hasKerberosCredentials()) {|
| |KerberosInfo krbInfo =|
| |remoteId.getProtocol().getAnnotation(KerberosInfo.class);|
| |if (krbInfo != null) {|
| |String principal = ticket.getUserName();|
| |String host = SecurityUtil.getHostFromPrincipal(principal);|
| |// If host name is a valid local address then bind socket to it|
| |{color:#FF0000}*InetAddress localAddr = 
NetUtils.getLocalInetAddress(host);*{color}|
|{color:#FF0000} ** {color}|if (localAddr != null) {|
| |this.socket.setReuseAddress(true);|
| |if (LOG.isDebugEnabled()) {|
| |LOG.debug("Binding " + principal + " to " + localAddr);|
| |}|
| |*{color:#FF0000}bindAddr = new InetSocketAddress(localAddr, 0);{color}*|
| *{color:#FF0000}{color}* |*{color:#FF0000}}{color}*|
| |}|
| |}|

 

So in my Hadoop 2.7.x Cluster I made the following changes and traffic flows 
correctly out the correct interfaces:

 

diff --git 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java

index e1be271..c5b4a42 100644

--- 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java

+++ 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java

@@ -305,6 +305,9 @@

   public static final String  IPC_CLIENT_FALLBACK_TO_SIMPLE_AUTH_ALLOWED_KEY = 
"ipc.client.fallback-to-simple-auth-allowed";

   public static final boolean 
IPC_CLIENT_FALLBACK_TO_SIMPLE_AUTH_ALLOWED_DEFAULT = false;

 

+  public static final String  IPC_CLIENT_NO_BIND_LOCAL_ADDR_KEY = 
"ipc.client.nobind.local.addr";

+  public static final boolean IPC_CLIENT_NO_BIND_LOCAL_ADDR_DEFAULT = false;

+

   public static final String IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SASL_KEY =

     "ipc.client.connect.max.retries.on.sasl";

   public static final int    IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SASL_DEFAULT = 
5;

diff --git 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java

index a6f4eb6..7bfddb7 100644

--- 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java

+++ 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java

@@ -129,7 +129,9 @@ public static void setCallIdAndRetryCount(int cid, int rc) {

 

   private final int connectionTimeout;

 

+

   private final boolean fallbackAllowed;

+  private final boolean noBindLocalAddr;

   private final byte[] clientId;

   

   final static int CONNECTION_CONTEXT_CALL_ID = -3;

@@ -642,7 +644,11 @@ private synchronized void setupConnection() throws 
IOException {

               InetAddress localAddr = NetUtils.getLocalInetAddress(host);

               if (localAddr != null) {

                 this.socket.setReuseAddress(true);

-                this.socket.bind(new InetSocketAddress(localAddr, 0));

+                if (noBindLocalAddr) {

+                  this.socket.bind(null);

+ } else {

+                  this.socket.bind(new InetSocketAddress(localAddr, 0));

+                }

               }

             }

           }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to