[ 
https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680525#comment-16680525
 ] 

Keqiu Hu commented on YARN-8983:
--------------------------------

[~eyang] I took some time to go through the docker overlay network, it seems to 
require swarm and users to set it up. For my current testing, I'm using host 
network option thus piggybacking on */etc/hosts* or simply is *NM_HOST* 
suffices to do simply DNS lookup.

I haven't tested it myself for communication between different docker 
containers inside a Hadoop cluster, but I think overlay network is not a 
supported out of box by YARN like this:

                               YARN RM

       /                                                      \

 Node1                                                 Node 2

    |                                                          |

*docker1:100        <-  RPC calls  ->     docker2:200*

    *\                                                        /*

                       *overlay network* 

 

And IIUC, we don't have network isolation and always use host as the network 
driver to communicate like this:

                               YARN RM

       /                                                          \ 

 Node1:100        <-  RPC calls  ->         Node 2:200

    |                                                                |

*docker1:100                                      docker2:200*

 

> YARN container with docker: hostname entry not in /etc/hosts
> ------------------------------------------------------------
>
>                 Key: YARN-8983
>                 URL: https://issues.apache.org/jira/browse/YARN-8983
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.9.1
>            Reporter: Keqiu Hu
>            Priority: Critical
>              Labels: Docker
>
> I'm experimenting to use Hadoop 2.9.1 to launch applications with docker 
> containers. Inside the container task, we try to get the hostname of the 
> container using
> {code:java}
> InetAddress.getLocalHost().getHostName(){code}
> This works fine with LXC, however it throws the following exception when I 
> enable docker container using: 
> {code:java}
> YARN_CONTAINER_RUNTIME_TYPE=docker 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
> {code}
> The exception:
>  
> {noformat}
> java.net.UnknownHostException: ctr-1541488751855-0023-01-000003: 
> ctr-1541488751855-0023-01-000003: Temporary failure in name resolution at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1506)
>  at 
> com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204)
>  
> at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: 
> java.net.UnknownHostException: ctr-1541488751855-0023-01-000003: Temporary 
> failure in name resolution at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) 
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
> {noformat}
>  
> Did some research online, it seems to be related to missing entry in 
> /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing 
> the entry : 
> {noformat}
> pi@pi-aw:~/docker/$ docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a 
> second container_1541488751855_0028_01_000001
> 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours 
> blissful_turing
> pi@pi-aw:~/docker/$ de 71e3e9df8bc6
> groups: cannot find name for group ID 1000
> groups: cannot find name for group ID 116
> groups: cannot find name for group ID 126
> To run a command as administrator (user "root"), use "sudo <command>".
> See "man sudo_root" for details.
> pi@ctr-1541488751855-0028-01-000001:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_000001$
>  cat /etc/hosts
> 127.0.0.1 localhost
> 192.168.0.14 pi-aw
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> pi@ctr-1541488751855-0028-01-000001:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_000001$
> {noformat}
> If I launch the image without YARN, I saw the entry in /etc/hosts:
> {noformat}
> pi@61f173f95631:~$ cat /etc/hosts
> 127.0.0.1 localhost
> ::1 localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 172.17.0.3 61f173f95631 {noformat}
> Here is my container-executor.cfg
> {code:java}
>  1 min.user.id=100
>  2 yarn.nodemanager.linux-container-executor.group=hadoop
>  3 [docker]
>  4 module.enabled=true
>  5 docker.binary=/usr/bin/docker
>  6 
> docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE
>  7 docker.allowed.networks=bridge,host,none
>  8 
> docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code}
>  Since I'm using an older version of Hadoop 2.9.1, let me know if this is 
> something already fixed in later version :) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to