Hello, I build Spark on HDFS/YARN cluster on Docker Containers.
* Spark on YARN version - Spark 1.6.0 - Hadoop 2.6.0 (CDH 5.6.0) - Oracle Java 1.8.0_74 There are one HDFS/YARN master and one HDFS/YARN worker on each containers. spark-yarn-master container has below hostname and IP addr. hostname: spark-yarn-master-1-sxegt (pod name) IP addr.: 172.17.0.11 hostname: spark-yarn-master (alias DNS name) IP addr.: 172.30.242.57 (alias IP addr.) bash-4.2$ cat /etc/hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 172.17.0.11 spark-yarn-master-1-sxegt bash-4.2$ bash-4.2$ ip -4 addr show dev eth0 50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link-netnsid 0 inet 172.17.0.11/16 scope global eth0 valid_lft forever preferred_lft forever bash-4.2$ bash-4.2$ hostname -f spark-yarn-master-1-sxegt bash-4.2$ bash-4.2$ curl -v spark-yarn-master:8020 * About to connect() to spark-yarn-master port 8020 (#0) * Trying 172.30.242.57... * Connected to spark-yarn-master (172.30.242.57) port 8020 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: spark-yarn-master:8020 > Accept: */* > < HTTP/1.1 404 Not Found < Content-type: text/plain * no chunk, no close, no size. Assume close to signal end < It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon. * Closing connection 0 bash-4.2$ spark-yarn-worker container has below hostname and IP addr. hostname: spark-yarn-worker-1-pshqi (pod name) IP addr.: 172.17.0.12 hostname: spark-yarn-worker (alias DNS name) IP addr.: 172.30.1.53 (alias IP addr.) bash-4.2$ cat /etc/hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 172.17.0.12 spark-yarn-worker-1-pshqi bash-4.2$ bash-4.2$ ip -4 addr show dev eth0 52: eth0@if53: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link-netnsid 0 inet 172.17.0.12/16 scope global eth0 valid_lft forever preferred_lft forever bash-4.2$ bash-4.2$ hostname -f spark-yarn-worker-1-pshqi bash-4.2$ bash-4.2$ curl -v spark-yarn-worker:8040 * About to connect() to spark-yarn-worker port 8040 (#0) * Trying 172.30.1.53... * Connected to spark-yarn-worker (172.30.1.53) port 8040 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: spark-yarn-worker:8040 > Accept: */* > < HTTP/1.1 404 Not Found < Content-type: text/plain * no chunk, no close, no size. Assume close to signal end < It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon. * Closing connection 0 bash-4.2$ Spark HDFS/YARN master/worker nodes can connect each other by alias DNS name. On master, To worker (alias DNS name): bash-4.2$ hostname -f ; curl spark-yarn-worker:8040 spark-yarn-master-1-sxegt It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon. bash-4.2$ On worker, To master (alias DNS name): bash-4.2$ hostname -f ; curl spark-yarn-master:8020 spark-yarn-worker-1-pshqi It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon. bash-4.2$ They cannot connect each other by hostname. On master, To worker (hostname): bash-4.2$ hostname -f ; curl spark-yarn-master-1-sxegt:8020 spark-yarn-worker-1-pshqi curl: (6) Could not resolve host: spark-yarn-master-1-sxegt; Name or service not known bash-4.2$ On worker, To master (hostname): bash-4.2$ hostname -f ; curl spark-yarn-worker-1-pshqi:8040 spark-yarn-master-1-sxegt curl: (6) Could not resolve host: spark-yarn-worker-1-pshqi; Name or service not known bash-4.2$ So, I want HDFS/YARN to use alias DNS name instead of hostname. But YARN nodemanager always uses hostname even configuredyarn.nodemanager.hostname to alias DNS name. HDFS/YARN worker log: 16/03/07 10:04:38 INFO datanode.DataNode: Configured hostname is spark-yarn-worker : 16/03/07 10:04:42 INFO security.NMContainerTokenSecretManager: Updating node address : spark-yarn-worker-1-pshqi:39352 : 16/03/07 10:04:42 INFO containermanager.ContainerManagerImpl: ContainerManager started at spark-yarn-worker-1-pshqi/172.17.0.12:39352 : 16/03/07 10:04:44 INFO nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as spark-yarn-worker-1-pshqi:39352 with total resource of <memory:8192, vCores:8> : 16/03/07 10:04:44 INFO common.Storage: Lock on /var/lib/hadoop-hdfs/cache/hdfs/dfs/data/in_use.lock acquired by nodename 16@spark-yarn-worker-1-pshqi : HDFS/YARN master log: 16/03/07 10:04:44 INFO resourcemanager.ResourceTrackerService: NodeManager from node spark-yarn-worker-1-pshqi(cmPort: 39352 httpPort: 8042) registered with capability: <memory:8192, vCores:8>, assigned nodeId spark-yarn-worker-1-pshqi:39352 : 16/03/07 10:04:45 INFO rmnode.RMNodeImpl: spark-yarn-worker-1-pshqi:39352 Node Transitioned from NEW to RUNNING 16/03/07 10:04:45 INFO fair.FairScheduler: Added node spark-yarn-worker-1-pshqi:39352 cluster capacity: <memory:8192, vCores:8> When submit application to spark master, spark master connects to worker, but it uses "unresolved" worker's hostname not alias DNS name, it throws java.net.UnknownHostException. 16/03/07 11:58:51 INFO scheduler.SchedulerNode: Assigned container container_1457312681433_0001_01_000001 of capacity <memory:1024, vCores:1> on host spark-yarn-worker-1-pshqi:39352, which has 1 containers, <memory:1024, vCores:1> used and <memory:7168, vCores:7> available after allocation 16/03/07 11:58:51 ERROR scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1457312681433_0001_01_000001 java.lang.IllegalArgumentException: java.net.UnknownHostException: spark-yarn-worker-1-pshqi Can I configure YARN nodemanager to use arbitrary hostname? Sorry to say, there is no chance to modify container's hostname, /etc/hosts and DNS. Regards, dai -- HIGUCHI Daisuke <d-higu...@creationline.com> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org