Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Chris Mawata Sun, 15 Dec 2013 12:59:12 -0800

You might have better luck with an alternative approach to avoid havingIPV6 which is to add to your hadoop-env.sh


HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true


Chris



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,
I have set up a two-node Hadoop cluster on Ubuntu 12.04 runningstreaming jobs with Hadoop 2.2.0. I am having problems with runningtasks on a NM which is on a different host than the RM, and I believethat this is happening because the NM host'sdfs.client.local.interfaces property is not having any effect.
I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server

Host B (5.6.7.8):

DataNode

NodeManager
On each host, hdfs-site.xml was edited to changedfs.client.local.interfaces from an interface name ("eth0") to theIPv4 address representing that host's interface ("1.2.3.4" or"5.6.7.8"). This is to prevent the HDFS client from randomly bindingto the IPv6 side of the interface (it randomly swaps between the IP4and IP6 addresses, due to the random bind IP selection in the DFSclient) which was causing other problems.
However, I am observing that the Yarn container on the NM appears toinherit the property from the copy of hdfs-site.xml on the RM, ratherthan reading it from the local configuration file. In other words,setting the dfs.client.local.interfaces property in Host A'sconfiguration file causes the Yarn containers on Host B to use samevalue of the property. This causes the map task to fail, as thecontainer cannot establish a TCP connection to the HDFS. However, onHost B, other commands that access the HDFS (such as "hadoop fs") dowork, as they respect the local value of the property.
To illustrate with an example, I start a streaming job from thecommand line on Host A:
hadoop jar$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -inputhdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper/home/hadoop/toRecords.pl -reducer /bin/cat
The NodeManager on Host B notes that there was an error starting thecontainer:
13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exceptionfrom container-launch with container ID:container_1387067177654_0002_01_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)
atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
        at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(UnknownSource)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(UnknownSource)
        at java.lang.Thread.run(Unknown Source)
On Host B, I openuserlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslogand find the following messages (note the DEBUG-level messages which Imanually enabled for the DFS client):
2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
<cut>
2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:newInfo = LocatedBlocks{
  fileLength=537

  underConstruction=false
blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,1.2.3.4:50010]}]
lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,1.2.3.4:50010]}
  isLastBlockComplete=true}
2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:Connecting to datanode 5.6.7.8:50010
2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:Using local interface /1.2.3.4:0
2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:Failed to connect to /5.6.7.8:50010 for block, add to deadNodes andcontinue. java.net.BindException: Cannot assign requested address
Note the failure to bind to 1.2.3.4, as the IP for Node B's localinterface is actually 5.6.7.8.
Note that when running other HDFS commands on Host B, Host B's settingfor dfs.client.local.interfaces is respected. On host B:
hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces[5.6.7.8] with addresses [/5.6.7.8:0]
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2013-12-14 00:40hdfs://hosta/linesin
drwxr-xr-x - hadoop supergroup 0 2013-12-14 02:01hdfs://hosta/system
drwx------ - hadoop supergroup 0 2013-12-14 10:31hdfs://hosta/tmp
If I change dfs.client.local.interfaces on Host A to eth0 (withouttouching the setting on Host B), the syslog mentioned above insteadshows the following:
2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:Using local interfaces [eth0] with addresses [/<some IP6address>:0,/5.6.7.8:0]
The job then successfully completes sometimes, but both Host A andHost B will then randomly alternate between the IP4 and IP6 side oftheir eth0 interfaces, which causes other issues. In other words,changing the dfs.client.local.interfaces setting on Host A to a namedadapter caused the Yarn container on Host B to bind to an identicallynamed adapter.
Any ideas on how I can reconfigure the cluster so every container willtry to bind to its own interface? I successfully worked around thisissue by doing a custom build of HDFS which hardcodes my IP address inthe DFSClient, but I am looking for a better long-term solution.
Thanks,

Jeff

Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Reply via email to