Jeff, DFSClient don't use copied Configuration from RM. did you add hostname or IP addr in the conf/slaves? if hostname, Can you check /etc/hosts? does there have confilicts? and y
On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <stuck...@umd.edu> wrote: > Thanks for the response. I have the preferIPv4Stack option in > hadoop-env.sh; however; this was not preventing the mapreduce container > from enumerating the IPv6 address of the interface. > > > > Jeff > > > > *From:* Chris Mawata [mailto:chris.maw...@gmail.com] > *Sent:* Sunday, December 15, 2013 3:58 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Site-specific dfs.client.local.interfaces setting not > respected for Yarn MR container > > > > You might have better luck with an alternative approach to avoid having > IPV6 which is to add to your hadoop-env.sh > > HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true > > > > Chris > > > > > > On 12/14/2013 11:38 PM, Jeff Stuckman wrote: > > Hello, > > > > I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming > jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM > which is on a different host than the RM, and I believe that this is > happening because the NM host's dfs.client.local.interfaces property is not > having any effect. > > > > I have two hosts set up as follows: > > Host A (1.2.3.4): > > NameNode > > DataNode > > ResourceManager > > Job History Server > > > > Host B (5.6.7.8): > > DataNode > > NodeManager > > > > On each host, hdfs-site.xml was edited to change > dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 > address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This > is to prevent the HDFS client from randomly binding to the IPv6 side of the > interface (it randomly swaps between the IP4 and IP6 addresses, due to the > random bind IP selection in the DFS client) which was causing other > problems. > > > > However, I am observing that the Yarn container on the NM appears to > inherit the property from the copy of hdfs-site.xml on the RM, rather than > reading it from the local configuration file. In other words, setting the > dfs.client.local.interfaces property in Host A's configuration file causes > the Yarn containers on Host B to use same value of the property. This > causes the map task to fail, as the container cannot establish a TCP > connection to the HDFS. However, on Host B, other commands that access the > HDFS (such as "hadoop fs") do work, as they respect the local value of the > property. > > > > To illustrate with an example, I start a streaming job from the command > line on Host A: > > > > hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar > -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper > /home/hadoop/toRecords.pl -reducer /bin/cat > > > > The NodeManager on Host B notes that there was an error starting the > container: > > > > 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception > from container-launch with container ID: > container_1387067177654_0002_01_000001 and exit code: 1 > > org.apache.hadoop.util.Shell$ExitCodeException: > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > > at org.apache.hadoop.util.Shell.run(Shell.java:379) > > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) > > at java.util.concurrent.FutureTask.run(Unknown Source) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > > at java.lang.Thread.run(Unknown Source) > > > > On Host B, I open > userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog > and find the following messages (note the DEBUG-level messages which I > manually enabled for the DFS client): > > > > 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0] > > <cut> > > 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > newInfo = LocatedBlocks{ > > fileLength=537 > > underConstruction=false > > > blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; > getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, > 1.2.3.4:50010]}] > > > lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; > getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, > 1.2.3.4:50010]} > > isLastBlockComplete=true} > > 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Connecting to datanode 5.6.7.8:50010 > > 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Using local interface /1.2.3.4:0 > > 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: > Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and > continue. java.net.BindException: Cannot assign requested address > > > > Note the failure to bind to 1.2.3.4, as the IP for Node B's local > interface is actually 5.6.7.8. > > > > Note that when running other HDFS commands on Host B, Host B's setting for > dfs.client.local.interfaces is respected. On host B: > > > > hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/ > > 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] > with addresses [/5.6.7.8:0] > > Found 3 items > > drwxr-xr-x - hadoop supergroup 0 2013-12-14 00:40 > hdfs://hosta/linesin > > drwxr-xr-x - hadoop supergroup 0 2013-12-14 02:01 > hdfs://hosta/system > > drwx------ - hadoop supergroup 0 2013-12-14 10:31 > hdfs://hosta/tmp > > > > If I change dfs.client.local.interfaces on Host A to eth0 (without > touching the setting on Host B), the syslog mentioned above instead shows > the following: > > > > 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: > Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/ > 5.6.7.8:0] > > > > The job then successfully completes sometimes, but both Host A and Host B > will then randomly alternate between the IP4 and IP6 side of their eth0 > interfaces, which causes other issues. In other words, changing the > dfs.client.local.interfaces setting on Host A to a named adapter caused the > Yarn container on Host B to bind to an identically named adapter. > > Any ideas on how I can reconfigure the cluster so every container will try > to bind to its own interface? I successfully worked around this issue by > doing a custom build of HDFS which hardcodes my IP address in the > DFSClient, but I am looking for a better long-term solution. > > > > Thanks, > > Jeff > > > > >