Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Azuryy Yu Sun, 15 Dec 2013 20:57:51 -0800

Jeff,
DFSClient don't use copied Configuration from RM.

did you add hostname or IP addr in the conf/slaves? if hostname, Can you
check /etc/hosts? does there have confilicts? and y




On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <[email protected]> wrote:

>  Thanks for the response. I have the preferIPv4Stack option in
> hadoop-env.sh; however; this was not preventing the mapreduce container
> from enumerating the IPv6 address of the interface.
>
>
>
> Jeff
>
>
>
> *From:* Chris Mawata [mailto:[email protected]]
> *Sent:* Sunday, December 15, 2013 3:58 PM
> *To:* [email protected]
> *Subject:* Re: Site-specific dfs.client.local.interfaces setting not
> respected for Yarn MR container
>
>
>
> You might have better luck with an alternative approach to avoid having
> IPV6 which is to add to your hadoop-env.sh
>
> HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
>
>
>
> Chris
>
>
>
>
>
> On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
>
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming
> jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM
> which is on a different host than the RM, and I believe that this is
> happening because the NM host's dfs.client.local.interfaces property is not
> having any effect.
>
>
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
>
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
>
>
> On each host, hdfs-site.xml was edited to change
> dfs.client.local.interfaces from an interface name ("eth0") to the IPv4
> address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This
> is to prevent the HDFS client from randomly binding to the IPv6 side of the
> interface (it randomly swaps between the IP4 and IP6 addresses, due to the
> random bind IP selection in the DFS client) which was causing other
> problems.
>
>
>
> However, I am observing that the Yarn container on the NM appears to
> inherit the property from the copy of hdfs-site.xml on the RM, rather than
> reading it from the local configuration file. In other words, setting the
> dfs.client.local.interfaces property in Host A's configuration file causes
> the Yarn containers on Host B to use same value of the property. This
> causes the map task to fail, as the container cannot establish a TCP
> connection to the HDFS. However, on Host B, other commands that access the
> HDFS (such as "hadoop fs") do work, as they respect the local value of the
> property.
>
>
>
> To illustrate with an example, I start a streaming job from the command
> line on Host A:
>
>
>
> hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
> -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
>
>
> The NodeManager on Host B notes that there was an error starting the
> container:
>
>
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> On Host B, I open
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
> and find the following messages (note the DEBUG-level messages which I
> manually enabled for the DFS client):
>
>
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and
> continue. java.net.BindException: Cannot assign requested address
>
>
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local
> interface is actually 5.6.7.8.
>
>
>
> Note that when running other HDFS commands on Host B, Host B's setting for
> dfs.client.local.interfaces is respected. On host B:
>
>
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]
> with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31
> hdfs://hosta/tmp
>
>
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without
> touching the setting on Host B), the syslog mentioned above instead shows
> the following:
>
>
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/
> 5.6.7.8:0]
>
>
>
> The job then successfully completes sometimes, but both Host A and Host B
> will then randomly alternate between the IP4 and IP6 side of their eth0
> interfaces, which causes other issues. In other words, changing the
> dfs.client.local.interfaces setting on Host A to a named adapter caused the
> Yarn container on Host B to bind to an identically named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will try
> to bind to its own interface? I successfully worked around this issue by
> doing a custom build of HDFS which hardcodes my IP address in the
> DFSClient, but I am looking for a better long-term solution.
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Reply via email to