Re: 3.1.0 MR work won't distribute after dual-homing NameNode

Jeff Hubbs Wed, 13 Jun 2018 10:59:31 -0700

Gour -

Thank you; I'll certainly look into that. On Monday I performed anexperiment where I reduced the cluster down to a two-node, putting allthe daemons that were unique to msba02 onto msba02b and reconstructingHDFS as appropriate. This way, no active machine was dual-homed; theyran as they had before I changed the network except for having staticIPs and name resolution via host table. When I did this and ran thewordcount mapreduce job, I observed the same behavior: everything ran onjust one core of msba02b until the output file (with all the found wordsand their number of instances - it's a 770-MiB file) back out to HDFS.

I'm about to start part of the way over with a fresh binary distributionof 3.1.0 and see what happens. I thought I would also look into thesystems' name resolution priority and make /etc/hosts come first.



On 6/13/18 11:02 AM, Gour Saha wrote:

Looks like the YARN/MR multihoming doc patch never got committed andhence not available in the site documentation. You can look into thedoc patch in https://issues.apache.org/jira/browse/YARN-2384 (may beuse an online markdown tool to view it better) and see if you followedthe configuration mentioned there. Another comprehensive multihomingdocument which might help you is here<https://hortonworks.com/blog/multihoming-on-hadoop-yarn-clusters/>.
-Gour

*From: *Jeff Hubbs <[email protected]>
*Date: *Tuesday, June 5, 2018 at 2:57 PM
*To: *"[email protected]" <[email protected]>
*Subject: *3.1.0 MR work won't distribute after dual-homing NameNode

Hi -
I have a three node Hadoop 3.1.0 cluster on which the daemons aredistributed like so:
Daemons on msba02a...
20112 NameNode
20240 DataNode
24101 JobHistoryServer
20918 WebAppProxyServer
20743 NodeManager
20476 SecondaryNameNode

Daemons on msba02b...
22547 DataNode
22734 ResourceManager
23007 NodeManager

Daemons on msba02c...
10005 NodeManager
9818 DataNode
All three nodes run Gentoo Linux and have either one or two volumesdevoted to HDFS; HDFS reports a size of 5.7TiB.
Previously, HDFS and MapReduce (testing with the archetypical"wordcount" job on a 5.8GiB XML file) worked fine in an environmentwhere all three machines are on the same office LAN and get their IPaddresses from DHCP; dynamic DNS creates network host names based onthe machines' host names as reported by the machines' DHCP clients.FQDNs were used for all intra- and inter-machine references in theHadoop configuration files.
Since then, I've changed things so that msba02a now has a second NICthat connects to an independent LAN along with the other two machinesusing their built-in NICs like before; msba02b and msba02c reach theInternet by going through NAT on msba02a. /etc/hosts on all threemachines has been populated with the static IPs I gave them like so:
    127.0.0.1 localhost
    1.0.0.1 msba02a
    1.0.0.10 msba02b
    1.0.0.20 msba02c
So now if I shell into msba02a and run the wordcount job with the testXML file sitting in HDFS with replication set to 3, the job *does* runand gives me the expected output file...but the workload doesn'tdistribute to all cores on all nodes like before; it all executes onmsba02a. In fact, it doesn't even run on all cores on msba02a; itseems to light up just one core at any given moment. The job used torun on the cluster in 1m48s; now it takes 5m56 (a ratio I can'tunderstand; these are all four-core, eight-thread machines so I'dexpect a ratio of close to 24:1, not 3:1). The only time the other twonodes light up at all is near the end of the job when the output file(770MiB) is written out to HDFS.
I've gone throughhttps://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.htmlandset the values shown there to 1.0.0.1 in hdfs-site.xml on msba02a inhopes of getting the daemons to bind to the cluster-facing NIC insteadof the outward-facing NIC, but it seems to me like HDFS is workingexactly like it's supposed to. Note that the ResourceManager daemonruns on msba02b and therefore doesn't need to be bound to a particularNIC; it still uses that machine's only NIC like before except now itsIP address is static and is resolved via its local /etc/hosts.
The only errors showing up in the daemon logs of any nodes seem to bee.g."org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:ExpiredTokenRemover received java.lang.InterruptedException: sleepinterrupted" in hadoop-yarn-resourcemanager-msba02b.log andhadoop-mapred-historyserver-msba02a.log.
As for the hadoop run output, previously when everything was workingthings would get to point where it would print out a series of lines like
    map 0% reduce 0%
and that line would repeat with "map" percentage climbing first andthen the "reduce" percentage would climb until both numbers reached100% and the job would wrap up soon afterward. Now, it interspersesthose lines with other output and it skips around, like this:
    *2018-06-05 17:45:34,338 INFO mapreduce.Job:  map 100% reduce 0%*
    2018-06-05 17:45:36,295 INFO mapred.MapTask: Finished spill 0
    2018-06-05 17:45:36,295 INFO mapred.MapTask: (RESET) equator
    61480136 kv 15370028(61480112) kvi 13480948(53923792)
    2018-06-05 17:45:36,882 INFO mapred.MapTask: Spilling map output
    2018-06-05 17:45:36,882 INFO mapred.MapTask: bufstart = 61480136;
    bufend = 10372007; bufvoid = 104857566
    2018-06-05 17:45:36,882 INFO mapred.MapTask: kvstart =
    15370028(61480112); kvend = 7835876(31343504); length =
    7534153/6553600
    2018-06-05 17:45:36,882 INFO mapred.MapTask: (EQUATOR) 17997991
    kvi 4499492(17997968)
    2018-06-05 17:45:38,774 INFO mapred.MapTask: Finished spill 1
    2018-06-05 17:45:38,774 INFO mapred.MapTask: (RESET) equator
    17997991 kv 4499492(17997968) kvi 2642780(10571120)
    2018-06-05 17:45:38,910 INFO mapred.LocalJobRunner:
    2018-06-05 17:45:38,910 INFO mapred.MapTask: Starting flush of map
    output
    2018-06-05 17:45:38,910 INFO mapred.MapTask: Spilling map output
    2018-06-05 17:45:38,911 INFO mapred.MapTask: bufstart = 17997991;
    bufend = 40956853; bufvoid = 104857600
    2018-06-05 17:45:38,911 INFO mapred.MapTask: kvstart =
    4499492(17997968); kvend = 1327036(5308144); length = 3172457/6553600
    *2018-06-05 17:45:39,340 INFO mapreduce.Job:  map 4% reduce 0%*
    2018-06-05 17:45:39,684 INFO mapred.MapTask: Finished spill 2
    2018-06-05 17:45:39,788 INFO mapred.Merger: Merging 3 sorted segments
    2018-06-05 17:45:39,788 INFO mapred.Merger: Down to the last
    merge-pass, with 3 segments left of total size: 34645401 bytes
    2018-06-05 17:45:40,251 INFO mapred.Task:
    Task:attempt_local1155504279_0001_m_000002_0 is done. And is in
    the process of committing
    2018-06-05 17:45:40,253 INFO mapred.LocalJobRunner: map > sort
    2018-06-05 17:45:40,253 INFO mapred.Task: Task
    'attempt_local1155504279_0001_m_000002_0' done.
    2018-06-05 17:45:40,253 INFO mapred.Task: Final Counters for
    attempt_local1155504279_0001_m_000002_0: Counters: 23
        File System Counters
            FILE: Number of bytes read=106419805
            FILE: Number of bytes written=202253153
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=410006948
            HDFS: Number of bytes written=0
            HDFS: Number of read operations=9
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=1
        Map-Reduce Framework
            Map input records=2653033
            Map output records=4553651
            Map output bytes=130562451
            Map output materialized bytes=31060160
            Input split bytes=95
            Combine input records=5425504
            Combine output records=1618222
            Spilled Records=1618222
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=114
            Total committed heap usage (bytes)=1301807104
        File Input Format Counters
            Bytes Read=134348800
    2018-06-05 17:45:40,253 INFO mapred.LocalJobRunner: Finishing
    task: attempt_local1155504279_0001_m_000002_0
    2018-06-05 17:45:40,253 INFO mapred.LocalJobRunner: Starting task:
    attempt_local1155504279_0001_m_000003_0
    2018-06-05 17:45:40,254 INFO output.FileOutputCommitter: File
    Output Committer Algorithm version is 2
    2018-06-05 17:45:40,254 INFO output.FileOutputCommitter:
    FileOutputCommitter skip cleanup _temporary folders under output
    directory:false, ignore cleanup failures: false
    2018-06-05 17:45:40,254 INFO mapred.Task:  Using
    ResourceCalculatorProcessTree : [ ]
    2018-06-05 17:45:40,255 INFO mapred.MapTask: Processing split:
    hdfs://msba02a:9000/allcat.xml:268435456+134217728
    2018-06-05 17:45:40,265 INFO mapred.MapTask: (EQUATOR) 0 kvi
    26214396(104857584)
    2018-06-05 17:45:40,266 INFO mapred.MapTask:
    mapreduce.task.io.sort.mb: 100
    2018-06-05 17:45:40,266 INFO mapred.MapTask: soft limit at 83886080
    2018-06-05 17:45:40,266 INFO mapred.MapTask: bufstart = 0; bufvoid
    = 104857600
    2018-06-05 17:45:40,266 INFO mapred.MapTask: kvstart = 26214396;
    length = 6553600
    2018-06-05 17:45:40,266 INFO mapred.MapTask: Map output collector
    class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    *2018-06-05 17:45:40,341 INFO mapreduce.Job:  map 100% reduce 0%*
    2018-06-05 17:45:41,079 INFO mapred.MapTask: Spilling map output
    2018-06-05 17:45:41,079 INFO mapred.MapTask: bufstart = 0; bufend
    = 53799451; bufvoid = 104857600
    2018-06-05 17:45:41,079 INFO mapred.MapTask: kvstart =
    26214396(104857584); kvend = 18692744(74770976); length =
    7521653/6553600
    2018-06-05 17:45:41,079 INFO mapred.MapTask: (EQUATOR) 61425451
    kvi 15356356(61425424)
    2018-06-05 17:45:43,110 INFO mapred.MapTask: Finished spill 0
    2018-06-05 17:45:43,110 INFO mapred.MapTask: (RESET) equator
    61425451 kv 15356356(61425424) kvi 13514352(54057408)
    2018-06-05 17:45:43,687 INFO mapred.MapTask: Spilling map output
    2018-06-05 17:45:43,687 INFO mapred.MapTask: bufstart = 61425451;
    bufend = 10294846; bufvoid = 104857586
    2018-06-05 17:45:43,687 INFO mapred.MapTask: kvstart =
    15356356(61425424); kvend = 7816592(31266368); length =
    7539765/6553600
    2018-06-05 17:45:43,687 INFO mapred.MapTask: (EQUATOR) 17920846
    kvi 4480204(17920816)
    2018-06-05 17:45:46,275 INFO mapred.MapTask: Finished spill 1
    2018-06-05 17:45:46,275 INFO mapred.MapTask: (RESET) equator
    17920846 kv 4480204(17920816) kvi 2573716(10294864)
    2018-06-05 17:45:46,423 INFO mapred.LocalJobRunner:
    2018-06-05 17:45:46,423 INFO mapred.MapTask: Starting flush of map
    output
    2018-06-05 17:45:46,423 INFO mapred.MapTask: Spilling map output
    2018-06-05 17:45:46,423 INFO mapred.MapTask: bufstart = 17920846;
    bufend = 41420321; bufvoid = 104857600
    2018-06-05 17:45:46,423 INFO mapred.MapTask: kvstart =
    4480204(17920816); kvend = 1126824(4507296); length = 3353381/6553600
Any hints as to why work isn't distributing? It seems to me like thiskind of network configuration for Hadoop clusters would be more thenorm than one where all nodes are on a network with everything else inan environment (in our situation one driver for having cluster trafficisolated is because the data files used may contain NDA-bound datathat shouldn't travel the office LAN unencrypted).
Thanks!

Re: 3.1.0 MR work won't distribute after dual-homing NameNode

Reply via email to