I dont know y it is running on localhost. I have commented it. ================================================================== *slave1:* Hostname: pc321
hduser@pc321:/etc$ vi hosts #127.0.0.1 localhost loghost localhost.myslice.ch-geni-net.emulab.net 155.98.39.28 pc228 155.98.39.121 pc321 155.98.39.27 dn3.myslice.ch-geni-net.emulab.net ======================================================================== slave2: hostname: dn3.myslice.ch-geni-net.emulab.net hduser@dn3:/etc$ vi hosts #127.0.0.1 localhost loghost localhost.myslice.ch-geni-net.emulab.net 155.98.39.28 pc228 155.98.39.121 pc321 155.98.39.27 dn3.myslice.ch-geni-net.emulab.net ======================================================================== Master: Hostame: pc228 hduser@pc228:/etc$ vi hosts #127.0.0.1 localhost loghost localhost.myslice.ch-geni-net.emulab.net 155.98.39.28 pc228 155.98.39.121 pc321 #155.98.39.19 slave2 155.98.39.27 dn3.myslice.ch-geni-net.emulab.net ============================================================================ I have replaced localhost with pc228 in coresite.xml and mapreduce-site.xml and replication factor as 3. I can able to ssh pc321 and dn3.myslice.ch-geni-net.emulab.net from master. hduser@pc228:/usr/local/hadoop/conf$ more slaves pc228 pc321 dn3.myslice.ch-geni-net.emulab.net hduser@pc228:/usr/local/hadoop/conf$ more masters pc228 hduser@pc228:/usr/local/hadoop/conf$ Am i am doing anything wrong here ? On Wed, Jan 1, 2014 at 4:54 PM, Hardik Pandya <[email protected]>wrote: > do you have your hosnames properly configured in etc/hosts? have you tried > 192.168.?.? instead of localhost 127.0.0.1 > > > > On Wed, Jan 1, 2014 at 11:33 AM, navaz <[email protected]> wrote: > >> Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ? >> >> Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig >> is succeded for these nodes). >> >> Does it looks for hostname only for reduce ? >> >> >> 14/01/01 09:09:38 INFO mapred.JobClient: Running job: >> job_201401010908_0001 >> 14/01/01 09:09:39 INFO mapred.JobClient: map 0% reduce 0% >> 14/01/01 09:10:00 INFO mapred.JobClient: map 33% reduce 0% >> 14/01/01 09:10:01 INFO mapred.JobClient: map 66% reduce 0% >> 14/01/01 09:10:05 INFO mapred.JobClient: map 100% reduce 0% >> 14/01/01 09:10:14 INFO mapred.JobClient: map 100% reduce 22% >> 14/01/01 09:17:32 INFO mapred.JobClient: map 100% reduce 0% >> 14/01/01 09:17:35 INFO mapred.JobClient: Task Id : >> attempt_201401010908_0001_r_000000_0, Status : FAILED >> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> 14/01/01 09:17:46 INFO mapred.JobClient: map 100% reduce 11% >> 14/01/01 09:17:50 INFO mapred.JobClient: map 100% reduce 22% >> 14/01/01 09:25:06 INFO mapred.JobClient: map 100% reduce 0% >> 14/01/01 09:25:10 INFO mapred.JobClient: Task Id : >> attempt_201401010908_0001_r_000000_1, Status : FAILED >> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> 14/01/01 09:25:34 INFO mapred.JobClient: map 100% reduce 100% >> 14/01/01 09:25:42 INFO mapred.JobClient: Job complete: >> job_201401010908_0001 >> 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29 >> >> >> >> Job Tracker logs: >> 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201401010908_0001_m_000002_0' has completed task_20140 >> 1010908_0001_m_000002 successfully. >> 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201401010908_0001_m_000001_0' has completed task_20140 >> 1010908_0001_m_000001 successfully. >> 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error from attempt_201401010908_0001_r_000000_0: Shuffle Error: Exc >> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201401010908_0001_r_000000_0' >> 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_0' to ti >> p task_201401010908_0001_r_000000, for tracker 'tracker_slave3:localhost/ >> 127.0.0.1:44663' >> 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201401010908_0001_r_000000_0' >> 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (REDUCE) 'attempt_201401010908_0001_r_000000_1' to tip task >> _201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/ >> 127.0.0.1:51438' >> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error from attempt_201401010908_0001_r_000000_1: Shuffle Error: Exc >> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201401010908_0001_r_000000_1' >> 2014-01-01 09:25:05,494 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_1' to ti >> p task_201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/ >> 127.0.0.1:51438' >> 2014-01-01 09:25:10,087 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201401010908_0001_r_000000_1' >> 2014-01-01 09:25:10,109 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (REDUCE) 'attempt_201401010908_0001_r_000000_2' to tip task >> _201401010908_0001_r_000000, for tracker 'tracker_master:localhost/ >> 127.0.0.1:57156' >> 2014-01-01 09:25:33,340 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201401010908_0001_r_000000_2' has completed task_20140 >> 1010908_0001_r_000000 successfully. >> 2014-01-01 09:25:33,462 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (JOB_CLEANUP) 'attempt_201401010908_0001_m_000003_0' to tip >> task_201401010908_0001_m_000003, for tracker 'tracker_master:localhost/ >> 127.0.0.1:57156' >> 2014-01-01 09:25:42,304 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201401010908_0001_m_000003_0' has completed task_20140 >> 1010908_0001_m_000003 successfully. >> >> >> On Tue, Dec 31, 2013 at 4:56 PM, Hardik Pandya <[email protected]>wrote: >> >>> as expected, its failing during shuffle >>> >>> it seems like hdfs could not resolve the DNS name for slave nodes >>> >>> have your configured your slaves host names correctly? >>> >>> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: >>> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc >>> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >>> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: >>> Removing task 'attempt_201312311107_0003_r_000000_0' >>> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: >>> Adding task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti >>> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ >>> 127.0.0.1:52677' >>> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: >>> Removing task 'attempt_201312311107_0003_r_000000_0' >>> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: >>> Adding task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task >>> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ >>> 127.0.0.1:57492' >>> >>> >>> >>> >>> On Tue, Dec 31, 2013 at 4:42 PM, navaz <[email protected]> wrote: >>> >>>> Hi >>>> >>>> My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves) >>>> >>>> <property> >>>> <name>dfs.replication</name> >>>> <value>4</value> >>>> >>>> start-dfs.sh and stop-mapred.sh doesnt solve the problem. >>>> >>>> Also tried to run the program after formatting the namenode(Master) >>>> which also fails. >>>> >>>> My jobtracker logs on the master ( name node) is give below. >>>> >>>> >>>> >>>> 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress: >>>> job_201312311107_0004: nMaps=3 nReduces=1 max=-1 >>>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job >>>> job_201312311107_0004 added successfully for user 'hduser' to queue >>>> 'default' >>>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger: >>>> USER=hduser IP=155.98.39.28 OPERATION=SUBMIT_JOB TARGET=job_201312 >>>> 311107_0004 RESULT=SUCCESS >>>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: >>>> Initializing job_201312311107_0004 >>>> 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Initializing job_201312311107_0004 >>>> 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress: >>>> jobToken generated and stored with users keys in /app/hadoop/tmp/map >>>> red/system/job_201312311107_0004/jobToken >>>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Input size for job job_201312311107_0004 = 3671523. Number of splits >>>> = 3 >>>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>>> master >>>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>>> slave2 >>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>>> slave1 >>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>>> slave3 >>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>>> master >>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>>> slave1 >>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>>> slave3 >>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>>> slave2 >>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>>> master >>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>>> slave1 >>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>>> slave2 >>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>>> slave3 >>>> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: >>>> job_201312311107_0004 LOCALITY_WAIT_FACTOR=1.0 >>>> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Job job_201312311107_0004 initialized successfully with 3 map tasks >>>> and 1 reduce tasks. >>>> 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (JOB_SETUP) 'attempt_201312311107_0004_m_000004_0' to tip t >>>> ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/ >>>> 127.0.0.1:57492' >>>> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Task 'attempt_201312311107_0004_m_000004_0' has completed task_20131 >>>> 2311107_0004_m_000004 successfully. >>>> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (MAP) 'attempt_201312311107_0004_m_000000_0' to tip task_20 >>>> 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/ >>>> 127.0.0.1:57492' >>>> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Choosing data-local task task_201312311107_0004_m_000000 >>>> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (MAP) 'attempt_201312311107_0004_m_000001_0' to tip task_20 >>>> 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/ >>>> 127.0.0.1:52677' >>>> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Choosing data-local task task_201312311107_0004_m_000001 >>>> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (MAP) 'attempt_201312311107_0004_m_000002_0' to tip task_20 >>>> 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/ >>>> 127.0.0.1:46845' >>>> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Choosing data-local task task_201312311107_0004_m_000002 >>>> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Task 'attempt_201312311107_0004_m_000002_0' has completed task_20131 >>>> 2311107_0004_m_000002 successfully. >>>> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (REDUCE) 'attempt_201312311107_0004_r_000000_0' to tip task >>>> _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/ >>>> 127.0.0.1:46845' >>>> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Task 'attempt_201312311107_0004_m_000000_0' has completed task_20131 >>>> 2311107_0004_m_000000 successfully. >>>> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Task 'attempt_201312311107_0004_m_000001_0' has completed task_20131 >>>> 2311107_0004_m_000001 successfully. >>>> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: >>>> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc >>>> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >>>> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: >>>> Removing task 'attempt_201312311107_0003_r_000000_0' >>>> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti >>>> p task_201312311107_0003_r_000000, for tracker >>>> 'tracker_slave2:localhost/127.0.0.1:52677' >>>> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: >>>> Removing task 'attempt_201312311107_0003_r_000000_0' >>>> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task >>>> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ >>>> 127.0.0.1:57492' >>>> hduser@pc228:/usr/local/hadoop/logs$ >>>> >>>> >>>> I am referring the below document to configure hadoop cluster. >>>> >>>> >>>> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ >>>> >>>> Did i miss something ? Pls guide. >>>> >>>> Thanks >>>> Navaz >>>> >>>> >>>> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya >>>> <[email protected]>wrote: >>>> >>>>> what does your job log says? is yout hdfs-site configured properly to >>>>> find 3 data nodes? this could very well getting stuck in shuffle phase >>>>> >>>>> last thing to try : does stop-all and start-all helps? even worse try >>>>> formatting namenode >>>>> >>>>> >>>>> On Tue, Dec 31, 2013 at 11:40 AM, navaz <[email protected]> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> >>>>>> I am running Hadoop cluster with 1 name node and 3 data nodes. >>>>>> >>>>>> My HDFS looks like this. >>>>>> >>>>>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls >>>>>> /user/hduser/getty/gutenberg >>>>>> Warning: $HADOOP_HOME is deprecated. >>>>>> >>>>>> Found 7 items >>>>>> -rw-r--r-- 4 hduser supergroup 343691 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg132.txt >>>>>> -rw-r--r-- 4 hduser supergroup 594933 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg1661.txt >>>>>> -rw-r--r-- 4 hduser supergroup 1945886 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg19699.txt >>>>>> -rw-r--r-- 4 hduser supergroup 674570 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg20417.txt >>>>>> -rw-r--r-- 4 hduser supergroup 1573150 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg4300.txt >>>>>> -rw-r--r-- 4 hduser supergroup 1423803 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg5000.txt >>>>>> -rw-r--r-- 4 hduser supergroup 393968 2013-12-30 19:12 >>>>>> /user/hduser/getty/gutenberg/pg972.txt >>>>>> hduser@nm:/usr/local/hadoop$ >>>>>> >>>>>> When i start mapreduce wordcount program it gives 100% mapping and >>>>>> reduce is hangs at 14%. >>>>>> >>>>>> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount >>>>>> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3 >>>>>> Warning: $HADOOP_HOME is deprecated. >>>>>> >>>>>> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for >>>>>> parsing the arguments. Applications should implement Tool for the same. >>>>>> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to >>>>>> process : 7 >>>>>> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the >>>>>> native-hadoop library >>>>>> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not >>>>>> loaded >>>>>> 13/12/31 09:31:08 INFO mapred.JobClient: Running job: >>>>>> job_201312310929_0001 >>>>>> 13/12/31 09:31:09 INFO mapred.JobClient: map 0% reduce 0% >>>>>> 13/12/31 09:31:29 INFO mapred.JobClient: map 14% reduce 0% >>>>>> 13/12/31 09:31:34 INFO mapred.JobClient: map 32% reduce 0% >>>>>> 13/12/31 09:31:35 INFO mapred.JobClient: map 75% reduce 0% >>>>>> 13/12/31 09:31:36 INFO mapred.JobClient: map 90% reduce 0% >>>>>> 13/12/31 09:31:37 INFO mapred.JobClient: map 99% reduce 0% >>>>>> 13/12/31 09:31:38 INFO mapred.JobClient: map 100% reduce 0% >>>>>> 13/12/31 09:31:43 INFO mapred.JobClient: map 100% reduce 14% >>>>>> >>>>>> <HANGS HEAR> >>>>>> >>>>>> Could you please help me in resolving this issue. >>>>>> >>>>>> >>>>>> Thanks & Regards >>>>>> *Abdul Navaz* >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Abdul Navaz* >>>> *Masters in Network Communications* >>>> *University of Houston* >>>> *Houston, TX - 77204-4020* >>>> *Ph - 281-685-0388 <281-685-0388>* >>>> *[email protected]* <[email protected]> >>>> >>>> >>> >> >> >> -- >> *Abdul Navaz* >> *Masters in Network Communications* >> *University of Houston* >> *Houston, TX - 77204-4020* >> *Ph - 281-685-0388 <281-685-0388>* >> *[email protected]* <[email protected]> >> >> > -- *Abdul Navaz* *Masters in Network Communications* *University of Houston* *Houston, TX - 77204-4020* *Ph - 281-685-0388* *[email protected]* <[email protected]>
