Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ? Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig is succeded for these nodes).
Does it looks for hostname only for reduce ? 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001 14/01/01 09:09:39 INFO mapred.JobClient: map 0% reduce 0% 14/01/01 09:10:00 INFO mapred.JobClient: map 33% reduce 0% 14/01/01 09:10:01 INFO mapred.JobClient: map 66% reduce 0% 14/01/01 09:10:05 INFO mapred.JobClient: map 100% reduce 0% 14/01/01 09:10:14 INFO mapred.JobClient: map 100% reduce 22% 14/01/01 09:17:32 INFO mapred.JobClient: map 100% reduce 0% 14/01/01 09:17:35 INFO mapred.JobClient: Task Id : attempt_201401010908_0001_r_000000_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 14/01/01 09:17:46 INFO mapred.JobClient: map 100% reduce 11% 14/01/01 09:17:50 INFO mapred.JobClient: map 100% reduce 22% 14/01/01 09:25:06 INFO mapred.JobClient: map 100% reduce 0% 14/01/01 09:25:10 INFO mapred.JobClient: Task Id : attempt_201401010908_0001_r_000000_1, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 14/01/01 09:25:34 INFO mapred.JobClient: map 100% reduce 100% 14/01/01 09:25:42 INFO mapred.JobClient: Job complete: job_201401010908_0001 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29 Job Tracker logs: 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000002_0' has completed task_20140 1010908_0001_m_000002 successfully. 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000001_0' has completed task_20140 1010908_0001_m_000001 successfully. 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201401010908_0001_r_000000_0: Shuffle Error: Exc eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_0' 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_0' to ti p task_201401010908_0001_r_000000, for tracker 'tracker_slave3:localhost/ 127.0.0.1:44663' 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_0' 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201401010908_0001_r_000000_1' to tip task _201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/ 127.0.0.1:51438' 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201401010908_0001_r_000000_1: Shuffle Error: Exc eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_1' 2014-01-01 09:25:05,494 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_1' to ti p task_201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/ 127.0.0.1:51438' 2014-01-01 09:25:10,087 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_1' 2014-01-01 09:25:10,109 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201401010908_0001_r_000000_2' to tip task _201401010908_0001_r_000000, for tracker 'tracker_master:localhost/ 127.0.0.1:57156' 2014-01-01 09:25:33,340 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_r_000000_2' has completed task_20140 1010908_0001_r_000000 successfully. 2014-01-01 09:25:33,462 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_CLEANUP) 'attempt_201401010908_0001_m_000003_0' to tip task_201401010908_0001_m_000003, for tracker 'tracker_master:localhost/ 127.0.0.1:57156' 2014-01-01 09:25:42,304 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000003_0' has completed task_20140 1010908_0001_m_000003 successfully. On Tue, Dec 31, 2013 at 4:56 PM, Hardik Pandya <[email protected]>wrote: > as expected, its failing during shuffle > > it seems like hdfs could not resolve the DNS name for slave nodes > > have your configured your slaves host names correctly? > > 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc > eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: > Removing task 'attempt_201312311107_0003_r_000000_0' > 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti > p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ > 127.0.0.1:52677' > 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: > Removing task 'attempt_201312311107_0003_r_000000_0' > 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task > _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ > 127.0.0.1:57492' > > > > > On Tue, Dec 31, 2013 at 4:42 PM, navaz <[email protected]> wrote: > >> Hi >> >> My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves) >> >> <property> >> <name>dfs.replication</name> >> <value>4</value> >> >> start-dfs.sh and stop-mapred.sh doesnt solve the problem. >> >> Also tried to run the program after formatting the namenode(Master) which >> also fails. >> >> My jobtracker logs on the master ( name node) is give below. >> >> >> >> 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress: >> job_201312311107_0004: nMaps=3 nReduces=1 max=-1 >> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job >> job_201312311107_0004 added successfully for user 'hduser' to queue >> 'default' >> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger: >> USER=hduser IP=155.98.39.28 OPERATION=SUBMIT_JOB TARGET=job_201312 >> 311107_0004 RESULT=SUCCESS >> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: >> Initializing job_201312311107_0004 >> 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress: >> Initializing job_201312311107_0004 >> 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress: >> jobToken generated and stored with users keys in /app/hadoop/tmp/map >> red/system/job_201312311107_0004/jobToken >> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >> Input size for job job_201312311107_0004 = 3671523. Number of splits >> = 3 >> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >> master >> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >> slave2 >> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >> slave1 >> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >> slave3 >> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >> master >> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >> slave1 >> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >> slave3 >> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >> slave2 >> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >> master >> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >> slave1 >> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >> slave2 >> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >> slave3 >> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: >> job_201312311107_0004 LOCALITY_WAIT_FACTOR=1.0 >> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: Job >> job_201312311107_0004 initialized successfully with 3 map tasks >> and 1 reduce tasks. >> 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (JOB_SETUP) 'attempt_201312311107_0004_m_000004_0' to tip t >> ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/ >> 127.0.0.1:57492' >> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201312311107_0004_m_000004_0' has completed task_20131 >> 2311107_0004_m_000004 successfully. >> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (MAP) 'attempt_201312311107_0004_m_000000_0' to tip task_20 >> 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/ >> 127.0.0.1:57492' >> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress: >> Choosing data-local task task_201312311107_0004_m_000000 >> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (MAP) 'attempt_201312311107_0004_m_000001_0' to tip task_20 >> 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/ >> 127.0.0.1:52677' >> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress: >> Choosing data-local task task_201312311107_0004_m_000001 >> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (MAP) 'attempt_201312311107_0004_m_000002_0' to tip task_20 >> 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/ >> 127.0.0.1:46845' >> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress: >> Choosing data-local task task_201312311107_0004_m_000002 >> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201312311107_0004_m_000002_0' has completed task_20131 >> 2311107_0004_m_000002 successfully. >> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (REDUCE) 'attempt_201312311107_0004_r_000000_0' to tip task >> _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/ >> 127.0.0.1:46845' >> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201312311107_0004_m_000000_0' has completed task_20131 >> 2311107_0004_m_000000 successfully. >> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: Task >> 'attempt_201312311107_0004_m_000001_0' has completed task_20131 >> 2311107_0004_m_000001 successfully. >> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc >> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201312311107_0003_r_000000_0' >> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti >> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ >> 127.0.0.1:52677' >> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201312311107_0003_r_000000_0' >> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task >> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ >> 127.0.0.1:57492' >> hduser@pc228:/usr/local/hadoop/logs$ >> >> >> I am referring the below document to configure hadoop cluster. >> >> >> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ >> >> Did i miss something ? Pls guide. >> >> Thanks >> Navaz >> >> >> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya <[email protected]>wrote: >> >>> what does your job log says? is yout hdfs-site configured properly to >>> find 3 data nodes? this could very well getting stuck in shuffle phase >>> >>> last thing to try : does stop-all and start-all helps? even worse try >>> formatting namenode >>> >>> >>> On Tue, Dec 31, 2013 at 11:40 AM, navaz <[email protected]> wrote: >>> >>>> Hi >>>> >>>> >>>> I am running Hadoop cluster with 1 name node and 3 data nodes. >>>> >>>> My HDFS looks like this. >>>> >>>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg >>>> Warning: $HADOOP_HOME is deprecated. >>>> >>>> Found 7 items >>>> -rw-r--r-- 4 hduser supergroup 343691 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg132.txt >>>> -rw-r--r-- 4 hduser supergroup 594933 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg1661.txt >>>> -rw-r--r-- 4 hduser supergroup 1945886 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg19699.txt >>>> -rw-r--r-- 4 hduser supergroup 674570 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg20417.txt >>>> -rw-r--r-- 4 hduser supergroup 1573150 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg4300.txt >>>> -rw-r--r-- 4 hduser supergroup 1423803 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg5000.txt >>>> -rw-r--r-- 4 hduser supergroup 393968 2013-12-30 19:12 >>>> /user/hduser/getty/gutenberg/pg972.txt >>>> hduser@nm:/usr/local/hadoop$ >>>> >>>> When i start mapreduce wordcount program it gives 100% mapping and >>>> reduce is hangs at 14%. >>>> >>>> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount >>>> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3 >>>> Warning: $HADOOP_HOME is deprecated. >>>> >>>> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to >>>> process : 7 >>>> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop >>>> library >>>> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not >>>> loaded >>>> 13/12/31 09:31:08 INFO mapred.JobClient: Running job: >>>> job_201312310929_0001 >>>> 13/12/31 09:31:09 INFO mapred.JobClient: map 0% reduce 0% >>>> 13/12/31 09:31:29 INFO mapred.JobClient: map 14% reduce 0% >>>> 13/12/31 09:31:34 INFO mapred.JobClient: map 32% reduce 0% >>>> 13/12/31 09:31:35 INFO mapred.JobClient: map 75% reduce 0% >>>> 13/12/31 09:31:36 INFO mapred.JobClient: map 90% reduce 0% >>>> 13/12/31 09:31:37 INFO mapred.JobClient: map 99% reduce 0% >>>> 13/12/31 09:31:38 INFO mapred.JobClient: map 100% reduce 0% >>>> 13/12/31 09:31:43 INFO mapred.JobClient: map 100% reduce 14% >>>> >>>> <HANGS HEAR> >>>> >>>> Could you please help me in resolving this issue. >>>> >>>> >>>> Thanks & Regards >>>> *Abdul Navaz* >>>> >>>> >>>> >>>> >>> >> >> >> -- >> *Abdul Navaz* >> *Masters in Network Communications* >> *University of Houston* >> *Houston, TX - 77204-4020* >> *Ph - 281-685-0388 <281-685-0388>* >> *[email protected]* <[email protected]> >> >> > -- *Abdul Navaz* *Masters in Network Communications* *University of Houston* *Houston, TX - 77204-4020* *Ph - 281-685-0388* *[email protected]* <[email protected]>
