I set mapred.reduce.tasks from -1 to "AutoReduce" And the hadoop created 450 tasks for Map. But 1 task for Reduce. It seems that this reduce only run on 1 slave (I have two slaves).
But when it was running on 66%, the error report again "Task attempt_201301150318_0001_r_000000_0 failed to report status for 601 seconds. Killing!" 2013/1/14 yaotian <[email protected]> > How to judge which counter would work? > > > 2013/1/11 <[email protected]> > > ** >> Hi >> >> To add on to Harsh's comments. >> >> You need not have to change the task time out. >> >> In your map/reduce code, you can increment a counter or report status >> intermediate on intervals so that there is communication from the task and >> hence won't have a task time out. >> >> Every map and reduce task run on its own jvm limited by a jvm size. If >> you try to holds too much data in memory then it can go beyond the jvm size >> and cause OOM errors. >> >> Regards >> Bejoy KS >> >> Sent from remote device, Please excuse typos >> ------------------------------ >> *From: * yaotian <[email protected]> >> *Date: *Fri, 11 Jan 2013 14:35:07 +0800 >> *To: *<[email protected]> >> *ReplyTo: * [email protected] >> *Subject: *Re: I am running MapReduce on a 30G data on 1master/2 slave, >> but failed. >> >> See inline. >> >> >> 2013/1/11 Harsh J <[email protected]> >> >>> If the per-record processing time is very high, you will need to >>> periodically report a status. Without a status change report from the task >>> to the tracker, it will be killed away as a dead task after a default >>> timeout of 10 minutes (600s). >>> >> =====================> Do you mean to increase the report time: "* >> mapred.task.timeout"*? >> >> >>> Also, beware of holding too much memory in a reduce JVM - you're still >>> limited there. Best to let the framework do the sort or secondary sort. >>> >> =======================> You mean use the default value ? This is my >> value. >> *mapred.job.reduce.memory.mb*-1 >> >>> >>> >>> On Fri, Jan 11, 2013 at 10:58 AM, yaotian <[email protected]> wrote: >>> >>>> Yes, you are right. The data is GPS trace related to corresponding uid. >>>> The reduce is doing this: Sort user to get this kind of result: uid, gps1, >>>> gps2, gps3........ >>>> Yes, the gps data is big because this is 30G data. >>>> >>>> How to solve this? >>>> >>>> >>>> >>>> 2013/1/11 Mahesh Balija <[email protected]> >>>> >>>>> Hi, >>>>> >>>>> 2 reducers are successfully completed and 1498 have been >>>>> killed. I assume that you have the data issues. (Either the data is huge >>>>> or >>>>> some issues with the data you are trying to process) >>>>> One possibility could be you have many values associated to >>>>> a single key, which can cause these kind of issues based on the operation >>>>> you do in your reducer. >>>>> Can you put some logs in your reducer and try to trace out >>>>> what is happening. >>>>> >>>>> Best, >>>>> Mahesh Balija, >>>>> Calsoft Labs. >>>>> >>>>> >>>>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <[email protected]> wrote: >>>>> >>>>>> I have 1 hadoop master which name node locates and 2 slave which >>>>>> datanode locate. >>>>>> >>>>>> If i choose a small data like 200M, it can be done. >>>>>> >>>>>> But if i run 30G data, Map is done. But the reduce report error. Any >>>>>> sugggestion? >>>>>> >>>>>> >>>>>> This is the information. >>>>>> >>>>>> *Black-listed TaskTrackers:* >>>>>> 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041> >>>>>> ------------------------------ >>>>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed >>>>>> Task >>>>>> Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041> >>>>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1> >>>>>> 100.00%4500 >>>>>> 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed> >>>>>> 00 / >>>>>> 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed> >>>>>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1> >>>>>> 100.00%1500 0 >>>>>> 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed> >>>>>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed> >>>>>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed> >>>>>> / >>>>>> 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed> >>>>>> >>>>>> >>>>>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters >>>>>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001> >>>>>> 0.00% >>>>>> 10-Jan-2013 04:18:54 >>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec) >>>>>> >>>>>> Task attempt_201301090834_0041_r_000001_0 failed to report status for >>>>>> 600 seconds. Killing! >>>>>> Task attempt_201301090834_0041_r_000001_1 failed to report status for >>>>>> 602 seconds. Killing! >>>>>> Task attempt_201301090834_0041_r_000001_2 failed to report status for >>>>>> 602 seconds. Killing! >>>>>> Task attempt_201301090834_0041_r_000001_3 failed to report status for >>>>>> 602 seconds. Killing! >>>>>> >>>>>> >>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001> >>>>>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002> >>>>>> 0.00% >>>>>> 10-Jan-2013 04:18:54 >>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec) >>>>>> >>>>>> Task attempt_201301090834_0041_r_000002_0 failed to report status for >>>>>> 601 seconds. Killing! >>>>>> Task attempt_201301090834_0041_r_000002_1 failed to report status for >>>>>> 600 seconds. Killing! >>>>>> >>>>>> >>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002> >>>>>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003> >>>>>> 0.00% >>>>>> 10-Jan-2013 04:18:57 >>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec) >>>>>> >>>>>> Task attempt_201301090834_0041_r_000003_0 failed to report status for >>>>>> 602 seconds. Killing! >>>>>> Task attempt_201301090834_0041_r_000003_1 failed to report status for >>>>>> 602 seconds. Killing! >>>>>> Task attempt_201301090834_0041_r_000003_2 failed to report status for >>>>>> 602 seconds. Killing! >>>>>> >>>>>> >>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003> >>>>>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005> >>>>>> 0.00% >>>>>> 10-Jan-2013 06:11:07 >>>>>> 10-Jan-2013 06:46:38 (35mins, 31sec) >>>>>> >>>>>> Task attempt_201301090834_0041_r_000005_0 failed to report status for >>>>>> 600 seconds. Killing! >>>>>> >>>>>> >>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Harsh J >>> >> >> >
