See inline.
2013/1/11 Harsh J <[email protected]> > If the per-record processing time is very high, you will need to > periodically report a status. Without a status change report from the task > to the tracker, it will be killed away as a dead task after a default > timeout of 10 minutes (600s). > =====================> Do you mean to increase the report time: "* mapred.task.timeout"*? > Also, beware of holding too much memory in a reduce JVM - you're still > limited there. Best to let the framework do the sort or secondary sort. > =======================> You mean use the default value ? This is my value. *mapred.job.reduce.memory.mb*-1 > > > On Fri, Jan 11, 2013 at 10:58 AM, yaotian <[email protected]> wrote: > >> Yes, you are right. The data is GPS trace related to corresponding uid. >> The reduce is doing this: Sort user to get this kind of result: uid, gps1, >> gps2, gps3........ >> Yes, the gps data is big because this is 30G data. >> >> How to solve this? >> >> >> >> 2013/1/11 Mahesh Balija <[email protected]> >> >>> Hi, >>> >>> 2 reducers are successfully completed and 1498 have been >>> killed. I assume that you have the data issues. (Either the data is huge or >>> some issues with the data you are trying to process) >>> One possibility could be you have many values associated to a >>> single key, which can cause these kind of issues based on the operation you >>> do in your reducer. >>> Can you put some logs in your reducer and try to trace out >>> what is happening. >>> >>> Best, >>> Mahesh Balija, >>> Calsoft Labs. >>> >>> >>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <[email protected]> wrote: >>> >>>> I have 1 hadoop master which name node locates and 2 slave which >>>> datanode locate. >>>> >>>> If i choose a small data like 200M, it can be done. >>>> >>>> But if i run 30G data, Map is done. But the reduce report error. Any >>>> sugggestion? >>>> >>>> >>>> This is the information. >>>> >>>> *Black-listed TaskTrackers:* >>>> 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041> >>>> ------------------------------ >>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed >>>> Task >>>> Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041> >>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1> >>>> 100.00%4500 >>>> 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed> >>>> 00 / >>>> 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed> >>>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1> >>>> 100.00%1500 0 >>>> 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed> >>>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed> >>>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed> >>>> / >>>> 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed> >>>> >>>> >>>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters >>>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001> >>>> 0.00% >>>> 10-Jan-2013 04:18:54 >>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec) >>>> >>>> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 >>>> seconds. Killing! >>>> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 >>>> seconds. Killing! >>>> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 >>>> seconds. Killing! >>>> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 >>>> seconds. Killing! >>>> >>>> >>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001> >>>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002> >>>> 0.00% >>>> 10-Jan-2013 04:18:54 >>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec) >>>> >>>> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 >>>> seconds. Killing! >>>> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 >>>> seconds. Killing! >>>> >>>> >>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002> >>>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003> >>>> 0.00% >>>> 10-Jan-2013 04:18:57 >>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec) >>>> >>>> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 >>>> seconds. Killing! >>>> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 >>>> seconds. Killing! >>>> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 >>>> seconds. Killing! >>>> >>>> >>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003> >>>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005> >>>> 0.00% >>>> 10-Jan-2013 06:11:07 >>>> 10-Jan-2013 06:46:38 (35mins, 31sec) >>>> >>>> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 >>>> seconds. Killing! >>>> >>>> >>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005> >>>> >>> >>> >> > > > -- > Harsh J >
