Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

yaotian Sun, 13 Jan 2013 20:09:20 -0800

How to judge which counter would work?


2013/1/11 <[email protected]>

> **
> Hi
>
> To add on to Harsh's comments.
>
> You need not have to change the task time out.
>
> In your map/reduce code, you can increment a counter or report status
> intermediate on intervals so that there is communication from the task and
> hence won't have a task time out.
>
> Every map and reduce task run on its own jvm limited by a jvm size. If you
> try to holds too much data in memory then it can go beyond the jvm size and
> cause OOM errors.
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * yaotian <[email protected]>
> *Date: *Fri, 11 Jan 2013 14:35:07 +0800
> *To: *<[email protected]>
> *ReplyTo: * [email protected]
> *Subject: *Re: I am running MapReduce on a 30G data on 1master/2 slave,
> but failed.
>
> See inline.
>
>
> 2013/1/11 Harsh J <[email protected]>
>
>> If the per-record processing time is very high, you will need to
>> periodically report a status. Without a status change report from the task
>> to the tracker, it will be killed away as a dead task after a default
>> timeout of 10 minutes (600s).
>>
> =====================> Do you mean to increase the report time: "*
> mapred.task.timeout"*?
>
>
>> Also, beware of holding too much memory in a reduce JVM - you're still
>> limited there. Best to let the framework do the sort or secondary sort.
>>
> =======================>  You mean use the default value ? This is my
> value.
> *mapred.job.reduce.memory.mb*-1
>
>>
>>
>> On Fri, Jan 11, 2013 at 10:58 AM, yaotian <[email protected]> wrote:
>>
>>> Yes, you are right. The data is GPS trace related to corresponding uid.
>>> The reduce is doing this: Sort user to get this kind of result: uid, gps1,
>>> gps2, gps3........
>>> Yes, the gps data is big because this is 30G data.
>>>
>>> How to solve this?
>>>
>>>
>>>
>>> 2013/1/11 Mahesh Balija <[email protected]>
>>>
>>>> Hi,
>>>>
>>>>           2 reducers are successfully completed and 1498 have been
>>>> killed. I assume that you have the data issues. (Either the data is huge or
>>>> some issues with the data you are trying to process)
>>>>           One possibility could be you have many values associated to a
>>>> single key, which can cause these kind of issues based on the operation you
>>>> do in your reducer.
>>>>           Can you put some logs in your reducer and try to trace out
>>>> what is happening.
>>>>
>>>> Best,
>>>> Mahesh Balija,
>>>> Calsoft Labs.
>>>>
>>>>
>>>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <[email protected]> wrote:
>>>>
>>>>> I have 1 hadoop master which name node locates and 2 slave which
>>>>> datanode locate.
>>>>>
>>>>> If i choose a small data like 200M, it can be done.
>>>>>
>>>>> But if i run 30G data, Map is done. But the reduce report error. Any
>>>>> sugggestion?
>>>>>
>>>>>
>>>>> This is the information.
>>>>>
>>>>> *Black-listed TaskTrackers:* 
>>>>> 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041>
>>>>> ------------------------------
>>>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed
>>>>> Task 
>>>>> Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041>
>>>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1>
>>>>> 100.00%4500 
>>>>> 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed>
>>>>> 00 / 
>>>>> 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed>
>>>>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1>
>>>>> 100.00%1500 0 
>>>>> 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed>
>>>>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed>
>>>>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed>
>>>>>  / 
>>>>> 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed>
>>>>>
>>>>>
>>>>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters
>>>>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001>
>>>>> 0.00%
>>>>> 10-Jan-2013 04:18:54
>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
>>>>>
>>>>> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 
>>>>> seconds. Killing!
>>>>> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 
>>>>> seconds. Killing!
>>>>> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 
>>>>> seconds. Killing!
>>>>> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 
>>>>> seconds. Killing!
>>>>>
>>>>>
>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001>
>>>>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002>
>>>>> 0.00%
>>>>> 10-Jan-2013 04:18:54
>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
>>>>>
>>>>> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 
>>>>> seconds. Killing!
>>>>> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 
>>>>> seconds. Killing!
>>>>>
>>>>>
>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002>
>>>>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003>
>>>>> 0.00%
>>>>> 10-Jan-2013 04:18:57
>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
>>>>>
>>>>> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 
>>>>> seconds. Killing!
>>>>> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 
>>>>> seconds. Killing!
>>>>> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 
>>>>> seconds. Killing!
>>>>>
>>>>>
>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003>
>>>>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005>
>>>>> 0.00%
>>>>> 10-Jan-2013 06:11:07
>>>>> 10-Jan-2013 06:46:38 (35mins, 31sec)
>>>>>
>>>>> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 
>>>>> seconds. Killing!
>>>>>
>>>>>
>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

Reply via email to