Re: M/R job optimization

Rahul Bhattacharjee Sun, 05 May 2013 09:47:29 -0700

I do not think the hint of skewed reducer is the problem here as Han
mentioned that he has to wait for 5 minutes after the job shows progress as
100% map and 100% reduce. There may be something to do with the output
committer , FileOutputCommitter needs to be looked at as what its doing for
5 min. Why so much time taken for committing a job.


Thanks,
Rahul


On Mon, Apr 29, 2013 at 9:29 PM, Ted Xu <[email protected]> wrote:

> Hi Han,
>
> I think your point is valid. In fact you can change the progress report
> logic by manually calling the Reporter API, but by default it is quite
> straight forward. Reducer progress is divided into 3 phases, namely copy
> phase, merge/sort phase and reduce phase, each with ~33%. In your case it
> looks your program is stucked in reduce phase. To better track the cause,
> you can check the task log, as Ted Dunning suggested before.
>
>
> On Mon, Apr 29, 2013 at 11:17 PM, Han JU <[email protected]> wrote:
>
>> Thanks Ted and .. Ted ..
>> I've been looking at the progress when the job is executing.
>> In fact, I think it's not a skewed partition problem. I've looked at the
>> mapper output files, all are of the same size and the reducer each takes a
>> single group.
>> What I want to know is that how hadoop M/R framework calculate the
>> progress percentage.
>> For example, my reducer:
>>
>> reducer(...) {
>>   call_of_another_func() // lots of complicated calculations
>> }
>>
>> Will the percentage reflect the calculation inside the function call?
>> Because I observed that in the job, all reducer reached 100% fairly
>> quickly, then they stucked there. In this time, the datanodes seem to be
>> working.
>>
>> Thanks.
>>
>>
>> 2013/4/26 Ted Dunning <[email protected]>
>>
>>> Have you checked the logs?
>>>
>>> Is there a task that is taking a long time?  What is that task doing?
>>>
>>> There are two basic possibilities:
>>>
>>> a) you have a skewed join like the other Ted mentioned.  In this case,
>>> the straggler will be seen to be working on data.
>>>
>>> b) you have a hung process.  This can be more difficult to diagnose, but
>>> indicates that there is a problem with your cluster.
>>>
>>>
>>>
>>> On Fri, Apr 26, 2013 at 2:21 AM, Han JU <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
>>>> questionis that in one of the jobs, map and reduce tasks show 100% finished
>>>> in about 1m 30s, but I have to wait another 5m for this job to finish.
>>>> This job writes about 720mb compressed data to HDFS with replication
>>>> factor 1, in sequence file format. I've tried copying these data to hdfs,
>>>> it takes only < 20 seconds. What happened during this 5 more minutes?
>>>>
>>>> Any idea on how to optimize this part?
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Regards,
> Ted Xu
>

Re: M/R job optimization

Reply via email to