I do not think the hint of skewed reducer is the problem here as Han mentioned that he has to wait for 5 minutes after the job shows progress as 100% map and 100% reduce. There may be something to do with the output committer , FileOutputCommitter needs to be looked at as what its doing for 5 min. Why so much time taken for committing a job.
Thanks, Rahul On Mon, Apr 29, 2013 at 9:29 PM, Ted Xu <[email protected]> wrote: > Hi Han, > > I think your point is valid. In fact you can change the progress report > logic by manually calling the Reporter API, but by default it is quite > straight forward. Reducer progress is divided into 3 phases, namely copy > phase, merge/sort phase and reduce phase, each with ~33%. In your case it > looks your program is stucked in reduce phase. To better track the cause, > you can check the task log, as Ted Dunning suggested before. > > > On Mon, Apr 29, 2013 at 11:17 PM, Han JU <[email protected]> wrote: > >> Thanks Ted and .. Ted .. >> I've been looking at the progress when the job is executing. >> In fact, I think it's not a skewed partition problem. I've looked at the >> mapper output files, all are of the same size and the reducer each takes a >> single group. >> What I want to know is that how hadoop M/R framework calculate the >> progress percentage. >> For example, my reducer: >> >> reducer(...) { >> call_of_another_func() // lots of complicated calculations >> } >> >> Will the percentage reflect the calculation inside the function call? >> Because I observed that in the job, all reducer reached 100% fairly >> quickly, then they stucked there. In this time, the datanodes seem to be >> working. >> >> Thanks. >> >> >> 2013/4/26 Ted Dunning <[email protected]> >> >>> Have you checked the logs? >>> >>> Is there a task that is taking a long time? What is that task doing? >>> >>> There are two basic possibilities: >>> >>> a) you have a skewed join like the other Ted mentioned. In this case, >>> the straggler will be seen to be working on data. >>> >>> b) you have a hung process. This can be more difficult to diagnose, but >>> indicates that there is a problem with your cluster. >>> >>> >>> >>> On Fri, Apr 26, 2013 at 2:21 AM, Han JU <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My >>>> questionis that in one of the jobs, map and reduce tasks show 100% finished >>>> in about 1m 30s, but I have to wait another 5m for this job to finish. >>>> This job writes about 720mb compressed data to HDFS with replication >>>> factor 1, in sequence file format. I've tried copying these data to hdfs, >>>> it takes only < 20 seconds. What happened during this 5 more minutes? >>>> >>>> Any idea on how to optimize this part? >>>> >>>> Thanks. >>>> >>>> -- >>>> *JU Han* >>>> >>>> UTC - Université de Technologie de Compiègne >>>> * **GI06 - Fouille de Données et Décisionnel* >>>> >>>> +33 0619608888 >>>> >>> >>> >> >> >> -- >> *JU Han* >> >> Software Engineer Intern @ KXEN Inc. >> UTC - Université de Technologie de Compiègne >> * **GI06 - Fouille de Données et Décisionnel* >> >> +33 0619608888 >> > > > > -- > Regards, > Ted Xu >
