Hi, Thanks for your replies. I will try working on recommended suggestions and provide feedback.
Abhi, In the JobTracker Web UI -> Job Tracker History, go to the specific job. Go to Reduce Task List. Enter into the first reduce task attempt. There you can see the start time. It is the time when the shuffle (part of reduce phase) actually starts. Then again, go to JobTracker Main Page -> Job Tracker History -> Same Job. Click on "Analyse This Job". Scroll down to the portion where you can see the "Last Shuffle Finish Time". Calculate the difference/gap between both the times. That is your Job's Total Shuffle Time. Thanks, Gaurav Dasgupta On Wed, Aug 29, 2012 at 12:57 AM, abhiTowson cal <[email protected]>wrote: > hi Gaurav, > > Can you tell me how did calculated total shuffle time ?.Apart from > combiners and compression, you can also use some shuffle-sort > parameters that might increase the performance, i am not sure exactly > which parameters to tweak .Please share if you come across some other > techniques , i am very much interested. > > Regards > Abhi > > On Tue, Aug 28, 2012 at 3:16 AM, Gaurav Dasgupta <[email protected]> > wrote: > > Hi, > > > > I have run some large and small jobs and calculated the Total Shuffle > Time > > for the jobs. I can see that the Total Shuffle Time is almost half the > Total > > Time which was taken by the full job to complete. > > > > My question, here, is that how can we decrease the Total Shuffle Time? > And > > doing so, what will be its effect on the Job? > > > > Thanks, > > Gaurav Dasgupta >
