Re: Can I reduce shuffling time?

Austin Chungath Wed, 14 Mar 2012 02:15:12 -0700

Hi Prashant,

The number of shuffle bytes is around 7.5GB and it's taking around 25 to 30
mins for the shuffling to finish.
No, the job is not CPU intensive but it contains lots of GROUP BY and JOINS.


For more details about the job, go to the following link to get a job
details screenshot
http://imageshack.us/f/69/jobscreenshot.jpg/
(ps. I am using Pig from trunk, on hadoop 0.20.205)

Thanks,
Austin

On Tue, Mar 13, 2012 at 9:04 PM, Prashant Kommireddi <[email protected]>wrote:

> What is the number of reduce shuffle bytes for this job? Also, is this
> job CPU intensive on reducers or is it simple aggregation?
>
> Sent from my iPhone
>
> On Mar 13, 2012, at 5:25 AM, Austin Chungath <[email protected]> wrote:
>
> > Hi,
> > I am running a pig query on around 500 GB input data.
> > The current block size is 128 MB and split size is the default 128 MB.
> > I have also specified 16 reducers and around 3800 mappers are running.
> >
> > Now I observe that shuffling is taking a long time to complete execution,
> > approximately 25 mins per job.
> >
> > Can anyone suggest how I can bring down the shuffling time? Is there any
> > property that I can tweak to improve performance?
> >
> > Thanks & Regards,
> > Austin
>

Re: Can I reduce shuffling time?

Reply via email to