Hi,
I am running a pig query on around 500 GB input data.
The current block size is 128 MB and split size is the default 128 MB.
I have also specified 16 reducers and around 3800 mappers are running.

Now I observe that shuffling is taking a long time to complete execution,
approximately 25 mins per job.

Can anyone suggest how I can bring down the shuffling time? Is there any
property that I can tweak to improve performance?

Thanks & Regards,
Austin

Reply via email to