Hi, I am running a pig query on around 500 GB input data. The current block size is 128 MB and split size is the default 128 MB. I have also specified 16 reducers and around 3800 mappers are running.
Now I observe that shuffling is taking a long time to complete execution, approximately 25 mins per job. Can anyone suggest how I can bring down the shuffling time? Is there any property that I can tweak to improve performance? Thanks & Regards, Austin
