Unfortunately, no. The settings are script-wide. Can you add an order-by before storing your output and set its parallel to a smaller number? That will force a reduce phase and combine small files. Of course, it will add extra MR jobs.
On Sat, Nov 30, 2013 at 9:20 AM, Something Something < [email protected]> wrote: > Is there a way in Pig to change this configuration > (pig.maxCombinedSplitSize) at different steps inside the *same* Pig script? > > For example, when I am LOADing the data I want this value to be low so that > we use the block size effectively & many mappers get triggered. (Otherwise, > the job takes too long). > > But later when I SPLIT my output, I want split size to be large so we don't > create 4000 small output files. (SPLIT is a mapper only task). > > Is there a way to accomplish this? >
