Re: Changing pig.maxCombinedSplitSize dynamically in single run

Something Something Mon, 02 Dec 2013 00:01:03 -0800

Adding ORDER BY is what I have done.  Basically, ordering by the same field
that I am splitting by.  This field is the same on all rows so essentially
there's nothing to order!  But this sounds kludgy!  That's why I asked.
 Thanks.



On Sun, Dec 1, 2013 at 8:31 PM, Cheolsoo Park <[email protected]> wrote:

> Unfortunately, no. The settings are script-wide. Can you add an order-by
> before storing your output and set its parallel to a smaller number? That
> will force a reduce phase and combine small files. Of course, it will add
> extra MR jobs.
>
>
> On Sat, Nov 30, 2013 at 9:20 AM, Something Something <
> [email protected]> wrote:
>
> > Is there a way in Pig to change this configuration
> > (pig.maxCombinedSplitSize) at different steps inside the *same* Pig
> script?
> >
> > For example, when I am LOADing the data I want this value to be low so
> that
> > we use the block size effectively & many mappers get triggered.
> (Otherwise,
> > the job takes too long).
> >
> > But later when I SPLIT my output, I want split size to be large so we
> don't
> > create 4000 small output files.  (SPLIT is a mapper only task).
> >
> > Is there a way to accomplish this?
> >
>

Re: Changing pig.maxCombinedSplitSize dynamically in single run

Reply via email to