Re: Nb of reduce tasks when GROUPing

Jonathan Coveney Sun, 19 May 2013 15:39:15 -0700

Also, look into the TOP udf instead of doing the limit. It can potentially
be a lot faster and is cleaner, IMHO.



2013/5/19 Norbert Burger <norbert.bur...@gmail.com>

> Take a look at the PARALLEL clause:
>
> http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause
>
> On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <vincent.ba...@gmail.com
> >wrote:
>
> > Hi,
> >
> > I use this request to remove duplicated entries from a set of input files
> > (I cannot use DISTINCT since some fields can be different)
> >
> > grp = GROUP alias BY key;
> > alias = FOREACH grp {
> >   record = LIMIT  alias 1;
> >   GENERATE FLATTEN(record) AS ... :
> > }
> >
> > It appears that this request always generates 1 reducer (I use 0 as
> > default nb of reducer to let PIG decide) whatever the size of my input
> data.
> >
> > Is it a normal behavior ? How can I improve my request time by using
> > several reducers ?
> >
> > Thanks a lot for your help.
> >
> >
> >
>

Re: Nb of reduce tasks when GROUPing

Reply via email to