Re: Nb of reduce tasks when GROUPing

Vincent Barat Tue, 21 May 2013 09:27:47 -0700

OK I got it :http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/TOP.html

Anyway, there is no such thing than a "long" field in my use case: Ijust want to pick up 1 tuple over the set (I consider then as allequivalent)


Le 20/05/13 00:38, Jonathan Coveney a écrit :

Also, look into the TOP udf instead of doing the limit. It can potentially
be a lot faster and is cleaner, IMHO.


2013/5/19 Norbert Burger <norbert.bur...@gmail.com>

Take a look at the PARALLEL clause:

http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause

On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <vincent.ba...@gmail.com

wrote:
Hi,

I use this request to remove duplicated entries from a set of input files
(I cannot use DISTINCT since some fields can be different)

grp = GROUP alias BY key;
alias = FOREACH grp {
   record = LIMIT  alias 1;
   GENERATE FLATTEN(record) AS ... :
}

It appears that this request always generates 1 reducer (I use 0 as
default nb of reducer to let PIG decide) whatever the size of my input

data.

Is it a normal behavior ? How can I improve my request time by using
several reducers ?

Thanks a lot for your help.

Re: Nb of reduce tasks when GROUPing

Reply via email to