As Jonathan mentioned, TOP should obviate this particular use case. But for future examples, the parameters pig.exec.reducers.bytes.per.reducer and pig.exec.reducers.max might be useful:
https://issues.apache.org/jira/browse/PIG-1249 Norbert On Tue, May 21, 2013 at 9:23 AM, Vincent Barat <vincent.ba...@gmail.com>wrote: > Thanks for your reply. > > My goal is actually to AVOID using PARALLEL toi let PIG guess a good > number of reducer by itself. > Usually it works well for me, so I don't understadn why in that case it > does not. > > Le 19/05/13 15:37, Norbert Burger a écrit : > > Take a look at the PARALLEL clause: >> >> http://pig.apache.org/docs/r0.**7.0/cookbook.html#Use+the+** >> PARALLEL+Clause<http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause> >> >> On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <vincent.ba...@gmail.com> >> **wrote: >> >> Hi, >>> >>> I use this request to remove duplicated entries from a set of input files >>> (I cannot use DISTINCT since some fields can be different) >>> >>> grp = GROUP alias BY key; >>> alias = FOREACH grp { >>> record = LIMIT alias 1; >>> GENERATE FLATTEN(record) AS ... : >>> } >>> >>> It appears that this request always generates 1 reducer (I use 0 as >>> default nb of reducer to let PIG decide) whatever the size of my input >>> data. >>> >>> Is it a normal behavior ? How can I improve my request time by using >>> several reducers ? >>> >>> Thanks a lot for your help. >>> >>> >>> >>>