Attach my sample code ( this InputFormat generate 1 reducer task for each 5
mapper task):
*public class MyInputFormat extends TextInputFormat {
@Override
public InputSplit[] getSplits(JobConf job, int numSplits)
throws IOException {
InputSplit[] splits = super.getSplits(job, numSplits);
int reducerNum = splits.length / 5;
if (reducerNum == 0) {
reducerNum = 1;
}
job.setNumReduceTasks(reducerNum);
return splits;
}
}*
After pig integrate the InputFormat in LoadFunc (Pig-966), it will be
possible to change the reducer task number dynamically.
Jeff Zhang
On Fri, Nov 27, 2009 at 3:38 PM, Jeff Zhang <[email protected]> wrote:
> I get the suggestion from Owen O'Malley that we can control reducer number
> in InputFormat, and I have tried that, it works.
>
>
> Jeff Zhang
>
>
>
>
> On Sat, Nov 14, 2009 at 1:23 AM, Alan Gates <[email protected]> wrote:
>
>>
>> On Nov 12, 2009, at 2:49 PM, Scott Carey wrote:
>>
>> Is it possible to have a script at least use the default configured
>>> Hadoop value? Or is there a way to do that already?
>>>
>>
>> If the user doesn't specify a parallelism Pig doesn't set a value in
>> JobConf for the reduce, which means it will pick up the default for the
>> cluster. Unless cluster administrators change it, the default for the
>> cluster is 1.
>>
>>
>>> Alan.
>>
>
>