The number of maps depends on the number of input splits. mapred.map.tasks is just a hint and needs to be honored by the InputFormat. With pig, you can try pig.maxCombinedSplitSize configuration to control the number of maps based on input size. For eg: 1G split size can be specified as Dpig.maxCombinedSplitSize=1073741824
Regards, Rohini On Fri, Feb 1, 2013 at 5:07 PM, Mohit Anchlia <[email protected]>wrote: > Sorry my question was around mapred.map.tasks I mistakenly specified wrong > parameter. In pig I am setting mapred.map.tasks to 200 but there are more > tasks being executed. > > On Fri, Feb 1, 2013 at 5:04 PM, Alan Gates <[email protected]> wrote: > > > Setting that mapred.reduce.tasks won't work as Pig overrides. See > > http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to > > set the number of reducers in Pig. > > > > Alan. > > > > On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote: > > > > > Just slightly different problem I tried setting SET mapred.reduce.tasks > > to > > > 200 in pig but still more tasks were launched for that job. Is there > any > > > other way to set the parameter? > > > > > > On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[email protected]> wrote: > > > > > >> > > >> its the total number of reducers not active reducers. > > >> If you specify lower number each reducer gets more data to process. > > >> -- > > >> Harsha > > >> > > >> > > >> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote: > > >> > > >>> Thanks! Is there a downside of reducing number of reducers? I am > trying > > >> to > > >>> alleviate high CPU. > > >>> > > >>> With low reducers using parallel clause does it mean that more data > is > > >>> processed by each reducer or does it mean how many reducers can be > > active > > >>> at one time > > >>> > > >>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[email protected] (mailto: > > >> [email protected])> wrote: > > >>> > > >>>> Mohit, > > >>>> you can use PARALLEL clause to specify reduce tasks. More info here > > >>>> > > >> > > > http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > > >>>> > > >>>> -- > > >>>> Harsha > > >>>> > > >>>> > > >>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > > >>>> > > >>>>> Is there a way to specify max number of reduce tasks that a job > > >> should > > >>>> span > > >>>>> in pig script without having to restart the cluster? > > >>>> > > >>>> > > >>> > > >>> > > >>> > > >> > > >> > > >> > > > > >
