I have followed a suggestion on the given link, and set mapred.min.split.size to 134217728.
With the above mapred.min.split.size, I get mapred.map.tasks =121 (previously it was 242). Thanks for all the replies ! Shing ________________________________ From: Romedius Weiss <[email protected]> To: [email protected] Sent: Wednesday, October 3, 2012 5:00 AM Subject: Re: How to lower the total number of map tasks Hi! According to the article @YDN* "The on-node parallelism is controlled by the mapred.tasktracker.map.tasks.maximum parameter." [http://developer.yahoo.com/hadoop/tutorial/module4.html] Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size. Have you tried to make a custom InputFormat? Might be another more drastic solution. Cheers, R Zitat von Shing Hing Man <[email protected]>: > I only have one big input file. > > Shing > > > ________________________________ > From: Bejoy KS <[email protected]> > To: [email protected]; Shing Hing Man <[email protected]> > Sent: Tuesday, October 2, 2012 6:46 PM > Subject: Re: How to lower the total number of map tasks > > > Hi Shing > > Is your input a single file or set of small files? If latter you need to use > CombineFileInputFormat. > > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ________________________________ > > From: Shing Hing Man <[email protected]> > Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT) > To: [email protected]<[email protected]> > ReplyTo: [email protected] > Subject: Re: How to lower the total number of map tasks > > > I have tried > Configuration.setInt("mapred.max.split.size",134217728); > > and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is > left unchanged at 67108864). > > But in the job.xml, I am still getting mapred.map.tasks =242 . > > Shing > > > > > > > ________________________________ > From: Bejoy Ks <[email protected]> > To: [email protected]; Shing Hing Man <[email protected]> > Sent: Tuesday, October 2, 2012 6:03 PM > Subject: Re: How to lower the total number of map tasks > > > Sorry for the typo, the property name is mapred.max.split.size > > Also just for changing the number of map tasks you don't need to modify the > hdfs block size. > > > On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[email protected]> wrote: > > Hi >> >> >> You need to alter the value of mapred.max.split size to a value larger than >> your block size to have less number of map tasks than the default. >> >> >> >> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[email protected]> wrote: >> >> >>> >>> >>> I am running Hadoop 1.0.3 in Pseudo distributed mode. >>> When I submit a map/reduce job to process a file of size about 16 GB, in >>> job.xml, I have the following >>> >>> >>> mapred.map.tasks =242 >>> mapred.min.split.size =0 >>> dfs.block.size = 67108864 >>> >>> >>> I would like to reduce mapred.map.tasks to see if it improves performance. >>> I have tried doubling the size of dfs.block.size. But >>> the mapred.map.tasks remains unchanged. >>> Is there a way to reduce mapred.map.tasks ? >>> >>> >>> Thanks in advance for any assistance ! >>> Shing >>> >>> >>
