I don't think it does anything special. Block size isn't quite the issue here -- do you have 1,555 input files? You can check the value of mapred.min.split.size too.
On Mon, Dec 5, 2011 at 9:20 PM, Mohammed Al khooja <[email protected]>wrote: > Hi Bejoy, > > Thanks for your input. The only problem is that I don't have access to the > code and I need to work around the configurations. Since I have 123 > initial input splits, is it possible that Mahout is doing it's own > splitting ? I checked the Block sizes and they do match (128MB) so I can't > see why I end up having 1555 map tasks. > > Thanks anyways for the code hint. I think I will need it in the near > future! > > Mohamad > > On Mon, Dec 5, 2011 at 3:18 PM, bejoy ks <[email protected]> wrote: > > > Hi > > In hadoop mapred.map.tasks won't do the job of controlling the number > > of mappers. There would be one map task created against each input split. > > To reduce the number of map tasks you can use CombineFileInputFormat if > > that works for you in the implementation. It actually assigns multiple > > input splits to one mapper,but with a light loss of data locality. > > > > Hope it Helps!.. > > > > Regards > > Bejoy K S > > > > -----Original Message----- > > From: Mohammed Al khooja <[email protected]> > > Date: Mon, 5 Dec 2011 20:04:25 > > To: <[email protected]> > > Subject: Too many mappers - Mahout lda > > > > Hi, > > > > I'm running lda on 123 file-parts (Block size is 128 MB). However, > mahout > > is creating 1555 mappers. I tried setting mapred.map.tasks in > > mapred-site.xml but I guess Mahout overrides it. > > > > Does anything has to do with block size or splitting size ? > > > > Thanks. > > > > -- > > > > M.khouja > > > > > > -- > > M.khouja >
