Hi Bejoy, Thanks for your input. The only problem is that I don't have access to the code and I need to work around the configurations. Since I have 123 initial input splits, is it possible that Mahout is doing it's own splitting ? I checked the Block sizes and they do match (128MB) so I can't see why I end up having 1555 map tasks.
Thanks anyways for the code hint. I think I will need it in the near future! Mohamad On Mon, Dec 5, 2011 at 3:18 PM, bejoy ks <[email protected]> wrote: > Hi > In hadoop mapred.map.tasks won't do the job of controlling the number > of mappers. There would be one map task created against each input split. > To reduce the number of map tasks you can use CombineFileInputFormat if > that works for you in the implementation. It actually assigns multiple > input splits to one mapper,but with a light loss of data locality. > > Hope it Helps!.. > > Regards > Bejoy K S > > -----Original Message----- > From: Mohammed Al khooja <[email protected]> > Date: Mon, 5 Dec 2011 20:04:25 > To: <[email protected]> > Subject: Too many mappers - Mahout lda > > Hi, > > I'm running lda on 123 file-parts (Block size is 128 MB). However, mahout > is creating 1555 mappers. I tried setting mapred.map.tasks in > mapred-site.xml but I guess Mahout overrides it. > > Does anything has to do with block size or splitting size ? > > Thanks. > > -- > > M.khouja > -- M.khouja
