I don't think it does anything special. Block size isn't quite the issue
here -- do you have 1,555 input files?
You can check the value of mapred.min.split.size too.

On Mon, Dec 5, 2011 at 9:20 PM, Mohammed Al khooja <[email protected]>wrote:

> Hi Bejoy,
>
> Thanks for your input.  The only problem is that I don't have access to the
> code and I need to work around the configurations.  Since I have 123
> initial input splits, is it possible that Mahout is doing it's own
> splitting ?  I checked the Block sizes and they do match (128MB) so I can't
> see why I end up having 1555 map tasks.
>
> Thanks anyways for the code hint.  I think I will need it in the near
> future!
>
> Mohamad
>
> On Mon, Dec 5, 2011 at 3:18 PM, bejoy ks <[email protected]> wrote:
>
> > Hi
> >    In hadoop mapred.map.tasks won't do the job of controlling the number
> > of mappers. There would be one map task created against each input split.
> > To reduce the number of map tasks you can use CombineFileInputFormat if
> > that works for you in the implementation. It actually assigns multiple
> > input splits to one mapper,but with a light loss of data locality.
> >
> > Hope it Helps!..
> >
> > Regards
> > Bejoy K S
> >
> > -----Original Message-----
> > From: Mohammed Al khooja <[email protected]>
> > Date: Mon, 5 Dec 2011 20:04:25
> > To: <[email protected]>
> > Subject: Too many mappers - Mahout lda
> >
> > Hi,
> >
> > I'm running lda on 123 file-parts (Block size is 128 MB).  However,
> mahout
> > is creating 1555 mappers. I tried setting mapred.map.tasks in
> > mapred-site.xml but I guess Mahout overrides it.
> >
> > Does anything has to do with block size or splitting size  ?
> >
> > Thanks.
> >
> > --
> >
> > M.khouja
> >
>
>
>
> --
>
> M.khouja
>

Reply via email to