I have 123 input files and mapred.min.split.size is zero.

Thanks

On Mon, Dec 5, 2011 at 4:24 PM, Sean Owen <[email protected]> wrote:

> I don't think it does anything special. Block size isn't quite the issue
> here -- do you have 1,555 input files?
> You can check the value of mapred.min.split.size too.
>
> On Mon, Dec 5, 2011 at 9:20 PM, Mohammed Al khooja <[email protected]
> >wrote:
>
> > Hi Bejoy,
> >
> > Thanks for your input.  The only problem is that I don't have access to
> the
> > code and I need to work around the configurations.  Since I have 123
> > initial input splits, is it possible that Mahout is doing it's own
> > splitting ?  I checked the Block sizes and they do match (128MB) so I
> can't
> > see why I end up having 1555 map tasks.
> >
> > Thanks anyways for the code hint.  I think I will need it in the near
> > future!
> >
> > Mohamad
> >
> > On Mon, Dec 5, 2011 at 3:18 PM, bejoy ks <[email protected]> wrote:
> >
> > > Hi
> > >    In hadoop mapred.map.tasks won't do the job of controlling the
> number
> > > of mappers. There would be one map task created against each input
> split.
> > > To reduce the number of map tasks you can use CombineFileInputFormat if
> > > that works for you in the implementation. It actually assigns multiple
> > > input splits to one mapper,but with a light loss of data locality.
> > >
> > > Hope it Helps!..
> > >
> > > Regards
> > > Bejoy K S
> > >
> > > -----Original Message-----
> > > From: Mohammed Al khooja <[email protected]>
> > > Date: Mon, 5 Dec 2011 20:04:25
> > > To: <[email protected]>
> > > Subject: Too many mappers - Mahout lda
> > >
> > > Hi,
> > >
> > > I'm running lda on 123 file-parts (Block size is 128 MB).  However,
> > mahout
> > > is creating 1555 mappers. I tried setting mapred.map.tasks in
> > > mapred-site.xml but I guess Mahout overrides it.
> > >
> > > Does anything has to do with block size or splitting size  ?
> > >
> > > Thanks.
> > >
> > > --
> > >
> > > M.khouja
> > >
> >
> >
> >
> > --
> >
> > M.khouja
> >
>



-- 

M.khouja

Reply via email to