Khooja
To use CombineFileInputFormat you necessarily don't need to do any code
change. Specify the input format in the hadoop jar command with the -D option
before you specify any other args like the input/output paths.
Regards
Bejoy K S
-----Original Message-----
From: bejoy ks <[email protected]>
Date: Mon, 5 Dec 2011 21:35:03
To: <[email protected]>; <[email protected]>
Subject: Re: Too many mappers - Mahout lda
Hi
Could you post in your values of mapred.min.split.size and
mapred.max.split.size. Even If your hdfs block size is 64MB, if
mapred.max.split.size is lesser then it can create more map tasks. When you say
123 input splits, is it just num of input files or the num of hdfs blocks? If
123 larger files are there and each spawning multiple blocks then the num map
tasks would definitely be more than num of input files. Please check all of the
above
Regards
Bejoy K S
-----Original Message-----
From: Sean Owen <[email protected]>
Date: Mon, 5 Dec 2011 21:24:48
To: <[email protected]>
Subject: Re: Too many mappers - Mahout lda
I don't think it does anything special. Block size isn't quite the issue
here -- do you have 1,555 input files?
You can check the value of mapred.min.split.size too.
On Mon, Dec 5, 2011 at 9:20 PM, Mohammed Al khooja <[email protected]>wrote:
> Hi Bejoy,
>
> Thanks for your input. The only problem is that I don't have access to the
> code and I need to work around the configurations. Since I have 123
> initial input splits, is it possible that Mahout is doing it's own
> splitting ? I checked the Block sizes and they do match (128MB) so I can't
> see why I end up having 1555 map tasks.
>
> Thanks anyways for the code hint. I think I will need it in the near
> future!
>
> Mohamad
>
> On Mon, Dec 5, 2011 at 3:18 PM, bejoy ks <[email protected]> wrote:
>
> > Hi
> > In hadoop mapred.map.tasks won't do the job of controlling the number
> > of mappers. There would be one map task created against each input split.
> > To reduce the number of map tasks you can use CombineFileInputFormat if
> > that works for you in the implementation. It actually assigns multiple
> > input splits to one mapper,but with a light loss of data locality.
> >
> > Hope it Helps!..
> >
> > Regards
> > Bejoy K S
> >
> > -----Original Message-----
> > From: Mohammed Al khooja <[email protected]>
> > Date: Mon, 5 Dec 2011 20:04:25
> > To: <[email protected]>
> > Subject: Too many mappers - Mahout lda
> >
> > Hi,
> >
> > I'm running lda on 123 file-parts (Block size is 128 MB). However,
> mahout
> > is creating 1555 mappers. I tried setting mapred.map.tasks in
> > mapred-site.xml but I guess Mahout overrides it.
> >
> > Does anything has to do with block size or splitting size ?
> >
> > Thanks.
> >
> > --
> >
> > M.khouja
> >
>
>
>
> --
>
> M.khouja
>