Hi Bejoy,

Thanks for your input.  The only problem is that I don't have access to the
code and I need to work around the configurations.  Since I have 123
initial input splits, is it possible that Mahout is doing it's own
splitting ?  I checked the Block sizes and they do match (128MB) so I can't
see why I end up having 1555 map tasks.

Thanks anyways for the code hint.  I think I will need it in the near
future!

Mohamad

On Mon, Dec 5, 2011 at 3:18 PM, bejoy ks <[email protected]> wrote:

> Hi
>    In hadoop mapred.map.tasks won't do the job of controlling the number
> of mappers. There would be one map task created against each input split.
> To reduce the number of map tasks you can use CombineFileInputFormat if
> that works for you in the implementation. It actually assigns multiple
> input splits to one mapper,but with a light loss of data locality.
>
> Hope it Helps!..
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: Mohammed Al khooja <[email protected]>
> Date: Mon, 5 Dec 2011 20:04:25
> To: <[email protected]>
> Subject: Too many mappers - Mahout lda
>
> Hi,
>
> I'm running lda on 123 file-parts (Block size is 128 MB).  However, mahout
> is creating 1555 mappers. I tried setting mapred.map.tasks in
> mapred-site.xml but I guess Mahout overrides it.
>
> Does anything has to do with block size or splitting size  ?
>
> Thanks.
>
> --
>
> M.khouja
>



-- 

M.khouja

Reply via email to