Re: LDA on single node is much faster than 20 nodes

Sean Owen Tue, 24 Jan 2012 06:33:28 -0800

I haven't used the CLI in ages but I believe there's an env variable like
"MAHOUT_OPTS" where you can set flags like -Dmapred.map.tasks=20.


On Tue, Jan 24, 2012 at 2:23 PM, Tim R. Havens <[email protected]> wrote:

> Sean Owen <srowen <at> gmail.com> writes:
>
> ...snip...
> > You can of course force it to use more mappers, and that's probably a
> good
> > idea here. -Dmapred.map.tasks=20 perhaps. More mappers means more
> overhead
> > of spinning up mappers to process less data, and Hadoop's guess indicates
> > that it thinks it's not efficient to use 20 workers. If you know that
> those
> > other 18 are otherwise idle, my guess is you'd benefit from just making
> it
> > use 20.
> ...
>
> How can I accomplish this when doing something like this from command line?
>
> Is it possible to force the map tasks and reduce tasks to a higher number
> in this example?  I've been running a few jobs like this with 'fpg' but
> I haven't been able to find solid doc's on how to increase the number of
> map/reducers for the jobs.  Currently this will run on about 8-9M rows
> of input on our cluster, but it never uses more than 2 map 2 reduce per
> job.
>
> mahout fpg -i /user/<user>/stopword_filtered/search_terms.txt \
>           -o stopword_filtered/patterns \
>           -g 5000 \
>           -k 20 \
>           -method mapreduce \
>           -regex '[\ ]' \
>           -s 120
>
>

Re: LDA on single node is much faster than 20 nodes

Reply via email to