Re: LDA on single node is much faster than 20 nodes

Tim R . Havens Tue, 24 Jan 2012 06:26:03 -0800

Sean Owen <srowen <at> gmail.com> writes:

...snip...
> You can of course force it to use more mappers, and that's probably a good
> idea here. -Dmapred.map.tasks=20 perhaps. More mappers means more overhead
> of spinning up mappers to process less data, and Hadoop's guess indicates
> that it thinks it's not efficient to use 20 workers. If you know that those
> other 18 are otherwise idle, my guess is you'd benefit from just making it
> use 20.
...


How can I accomplish this when doing something like this from command line?

Is it possible to force the map tasks and reduce tasks to a higher number 
in this example?  I've been running a few jobs like this with 'fpg' but 
I haven't been able to find solid doc's on how to increase the number of 
map/reducers for the jobs.  Currently this will run on about 8-9M rows 
of input on our cluster, but it never uses more than 2 map 2 reduce per 
job.

mahout fpg -i /user/<user>/stopword_filtered/search_terms.txt \
           -o stopword_filtered/patterns \
           -g 5000 \
           -k 20 \
           -method mapreduce \
           -regex '[\ ]' \
           -s 120

Re: LDA on single node is much faster than 20 nodes

Reply via email to