Sean Owen <srowen <at> gmail.com> writes:
...snip...
> You can of course force it to use more mappers, and that's probably a good
> idea here. -Dmapred.map.tasks=20 perhaps. More mappers means more overhead
> of spinning up mappers to process less data, and Hadoop's guess indicates
> that it thinks it's not efficient to use 20 workers. If you know that those
> other 18 are otherwise idle, my guess is you'd benefit from just making it
> use 20.
...
How can I accomplish this when doing something like this from command line?
Is it possible to force the map tasks and reduce tasks to a higher number
in this example? I've been running a few jobs like this with 'fpg' but
I haven't been able to find solid doc's on how to increase the number of
map/reducers for the jobs. Currently this will run on about 8-9M rows
of input on our cluster, but it never uses more than 2 map 2 reduce per
job.
mahout fpg -i /user/<user>/stopword_filtered/search_terms.txt \
-o stopword_filtered/patterns \
-g 5000 \
-k 20 \
-method mapreduce \
-regex '[\ ]' \
-s 120