I haven't used the CLI in ages but I believe there's an env variable like "MAHOUT_OPTS" where you can set flags like -Dmapred.map.tasks=20.
On Tue, Jan 24, 2012 at 2:23 PM, Tim R. Havens <[email protected]> wrote: > Sean Owen <srowen <at> gmail.com> writes: > > ...snip... > > You can of course force it to use more mappers, and that's probably a > good > > idea here. -Dmapred.map.tasks=20 perhaps. More mappers means more > overhead > > of spinning up mappers to process less data, and Hadoop's guess indicates > > that it thinks it's not efficient to use 20 workers. If you know that > those > > other 18 are otherwise idle, my guess is you'd benefit from just making > it > > use 20. > ... > > How can I accomplish this when doing something like this from command line? > > Is it possible to force the map tasks and reduce tasks to a higher number > in this example? I've been running a few jobs like this with 'fpg' but > I haven't been able to find solid doc's on how to increase the number of > map/reducers for the jobs. Currently this will run on about 8-9M rows > of input on our cluster, but it never uses more than 2 map 2 reduce per > job. > > mahout fpg -i /user/<user>/stopword_filtered/search_terms.txt \ > -o stopword_filtered/patterns \ > -g 5000 \ > -k 20 \ > -method mapreduce \ > -regex '[\ ]' \ > -s 120 > >
