I tend to let the cluster decide these things based on the input size and splits. But yes if you're not getting enough CPU utilization you can try running more mappers. If you're I/O bound, it won't necessarily help, but if not, it should increase throughput.
On Fri, Nov 26, 2010 at 10:45 PM, rmx <[email protected]> wrote: > > Hi, > > I read on Mahout in Action that I should set -Dmapred.map.tasks=X where X > would be the number of cores of my cluster. > I have been running experiments on amazon EC2 m.large instances. > I have been using kmeans over 1.1GB dataset. > I never set up that flag. I noticed that on a 10 machine cluster the maximum > cpu usage is 60%. > > Am I proceeding right? Shall I setup the flag? how? > > thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/is-it-necessary-set-mapred-map-tasks-running-mahout-on-a-cluster-tp1975103p1975103.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
