On Wed, May 16, 2012 at 3:00 AM, Chandra Mohan, Ananda Vel Murugan <[email protected]> wrote: > * What is the difference between running a mahout job locally and > in Hadoop?
Mostly the difference is whether algorithm supports running on MapReduce or not. Usually it is one way or the other. (although MapReduce based solutions could be run using hadoop local mode in some cases (not all) and technically it would still be "running in Hadoop". > > > > * I wrote a simple mahout job to do K-means clustering using my > data. I packaged it as jar and tried running it. It worked and did the > clustering in a Hadoop single node cluster. I am planning to move this > job to a multi node cluster. Should I execute mahout command from job > tracker node only? Or can I execute it from any node in cluster and be > assured that it uses all the nodes in the cluster. How mahout works in a > multi node cluster? > You can execute command line (it's called "driver" in Hadoop's lingua) from any node that has a network connectivity to mapreduce cluster (i.e. you don't have to choose any particular node or even be within the cluster) but you should do it only once. -d
