Hadoop just about always makes things slower, in terms of total resources needed. It adds a lot of overhead; such is the price of parallelism. My rule of thumb is that Hadoop-based algorithms will, all else equal, take 4x more CPU hours. But of course Hadoop lets you distribute.
However, I doubt that's the total explanation here. What did you do in 2 minutes? I can't believe even the Mahout non-distributed recommender would build its model and make recs for *all* users in that time. Really? Remember that RecommenderJob is computing all recommendations for all users. The non-distributed recommender doesn't do anything like that until you ask it. On Thu, May 10, 2012 at 10:29 AM, <[email protected]> wrote: > Hi, I am study mahout by Mahaout in Action Book. > I have downloaded wikipedia links database and tried to executed > recommendation for it using mahout and hadoop. > I have used following command: > hadoop jar mahout-core-0.6-job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile > input/users.txt --booleanData true -s SIMILARITY_LOGLIKELIHOOD > > The command took for execution about 4 hours on my Mac Book Pro. At the > same time on my book the recommendation without hadoop have required about > 2 minuts. So why mahout+hadoop so slow?
