Hadoop just about always makes things slower, in terms of total
resources needed. It adds a lot of overhead; such is the price of
parallelism. My rule of thumb is that Hadoop-based algorithms will,
all else equal, take 4x more CPU hours. But of course Hadoop lets you
distribute.

However, I doubt that's the total explanation here. What did you do in
2 minutes? I can't believe even the Mahout non-distributed recommender
would build its model and make recs for *all* users in that time.
Really?  Remember that RecommenderJob is computing all recommendations
for all users. The non-distributed recommender doesn't do anything
like that until you ask it.

On Thu, May 10, 2012 at 10:29 AM,  <[email protected]> wrote:
> Hi, I am study mahout by Mahaout in Action Book.
> I have downloaded wikipedia links database and tried to executed
> recommendation for it using mahout and hadoop.
> I have used following command:
> hadoop jar mahout-core-0.6-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData true -s SIMILARITY_LOGLIKELIHOOD
>
> The command took for execution about 4 hours on my Mac Book Pro. At the
> same time on my book the recommendation without hadoop have required about
> 2 minuts. So why mahout+hadoop so slow?

Reply via email to