On 10.05.12 12:31, "Sean Owen" <[email protected]> wrote:
>Hadoop just about always makes things slower, in terms of total >resources needed. It adds a lot of overhead; such is the price of >parallelism. My rule of thumb is that Hadoop-based algorithms will, >all else equal, take 4x more CPU hours. But of course Hadoop lets you >distribute. > >However, I doubt that's the total explanation here. What did you do in >2 minutes? I can't believe even the Mahout non-distributed recommender >would build its model and make recs for *all* users in that time. >Really? Nope. I just need a recommendation for one user. But I need this recommendation quickly. So As Input for hadoop I have used two files(input.txt - wikipedia database). Users.txt - files with user id for which I need a recommendation. So I need a param to specify that I need a recommendation for one user? > Remember that RecommenderJob is computing all recommendations >for all users. The non-distributed recommender doesn't do anything >like that until you ask it. > >On Thu, May 10, 2012 at 10:29 AM, <[email protected]> wrote: >> Hi, I am study mahout by Mahaout in Action Book. >> I have downloaded wikipedia links database and tried to executed >> recommendation for it using mahout and hadoop. >> I have used following command: >> hadoop jar mahout-core-0.6-job.jar >> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob >> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output >>--usersFile >> input/users.txt --booleanData true -s SIMILARITY_LOGLIKELIHOOD >> >> The command took for execution about 4 hours on my Mac Book Pro. At the >> same time on my book the recommendation without hadoop have required >>about >> 2 minuts. So why mahout+hadoop so slow?
