If you need anything for "one user" and "quickly" then you don't want to use Hadoop. :) It is extremely inefficient to try to use Hadoop this way since by running it you are recomputing every single similarity, then producing one recommendation. At best, use Hadoop to precompute similarities, then use them to recommend in real-time.
This is really the right way, I think, to use Hadoop with recommendations. You use Hadoop to compute, offline and periodically, some underlying model. Then you load that into a server that can quickly make any recommendation from it at run-time. This is the architecture I'm building in the (Mahout-based) Myrrix recommender engine (myrrix.com) Sean On Thu, May 10, 2012 at 10:53 AM, Maksim Areshkau <[email protected]> wrote: > On 10.05.12 12:31, "Sean Owen" <[email protected]> wrote: > > >>Hadoop just about always makes things slower, in terms of total >>resources needed. It adds a lot of overhead; such is the price of >>parallelism. My rule of thumb is that Hadoop-based algorithms will, >>all else equal, take 4x more CPU hours. But of course Hadoop lets you >>distribute. >> >>However, I doubt that's the total explanation here. What did you do in >>2 minutes? I can't believe even the Mahout non-distributed recommender >>would build its model and make recs for *all* users in that time. >>Really? > Nope. I just need a recommendation for one user. But I need this > recommendation quickly. > So As Input for hadoop I have used two files(input.txt - wikipedia > database). Users.txt - files with user id for which I need a > recommendation. > So I need a param to specify that I need a recommendation for one user? >> Remember that RecommenderJob is computing all recommendations >>for all users. The non-distributed recommender doesn't do anything >>like that until you ask it. >> >>On Thu, May 10, 2012 at 10:29 AM, <[email protected]> wrote: >>> Hi, I am study mahout by Mahaout in Action Book. >>> I have downloaded wikipedia links database and tried to executed >>> recommendation for it using mahout and hadoop. >>> I have used following command: >>> hadoop jar mahout-core-0.6-job.jar >>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob >>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output >>>--usersFile >>> input/users.txt --booleanData true -s SIMILARITY_LOGLIKELIHOOD >>> >>> The command took for execution about 4 hours on my Mac Book Pro. At the >>> same time on my book the recommendation without hadoop have required >>>about >>> 2 minuts. So why mahout+hadoop so slow? > >
