Instead of pre-processing, you can put a CachingItemSimilarity on top of your ItemSimilarity. At least it will remember what it has already computed, and you don't have to pre-compute everything, most of which is wasted.
You can also look at different CandidateItemStrategy classes. You can use it to have it consider fewer item-item pairs. But for MapReduce, you want to look at org.apache.mahout.cf.taste.hadoop.item. There's a job there that will compute all-pairs item-item similarity. Sean On Tue, Jul 12, 2011 at 4:32 PM, Abmar Barros <[email protected]> wrote: > Hi all, > > I am new to Mahout and I am putting up a Recommender for buddycloud ( > http://buddycloud.com/) as a part of my GSoC project ( > https://github.com/buddycloud/channel-directory). > In the testing snapshot, I got ~100k users, ~20k items and ~230k boolean > taste preferences. > At first I tried an UserBasedRecommender, with an all-in-memory DataModel > (read from dump file, created a GenericDataModel). The recommendations > performed great, almost real time. However, I thought this strategy > wouldn't > scale, once the number of users and items tend to increase, and then the > service could run out-of-memory. > > Then I tried a PostgreSQLBooleanPrefJDBCDataModel, and, as expected, the > performance dropped drastically. After reading the blog post at > > http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/ > , > I decided to try an ItemBasedRecommender, using a preprocessed > ItemSimilarity table. I am trying to not use MapReduce at first, thus I > tried to compute the LogLikehood similarity from every pair of item. This > took too long, and then I gave up. > > Finally, my questions are: Am I doing things right? What is the best way to > compute item similarity offline without MapReduce? > > Thanks in advance! > Abmar > > -- > Abmar Barros > MSc candidate on Computer Science at Federal University of Campina Grande - > www.ufcg.edu.br > OurGrid Team Member - www.ourgrid.org > ParaĆba - Brazil >
