Re: ItemSimilarity pre-processing

Sean Owen Tue, 12 Jul 2011 09:23:14 -0700

Instead of pre-processing, you can put a CachingItemSimilarity on top of
your ItemSimilarity. At least it will remember what it has already computed,
and you don't have to pre-compute everything, most of which is wasted.


You can also look at different CandidateItemStrategy classes. You can use it
to have it consider fewer item-item pairs.

But for MapReduce, you want to look at
org.apache.mahout.cf.taste.hadoop.item. There's a job there that will
compute all-pairs item-item similarity.

Sean

On Tue, Jul 12, 2011 at 4:32 PM, Abmar Barros <[email protected]> wrote:

> Hi all,
>
> I am new to Mahout and I am putting up a Recommender for buddycloud (
> http://buddycloud.com/) as a part of my GSoC project (
> https://github.com/buddycloud/channel-directory).
> In the testing snapshot, I got ~100k users, ~20k items and ~230k boolean
> taste preferences.
> At first I tried an UserBasedRecommender, with an all-in-memory DataModel
> (read from dump file, created a GenericDataModel). The recommendations
> performed great, almost real time. However, I thought this strategy
> wouldn't
> scale, once the number of users and items tend to increase, and then the
> service could run out-of-memory.
>
> Then I tried a PostgreSQLBooleanPrefJDBCDataModel, and, as expected, the
> performance dropped drastically. After reading the blog post at
>
> http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/
> ,
> I decided to try an ItemBasedRecommender, using a preprocessed
> ItemSimilarity table. I am trying to not use MapReduce at first, thus I
> tried to compute the LogLikehood similarity from every pair of item. This
> took too long, and then I gave up.
>
> Finally, my questions are: Am I doing things right? What is the best way to
> compute item similarity offline without MapReduce?
>
> Thanks in advance!
> Abmar
>
> --
> Abmar Barros
> MSc candidate on Computer Science at Federal University of Campina Grande -
> www.ufcg.edu.br
> OurGrid Team Member - www.ourgrid.org
> Paraíba - Brazil
>

Re: ItemSimilarity pre-processing

Reply via email to