Hi Stefano,

AFAIK the chapter about distributed recommenders in Mahout in Action has not yet been updated to the latest version of RecommenderJob maybe that's the source of your confusion.

I'll try to give a brief explanation of the similarity computation, feel free to ask more questions if things don't get clear.

RecommenderJob starts ItemSimilarityJob which creates an item x user matrix from the preference data and uses RowSimilarityJob to compute the pairwise similarities of the rows of this matrix (the items). So the best place to start is looking at at RowSimilarityJob.

RowSimilarityJob uses an implementation of DistributedVectorSimilarity to compute the similarities in two phases. In the first phase each item-vector is shown to the similarity implementation and it can compute a "weight" for it. In the second phase for all pairs of rows that have at least one cooccurrence the method similarity(...) is called with the formerly computed weights and a list of all cooccurring values. This generic approach allows us to use different implementations of DistributedVectorSimilarity so we can support a wide range of similarity functions.

A simplified version of this algorithm is also explained in the slides of a talk I gave at the Hadoop Get Together, maybe that's helpful too: http://www.slideshare.net/sscdotopen/mahoutcf

--sebastian



On 18.01.2011 11:12, Stefano Bellasio wrote:
Hi guys, im trying to understand how RecommenderJob works. Right now i was thinking that was 
necessary choosing a particular similarity class like Euclidean Distance and so on, so my algorithm 
could compute all similarities for each pair of items and produce recommendations. Reading Mahout 
in Action, "Distributing a Recommender" i have now some doubts about the correlation 
between similarities like Euclidean, LogLike, Cosine and the co-occurence matrix, as i see in 
RecommenderJob i can specify also "Co-occurrence" as a similarity class, so what's the 
correct way to compute similarities and how this happens with other similarities class and 
co-occurrence matrix/similarity. Thank you very much for your further explanations :)

Reply via email to