Hi Stefano,
AFAIK the chapter about distributed recommenders in Mahout in Action has
not yet been updated to the latest version of RecommenderJob maybe
that's the source of your confusion.
I'll try to give a brief explanation of the similarity computation, feel
free to ask more questions if things don't get clear.
RecommenderJob starts ItemSimilarityJob which creates an item x user
matrix from the preference data and uses RowSimilarityJob to compute the
pairwise similarities of the rows of this matrix (the items). So the
best place to start is looking at at RowSimilarityJob.
RowSimilarityJob uses an implementation of DistributedVectorSimilarity
to compute the similarities in two phases. In the first phase each
item-vector is shown to the similarity implementation and it can compute
a "weight" for it. In the second phase for all pairs of rows that have
at least one cooccurrence the method similarity(...) is called with the
formerly computed weights and a list of all cooccurring values. This
generic approach allows us to use different implementations of
DistributedVectorSimilarity so we can support a wide range of similarity
functions.
A simplified version of this algorithm is also explained in the slides
of a talk I gave at the Hadoop Get Together, maybe that's helpful too:
http://www.slideshare.net/sscdotopen/mahoutcf
--sebastian
On 18.01.2011 11:12, Stefano Bellasio wrote:
Hi guys, im trying to understand how RecommenderJob works. Right now i was thinking that was
necessary choosing a particular similarity class like Euclidean Distance and so on, so my algorithm
could compute all similarities for each pair of items and produce recommendations. Reading Mahout
in Action, "Distributing a Recommender" i have now some doubts about the correlation
between similarities like Euclidean, LogLike, Cosine and the co-occurence matrix, as i see in
RecommenderJob i can specify also "Co-occurrence" as a similarity class, so what's the
correct way to compute similarities and how this happens with other similarities class and
co-occurrence matrix/similarity. Thank you very much for your further explanations :)