Re: Understanding similaraties computation in RecommenderJob

Sebastian Schelter Tue, 18 Jan 2011 02:35:42 -0800

Hi Stefano,

AFAIK the chapter about distributed recommenders in Mahout in Action hasnot yet been updated to the latest version of RecommenderJob maybethat's the source of your confusion.

I'll try to give a brief explanation of the similarity computation, feelfree to ask more questions if things don't get clear.

RecommenderJob starts ItemSimilarityJob which creates an item x usermatrix from the preference data and uses RowSimilarityJob to compute thepairwise similarities of the rows of this matrix (the items). So thebest place to start is looking at at RowSimilarityJob.

RowSimilarityJob uses an implementation of DistributedVectorSimilarityto compute the similarities in two phases. In the first phase eachitem-vector is shown to the similarity implementation and it can computea "weight" for it. In the second phase for all pairs of rows that haveat least one cooccurrence the method similarity(...) is called with theformerly computed weights and a list of all cooccurring values. Thisgeneric approach allows us to use different implementations ofDistributedVectorSimilarity so we can support a wide range of similarityfunctions.

A simplified version of this algorithm is also explained in the slidesof a talk I gave at the Hadoop Get Together, maybe that's helpful too:http://www.slideshare.net/sscdotopen/mahoutcf


--sebastian



On 18.01.2011 11:12, Stefano Bellasio wrote:

Hi guys, im trying to understand how RecommenderJob works. Right now i was thinking that was 
necessary choosing a particular similarity class like Euclidean Distance and so on, so my algorithm 
could compute all similarities for each pair of items and produce recommendations. Reading Mahout 
in Action, "Distributing a Recommender" i have now some doubts about the correlation 
between similarities like Euclidean, LogLike, Cosine and the co-occurence matrix, as i see in 
RecommenderJob i can specify also "Co-occurrence" as a similarity class, so what's the 
correct way to compute similarities and how this happens with other similarities class and 
co-occurrence matrix/similarity. Thank you very much for your further explanations :)

Re: Understanding similaraties computation in RecommenderJob

Reply via email to