Getting this running with co-occurrence rather than using a similarity calc on 
user rows finally forced me to understand what is going on in the base 
recommender. And the answer implies further work.

[B'B] is usually not calculated in the usual item based recommender. The matrix 
that comes out of RowSimilairtyJob looking at the purchases input matrix (rows 
= user) is used. This can be a co-occurrence matrix but is actually a 
log-likelihood similarity matrix in my case (substitute your favorite 
similarity measure). 

RowSimilarity works if the rows of one matrix are identical to the columns of 
the other. However when calculating the "similarity" version of the 
co-occurrence matrix corresponding to [B'A] you need to look at the similarity 
of a row in B with all rows in A. This will give us the analogous "similarity" 
matrix in the standard recommender. 

All is clear if I have this right. So a better generalization of the aglo would 
use the similarity of rows in B to all rows in A.  So to rename [B'A] to S_ba 
for clarity S_ba would be the similarity matrix calculated from cross 
comparisons of rows/users.

This is fundamentally a new mahout job type AFAIK. It's an important question 
to me because when we looked at similarity measures, log-likelihood gave us 
considerably better scores in the standard recommender. Also looking at the 
values in our [B'A] product I suspect it is not sparsified enough, which would 
be a desired side-effect of using similarity instead of co-occurrence. Also the 
values are not normalized in the same way as the general recommender so they 
can't be linearly combined with it.

Do I have to create a SimilarityJob( matrixB, matrixA, similarityType ) to get 
this or have I missed something already in Mahout?


On Apr 8, 2013, at 2:31 PM, Ted Dunning <[email protected]> wrote:

> So calculating [B'A] seems like TransposeJob and MultiplyJob and does seem
> to work. You loose the ability to substutute different RowSimilarityJob
> measures. I assume this creates something like the co-occurrence similairty
> measure. But oh, well. Maybe I'll look at that later.
> 

Yes.  Exactly.

Reply via email to