Getting this running with co-occurrence rather than using a similarity calc on user rows finally forced me to understand what is going on in the base recommender. And the answer implies further work.
[B'B] is usually not calculated in the usual item based recommender. The matrix that comes out of RowSimilairtyJob looking at the purchases input matrix (rows = user) is used. This can be a co-occurrence matrix but is actually a log-likelihood similarity matrix in my case (substitute your favorite similarity measure). RowSimilarity works if the rows of one matrix are identical to the columns of the other. However when calculating the "similarity" version of the co-occurrence matrix corresponding to [B'A] you need to look at the similarity of a row in B with all rows in A. This will give us the analogous "similarity" matrix in the standard recommender. All is clear if I have this right. So a better generalization of the aglo would use the similarity of rows in B to all rows in A. So to rename [B'A] to S_ba for clarity S_ba would be the similarity matrix calculated from cross comparisons of rows/users. This is fundamentally a new mahout job type AFAIK. It's an important question to me because when we looked at similarity measures, log-likelihood gave us considerably better scores in the standard recommender. Also looking at the values in our [B'A] product I suspect it is not sparsified enough, which would be a desired side-effect of using similarity instead of co-occurrence. Also the values are not normalized in the same way as the general recommender so they can't be linearly combined with it. Do I have to create a SimilarityJob( matrixB, matrixA, similarityType ) to get this or have I missed something already in Mahout? On Apr 8, 2013, at 2:31 PM, Ted Dunning <[email protected]> wrote: > So calculating [B'A] seems like TransposeJob and MultiplyJob and does seem > to work. You loose the ability to substutute different RowSimilarityJob > measures. I assume this creates something like the co-occurrence similairty > measure. But oh, well. Maybe I'll look at that later. > Yes. Exactly.
