RowSimilarityJob outputs a matrix of item similarities in RecommenderJob. While you can think of this conceptually as A'A, in the code it is a sparsified matrix which holds the top-k similarities for item j in the j-th row. This means it is not symmetric.
I don't think you need to run RowSimilarityJob on B'A, I think you would need an equivalent of RowSimilarityJob to compute B'A. I guess you could extends the MatrixMultiplicationJob to use the similarity measures from RowSimilarityJob instead of standard dot products. I really like the idea of such a cross recommender. On 03.04.2013 08:33, Ted Dunning wrote: > Sebastian, > > What about the assumption that the matrix is symmetric? > > A'A is symmetric, but B'A is not. > > > On Wed, Apr 3, 2013 at 12:08 AM, Sebastian Schelter <[email protected] >> wrote: > >> RowSimilarityJob computes the top-k similar rows to each row of the >> input matrix. You can think of it as computing A'A and sparsifying the >> result afterwards. Furthermore it allows to plug in a similarity measure >> of your choice. >> >> If you want to have a cooccurrence matrix, you can use >> >> o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity >> as similarity measure. >> >> >> On 02.04.2013 23:43, Pat Ferrel wrote: >>> Taking an idea from Ted, I'm working on a cross recommender starting >> from mahout's m/r implementation of an item-based recommender. We have >> purchases and views for items by user. It is straightforward to create a >> recommender on purchases but using views as a predictor of purchases does >> not work so well--giving us lower precision scores. This is, no doubt, >> because the events have a lot of noise, views that do not lead to purchases. >>> >>> To help solve this Ted suggests we think of a recommender in two parts: >>> >>> [B'B]h_p = r_p <== standard item-based recommender using purchases >>> [B'A]h_v = r_v <== cross-recommender using views and purchases >>> r = r_p + r_v <== linear combination of the two parts is the full >> recommendation vector >>> >>> These both make recommendations for purchases but method 2 makes cross >> recommendations based on views. [B'A] is the co-occurrence matrix of views >> with purchases. >>> >>> From RecommenderJob the 'similarity matrix' is created by: >>> >>> //calculate the co-occurrence matrix >>> ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{ >>> "--input", new Path(prepPath, >> PreparePreferenceMatrixJob.RATING_MATRIX).toString(), >>> "--output", similarityMatrixPath.toString(), >>> "--similarityClassname", similarityClassname, >>> … >>> >>> What is the role of RowSimilarityJob here and how does it lead to a >> co-occurrence matrix? I understand that in the general recommender the >> co-occurrence matrix is symmetric so columns = rows. Is the co-occurrence >> matrix actually calculated anywhere in the standard recommender? >>> >>> The output of PreparePreferenceMatrixJob is a DistributedRowMatrix. As a >> first cut it seems I can do the cross recommender part of the work by: >>> >>> //calculate the 'cross' co-occurrence matrix >>> B = PreparePreferenceMatrixJob using user purchase prefs >>> A = PreparePreferenceMatrixJob using user view prefs >>> // note that users and items must be the same for A and B, their >> ids must map to the same things >>> B' = TransposeJob on B >>> [B'A] = MatrixMultJob on B', A >>> [B'A]h_v by using the partial multiply process in the standard >> Recommender >>> extract the needed recs >>> >>> Questions: >>> * I need to get item similarities perhaps even more importantly than >> user history based recs. I use the [B'A] columns for this, right? Shouldn't >> I run RowSimilarityJob on [B'A]'? >>> * There are assumptions in some code that the co-occurrence matrix is >> symmetric and so rows = columns. This is not true of the 'cross' >> co-occurrence matrix. Are there places I need to account for this? >>> >> >> >
