Sebastian, What about the assumption that the matrix is symmetric?
A'A is symmetric, but B'A is not. On Wed, Apr 3, 2013 at 12:08 AM, Sebastian Schelter <[email protected] > wrote: > RowSimilarityJob computes the top-k similar rows to each row of the > input matrix. You can think of it as computing A'A and sparsifying the > result afterwards. Furthermore it allows to plug in a similarity measure > of your choice. > > If you want to have a cooccurrence matrix, you can use > > o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity > as similarity measure. > > > On 02.04.2013 23:43, Pat Ferrel wrote: > > Taking an idea from Ted, I'm working on a cross recommender starting > from mahout's m/r implementation of an item-based recommender. We have > purchases and views for items by user. It is straightforward to create a > recommender on purchases but using views as a predictor of purchases does > not work so well--giving us lower precision scores. This is, no doubt, > because the events have a lot of noise, views that do not lead to purchases. > > > > To help solve this Ted suggests we think of a recommender in two parts: > > > > [B'B]h_p = r_p <== standard item-based recommender using purchases > > [B'A]h_v = r_v <== cross-recommender using views and purchases > > r = r_p + r_v <== linear combination of the two parts is the full > recommendation vector > > > > These both make recommendations for purchases but method 2 makes cross > recommendations based on views. [B'A] is the co-occurrence matrix of views > with purchases. > > > > From RecommenderJob the 'similarity matrix' is created by: > > > > //calculate the co-occurrence matrix > > ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{ > > "--input", new Path(prepPath, > PreparePreferenceMatrixJob.RATING_MATRIX).toString(), > > "--output", similarityMatrixPath.toString(), > > "--similarityClassname", similarityClassname, > > … > > > > What is the role of RowSimilarityJob here and how does it lead to a > co-occurrence matrix? I understand that in the general recommender the > co-occurrence matrix is symmetric so columns = rows. Is the co-occurrence > matrix actually calculated anywhere in the standard recommender? > > > > The output of PreparePreferenceMatrixJob is a DistributedRowMatrix. As a > first cut it seems I can do the cross recommender part of the work by: > > > > //calculate the 'cross' co-occurrence matrix > > B = PreparePreferenceMatrixJob using user purchase prefs > > A = PreparePreferenceMatrixJob using user view prefs > > // note that users and items must be the same for A and B, their > ids must map to the same things > > B' = TransposeJob on B > > [B'A] = MatrixMultJob on B', A > > [B'A]h_v by using the partial multiply process in the standard > Recommender > > extract the needed recs > > > > Questions: > > * I need to get item similarities perhaps even more importantly than > user history based recs. I use the [B'A] columns for this, right? Shouldn't > I run RowSimilarityJob on [B'A]'? > > * There are assumptions in some code that the co-occurrence matrix is > symmetric and so rows = columns. This is not true of the 'cross' > co-occurrence matrix. Are there places I need to account for this? > > > >
