Taking an idea from Ted, I'm working on a cross recommender starting from 
mahout's m/r implementation of an item-based recommender. We have purchases and 
views for items by user. It is straightforward to create a recommender on 
purchases but using views as a predictor of purchases does not work so 
well--giving us lower precision scores. This is, no doubt, because the events 
have a lot of noise, views that do not lead to purchases.

To help solve this Ted suggests we think of a recommender in two parts:

[B'B]h_p = r_p  <== standard item-based recommender using purchases
[B'A]h_v = r_v  <== cross-recommender using views and purchases
r = r_p + r_v   <== linear combination of the two parts is the full 
recommendation vector

These both make recommendations for purchases but method 2 makes cross 
recommendations based on views. [B'A] is the co-occurrence matrix of views with 
purchases. 

From RecommenderJob the 'similarity matrix' is created by:

  //calculate the co-occurrence matrix
      ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{
          "--input", new Path(prepPath, 
PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
          "--output", similarityMatrixPath.toString(),
          "--similarityClassname", similarityClassname,
      …

What is the role of RowSimilarityJob here and how does it lead to a 
co-occurrence matrix? I understand that in the general recommender the 
co-occurrence matrix is symmetric so columns = rows. Is the co-occurrence 
matrix actually calculated anywhere in the standard recommender?

The output of PreparePreferenceMatrixJob is a DistributedRowMatrix. As a first 
cut it seems I can do the cross recommender part of the work by:

  //calculate the 'cross' co-occurrence matrix
      B = PreparePreferenceMatrixJob using user purchase prefs
      A = PreparePreferenceMatrixJob using user view prefs
      // note that users and items must be the same for A and B, their ids must 
map to the same things
      B' = TransposeJob on B
      [B'A] = MatrixMultJob on B', A
      [B'A]h_v by using the partial multiply process in the standard 
Recommender 
      extract the needed recs

Questions:
 *  I need to get item similarities perhaps even more importantly than user 
history based recs. I use the [B'A] columns for this, right? Shouldn't I run 
RowSimilarityJob on [B'A]'?
 *  There are assumptions in some code that the co-occurrence matrix is 
symmetric and so rows = columns. This is not true of the 'cross' co-occurrence 
matrix. Are there places I need to account for this?

Reply via email to