The non-symmetry of the [B'A] and the fact that it is calculated from two 
models leads me to a rather heavy handed approach at least for a first cut. 

Let me know if this seems right:

 //calculate the 'cross' co-occurrence matrix
     B = PreparePreferenceMatrixJob using user purchase prefs
     A = PreparePreferenceMatrixJob using user view prefs
     // note that users and items *must* be the same for A and B, their ids 
must map to the same things and this may be a challenge.
     B' = TransposeJob on B
     [B'A] = MatrixMultJob on B', A

Now in the standard recommender we get the magic with RowSimilarity and  
'partial' multiplies. I haven't teased the partial multiplies apart but I 
suspect that since they use and rely on the output from RowSimilarityJob I'll 
need to rework this--please correct me if I'm wrong. Once I have [B'A] I need 
to :
    [B'A] * H_v, where H_v is the original user history vectors in A based on 
user's views. I think they need to be column vectors so H_v = A' so 
    [B'A]A' = DitributedRowMatrix of recommendations by user.

I'm most interested in item similarity so I think the [B'A] needs 
RowSimilarityJob run on it but it is the columns I need to compare (???) so 
    [B'A]' = rows of items with values that are views that lead to (co-occur 
with) purchases
    RowSimilarityJob on [B'A]' will calculate pairwise similarity of items and 
so will create a matrix of item similarities. Here I suppose I can apply any of 
the similarity classes.

Question:
*  Have I got the item similarity part right, do I need to compare columns of 
[B'A]?



On Apr 3, 2013, at 1:21 AM, Sebastian Schelter <[email protected]> wrote:

RowSimilarityJob outputs a matrix of item similarities in
RecommenderJob. While you can think of this conceptually as A'A, in the
code it is a sparsified matrix which holds the top-k similarities for
item j in the j-th row. This means it is not symmetric.

I don't think you need to run RowSimilarityJob on B'A, I think you would
need an equivalent of RowSimilarityJob to compute B'A. I guess you could
extends the MatrixMultiplicationJob to use the similarity measures from
RowSimilarityJob instead of standard dot products.

I really like the idea of such a cross recommender.

On 03.04.2013 08:33, Ted Dunning wrote:
> Sebastian,
> 
> What about the assumption that the matrix is symmetric?
> 
> A'A is symmetric, but B'A is not.
> 
> 
> On Wed, Apr 3, 2013 at 12:08 AM, Sebastian Schelter <[email protected]
>> wrote:
> 
>> RowSimilarityJob computes the top-k similar rows to each row of the
>> input matrix. You can think of it as computing A'A and sparsifying the
>> result afterwards. Furthermore it allows to plug in a similarity measure
>> of your choice.
>> 
>> If you want to have a cooccurrence matrix, you can use
>> 
>> o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity
>> as similarity measure.
>> 
>> 
>> On 02.04.2013 23:43, Pat Ferrel wrote:
>>> Taking an idea from Ted, I'm working on a cross recommender starting
>> from mahout's m/r implementation of an item-based recommender. We have
>> purchases and views for items by user. It is straightforward to create a
>> recommender on purchases but using views as a predictor of purchases does
>> not work so well--giving us lower precision scores. This is, no doubt,
>> because the events have a lot of noise, views that do not lead to purchases.
>>> 
>>> To help solve this Ted suggests we think of a recommender in two parts:
>>> 
>>> [B'B]h_p = r_p  <== standard item-based recommender using purchases
>>> [B'A]h_v = r_v  <== cross-recommender using views and purchases
>>> r = r_p + r_v   <== linear combination of the two parts is the full
>> recommendation vector
>>> 
>>> These both make recommendations for purchases but method 2 makes cross
>> recommendations based on views. [B'A] is the co-occurrence matrix of views
>> with purchases.
>>> 
>>> From RecommenderJob the 'similarity matrix' is created by:
>>> 
>>>  //calculate the co-occurrence matrix
>>>      ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{
>>>          "--input", new Path(prepPath,
>> PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
>>>          "--output", similarityMatrixPath.toString(),
>>>          "--similarityClassname", similarityClassname,
>>>      …
>>> 
>>> What is the role of RowSimilarityJob here and how does it lead to a
>> co-occurrence matrix? I understand that in the general recommender the
>> co-occurrence matrix is symmetric so columns = rows. Is the co-occurrence
>> matrix actually calculated anywhere in the standard recommender?
>>> 
>>> The output of PreparePreferenceMatrixJob is a DistributedRowMatrix. As a
>> first cut it seems I can do the cross recommender part of the work by:
>>> 
>>>  //calculate the 'cross' co-occurrence matrix
>>>      B = PreparePreferenceMatrixJob using user purchase prefs
>>>      A = PreparePreferenceMatrixJob using user view prefs
>>>      // note that users and items must be the same for A and B, their
>> ids must map to the same things
>>>      B' = TransposeJob on B
>>>      [B'A] = MatrixMultJob on B', A
>>>      [B'A]h_v by using the partial multiply process in the standard
>> Recommender
>>>      extract the needed recs
>>> 
>>> Questions:
>>> *  I need to get item similarities perhaps even more importantly than
>> user history based recs. I use the [B'A] columns for this, right? Shouldn't
>> I run RowSimilarityJob on [B'A]'?
>>> *  There are assumptions in some code that the co-occurrence matrix is
>> symmetric and so rows = columns. This is not true of the 'cross'
>> co-occurrence matrix. Are there places I need to account for this?
>>> 
>> 
>> 
> 


Reply via email to