One issue with the method below is that the two source matrices would not have 
values for all users or items (rows or columns). I do know the entire user and 
item id space from a previous step so I know the # of rows including blank ones 
and # of columns even though some are empty. Put another way the Actual matrix 
(with empty rows or columns) may be larger than the number of rows in the 
DistributedRowMatrix or unique item ids. However all ids in one matrix will 
match the ids of the other matrix. 

AFAICT this should be OK for the TransposeJob and MatrixMultJob but I haven't 
tested it. I will need to pass in the size of the matrices as the size of the 
user and item space, Correct?


On Apr 3, 2013, at 9:15 AM, Pat Ferrel <[email protected]> wrote:

The non-symmetry of the [B'A] and the fact that it is calculated from two 
models leads me to a rather heavy handed approach at least for a first cut. 

Let me know if this seems right:

//calculate the 'cross' co-occurrence matrix
    B = PreparePreferenceMatrixJob using user purchase prefs
    A = PreparePreferenceMatrixJob using user view prefs
    // note that users and items *must* be the same for A and B, their ids must 
map to the same things and this may be a challenge.
    B' = TransposeJob on B
    [B'A] = MatrixMultJob on B', A

Now in the standard recommender we get the magic with RowSimilarity and  
'partial' multiplies. I haven't teased the partial multiplies apart but I 
suspect that since they use and rely on the output from RowSimilarityJob I'll 
need to rework this--please correct me if I'm wrong. Once I have [B'A] I need 
to :
   [B'A] * H_v, where H_v is the original user history vectors in A based on 
user's views. I think they need to be column vectors so H_v = A' so 
   [B'A]A' = DitributedRowMatrix of recommendations by user.

I'm most interested in item similarity so I think the [B'A] needs 
RowSimilarityJob run on it but it is the columns I need to compare (???) so 
   [B'A]' = rows of items with values that are views that lead to (co-occur 
with) purchases
   RowSimilarityJob on [B'A]' will calculate pairwise similarity of items and 
so will create a matrix of item similarities. Here I suppose I can apply any of 
the similarity classes.

Question:
*  Have I got the item similarity part right, do I need to compare columns of 
[B'A]?



On Apr 3, 2013, at 1:21 AM, Sebastian Schelter <[email protected]> wrote:

RowSimilarityJob outputs a matrix of item similarities in
RecommenderJob. While you can think of this conceptually as A'A, in the
code it is a sparsified matrix which holds the top-k similarities for
item j in the j-th row. This means it is not symmetric.

I don't think you need to run RowSimilarityJob on B'A, I think you would
need an equivalent of RowSimilarityJob to compute B'A. I guess you could
extends the MatrixMultiplicationJob to use the similarity measures from
RowSimilarityJob instead of standard dot products.

I really like the idea of such a cross recommender.

On 03.04.2013 08:33, Ted Dunning wrote:
> Sebastian,
> 
> What about the assumption that the matrix is symmetric?
> 
> A'A is symmetric, but B'A is not.
> 
> 
> On Wed, Apr 3, 2013 at 12:08 AM, Sebastian Schelter <[email protected]
>> wrote:
> 
>> RowSimilarityJob computes the top-k similar rows to each row of the
>> input matrix. You can think of it as computing A'A and sparsifying the
>> result afterwards. Furthermore it allows to plug in a similarity measure
>> of your choice.
>> 
>> If you want to have a cooccurrence matrix, you can use
>> 
>> o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity
>> as similarity measure.
>> 
>> 
>> On 02.04.2013 23:43, Pat Ferrel wrote:
>>> Taking an idea from Ted, I'm working on a cross recommender starting
>> from mahout's m/r implementation of an item-based recommender. We have
>> purchases and views for items by user. It is straightforward to create a
>> recommender on purchases but using views as a predictor of purchases does
>> not work so well--giving us lower precision scores. This is, no doubt,
>> because the events have a lot of noise, views that do not lead to purchases.
>>> 
>>> To help solve this Ted suggests we think of a recommender in two parts:
>>> 
>>> [B'B]h_p = r_p  <== standard item-based recommender using purchases
>>> [B'A]h_v = r_v  <== cross-recommender using views and purchases
>>> r = r_p + r_v   <== linear combination of the two parts is the full
>> recommendation vector
>>> 
>>> These both make recommendations for purchases but method 2 makes cross
>> recommendations based on views. [B'A] is the co-occurrence matrix of views
>> with purchases.
>>> 
>>> From RecommenderJob the 'similarity matrix' is created by:
>>> 
>>> //calculate the co-occurrence matrix
>>>     ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{
>>>         "--input", new Path(prepPath,
>> PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
>>>         "--output", similarityMatrixPath.toString(),
>>>         "--similarityClassname", similarityClassname,
>>>     …
>>> 
>>> What is the role of RowSimilarityJob here and how does it lead to a
>> co-occurrence matrix? I understand that in the general recommender the
>> co-occurrence matrix is symmetric so columns = rows. Is the co-occurrence
>> matrix actually calculated anywhere in the standard recommender?
>>> 
>>> The output of PreparePreferenceMatrixJob is a DistributedRowMatrix. As a
>> first cut it seems I can do the cross recommender part of the work by:
>>> 
>>> //calculate the 'cross' co-occurrence matrix
>>>     B = PreparePreferenceMatrixJob using user purchase prefs
>>>     A = PreparePreferenceMatrixJob using user view prefs
>>>     // note that users and items must be the same for A and B, their
>> ids must map to the same things
>>>     B' = TransposeJob on B
>>>     [B'A] = MatrixMultJob on B', A
>>>     [B'A]h_v by using the partial multiply process in the standard
>> Recommender
>>>     extract the needed recs
>>> 
>>> Questions:
>>> *  I need to get item similarities perhaps even more importantly than
>> user history based recs. I use the [B'A] columns for this, right? Shouldn't
>> I run RowSimilarityJob on [B'A]'?
>>> *  There are assumptions in some code that the co-occurrence matrix is
>> symmetric and so rows = columns. This is not true of the 'cross'
>> co-occurrence matrix. Are there places I need to account for this?
>>> 
>> 
>> 
> 



Reply via email to