Apologies for thrashing--definitely doing some mental looping but look at the 
cross-similarities on the "Template" sheet of the Excel file. The rows of [B'A] 
intuitively look best.

Specifically there was a user who viewed the Surface and Nexus but the columns 
do not account for that, the rows do.

Going from rows to columns is the trivial addition of a transpose so I'm going 
to go ahead with rows for now. This affects the cross_action_similar_items and 
so only the cross-recommender part of the whole.

On Aug 2, 2013, at 8:00 AM, Pat Ferrel <[email protected]> wrote:

I put some thought into this (actually I slept on it) and I think the answer is 
in the math.

-- A = matrix of action2 by user, used for cross-action recommendations, for 
instance action2 = views.
-- B = matrix of action1 by user, these are the primary recommenders actions, 
for instance action1 = purchases.
-- H_a1 = all user history of action1 in column vectors. This may be all 
action1's recorded and so may = B' or it may have truncated history to get more 
recent activity in recs.
-- H_a2 = all user history of action2 in column vectors. This may be all 
action2's recorded and so may = A' or it may have truncated history to get more 
recent activity in recs.
-- [B'B]H_a1 = R_a1, recommendations from action1. Recommendation are for 
action1.
-- [B'A]H_a2 = R_a2, recommendations calculated from action2 where there was 
also an action1. recommendation are for action1. 
-- R_a1+ R_a2 = R, assumes a non-weighted linear combination, ideally they are 
weighted to optimize results.

The query on [B'A] will be column vectors from  H_a2. Each is a user's  history 
of action2 on A items. That is if there were different items in A than B then 
the query would be comprised of those items and against the field that contains 
those items. This brings up a bunch of other questions but for now we do not 
have separate items.

It illustrates the fact that the query is user history of action2 so the items 
(though they have the same ID space in this case) should be from A or there 
would be no hits.

Therefore we need the columns of [B'A], and [B'B]. [B'B] is symmetric so rows 
are the same as columns.

The confusion may come from the fact that Ted's mental model does not have the 
same items for both A and B. So the document ID cannot = item ID since the docs 
contain items from both item ID spaces. In which case I don't know why they 
would be in the same doc at all but that is another discussion. This model does 
not allow us to fetch a doc by ID.

But in our case since we have the same IDs in A and B we can put them in a doc 
of ID=item ID, the field similair_items can contain items from B 
similarityMatrix rows since they are the same as columns, the 
cross_action_similar_items field will contain columns from [B'A]

This may just be mental looping--sleep only work about 50% of the time for me 
so maybe someone else can check this reasoning. Have a look at the data here 
https://github.com/pferrel/solr-recommender/blob/master/src/test/resources/Recommender%20Math.xlsx


On Aug 1, 2013, at 6:00 PM, Pat Ferrel <[email protected]> wrote:

Yes, storing the similar_items in a field, cross_action_similar_items in 
another field all on the same doc ided by item ID. Agree that there may be 
other fields.

Storing the rows of [B'B] is ok because it's symmetric. However we did talk 
about the [B'A] case and I thought we agreed to store the rows there too 
because they were from Bs items. This was the discussion about having different 
items for cross actions. The excerpt below is Ted responding to my question. So 
do we want the columns of [B'A]? It's only a transpose away.


> On Tue, Jul 30, 2013 at 11:11 AM, Pat Ferrel <[email protected]> wrote:
> [B'A] =
>       iphone  ipad    nexus   galaxy  surface
> iphone  2       2       2       1       0
> ipad    2       2       2       1       0
> nexus   1       1       1       1       0
> galaxy  1       1       1       1       0
> surface 0       0       0       0       1
> 
> The rows are what we want from [B'A] since the row items are from B, right?
> 
> Yes.
> 
> It is easier to understand if you have different kinds of items as well as 
> different actions.  For instance, suppose that you have user x query terms 
> (A) and user x device (B).  B'A is then device x term so that there is a row 
> per device and the row contains terms.  This is good when searching for 
> devices using terms.


Talking about getting the actual doc field values, which will include the 
similar_items field and other metadata. The actual ids in the similar_items 
field work well for anonymous/no-history recs but maybe there is a second query 
or fetch that I'm missing? I assumed that a fetch of the doc and it's fields  
by item ID was as fast a way to do this as possible. If there is some way to 
get the same result by doing a query that is faster, I'm all for it?

Can do tomorrow at 2.


Reply via email to