I put some thought into this (actually I slept on it) and I think the answer is in the math.
-- A = matrix of action2 by user, used for cross-action recommendations, for instance action2 = views. -- B = matrix of action1 by user, these are the primary recommenders actions, for instance action1 = purchases. -- H_a1 = all user history of action1 in column vectors. This may be all action1's recorded and so may = B' or it may have truncated history to get more recent activity in recs. -- H_a2 = all user history of action2 in column vectors. This may be all action2's recorded and so may = A' or it may have truncated history to get more recent activity in recs. -- [B'B]H_a1 = R_a1, recommendations from action1. Recommendation are for action1. -- [B'A]H_a2 = R_a2, recommendations calculated from action2 where there was also an action1. recommendation are for action1. -- R_a1+ R_a2 = R, assumes a non-weighted linear combination, ideally they are weighted to optimize results. The query on [B'A] will be column vectors from H_a2. Each is a user's history of action2 on A items. That is if there were different items in A than B then the query would be comprised of those items and against the field that contains those items. This brings up a bunch of other questions but for now we do not have separate items. It illustrates the fact that the query is user history of action2 so the items (though they have the same ID space in this case) should be from A or there would be no hits. Therefore we need the columns of [B'A], and [B'B]. [B'B] is symmetric so rows are the same as columns. The confusion may come from the fact that Ted's mental model does not have the same items for both A and B. So the document ID cannot = item ID since the docs contain items from both item ID spaces. In which case I don't know why they would be in the same doc at all but that is another discussion. This model does not allow us to fetch a doc by ID. But in our case since we have the same IDs in A and B we can put them in a doc of ID=item ID, the field similair_items can contain items from B similarityMatrix rows since they are the same as columns, the cross_action_similar_items field will contain columns from [B'A] This may just be mental looping--sleep only work about 50% of the time for me so maybe someone else can check this reasoning. Have a look at the data here https://github.com/pferrel/solr-recommender/blob/master/src/test/resources/Recommender%20Math.xlsx On Aug 1, 2013, at 6:00 PM, Pat Ferrel <[email protected]> wrote: Yes, storing the similar_items in a field, cross_action_similar_items in another field all on the same doc ided by item ID. Agree that there may be other fields. Storing the rows of [B'B] is ok because it's symmetric. However we did talk about the [B'A] case and I thought we agreed to store the rows there too because they were from Bs items. This was the discussion about having different items for cross actions. The excerpt below is Ted responding to my question. So do we want the columns of [B'A]? It's only a transpose away. > On Tue, Jul 30, 2013 at 11:11 AM, Pat Ferrel <[email protected]> wrote: > [B'A] = > iphone ipad nexus galaxy surface > iphone 2 2 2 1 0 > ipad 2 2 2 1 0 > nexus 1 1 1 1 0 > galaxy 1 1 1 1 0 > surface 0 0 0 0 1 > > The rows are what we want from [B'A] since the row items are from B, right? > > Yes. > > It is easier to understand if you have different kinds of items as well as > different actions. For instance, suppose that you have user x query terms > (A) and user x device (B). B'A is then device x term so that there is a row > per device and the row contains terms. This is good when searching for > devices using terms. Talking about getting the actual doc field values, which will include the similar_items field and other metadata. The actual ids in the similar_items field work well for anonymous/no-history recs but maybe there is a second query or fetch that I'm missing? I assumed that a fetch of the doc and it's fields by item ID was as fast a way to do this as possible. If there is some way to get the same result by doing a query that is faster, I'm all for it? Can do tomorrow at 2.
