This first cut project explicitly assumes a unified user and item space. This works well for many action pairs, not for others. The reason I did this to begin with was for using multiple actions for ecom recs. Views were not very predictive of purchases alone and needed the cross-recommender treatment. We did this using Mahout matrix math so the issue of what to write to Solr did not come up. It worked fine but now we find the need for an online method that will make use of realtime generated preferences, so ones not in the batch training data.
The math still works for multiple item spaces but users must be in common. More generally the rank and ID space currently associated with users must be the same. Feel free to create examples if you want. Ted has some ideas for using multiple item spaces in presos that are on Slideshare I think. On Aug 2, 2013, at 10:13 AM, B Lyon <[email protected]> wrote: I think the sheet is very helpful. I was wondering about having at least one of the examples be where the actions deal with completely different things to maybe make it easier for newbies like me to grok the main points: purchases of items of type blah and views of videos, say. I think the input file has the same setup etc. I don't get the issue/questions that come up when we do have separate items. And I thought Ted mentioned at one point that the weighting of recommendation vectors might not be necessary based on some kind of solr magic, but I have no idea what that is. Btw, i was already thinking of doing something for my own clarification/edification that is similar to your spreadsheet, but would be a web page where a mouseover on one piece highlights the other pieces that generated it... E.g. The way the links in this pagerank explorer highlight the relevant portions of the google matrix ( https://googledrive.com/host/0B2GQktu-wcTiaWw5OFVqT1k3bDA/). There are lots of other different pieces here of course, but show connections soup-to-nuts as much as possible. On Friday, August 2, 2013, Pat Ferrel wrote: > I put some thought into this (actually I slept on it) and I think the > answer is in the math. > > -- A = matrix of action2 by user, used for cross-action recommendations, > for instance action2 = views. > -- B = matrix of action1 by user, these are the primary recommenders > actions, for instance action1 = purchases. > -- H_a1 = all user history of action1 in column vectors. This may be all > action1's recorded and so may = B' or it may have truncated history to get > more recent activity in recs. > -- H_a2 = all user history of action2 in column vectors. This may be all > action2's recorded and so may = A' or it may have truncated history to get > more recent activity in recs. > -- [B'B]H_a1 = R_a1, recommendations from action1. Recommendation are for > action1. > -- [B'A]H_a2 = R_a2, recommendations calculated from action2 where there > was also an action1. recommendation are for action1. > -- R_a1+ R_a2 = R, assumes a non-weighted linear combination, ideally they > are weighted to optimize results. > > The query on [B'A] will be column vectors from H_a2. Each is a user's > history of action2 on A items. That is if there were different items in A > than B then the query would be comprised of those items and against the > field that contains those items. This brings up a bunch of other questions > but for now we do not have separate items. > > It illustrates the fact that the query is user history of action2 so the > items (though they have the same ID space in this case) should be from A or > there would be no hits. > > Therefore we need the columns of [B'A], and [B'B]. [B'B] is symmetric so > rows are the same as columns. > > The confusion may come from the fact that Ted's mental model does not have > the same items for both A and B. So the document ID cannot = item ID since > the docs contain items from both item ID spaces. In which case I don't know > why they would be in the same doc at all but that is another discussion. > This model does not allow us to fetch a doc by ID. > > But in our case since we have the same IDs in A and B we can put them in a > doc of ID=item ID, the field similair_items can contain items from B > similarityMatrix rows since they are the same as columns, the > cross_action_similar_items field will contain columns from [B'A] > > This may just be mental looping--sleep only work about 50% of the time for > me so maybe someone else can check this reasoning. Have a look at the data > here > https://github.com/pferrel/solr-recommender/blob/master/src/test/resources/Recommender%20Math.xlsx > > > On Aug 1, 2013, at 6:00 PM, Pat Ferrel <[email protected]<javascript:;>> > wrote: > > Yes, storing the similar_items in a field, cross_action_similar_items in > another field all on the same doc ided by item ID. Agree that there may be > other fields. > > Storing the rows of [B'B] is ok because it's symmetric. However we did > talk about the [B'A] case and I thought we agreed to store the rows there > too because they were from Bs items. This was the discussion about having > different items for cross actions. The excerpt below is Ted responding to > my question. So do we want the columns of [B'A]? It's only a transpose away. > > >> On Tue, Jul 30, 2013 at 11:11 AM, Pat Ferrel >> <[email protected]<javascript:;>> > wrote: >> [B'A] = >> iphone ipad nexus galaxy surface >> iphone 2 2 2 1 0 >> ipad 2 2 2 1 0 >> nexus 1 1 1 1 0 >> galaxy 1 1 1 1 0 >> surface 0 0 0 0 1 >> >> The rows are what we want from [B'A] since the row items are from B, > right? >> >> Yes. >> >> It is easier to understand if you have different kinds of items as well > as different actions. For instance, suppose that you have user x query > terms (A) and user x device (B). B'A is then device x term so that there > is a row per device and the row contains terms. This is good when > searching for devices using terms. > > > Talking about getting the actual doc field values, which will include the > similar_items field and other metadata. The actual ids in the similar_items > field work well for anonymous/no-history recs but maybe there is a second > query or fetch that I'm missing? I assumed that a fetch of the doc and it's > fields by item ID was as fast a way to do this as possible. If there is > some way to get the same result by doing a query that is faster, I'm all > for it? > > Can do tomorrow at 2. > > -- BF Lyon http://www.nowherenearithaca.com
