Yes your only issue there, which I think you had touched on, was that you have to put your current cart (which hasn't been purchased) into the model in order to get an answer out of a recommender. I think we've talked about the recommend-to-anonymous function in the context of another system, which is exactly what you need here.
Yes, all you have to do then is reproduce the recommender computation. But I understand that you were hoping to avoid rewriting it. It's really just a loop though, so not much work to reproduce. 100K items x a few items in a cart is a few hundred thousand similarities. This isn't trivial but not going to take seconds, I think. Yes this gets much faster if you can precompute item-item similarity. Computing NxN pairs is going to take a long time though when N=100,000. So yes something like clustering is the nice way to scale that. Then your clusters greatly limit the number of candidates to consider because you can round every other inter-cluster similarity to 0. By this point... I imagine it's about as hard to whip up a frequent itemset implementation! or crib one and adapt it. This is in mahout. That's probably the right tool for the job. On Thu, Feb 14, 2013 at 8:19 PM, Pat Ferrel <[email protected]> wrote: > I'm creating a matrix of cart ids and items ids so cart x items in cart. > The 'preference' then is cartID, itemID. This will create the correct > matrix I think. > > For any cart id I would get a ranked list of recommended items that was > calculated from other carts. This seems like what is needed in a SC > recommender. So doing this should give a "recommend to this collection of > items", right? > > The only issue is finding the best cart to get the recs. I would be doing > a pair-wise similarity comparison for N carts to the current cart contents > and the result would have to come back in a very short amount of time, on > the order of the time to get recs for 3M users and 100K items. > > Not sure what N is yet but the # of items is the same as in the purchase > matrix. So finding the best cart to get recs for will be N similarity > comparisons--worst case. Each cart is likely to have only a few items in it > and I imagine this speeds the similarity calc. > > I guess I'll try it as described and optimize for speed if the precision > is good compared to the apriori algo. > > On Feb 14, 2013, at 10:57 AM, Sean Owen <[email protected]> wrote: > > I don't think it's necessarily slow; this is how item-based recommenders > work. The only thing stopping you from using Mahout directly is that I > don't think there's an easy way to say "recommend to this collection of > items". But that's what is happening inside when you recommend for a user. > > You can just roll your own version of it. Yes you are computing similarity > for k carted items by all N items, but is N so large? hundreds of > thousands of products? this is still likely pretty fast even if the > similarity is over millions of carts. Some smart precomputation and caching > goes a long way too. > > > On Thu, Feb 14, 2013 at 7:10 PM, Pat Ferrel <[email protected]> wrote: > > > Yes, one time tested way to do this is the "apriori" algo which looks at > > frequent item sets and creates rules. > > > > I was looking for a shortcut using a recommender, which would be super > > easy to try. The rule builder is a little harder to implement but we can > > also test precision on that and compare the two. > > > > The recommender method below should be reasonable AFAICT except for the > > method(s) of retrieving recs, which seem likely to be slow. > > > > On Feb 14, 2013, at 9:45 AM, Sean Owen <[email protected]> wrote: > > > > This sounds like a job for frequent item set mining, which is kind of a > > special case of the ideas you've mentioned here. Given N items in a cart, > > which next item most frequently occurs in a purchased cart? > > > > > > On Thu, Feb 14, 2013 at 6:30 PM, Pat Ferrel <[email protected]> > wrote: > > > >> I thought you might say that but we don't have the add-to-cart action. > We > >> have to calculate cart purchases by matching cart IDs or session IDs. So > > we > >> only have cart purchases with items. > >> > >> If we had the add-to-cart and the purchase we could use your > cross-action > >> method for getting recs by training only on those two actions. > >> > >> Still without the add-to-cart the method below should work, right? The > >> main problem being finding a similar cart in the training set quickly. > > Are > >> there other problems? > >> > >> On Feb 14, 2013, at 9:19 AM, Ted Dunning <[email protected]> wrote: > >> > >> I think that this is an excellent use case for cross recommendation from > >> cart contents (items) to cart purchases (items). The cross aspect is > > that > >> the recommendation is from two different kinds of actions, not two kinds > > of > >> things. The first action is insertion into a cart and the second is > >> purchase of an item. > >> > >> On Thu, Feb 14, 2013 at 9:53 AM, Pat Ferrel <[email protected]> > > wrote: > >> > >>> There are several methods for recommending things given a shopping cart > >>> contents. At the risk of using the same tool for every problem I was > >>> thinking about a recommender's use here. > >>> > >>> I'd do something like train on shopping cart purchases so row = cartID, > >>> column = itemID. > >>> Given cart contents I could find the most similar cart in the training > >> set > >>> by using a similarity measure then get recs for this closest matched > >> cart. > >>> > >>> The search for similar carts may be slow if I have to check for > pairwise > >>> similarity so I could cluster and find the best cluster then search it > >> for > >>> the best cart. I could create a decision tree on all trained carts and > >> walk > >>> as far as I can down the tree to find the cart with the most > >> cooccurrences. > >>> There may be other cooccurrence based methods in mahout??? With the id > > of > >>> the cart I can then get recs from the training set. I could also > fold-in > >>> the new cart contents to the training set and ask for recs based on it > >>> (this seems like it would take a long time to compute). This last would > >>> also pollute the trained matrix with partial carts over time. > >>> > >>> This seems like another place where Lucene might help but are there > > other > >>> mahout methods to look at before I diving into Lucene? > >> > >> > > > > > >
