Ted, thanks for the video - enjoyable and insightful. Gustavo, a good read, and a reminder of how far I have to go. More maths later - fun!
Pat, I need to read more and take my time understanding how cut-offs in LLR derived co-occurence can be exploited in practice. I accept that useful real-time model updates are an edge case, but I may have to face edge cases. I mentioned the framework I'm putting together - I didn't mention that we're a SAAS business. The product will serve multiple use cases. The cold start capabilities of the multi-modal approach are appealing. I can see content recommendations filling the gap while we build the user-item model - this won't work for all product types of course. There are clients whose 'products' are fairly short lived where the initial burst of user-item interactions would definitely be useful. I take your point that small increment sets might not impact the model in most cases. My take-out from the responses so far is that the real time question can wait until phase n of the project without sacrificing much value. I'm looking forward to learning what is possible - I see what you are saying about the mutable vectors. The great thing now for me is that I can do an end to end proof of concept mostly by doing framework plumbing. Maybe I'll look into doing multiple cross-coocurrence indicators in once pass via the ItemSimilarityDriver, but once we get the basics functioning, we'll probably be looking to engage a Ted or a Pat if we can afford them :D There is one final challenge for today I have not figured out though. Let's say I have a new client (client #2), who sells shoes. Let's say I have an existing client (client #1), for whom we have captured a million user-view/purchase interactions. How can I recommend to client #2 based on the model built from client #1? The items in their respective inventories are similar by content, but not identical. So I need to map the content similarities across the product data sets, then via that mapping, apply pseudo-collaborative filtering to client #2's customers. Thoughts? Many thanks for your time once again. On 22 June 2015 at 01:32, Pat Ferrel <[email protected]> wrote: > Actually Mahout’s item and row similarity calculate the cooccurrence and > cross-cooccurrence matrices, a search engine preforms the knn calc to > return an ordered list of recs. The search query is user history the search > engine calculates the most similar items from the cooccurrence matrix and > cross-cooccurrence matrices by keeping them in different fields. This means > there is only one query across several matrices. Solr and Elasticsearch are > well know for speed and scalability in serving these queries. > > In a hypothetical incremental model we might use the search engine as > matrix storage since an incremental update to the matrix would be indexed > in realtime by the engine. The update method Ted mentions is relatively > simple and only requires that the cooccurrence matrices be mutable and two > mutable vectors be kept in memory (item/column and user/row interaction > counts). > > On Jun 19, 2015, at 6:47 PM, Gustavo Frederico < > [email protected]> wrote: > > James, > > From my days at the university I remember reinforcement learning ( > https://en.wikipedia.org/wiki/Reinforcement_learning ) > I suspect reinforcement learning is interesting to explore in the problem > of e-commerce recommendation. My academic stuff is really rusted, but it's > one of the few models that represent well the synchronous/asynchronous > problem that we see in e-commerce systems... > The models I'm seeing with Mahout + Solr (by MapR et alli) have Solr do > the work to calculate the co-occurrence indicators. So the fact Solr is > indexing this 'from scratch' during offline learning 'throws the whole > model into the garbage soon' and doesn't leave room for the > optimization/reward step of reinforcement learning. I don't know if someone > could go on the theoretical side and tell us if perhaps there's a 'mapping' > between the reinforcement learning model and the traditional off-line > training + on-line testing. Maybe there's a way to shorten the Solr > indexing cycle, but I'm not sure how to 'inject' the reward in the model... > just some thoughts... > > cheers > > Gustavo > > > > On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <[email protected]> > wrote: > > > Hi, > > > > First of all, a big thanks to Ted and Pat, and all the authors and > > developers around Mahout. > > > > I'm putting together an eCommerce recommendation framework, and have a > > couple of questions from using the latest tools in Mahout 1.0. > > > > I've seen it hinted by Pat that real-time updates (incremental learning) > > are made possible with the latest Mahout tools here: > > > > > > > http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ > > > > But once I have gone through the first phase of data processing, I'm not > > clear on the basic direction for maintaining the generated data, e.g with > > added products and incremental user behaviour data. > > > > The only way I can see is to update my input data, then re-run the > entire > > process of generating the similarity matrices using the itemSimilarity > and > > rowSImilarity jobs. Is there a better way? > > > > James > > > >
