James, This isn't an answer to your last question ...
You have an excellent summary there. The only thing that you may have missed is that using cooccurrence/search-based recommendations allows you to improve results precisely because it gets you out of the business of tweaking algorithms and into the business of determining which data works better for your particular situation. Algorithmic tweaks have very, very limited upside. Getting better data has >100% potential for improvement. It is very important to get your recs team out of the first low-value activity of tweaking algorithms and into the second high-value activity of collecting and evaluating data. On Mon, Jun 22, 2015 at 9:00 AM, James Donnelly <[email protected]> wrote: > Ted, thanks for the video - enjoyable and insightful. > > Gustavo, a good read, and a reminder of how far I have to go. More maths > later - fun! > > Pat, I need to read more and take my time understanding how cut-offs in LLR > derived co-occurence can be exploited in practice. I accept that useful > real-time model updates are an edge case, but I may have to face edge > cases. > > I mentioned the framework I'm putting together - I didn't mention that > we're a SAAS business. The product will serve multiple use cases. > > The cold start capabilities of the multi-modal approach are appealing. I > can see content recommendations filling the gap while we build the > user-item model - this won't work for all product types of course. > > There are clients whose 'products' are fairly short lived where the initial > burst of user-item interactions would definitely be useful. I take your > point that small increment sets might not impact the model in most cases. > > My take-out from the responses so far is that the real time question can > wait until phase n of the project without sacrificing much value. I'm > looking forward to learning what is possible - I see what you are saying > about the mutable vectors. > > The great thing now for me is that I can do an end to end proof of concept > mostly by doing framework plumbing. Maybe I'll look into doing multiple > cross-coocurrence indicators in once pass via the ItemSimilarityDriver, but > once we get the basics functioning, we'll probably be looking to engage a > Ted or a Pat if we can afford them :D > > There is one final challenge for today I have not figured out though. > Let's say I have a new client (client #2), who sells shoes. Let's say I > have an existing client (client #1), for whom we have captured a million > user-view/purchase interactions. How can I recommend to client #2 based on > the model built from client #1? > > The items in their respective inventories are similar by content, but not > identical. So I need to map the content similarities across the product > data sets, then via that mapping, apply pseudo-collaborative filtering to > client #2's customers. > > Thoughts? > > Many thanks for your time once again. > > > > On 22 June 2015 at 01:32, Pat Ferrel <[email protected]> wrote: > > > Actually Mahout’s item and row similarity calculate the cooccurrence and > > cross-cooccurrence matrices, a search engine preforms the knn calc to > > return an ordered list of recs. The search query is user history the > search > > engine calculates the most similar items from the cooccurrence matrix and > > cross-cooccurrence matrices by keeping them in different fields. This > means > > there is only one query across several matrices. Solr and Elasticsearch > are > > well know for speed and scalability in serving these queries. > > > > In a hypothetical incremental model we might use the search engine as > > matrix storage since an incremental update to the matrix would be indexed > > in realtime by the engine. The update method Ted mentions is relatively > > simple and only requires that the cooccurrence matrices be mutable and > two > > mutable vectors be kept in memory (item/column and user/row interaction > > counts). > > > > On Jun 19, 2015, at 6:47 PM, Gustavo Frederico < > > [email protected]> wrote: > > > > James, > > > > From my days at the university I remember reinforcement learning ( > > https://en.wikipedia.org/wiki/Reinforcement_learning ) > > I suspect reinforcement learning is interesting to explore in the > problem > > of e-commerce recommendation. My academic stuff is really rusted, but > it's > > one of the few models that represent well the synchronous/asynchronous > > problem that we see in e-commerce systems... > > The models I'm seeing with Mahout + Solr (by MapR et alli) have Solr do > > the work to calculate the co-occurrence indicators. So the fact Solr is > > indexing this 'from scratch' during offline learning 'throws the whole > > model into the garbage soon' and doesn't leave room for the > > optimization/reward step of reinforcement learning. I don't know if > someone > > could go on the theoretical side and tell us if perhaps there's a > 'mapping' > > between the reinforcement learning model and the traditional off-line > > training + on-line testing. Maybe there's a way to shorten the Solr > > indexing cycle, but I'm not sure how to 'inject' the reward in the > model... > > just some thoughts... > > > > cheers > > > > Gustavo > > > > > > > > On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly < > [email protected]> > > wrote: > > > > > Hi, > > > > > > First of all, a big thanks to Ted and Pat, and all the authors and > > > developers around Mahout. > > > > > > I'm putting together an eCommerce recommendation framework, and have a > > > couple of questions from using the latest tools in Mahout 1.0. > > > > > > I've seen it hinted by Pat that real-time updates (incremental > learning) > > > are made possible with the latest Mahout tools here: > > > > > > > > > > > > http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ > > > > > > But once I have gone through the first phase of data processing, I'm > not > > > clear on the basic direction for maintaining the generated data, e.g > with > > > added products and incremental user behaviour data. > > > > > > The only way I can see is to update my input data, then re-run the > > entire > > > process of generating the similarity matrices using the itemSimilarity > > and > > > rowSImilarity jobs. Is there a better way? > > > > > > James > > > > > > > >
