Thank you Ted, Though I did not get all the points, I get it that streaming records won't be worth the hassle as far as recommendations are concerned,
Meanwhile, you rung a bell when you talked about elastic Search. I might have an idea how to use that, but that would be content based, and I need something collaborative for my use case... Thank you once again, that was a very valuable response from you. Regards 2014/1/11 Ted Dunning <[email protected]> > > Slope one is one of the few algorithms that performs so uniformly poorly > that it is being removed from Mahout. I wouldn't recommend it for any > applications. > > In general, recommendation isn't particularly well suited for on-line > operation since there is part of the computation that substantially > benefits from a large scale computation and part that really must be > interactive, but is best suited to a service oriented architecture rather > than a record streaming architecture. > > There are two general branches of recommendation implementations. One is > based on matrix factorization methods. The other is based on cooccurrence > analysis and is focused on sparsification of the cooccurrence matrix to > produce an indicator matrix. Lately, I strongly recommend that almost all > small or new projects use the cooccurrence based methods since they can be > deployed using search engines. More on that in a bit. > > For the large scale computation, matrix factorization techniques have some > on-line approximations, but the benefit in terms of accuracy and cost in > terms of resources of simply redoing the entire computation at roughly one > day tempo generally makes doing just that preferable. Especially with > multi-modal recommendations, there is little cost for cold-starts. The > interactive response part could easily be implemented in Storm, but there > is little benefit over an in-memory implementation based on well > established servers such as netty since there is no significant sense of > function composition in this part of the computation. > > For more discrete algorithms for cooccurrence analysis such as the > indicator-matrix techniques, you could definitely accumulate the > cooccurrence statistics in an on-line fashion, but I am not sure I see > significant benefit. One of the key performance features in such > algorithms is adaptive down-sampling. On-line variants of that would not > easily be able to have flexibility in the choice of sampling since they > would almost inevitably be biased towards early samples. It could be done, > but the off-line approaches are a really excellent match for map-reduce and > performance is pretty good. > > The interactive component for the indicator matrix form of recommendation > algorithms is almost identical to what a text search engine already does. > This makes it very desirable to simply deploy the recommendation model as > a search index. Operationally, this has massive benefits since much of the > necessary business logic is already provided by the capabilities of common > search engines such as Solr or Elastic Search. Moreover, the very > considerable operational history of these solutions make it desirable to > use them without any additional coding. > > The research literature suggests that discrete algorithms may produce > slightly inferior results than the matrix factorization results. Whether > this is true or not at realistic scale is not at all clear, since all of > the published research is done at relatively small scale. > > My own experience is that the discrete implementations massively > out-perform the matrix factorization approaches in practical settings > simply because they are so simple to implement that they free up resources > to go about the work of finding more and better data for the engine to make > use of. Finding better interaction data can result in improvements of > 200-500% while tweaking algorithms rarely results in improvements of more > than 10% even at small scales. Diverting development resources away from > the high value work is generally disastrously bad for performance unless > you have a really enormous team. > > For other systems which are very much like recommendations such as ad > targeting, it is often worthwhile to models specifically for each ad that > make use of content, context and user characteristics as well as > interactions of these. For that work, on-line algorithms are very much > worthwhile. The AdPredictor system paper [1] is a great intro to that > area. I have also collected other related references to do with the > general field of Bayesian approaches to the multi-armed bandit [2], [3]. > You can also use these bandits for adaptation of ranking, though I doubt > that parallelism is useful for this since the computation is so simple [4]. > > To summarize, > > - yes, you can implement algorithms like slope-one in an online framework, > but I wouldn't recommend wasting your time > > - yes, you can implement approximations of matrix factorization in on-line > form. > > - no, that probably isn't worthwhile > > - yes, you can build on-line versions of cooccurrence counters and > analyzers > > - no, that probably isn't worthwhile > > - no, it probably isn't a great idea to use Storm for the interactive > computation of recommendations. > > - yes, there are other situations where on-line update of recommendation > models is worthwhile. Storm might play well there. > > Here are the links: > > [1] http://research.microsoft.com/apps/pubs/default.aspx?id=122779 > [2] http://tdunning.blogspot.com/2012/02/bayesian-bandits.html > [3] > http://tdunning.blogspot.com/2012/10/references-for-on-line-algorithms.html > [4] > http://tdunning.blogspot.com/2013/04/learning-to-rank-in-very-bayesian-way.html > > > > On Sat, Jan 11, 2014 at 5:19 AM, Rafik NACCACHE > <[email protected]>wrote: > >> Thanks Klausen, >> >> It is not possible to use slope One ? As far as I know, it is adapted for >> online recommending stuff, >> >> Any Thoughts about that ? >> >> Thanks, >> >> Regards >> >> >> 2014/1/11 Klausen Schaefersinho <[email protected]> >> >>> Hi, >>> >>> to the best of my knowledge there is no publicly available Recommeder >>> Engine for Storm. You could try to integrate some java based RS systems >>> like taste (also used in Hadoop Mahout). However the classic (user-item abd >>> item.item) algorithms do not work well in a streaming architecture as you >>> would have to update some distance matrix for every event you observe. This >>> might be very expensive as the matrices might get quite big and you have to >>> share a constant state over the entire set of worker bolts. >>> >>> Cheers, >>> >>> Klausen >>> >>> >>> On Sat, Jan 11, 2014 at 11:05 AM, Rafik NACCACHE < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> It is not probably the best place to ask it, >>>> >>>> But does anyone mind sharing pointers to any recommender systems >>>> implemented on top of storm ? >>>> >>>> Sure there is trident-ML, but I did not see any collaborative filtering >>>> methods... >>>> >>>> Thank you for your advice guys, >>>> >>>> Regards >>>> >>> >>> >> >
