Slope one is one of the few algorithms that performs so uniformly poorly that it is being removed from Mahout. I wouldn't recommend it for any applications.
In general, recommendation isn't particularly well suited for on-line operation since there is part of the computation that substantially benefits from a large scale computation and part that really must be interactive, but is best suited to a service oriented architecture rather than a record streaming architecture. There are two general branches of recommendation implementations. One is based on matrix factorization methods. The other is based on cooccurrence analysis and is focused on sparsification of the cooccurrence matrix to produce an indicator matrix. Lately, I strongly recommend that almost all small or new projects use the cooccurrence based methods since they can be deployed using search engines. More on that in a bit. For the large scale computation, matrix factorization techniques have some on-line approximations, but the benefit in terms of accuracy and cost in terms of resources of simply redoing the entire computation at roughly one day tempo generally makes doing just that preferable. Especially with multi-modal recommendations, there is little cost for cold-starts. The interactive response part could easily be implemented in Storm, but there is little benefit over an in-memory implementation based on well established servers such as netty since there is no significant sense of function composition in this part of the computation. For more discrete algorithms for cooccurrence analysis such as the indicator-matrix techniques, you could definitely accumulate the cooccurrence statistics in an on-line fashion, but I am not sure I see significant benefit. One of the key performance features in such algorithms is adaptive down-sampling. On-line variants of that would not easily be able to have flexibility in the choice of sampling since they would almost inevitably be biased towards early samples. It could be done, but the off-line approaches are a really excellent match for map-reduce and performance is pretty good. The interactive component for the indicator matrix form of recommendation algorithms is almost identical to what a text search engine already does. This makes it very desirable to simply deploy the recommendation model as a search index. Operationally, this has massive benefits since much of the necessary business logic is already provided by the capabilities of common search engines such as Solr or Elastic Search. Moreover, the very considerable operational history of these solutions make it desirable to use them without any additional coding. The research literature suggests that discrete algorithms may produce slightly inferior results than the matrix factorization results. Whether this is true or not at realistic scale is not at all clear, since all of the published research is done at relatively small scale. My own experience is that the discrete implementations massively out-perform the matrix factorization approaches in practical settings simply because they are so simple to implement that they free up resources to go about the work of finding more and better data for the engine to make use of. Finding better interaction data can result in improvements of 200-500% while tweaking algorithms rarely results in improvements of more than 10% even at small scales. Diverting development resources away from the high value work is generally disastrously bad for performance unless you have a really enormous team. For other systems which are very much like recommendations such as ad targeting, it is often worthwhile to models specifically for each ad that make use of content, context and user characteristics as well as interactions of these. For that work, on-line algorithms are very much worthwhile. The AdPredictor system paper [1] is a great intro to that area. I have also collected other related references to do with the general field of Bayesian approaches to the multi-armed bandit [2], [3]. You can also use these bandits for adaptation of ranking, though I doubt that parallelism is useful for this since the computation is so simple [4]. To summarize, - yes, you can implement algorithms like slope-one in an online framework, but I wouldn't recommend wasting your time - yes, you can implement approximations of matrix factorization in on-line form. - no, that probably isn't worthwhile - yes, you can build on-line versions of cooccurrence counters and analyzers - no, that probably isn't worthwhile - no, it probably isn't a great idea to use Storm for the interactive computation of recommendations. - yes, there are other situations where on-line update of recommendation models is worthwhile. Storm might play well there. Here are the links: [1] http://research.microsoft.com/apps/pubs/default.aspx?id=122779 [2] http://tdunning.blogspot.com/2012/02/bayesian-bandits.html [3] http://tdunning.blogspot.com/2012/10/references-for-on-line-algorithms.html [4] http://tdunning.blogspot.com/2013/04/learning-to-rank-in-very-bayesian-way.html On Sat, Jan 11, 2014 at 5:19 AM, Rafik NACCACHE <[email protected]>wrote: > Thanks Klausen, > > It is not possible to use slope One ? As far as I know, it is adapted for > online recommending stuff, > > Any Thoughts about that ? > > Thanks, > > Regards > > > 2014/1/11 Klausen Schaefersinho <[email protected]> > >> Hi, >> >> to the best of my knowledge there is no publicly available Recommeder >> Engine for Storm. You could try to integrate some java based RS systems >> like taste (also used in Hadoop Mahout). However the classic (user-item abd >> item.item) algorithms do not work well in a streaming architecture as you >> would have to update some distance matrix for every event you observe. This >> might be very expensive as the matrices might get quite big and you have to >> share a constant state over the entire set of worker bolts. >> >> Cheers, >> >> Klausen >> >> >> On Sat, Jan 11, 2014 at 11:05 AM, Rafik NACCACHE < >> [email protected]> wrote: >> >>> Hi All, >>> >>> It is not probably the best place to ask it, >>> >>> But does anyone mind sharing pointers to any recommender systems >>> implemented on top of storm ? >>> >>> Sure there is trident-ML, but I did not see any collaborative filtering >>> methods... >>> >>> Thank you for your advice guys, >>> >>> Regards >>> >> >> >
