Re: Recommender Engines on top of Storm

Rafik NACCACHE Sat, 11 Jan 2014 12:31:29 -0800

Thank you Ted,

Though I did not get all the points, I get it that streaming records won't
be worth the hassle as far as recommendations are concerned,


Meanwhile, you rung a bell when you talked about elastic Search. I might
have an idea how to use that, but that would be content based, and I need
something collaborative for my use case...

Thank you once again, that was a very valuable response from you.

Regards


2014/1/11 Ted Dunning <[email protected]>

>
> Slope one is one of the few algorithms that performs so uniformly poorly
> that it is being removed from Mahout.  I wouldn't recommend it for any
> applications.
>
> In general, recommendation isn't particularly well suited for on-line
> operation since there is part of the computation that substantially
> benefits from a large scale computation and part that really must be
> interactive, but is best suited to a service oriented architecture rather
> than a record streaming architecture.
>
> There are two general branches of recommendation implementations.  One is
> based on matrix factorization methods.  The other is based on cooccurrence
> analysis and is focused on sparsification of the cooccurrence matrix to
> produce an indicator matrix.  Lately, I strongly recommend that almost all
> small or new projects use the cooccurrence based methods since they can be
> deployed using search engines.  More on that in a bit.
>
> For the large scale computation, matrix factorization techniques have some
> on-line approximations, but the benefit in terms of accuracy and cost in
> terms of resources of simply redoing the entire computation at roughly one
> day tempo generally makes doing just that preferable.  Especially with
> multi-modal recommendations, there is little cost for cold-starts.  The
> interactive response part could easily be implemented in Storm, but there
> is little benefit over an in-memory implementation based on well
> established servers such as netty since there is no significant sense of
> function composition in this part of the computation.
>
> For more discrete algorithms for cooccurrence analysis such as the
> indicator-matrix techniques, you could definitely accumulate the
> cooccurrence statistics in an on-line fashion, but I am not sure I see
> significant benefit.  One of the key performance features in such
> algorithms is adaptive down-sampling.  On-line variants of that would not
> easily be able to have flexibility in the choice of sampling since they
> would almost inevitably be biased towards early samples.  It could be done,
> but the off-line approaches are a really excellent match for map-reduce and
> performance is pretty good.
>
> The interactive component for the indicator matrix form of recommendation
> algorithms is almost identical to what a text search engine already does.
>  This makes it very desirable to simply deploy the recommendation model as
> a search index.  Operationally, this has massive benefits since much of the
> necessary business logic is already provided by the capabilities of common
> search engines such as Solr or Elastic Search.  Moreover, the very
> considerable operational history of these solutions make it desirable to
> use them without any additional coding.
>
> The research literature suggests that discrete algorithms may produce
> slightly inferior results than the matrix factorization results.  Whether
> this is true or not at realistic scale is not at all clear, since all of
> the published research is done at relatively small scale.
>
> My own experience is that the discrete implementations massively
> out-perform the matrix factorization approaches in practical settings
> simply because they are so simple to implement that they free up resources
> to go about the work of finding more and better data for the engine to make
> use of.  Finding better interaction data can result in improvements of
> 200-500% while tweaking algorithms rarely results in improvements of more
> than 10% even at small scales.  Diverting development resources away from
> the high value work is generally disastrously bad for performance unless
> you have a really enormous team.
>
> For other systems which are very much like recommendations such as ad
> targeting, it is often worthwhile to models specifically for each ad that
> make use of content, context and user characteristics as well as
> interactions of these.  For that work, on-line algorithms are very much
> worthwhile.  The AdPredictor system paper [1] is a great intro to that
> area.  I have also collected other related references to do with the
> general field of Bayesian approaches to the multi-armed bandit [2], [3].
>  You can also use these bandits for adaptation of ranking, though I doubt
> that parallelism is useful for this since the computation is so simple [4].
>
> To summarize,
>
> - yes, you can implement algorithms like slope-one in an online framework,
> but I wouldn't recommend wasting your time
>
> - yes, you can implement approximations of matrix factorization in on-line
> form.
>
> - no, that probably isn't worthwhile
>
> - yes, you can build on-line versions of cooccurrence counters and
> analyzers
>
> - no, that probably isn't worthwhile
>
> - no, it probably isn't a great idea to use Storm for the interactive
> computation of recommendations.
>
> - yes, there are other situations where on-line update of recommendation
> models is worthwhile.  Storm might play well there.
>
> Here are the links:
>
> [1] http://research.microsoft.com/apps/pubs/default.aspx?id=122779
> [2] http://tdunning.blogspot.com/2012/02/bayesian-bandits.html
> [3]
> http://tdunning.blogspot.com/2012/10/references-for-on-line-algorithms.html
> [4]
> http://tdunning.blogspot.com/2013/04/learning-to-rank-in-very-bayesian-way.html
>
>
>
> On Sat, Jan 11, 2014 at 5:19 AM, Rafik NACCACHE 
> <[email protected]>wrote:
>
>> Thanks Klausen,
>>
>> It is not possible to use slope One ? As far as I know, it is adapted for
>> online recommending stuff,
>>
>> Any Thoughts about that ?
>>
>> Thanks,
>>
>> Regards
>>
>>
>> 2014/1/11 Klausen Schaefersinho <[email protected]>
>>
>>> Hi,
>>>
>>> to the best of my knowledge there is no publicly available Recommeder
>>> Engine for Storm. You could try to integrate some java based RS systems
>>> like taste (also used in Hadoop Mahout). However the classic (user-item abd
>>> item.item) algorithms do not work well in a streaming architecture as you
>>> would have to update some distance matrix for every event you observe. This
>>> might be very expensive as the matrices might get quite big and you have to
>>> share a constant state over the entire set of worker bolts.
>>>
>>> Cheers,
>>>
>>> Klausen
>>>
>>>
>>> On Sat, Jan 11, 2014 at 11:05 AM, Rafik NACCACHE <
>>> [email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> It is not probably the best place to ask it,
>>>>
>>>> But does anyone mind sharing pointers to any recommender systems
>>>> implemented on top of storm ?
>>>>
>>>> Sure there is trident-ML, but I did not see any collaborative filtering
>>>> methods...
>>>>
>>>> Thank you for your advice guys,
>>>>
>>>> Regards
>>>>
>>>
>>>
>>
>

Re: Recommender Engines on top of Storm

Reply via email to