Re: New Mahout Recommender Service

Pat Ferrel Wed, 10 Sep 2014 10:00:16 -0700

Firstly I can’t imagine reimplementing anything in Solr or Eleasticsearch this 
would integrate with them unmodified just as Mahout integrates with databases. 
In fact I see them as similarity based key-value DBs.

Partly I’m reacting to the fact that if mrlegacy were deprecated at some time 
in the future, Mahout would not have a recommender. And yet it is, from looking 
at Stackoverflow and the archive of this list, one of the most used Mahout 
components. 

Partly I’d like to properly implement in one place all of the innovations that 
we have been talking about around cooccurrence recommenders, namely:
* blending collaborative filtering, metadata, and content recommenders into one
* multiple actions to recommend the primary action making use of all available 
user behavior
* noise injected into recs ordering to widen the exposure of items and as a 
form of self-tuning (dithering)

This leaves a couple needed features to support those above:
* ingesting and data prep to suport the above features in near realtime (for 
new user recs and recency of action support)
* an API for these features that fits their use
* implemented as a server to allow virtually any client to use it

At first cut it is something you can point to user interactions and then can 
pass in a user ID to get back item IDs. At this point the only thing that does 
this in mahout is the in-memory recommender. To do this with new users and 
realtime gathered preferences we need to add a new thing to the system, 
streaming data prep, probably with Spark streaming, though this is a discussion 
all by itself.

This does something like what Solr and Elasticsearch do for Lucene. We take 
several of Mahout’s algos and a search server directly without modification and 
implement only what is missing to produce a modern recommender with major 
innovations.

Is this something useful to users?

On Sep 9, 2014, at 5:38 PM, Ted Dunning <[email protected]> wrote:

What would the advantage be of a dedicated server over a search engine like
solr or elastic search?

It seems that you would be replicating much of the effort just to build a
server that does nearly the same thing.

On Tue, Sep 9, 2014 at 5:32 PM, Peng Zhang <[email protected]> wrote:

> Using this list to discuss is very convenient to stay tuned, so no
> objection.
> 
> Peng Zhang
> 
> --
> Sent from my iPhone
> 
>> On Sep 10, 2014, at 12:16 AM, Pat Ferrel <[email protected]> wrote:
>> 
>> No Jira yet. There are too many moving parts and we’d have to see if
> it’s appropriate for Mahout inclusion or as an “example” project. It would
> be great to include but we’ll have to see what others think as it takes
> better form. All components should be Apache license compatible though.
>> 
>> I’ll start a Github project. Does anyone object to using this list for
> discussion?
>> 
>> On Sep 9, 2014, at 8:46 AM, Saikat Kanjilal <[email protected]> wrote:
>> 
>> @Pat Any interest in using http://vertx.io instead of play, have heard
> some really good perf stats around this
>> 
>> We should really start a jira with a list of use cases and then back
> into a tech stack and outline the design in jira, thoughts ?
>> 
>> Sent from my iPhone
>> 
>>> On Sep 9, 2014, at 8:44 AM, "Martin, Nick" <[email protected]> wrote:
>>> 
>>> Would absolutely love an ES integration.
>>> 
>>> -----Original Message-----
>>> From: Pat Ferrel [mailto:[email protected]]
>>> Sent: Tuesday, September 09, 2014 10:29 AM
>>> To: [email protected]
>>> Subject: New Mahout Recommender Service
>>> 
>>> Now that we have the basis of several significant improvements to
> Mahout's recommender it seems like we need to go the last step and provide
> a service. Without this it is left to the user to do a lot of integration
> making the current next gen somewhat incomplete.
>>> 
>>> Using the Hadoop mapreduce code you can get all recs for all people
> using collaborative filtering data or you can use the in-memory single
> machine recommender if you have a small dataset.
>>> 
>>> The next generation would require Solr or Elasticsearch so why not go
> the extra step and provide a recommender API on top? At very least it would
> give users a single machine API they can call, analogous to the in-memory
> recommender of Mahout 0.9. But it would also be indefinitely scalable.
>>> 
>>> Is anyone interested in discussing this here?
>> 
>

Re: New Mahout Recommender Service

Reply via email to