Hi Pat, This would be very, very useful to me. I've been looking into this recently, seeking what you describe and have hit exactly the complexities you mention. Would love to keep up with any such effort and see if there's any way I can contribute.
Phil On Wed, Sep 10, 2014 at 5:59 PM, Pat Ferrel <[email protected]> wrote: > Firstly I can’t imagine reimplementing anything in Solr or Eleasticsearch > this would integrate with them unmodified just as Mahout integrates with > databases. In fact I see them as similarity based key-value DBs. > > Partly I’m reacting to the fact that if mrlegacy were deprecated at some > time in the future, Mahout would not have a recommender. And yet it is, > from looking at Stackoverflow and the archive of this list, one of the most > used Mahout components. > > Partly I’d like to properly implement in one place all of the innovations > that we have been talking about around cooccurrence recommenders, namely: > * blending collaborative filtering, metadata, and content recommenders > into one > * multiple actions to recommend the primary action making use of all > available user behavior > * noise injected into recs ordering to widen the exposure of items and as > a form of self-tuning (dithering) > > This leaves a couple needed features to support those above: > * ingesting and data prep to suport the above features in near realtime > (for new user recs and recency of action support) > * an API for these features that fits their use > * implemented as a server to allow virtually any client to use it > > At first cut it is something you can point to user interactions and then > can pass in a user ID to get back item IDs. At this point the only thing > that does this in mahout is the in-memory recommender. To do this with new > users and realtime gathered preferences we need to add a new thing to the > system, streaming data prep, probably with Spark streaming, though this is > a discussion all by itself. > > This does something like what Solr and Elasticsearch do for Lucene. We > take several of Mahout’s algos and a search server directly without > modification and implement only what is missing to produce a modern > recommender with major innovations. > > Is this something useful to users? > > > On Sep 9, 2014, at 5:38 PM, Ted Dunning <[email protected]> wrote: > > What would the advantage be of a dedicated server over a search engine like > solr or elastic search? > > It seems that you would be replicating much of the effort just to build a > server that does nearly the same thing. > > > > On Tue, Sep 9, 2014 at 5:32 PM, Peng Zhang <[email protected]> wrote: > > > Using this list to discuss is very convenient to stay tuned, so no > > objection. > > > > Peng Zhang > > > > -- > > Sent from my iPhone > > > >> On Sep 10, 2014, at 12:16 AM, Pat Ferrel <[email protected]> wrote: > >> > >> No Jira yet. There are too many moving parts and we’d have to see if > > it’s appropriate for Mahout inclusion or as an “example” project. It > would > > be great to include but we’ll have to see what others think as it takes > > better form. All components should be Apache license compatible though. > >> > >> I’ll start a Github project. Does anyone object to using this list for > > discussion? > >> > >> On Sep 9, 2014, at 8:46 AM, Saikat Kanjilal <[email protected]> > wrote: > >> > >> @Pat Any interest in using http://vertx.io instead of play, have heard > > some really good perf stats around this > >> > >> We should really start a jira with a list of use cases and then back > > into a tech stack and outline the design in jira, thoughts ? > >> > >> Sent from my iPhone > >> > >>> On Sep 9, 2014, at 8:44 AM, "Martin, Nick" <[email protected]> wrote: > >>> > >>> Would absolutely love an ES integration. > >>> > >>> -----Original Message----- > >>> From: Pat Ferrel [mailto:[email protected]] > >>> Sent: Tuesday, September 09, 2014 10:29 AM > >>> To: [email protected] > >>> Subject: New Mahout Recommender Service > >>> > >>> Now that we have the basis of several significant improvements to > > Mahout's recommender it seems like we need to go the last step and > provide > > a service. Without this it is left to the user to do a lot of integration > > making the current next gen somewhat incomplete. > >>> > >>> Using the Hadoop mapreduce code you can get all recs for all people > > using collaborative filtering data or you can use the in-memory single > > machine recommender if you have a small dataset. > >>> > >>> The next generation would require Solr or Elasticsearch so why not go > > the extra step and provide a recommender API on top? At very least it > would > > give users a single machine API they can call, analogous to the in-memory > > recommender of Mahout 0.9. But it would also be indefinitely scalable. > >>> > >>> Is anyone interested in discussing this here? > >> > > > >
