Thanks Sean, your suggestions and the links provided are just what I needed to start off with.
On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen <[email protected]> wrote: > I think you're assuming that you will pre-compute recommendations and > store them in Mongo. That's one way to go, with certain tradeoffs. You > can precompute offline easily, and serve results at large scale > easily, but, you are forced to precompute everything -- lots of wasted > effort, not completely up to date. > > The front-end part of the stack looks right. > > Spark would do the model building; you'd have to write a process to > score recommendations and store the result. Mahout is the same thing, > really. > > 500K items isn't all that large. Your requirements aren't driven just > by items though. Number of users and latent features matter too. It > matters how often you want to build the model too. I'm guessing you > would get away with a handful of modern machines for a problem this > size. > > > In a way what you describe reminds me of Wibidata, since it built > recommender-like solutions on top of data and results published to a > NoSQL store. You might glance at the related OSS project Kiji > (http://kiji.org/) for ideas about how to manage the schema. > > You should have a look at things like Nick's architecture for > Graphflow, however it's more concerned with computing recommendation > on the fly, and describes a shift from an architecture originally > built around something like a NoSQL store: > > http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf > > This is also the kind of ground the oryx project is intended to cover, > something I've worked on personally: > https://github.com/OryxProject/oryx -- a layer on and around the > core model building in Spark + Spark Streaming to provide a whole > recommender (for example), down to the REST API. > > On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao > <[email protected]> wrote: > > Hi, > > > > Can anyone who has developed recommendation engine suggest what could be > the > > possible software stack for such an application. > > > > I am basically new to recommendation engine , I just found out Mahout and > > Spark Mlib which are available . > > I am thinking the below software stack. > > > > 1. The user is going to use Android app. > > 2. Rest Api sent to app server from the android app to get > recommendations. > > 3. Spark Mlib core engine for recommendation engine > > 4. MongoDB database backend. > > > > I would like to know more on the cluster configuration( how many nodes > etc) > > part of spark for calculating the recommendations for 500,000 items. This > > items include products for day care etc. > > > > Other software stack suggestions would also be very useful.It has to run > on > > multiple vendor machines. > > > > Please suggest. > > > > Thanks > > shashi >
