I only have about a million docs right now so scaling is not a big issue. I'm looking to provide a quick implementation and then worry about scale when I get around to implementing a more robust recommender. I'm looking at a content based approach because we are not tracking users and items viewed by users. I was thinking of using morelikethis like walter mentioned, but wanted some feedback on the nuances required for a proper implementation like having a similarity based on euclidean distance, normalizing numerical field values and computing collection wide stats like mean and variance. Thank you for the link Otis, I will watch it right away.
On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > It doesn't have to be one or the other. In the past I've built a news > recommender engine based on CF (Mahout) and combined it with Content > Similarity-based engine (wasn't Solr/Lucene, but something custom that > worked with ngrams, but it may have as well been Lucene/Solr/ES). It > worked well. If you haven't worked with Mahout before I'd suggest the > approach in that video and going from there to Mahout only if it's > limiting. > > See Ted's stuff on this topic, too: > http://www.slideshare.net/tdunning/search-as-recommendation + > http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms > (note: Mahout, Solr, Pig) > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal <sxk1...@hotmail.com> > wrote: > > You could build a custom recommender in mahout to accomplish this, also > just out of curiosity why the content based approach as opposed to building > a recommender based on co-occurence. One other thing, what is your data > size, are you looking at scale where you need something like hadoop? > > > >> From: lcguerreroc...@gmail.com > >> Date: Fri, 28 Jun 2013 13:02:00 -0500 > >> Subject: Re: Content based recommender using lucene/solr > >> To: solr-user@lucene.apache.org > >> CC: java-u...@lucene.apache.org > >> > >> Hey saikat, thanks for your suggestion. I've looked into mahout and > other > >> alternatives for computing k nearest neighbors. I would have to run a > job > >> and computer the k nearest neighbors and track them in the index for > >> retrieval. I wanted to see if this was something I could do with lucene > >> using lucene's scoring function and solr's morelikethis component. The > job > >> you specifically mention is for Item based recommendation which would > >> require me to track the different items users have viewed. I'm looking > for > >> a content based approach where I would use a distance measure to > establish > >> how near items are (how similar) and have some kind of training phase to > >> adjust weights. > >> > >> > >> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1...@hotmail.com > >wrote: > >> > >> > Why not just use mahout to do this, there is an item similarity > algorithm > >> > in mahout that does exactly this :) > >> > > >> > > >> > > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html > >> > > >> > You can use mahout in distributed and non-distributed mode as well. > >> > > >> > > From: lcguerreroc...@gmail.com > >> > > Date: Fri, 28 Jun 2013 12:16:57 -0500 > >> > > Subject: Content based recommender using lucene/solr > >> > > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org > >> > > > >> > > Hi, > >> > > > >> > > I'm using lucene and solr right now in a production environment > with an > >> > > index of about a million docs. I'm working on a recommender that > >> > basically > >> > > would list the n most similar items to the user based on the > current item > >> > > he is viewing. > >> > > > >> > > I've been thinking of using solr/lucene since I already have all > docs > >> > > available and I want a quick version that can be deployed while we > work > >> > on > >> > > a more robust recommender. How about overriding the default > similarity so > >> > > that it scores documents based on the euclidean distance of > normalized > >> > item > >> > > attributes and then using a morelikethis component to pass in the > >> > > attributes of the item for which I want to generate > recommendations? I > >> > know > >> > > it has its issues like recomputing scores/normalization/weight > >> > application > >> > > at query time which could make this idea unfeasible/impractical. > I'm at a > >> > > very preliminary stage right now with this and would love some > >> > suggestions > >> > > from experienced users. > >> > > > >> > > thank you, > >> > > > >> > > Luis Guerrero > >> > > >> > > >> > >> > >> > >> -- > >> Luis Carlos Guerrero Covo > >> M.S. Computer Engineering > >> (57) 3183542047 > > > -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047