I only have about a million docs right now so scaling is not a big issue.
I'm looking to provide a quick implementation and then worry about scale
when I get around to implementing a more robust recommender. I'm looking at
a content based approach because we are not tracking users and items viewed
by users. I was thinking of using morelikethis like walter mentioned, but
wanted some feedback on the nuances required for a proper implementation
like having a similarity based on euclidean distance, normalizing numerical
field values and computing collection wide stats like mean and variance.
Thank you for the link Otis, I will watch it right away.


On Fri, Jun 28, 2013 at 1:12 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> It doesn't have to be one or the other.  In the past I've built a news
> recommender engine based on CF (Mahout) and combined it with Content
> Similarity-based engine (wasn't Solr/Lucene, but something custom that
> worked with ngrams, but it may have as well been Lucene/Solr/ES).  It
> worked well.  If you haven't worked with Mahout before I'd suggest the
> approach in that video and going from there to Mahout only if it's
> limiting.
>
> See Ted's stuff on this topic, too:
> http://www.slideshare.net/tdunning/search-as-recommendation +
> http://berlinbuzzwords.de/sessions/multi-modal-recommendation-algorithms
> (note: Mahout, Solr, Pig)
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 2:07 PM, Saikat Kanjilal <sxk1...@hotmail.com>
> wrote:
> > You could build a custom recommender in mahout to accomplish this, also
> just out of curiosity why the content based approach as opposed to building
> a recommender based on co-occurence.  One other thing, what is your data
> size, are you looking at scale where you need something like hadoop?
> >
> >> From: lcguerreroc...@gmail.com
> >> Date: Fri, 28 Jun 2013 13:02:00 -0500
> >> Subject: Re: Content based recommender using lucene/solr
> >> To: solr-user@lucene.apache.org
> >> CC: java-u...@lucene.apache.org
> >>
> >> Hey saikat, thanks for your suggestion. I've looked into mahout and
> other
> >> alternatives for computing k nearest neighbors. I would have to run a
> job
> >> and computer the k nearest neighbors and track them in the index for
> >> retrieval. I wanted to see if this was something I could do with lucene
> >> using lucene's scoring function and solr's morelikethis component. The
> job
> >> you specifically mention is for Item based recommendation which would
> >> require me to track the different items users have viewed. I'm looking
> for
> >> a content based approach where I would use a distance measure to
> establish
> >> how near items are (how similar) and have some kind of training phase to
> >> adjust weights.
> >>
> >>
> >> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1...@hotmail.com
> >wrote:
> >>
> >> > Why not just use mahout to do this, there is an item similarity
> algorithm
> >> > in mahout that does exactly this :)
> >> >
> >> >
> >> >
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
> >> >
> >> > You can use mahout in distributed and non-distributed mode as well.
> >> >
> >> > > From: lcguerreroc...@gmail.com
> >> > > Date: Fri, 28 Jun 2013 12:16:57 -0500
> >> > > Subject: Content based recommender using lucene/solr
> >> > > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
> >> > >
> >> > > Hi,
> >> > >
> >> > > I'm using lucene and solr right now in a production environment
> with an
> >> > > index of about a million docs. I'm working on a recommender that
> >> > basically
> >> > > would list the n most similar items to the user based on the
> current item
> >> > > he is viewing.
> >> > >
> >> > > I've been thinking of using solr/lucene since I already have all
> docs
> >> > > available and I want a quick version that can be deployed while we
> work
> >> > on
> >> > > a more robust recommender. How about overriding the default
> similarity so
> >> > > that it scores documents based on the euclidean distance of
> normalized
> >> > item
> >> > > attributes and then using a morelikethis component to pass in the
> >> > > attributes of the item for which I want to generate
> recommendations? I
> >> > know
> >> > > it has its issues like recomputing scores/normalization/weight
> >> > application
> >> > > at query time which could make this idea unfeasible/impractical.
> I'm at a
> >> > > very preliminary stage right now with this and would love some
> >> > suggestions
> >> > > from experienced users.
> >> > >
> >> > > thank you,
> >> > >
> >> > > Luis Guerrero
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Luis Carlos Guerrero Covo
> >> M.S. Computer Engineering
> >> (57) 3183542047
> >
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047

Reply via email to