The SGD classification system can actually be used to learn these
combinations.

On Fri, Dec 31, 2010 at 4:17 PM, Lance Norskog <[email protected]> wrote:

> Production recommendation systems use several algorythms and combine
> them with weights. This is called 'stacking'. You might wish to write
> a stacking version of Recommender.
>
> The stacker could transition from depending on one recommender to
> depending on another recommender. It could adjust weights based on the
> size of the data model or other things.
>
> On Fri, Dec 31, 2010 at 12:22 AM, Sebastian Schelter <[email protected]>
> wrote:
> > There are two places in the code that make implementing content-based
> > recommendation with a custom ItemSimilarity very difficult. I ran into
> > these unknowingly some time ago.
> >
> > AFAIK, the main purpose of using a content-based strategy would be to
> > handle the "cold-start" problem where no ratings exist for a new item
> > and a CF based approach cannot make any predictions.
> >
> > This will unfortunately not work by only implementing a custom
> > ItemSimilarity, because before the ItemSimilarity implementation is
> > used, a set of candidate items has to be found in the DataModel. In our
> > default implementation all items that co-occurr with one of the users
> > preferred items are selected. If we have an item that has not been rated
> > yet, we will run into a NoSuchItemException here.
> >
> > So a custom CandidateItemsStrategy will be necessary to make this work.
> >
> > The situation is even worse when the most-similar-items need to be
> > computed, in GenericItemBasedRecommender.doMostSimilarItems(...) only
> > co-occurring items are selected too, but we did not implement an
> > exchangable strategy so this behavior cannot be customized currently.
> >
> > I would suggest to create a similar construct like
> > CandidateItemsStrategy for most-similar-items too, any objections to
> that?
> >
> > --sebastian
> >
> >
> >
> > Am 30.12.2010 21:54, schrieb Sean Owen:
> >> You're on the right track. No I don't think the IDRescorer hurts. On
> >> the contrary it will save you from computing scores for movies that
> >> are not recommendable.
> >>
> >> It's hard to say what the 'right' content-based similarity metric is,
> >> as it will depend a lot on what data you have as input. You don't have
> >> much side information to go in here; it's possible that being from the
> >> same genre (or by the same director, etc.) is of little or no
> >> predictive value no matter what you apply to this data. Still, seems
> >> like you may need such a metric as a fall-back for the case of new
> >> movies where there is no rating-based metric available.
> >>
> >> You could hack up the code a little bit to do something like this: if
> >> too few similar items are found with the similarity metric, then
> >> compute similarities using the alternative content-based metric and
> >> proceed that way. It's a bit of a hack, and inelegant, but, may work
> >> well for you practice.
> >>
> >> Slope-one isn't based on item-item similarity so no I don't think the
> >> notion of content-based similarity applies. It comes up in item-based
> >> recommenders only.
> >>
> >>
> >> On Thu, Dec 30, 2010 at 11:42 AM, Vasil Vangelovski
> >> <[email protected]> wrote:
> >>> Hi
> >>>
> >>> I started diving into mahout a few days ago. I've a basic understanding
> of
> >>> the machine learning concepts behind it, however I'm not all too
> familiar
> >>> with mahout beyond the first 6 chapters of "Mahout in action".
> >>>
> >>> I'm looking to implement the following kind of a recommendation engine
> (it's
> >>> not about movies but it's easiest to explain in this manner):
> >>>
> >>> Let's say I've the Movie Lens dataset. Complete with ratings, genres
> etc.
> >>> I'd want a recommender that would recommend only from a list of movies
> that
> >>> are showing in cinemas right now. That would be a list of 10-20 movies
> out
> >>> of  5000 for which there are ratings in the dataset.
> >>>
> >>> Given these are relatively new movies there will be a relatively low
> number
> >>> of ratings for them. So I guess I'd have to rely on content-based
> >>> recommendation of some kind.
> >>>
> >>> The first question is how would it affect performance if I use
> IDRescorer
> >>> for the purpose of just displaying an ordered list of recommendations
> in the
> >>> set of available movies (by implementing isFiltered, where the result
> would
> >>> be false most of the time 10/5000)?
> >>>
> >>> I know the simple way to implement content based CF would be to
> implement my
> >>> own ItemSimilarity based on ratings + movie genre information. However
> in
> >>> the case of the MovieLens dataset if I combine say pearson correlation
> for
> >>> ratings + tanimoto coefficient for genres (or whatever combination
> makes
> >>> sense here) it degrades performance (score) slightly for that dataset
> >>> compared to using pearson alone. Should I ditch this method just
> because of
> >>> this reason?
> >>>
> >>> Further what would be other ways to implement content-based data in
> order to
> >>> improve a recommender for the described use case? Is there a
> straightforward
> >>> way to integrate content-based knowledge into a slope-one recommender?
> >>>
> >>> Thanks
> >>>
> >
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to