The SGD classification system can actually be used to learn these combinations.
On Fri, Dec 31, 2010 at 4:17 PM, Lance Norskog <[email protected]> wrote: > Production recommendation systems use several algorythms and combine > them with weights. This is called 'stacking'. You might wish to write > a stacking version of Recommender. > > The stacker could transition from depending on one recommender to > depending on another recommender. It could adjust weights based on the > size of the data model or other things. > > On Fri, Dec 31, 2010 at 12:22 AM, Sebastian Schelter <[email protected]> > wrote: > > There are two places in the code that make implementing content-based > > recommendation with a custom ItemSimilarity very difficult. I ran into > > these unknowingly some time ago. > > > > AFAIK, the main purpose of using a content-based strategy would be to > > handle the "cold-start" problem where no ratings exist for a new item > > and a CF based approach cannot make any predictions. > > > > This will unfortunately not work by only implementing a custom > > ItemSimilarity, because before the ItemSimilarity implementation is > > used, a set of candidate items has to be found in the DataModel. In our > > default implementation all items that co-occurr with one of the users > > preferred items are selected. If we have an item that has not been rated > > yet, we will run into a NoSuchItemException here. > > > > So a custom CandidateItemsStrategy will be necessary to make this work. > > > > The situation is even worse when the most-similar-items need to be > > computed, in GenericItemBasedRecommender.doMostSimilarItems(...) only > > co-occurring items are selected too, but we did not implement an > > exchangable strategy so this behavior cannot be customized currently. > > > > I would suggest to create a similar construct like > > CandidateItemsStrategy for most-similar-items too, any objections to > that? > > > > --sebastian > > > > > > > > Am 30.12.2010 21:54, schrieb Sean Owen: > >> You're on the right track. No I don't think the IDRescorer hurts. On > >> the contrary it will save you from computing scores for movies that > >> are not recommendable. > >> > >> It's hard to say what the 'right' content-based similarity metric is, > >> as it will depend a lot on what data you have as input. You don't have > >> much side information to go in here; it's possible that being from the > >> same genre (or by the same director, etc.) is of little or no > >> predictive value no matter what you apply to this data. Still, seems > >> like you may need such a metric as a fall-back for the case of new > >> movies where there is no rating-based metric available. > >> > >> You could hack up the code a little bit to do something like this: if > >> too few similar items are found with the similarity metric, then > >> compute similarities using the alternative content-based metric and > >> proceed that way. It's a bit of a hack, and inelegant, but, may work > >> well for you practice. > >> > >> Slope-one isn't based on item-item similarity so no I don't think the > >> notion of content-based similarity applies. It comes up in item-based > >> recommenders only. > >> > >> > >> On Thu, Dec 30, 2010 at 11:42 AM, Vasil Vangelovski > >> <[email protected]> wrote: > >>> Hi > >>> > >>> I started diving into mahout a few days ago. I've a basic understanding > of > >>> the machine learning concepts behind it, however I'm not all too > familiar > >>> with mahout beyond the first 6 chapters of "Mahout in action". > >>> > >>> I'm looking to implement the following kind of a recommendation engine > (it's > >>> not about movies but it's easiest to explain in this manner): > >>> > >>> Let's say I've the Movie Lens dataset. Complete with ratings, genres > etc. > >>> I'd want a recommender that would recommend only from a list of movies > that > >>> are showing in cinemas right now. That would be a list of 10-20 movies > out > >>> of 5000 for which there are ratings in the dataset. > >>> > >>> Given these are relatively new movies there will be a relatively low > number > >>> of ratings for them. So I guess I'd have to rely on content-based > >>> recommendation of some kind. > >>> > >>> The first question is how would it affect performance if I use > IDRescorer > >>> for the purpose of just displaying an ordered list of recommendations > in the > >>> set of available movies (by implementing isFiltered, where the result > would > >>> be false most of the time 10/5000)? > >>> > >>> I know the simple way to implement content based CF would be to > implement my > >>> own ItemSimilarity based on ratings + movie genre information. However > in > >>> the case of the MovieLens dataset if I combine say pearson correlation > for > >>> ratings + tanimoto coefficient for genres (or whatever combination > makes > >>> sense here) it degrades performance (score) slightly for that dataset > >>> compared to using pearson alone. Should I ditch this method just > because of > >>> this reason? > >>> > >>> Further what would be other ways to implement content-based data in > order to > >>> improve a recommender for the described use case? Is there a > straightforward > >>> way to integrate content-based knowledge into a slope-one recommender? > >>> > >>> Thanks > >>> > > > > > > > > -- > Lance Norskog > [email protected] >
