Hi

I started diving into mahout a few days ago. I've a basic understanding of
the machine learning concepts behind it, however I'm not all too familiar
with mahout beyond the first 6 chapters of "Mahout in action".

I'm looking to implement the following kind of a recommendation engine (it's
not about movies but it's easiest to explain in this manner):

Let's say I've the Movie Lens dataset. Complete with ratings, genres etc.
I'd want a recommender that would recommend only from a list of movies that
are showing in cinemas right now. That would be a list of 10-20 movies out
of  5000 for which there are ratings in the dataset.

Given these are relatively new movies there will be a relatively low number
of ratings for them. So I guess I'd have to rely on content-based
recommendation of some kind.

The first question is how would it affect performance if I use IDRescorer
for the purpose of just displaying an ordered list of recommendations in the
set of available movies (by implementing isFiltered, where the result would
be false most of the time 10/5000)?

I know the simple way to implement content based CF would be to implement my
own ItemSimilarity based on ratings + movie genre information. However in
the case of the MovieLens dataset if I combine say pearson correlation for
ratings + tanimoto coefficient for genres (or whatever combination makes
sense here) it degrades performance (score) slightly for that dataset
compared to using pearson alone. Should I ditch this method just because of
this reason?

Further what would be other ways to implement content-based data in order to
improve a recommender for the described use case? Is there a straightforward
way to integrate content-based knowledge into a slope-one recommender?

Thanks

Reply via email to