You're on the right track. No I don't think the IDRescorer hurts. On
the contrary it will save you from computing scores for movies that
are not recommendable.

It's hard to say what the 'right' content-based similarity metric is,
as it will depend a lot on what data you have as input. You don't have
much side information to go in here; it's possible that being from the
same genre (or by the same director, etc.) is of little or no
predictive value no matter what you apply to this data. Still, seems
like you may need such a metric as a fall-back for the case of new
movies where there is no rating-based metric available.

You could hack up the code a little bit to do something like this: if
too few similar items are found with the similarity metric, then
compute similarities using the alternative content-based metric and
proceed that way. It's a bit of a hack, and inelegant, but, may work
well for you practice.

Slope-one isn't based on item-item similarity so no I don't think the
notion of content-based similarity applies. It comes up in item-based
recommenders only.


On Thu, Dec 30, 2010 at 11:42 AM, Vasil Vangelovski
<[email protected]> wrote:
> Hi
>
> I started diving into mahout a few days ago. I've a basic understanding of
> the machine learning concepts behind it, however I'm not all too familiar
> with mahout beyond the first 6 chapters of "Mahout in action".
>
> I'm looking to implement the following kind of a recommendation engine (it's
> not about movies but it's easiest to explain in this manner):
>
> Let's say I've the Movie Lens dataset. Complete with ratings, genres etc.
> I'd want a recommender that would recommend only from a list of movies that
> are showing in cinemas right now. That would be a list of 10-20 movies out
> of  5000 for which there are ratings in the dataset.
>
> Given these are relatively new movies there will be a relatively low number
> of ratings for them. So I guess I'd have to rely on content-based
> recommendation of some kind.
>
> The first question is how would it affect performance if I use IDRescorer
> for the purpose of just displaying an ordered list of recommendations in the
> set of available movies (by implementing isFiltered, where the result would
> be false most of the time 10/5000)?
>
> I know the simple way to implement content based CF would be to implement my
> own ItemSimilarity based on ratings + movie genre information. However in
> the case of the MovieLens dataset if I combine say pearson correlation for
> ratings + tanimoto coefficient for genres (or whatever combination makes
> sense here) it degrades performance (score) slightly for that dataset
> compared to using pearson alone. Should I ditch this method just because of
> this reason?
>
> Further what would be other ways to implement content-based data in order to
> improve a recommender for the described use case? Is there a straightforward
> way to integrate content-based knowledge into a slope-one recommender?
>
> Thanks
>

Reply via email to