This is too much code to ask people to debug in detail, but I get the
gist of it.


I am guess that this is happening: the 2 War movies were rated 5.0,
and were only tagged War. This means that any other movie tagged only
War is estimated to be 5.0, given this similarity definition. And then
that's hard for anything else to beat.


You could say the problem is that a simple weighted average doesn't
account for the number of items that were averaged. An average of 5.0
over 1-2 items is far less meaningful than an average of 5.0 over 100
items. This isn't normally much of an issue when users have rated a
decent number of items, and when items have nonzero similarity to most
or all others. Here, most item-item pairs have no similarity.

You can change the estimation to account for "certainty" in some way.
For example, you could divide the estimate by the weighted standard
deviation of that series that was averaged to make the estimate. The
result is no longer an estimate of a rating, but is probably going to
give much more sane results.


While it wouldn't really solve the problem by itself -- I would also
recommend you change the similarity to be a simple Jaccard coefficient
computed from genres. Just the intersection size divided by union
size. You're doing something like that already, it's just the logical
conclusion.



On Fri, Jul 13, 2012 at 2:01 PM, a a <[email protected]> wrote:
> Hello,
>
> I am trying to implement an item based recommender with a custom 
> ItemSimilarity.
> I've used the movielens data for the test and the item similarity uses the 
> movie genre to create the similarity value.
>
> I've followed the advice in the book and wrote a very simple app to see it in 
> action.
>
> When I run the code, the results that I get back do not make a lot of sense.
>
> For ex., below are the recommendations I get for user 1 is :
> 1450 : 5.0 ->'1450 Prisoner of the Mountains, 1996, War'
> 1289 : 5.0 ->'1289 Koyaanisqatsi, 1983, Documentary War'
> 760 : 5.0 ->'760 Stalingrad, 1993, War'
> 632 : 5.0 ->'632 Land and Freedom, 1995, War'
> 665 : 5.0 ->'665 Underground, 1995, War'
> Movies 
> Wacthed:53>Crime:2,Adventure:5,Action:5,War:2,Fantasy:3,Romance:6,Animation:39,Children's:20,Sci-Fi:3,Musical:14,Comedy:14,Thriller:3
>
> In the last line of the log we see that, the user has watched a lot of movies 
> with genre Animation,Children's,Musical yet the reocmmendations are all from 
> the genre War.
> I've repeated the test for many different users, and all the recommendations 
> that I got were out of line with the user history.
>
> Can anyone tell me what I'm doing wrong?

Reply via email to