Hi,

I have looked at the internals of Mahout's RecommenderIRStatsEvaluator code. I 
think that there are two important problems here.

According to my understanding the experimental protocol used in this code is 
something like this:

It takes away a certain percentage of users as test users.
For
 each test user it builds a training set consisting of ratings given by 
all other users + the ratings of the test user which are below the 
relevanceThreshold. 
It then builds a model and makes a 
recommendation to the test user and finds the intersection between this 
recommendation list and the items which are rated above the 
relevanceThreshold by the test user. 
It then calculates the precision and recall in the usual way.

Probems:
1. (mild) It builds a model for every test user which can take a lot of time. 

2. (severe) Only the ratings (of the test user) which are below the 
relevanceThreshold are put into the training set. This means that the algorithm
only knows the preferences of the test user about the items which s/he don't 
like. This is not a good representation of user ratings.

Moreover when I run this evaluator on movielens 1m data, the precision and 
recall turned out to be, respectively,

0.011534185658699288
0.007905982905982885

and the run took about 13 minutes on my intel core i3. (I used user based 
recommendation with k=2)


Altgough I know that it is not ok to judge the performance of a recommendation 
algorithm by looking at these absolute precision and recall values, still these 
numbers seems to me too low which might be the result of the second problem I 
mentioned above.

Am I missing something?

Thanks
Ahmet

Reply via email to