Re: Problems with Mahout's RecommenderIRStatsEvaluator

Osman Başkaya Sun, 17 Feb 2013 03:57:37 -0800

I am sorry to extend the unsupervised/supervised discussion which is not
the main question here but I need to ask.

Sean, I don't understand your last answer. Let's assume our rating scale is
from 1 to 5. We can say that those movies which a particular user rates as
5 are relevant for him/her. 5 is just a number, we can use *relevance
threshold *like you did and we can follow the method described in Cremonesi
et al. Performance of Recommender Algorithms on Top-N Recommendation
Tasks<http://goo.gl/pejO7>(
*2. Testing Methodology - p.2*).

Are you saying that this job is unsupervised since no user can rate all of
the movies. For this reason, we won't be sure that our predicted top-N list
contains no relevant item because it can be possible that our top-N
recommendation list has relevant movie(s) which hasn't rated by the user *
yet* as relevant. By using this evaluation procedure we miss them.

In short, The following assumption can be problematic:

We randomly select 1000 additional items unrated by
> user u. We may assume that most of them will not be
> of interest to user u.

Although bigger N values overcomes this problem mostly, still it does not
seem totally supervised.

On Sun, Feb 17, 2013 at 1:49 AM, Sean Owen <[email protected]> wrote:

> The very question at hand is how to label the data as "relevant" and "not
> relevant" results. The question exists because this is not given, which is
> why I would not call this a supervised problem. That may just be semantics,
> but the point I wanted to make is that the reasons choosing a random
> training set are correct for a supervised learning problem are not reasons
> to determine the labels randomly from among the given data. It is a good
> idea if you're doing, say, logistic regression. It's not the best way here.
> This also seems to reflect the difference between whatever you want to call
> this and your garden variety supervised learning problem.
>
> On Sat, Feb 16, 2013 at 11:15 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Sean
> >
> > I think it is still a supervised learning problem in that there is a
> > labelled training data set and an unlabeled test data set.
> >
> > Learning a ranking doesn't change the basic dichotomy between supervised
> > and unsupervised.  It just changes the desired figure of merit.
> >
>

-- 
Osman Başkaya
Koc University
MS Student | Computer Science and Engineering

Re: Problems with Mahout's RecommenderIRStatsEvaluator

Reply via email to