I am sorry to extend the unsupervised/supervised discussion which is not the main question here but I need to ask.
Sean, I don't understand your last answer. Let's assume our rating scale is from 1 to 5. We can say that those movies which a particular user rates as 5 are relevant for him/her. 5 is just a number, we can use *relevance threshold *like you did and we can follow the method described in Cremonesi et al. Performance of Recommender Algorithms on Top-N Recommendation Tasks<http://goo.gl/pejO7>( *2. Testing Methodology - p.2*). Are you saying that this job is unsupervised since no user can rate all of the movies. For this reason, we won't be sure that our predicted top-N list contains no relevant item because it can be possible that our top-N recommendation list has relevant movie(s) which hasn't rated by the user * yet* as relevant. By using this evaluation procedure we miss them. In short, The following assumption can be problematic: We randomly select 1000 additional items unrated by > user u. We may assume that most of them will not be > of interest to user u. Although bigger N values overcomes this problem mostly, still it does not seem totally supervised. On Sun, Feb 17, 2013 at 1:49 AM, Sean Owen <[email protected]> wrote: > The very question at hand is how to label the data as "relevant" and "not > relevant" results. The question exists because this is not given, which is > why I would not call this a supervised problem. That may just be semantics, > but the point I wanted to make is that the reasons choosing a random > training set are correct for a supervised learning problem are not reasons > to determine the labels randomly from among the given data. It is a good > idea if you're doing, say, logistic regression. It's not the best way here. > This also seems to reflect the difference between whatever you want to call > this and your garden variety supervised learning problem. > > On Sat, Feb 16, 2013 at 11:15 PM, Ted Dunning <[email protected]> > wrote: > > > Sean > > > > I think it is still a supervised learning problem in that there is a > > labelled training data set and an unlabeled test data set. > > > > Learning a ranking doesn't change the basic dichotomy between supervised > > and unsupervised. It just changes the desired figure of merit. > > > -- Osman Başkaya Koc University MS Student | Computer Science and Engineering
