If I may guess at the answer -- Yes in theory it would be better to score output on the quality of its top recommendations, rather than on accuracy of predicted ratings, which are just one means to that goal. There are of course contexts where you have no ratings, so the winning technique here may not translate to those scenarios.
Perhaps output would be scored on what proportion of the top k match the real top k preferred items. And so the test would actually withhold the top k rated items and ask recommenders to predict them. This has two problems I can see, however. The small problem is that chopping off the top ratings makes the test data systematically different than real data. There's a lot of "information" in those top ratings versus any arbitrary k. The bigger problem is that the user's top k ratings are not necessarily the same as the best k recommendations! Let's say I've never seen the movie Breathless, but, if I do, I'll find it's actually my favorite movie ever. A recommender would be right in making this a top recommendation. But a recommender evaluation framework such as this contest might use can't know that, so would count that "wrong". Evaluating rating accuracy is at least unambiguous in comparison and so can form the basis of a competition. And to be fair, most people making production recommender systems would expect it to be able to estimate a rating, in addition to making recommendations. On Tue, Feb 15, 2011 at 11:19 AM, Chen_1st <[email protected]> wrote: > Hi, Markus, > > I am curious why the competition still tries to predict the rating > values, now that top k recommendation is more practical in real life > applications, and it's illustrated by many papers that rating value > prediction is not so useful for discovery of top k items. > > Best Regards. > > Chen
