Re: Two learning competitions that might be of interest for Mahout

Sean Owen Tue, 15 Feb 2011 15:24:42 -0800

Now that I've actually read the contest page (useful, that is), I see
that Track 2 does something like what we are talking about.  It holds
out some highly rated items as "good" recommendations and asks the
system to distinguish them from random other unrated items.

This mostly avoids the theoretical issues with a
precision/recall-style test for top k recommendations.

To answer your interesting question here -- no of course you do not
have ratings for all items from all users. Users tend to rate things
they really like, and really don't like. If you were to take out the
top k ratings (and that's not what Track 2 does, quite), the resulting
training data would be systematically un-like real ratings. It's not
the same as removing k random items. The effect on a recommender is
probably negative, but, perhaps only very marginally.

If a user rates items 1 to 10, and we hold out 6 to 10 from the
training data, you can create a precision@5 test by seeing how much of
6 to 10 is recommended to the user. But, the problem is that we don't
know 6-10 are the best recommendations -- they're probably good
recommendations, but not necessarily the best ones. If there were an
item 11 that the user actually would like more, and the recommender
recommends it, it would be penalized in this test. That's the problem.

It's not a meaningless test but has certain drawbacks. The Track 2
test formulation doesn't have this particular issue.

On Tue, Feb 15, 2011 at 6:34 PM, Chen_1st <[email protected]> wrote:
> Hi, Sean,
>
> Sorry for my poor English.
>
>>>Hmm, not sure I understand. No, it's not true that real-life data
>>>regularly omits the user's top ratings. Why would that be?
>
> In reallife applications, it's impossible for users to provide ratings for
> all their favoriate tracks, right? It's the same effect as omitting some top
> rated tracks.
>
>>>How would you score the recommendations by holding out a random
>>>subset? That subset is definitely *not* representative of good
>>>recommendations -- you might be picking out things the user hates.
>
> Consider the example: the top favoriate tracks of the user are complete_set
> = {1, 2, ..., 10}, and user only provide ratings on randomly_selected_subset
> = {1, 2, ..., 5}, here we assume the user randomly selected 5 tracks from
> the complete_set and rated them. Let the recommender system predict top 5
> tracks for the user, if it can correctly hit 3 in randomly_selected_subset,
> it's with high probability better than hit only 1,
>
> The above is the illustration how to  apply recall@5. Precision and NDCG are
> similar.
> 2011/2/16 Sean Owen <[email protected]>
>
>> Hmm, not sure I understand. No, it's not true that real-life data
>> regularly omits the user's top ratings. Why would that be?
>>
>> How would you score the recommendations by holding out a random
>> subset? That subset is definitely *not* representative of good
>> recommendations -- you might be picking out things the user hates.
>>
>> Precision / recall don't really make sense unless you think you're
>> holding out "good" recommendations and those would have to be top
>> rated items.
>>
>> Sean
>>

Re: Two learning competitions that might be of interest for Mahout

Reply via email to