MRR (Mean Reciprocal Rank) is a more realistic version of the same thing: first on the list counts as one, second on the list counts as 1/2, third on the list counts as 1/3 down to 5. This tries to match the probability of people clicking listings in the first page.
On Thu, Aug 9, 2012 at 1:37 PM, Sean Owen <[email protected]> wrote: > Evaluating precision @ 1 is evaluating the 1st recommendation, whether it's > a good recommendation. It's like asking for the data point that a > classifier would classify as most probably in a certain class. That's not > the same as what a classifier is built to do, which is to decide whether > any given item is in a class or not. Those are obviously quite related > questions though. > > On Thu, Aug 9, 2012 at 9:20 PM, ziad kamel <[email protected]> wrote: > >> Thanks again. >> >> A quick question , in recommendation , if we measure precision @ 1 , >> how is that different from measuring precision in a classifier ? Does >> that mean a recommender becomes a classifier at this case ? >> >> >> >> >> On Thu, Aug 9, 2012 at 12:18 PM, Sean Owen <[email protected]> wrote: >> > Yes, this is a definite weakness of the precision test as applied to >> > recommenders. It is somewhat flawed; it is easy to apply and has some >> use. >> > >> > Any item the user has interacted with is significant. The less-preferred >> 84 >> > still probably predict the most-preferred 16 to some extent. But you >> make a >> > good point, the bottom of the list is of a different nature than the top, >> > and that bias does harm the recommendations, making the test result less >> > useful. >> > >> > This is not a big issue though if the precision@ number is quite small >> > compared to the user pref list size. >> > >> > There's a stronger problem, that the user's pref list is not complete. A >> > recommendation that's not in the list already may still be a good >> > recommendation, in the abstract. But a precision test would count it as >> > "wrong". >> > >> > nDCG is slightly better than precision but still has this fundamental >> > problem. >> > >> > The "real" test is to make recommendations and then put them in front of >> > users somehow and see how many are clicked or acted on. That's the best >> > test but fairly impractical in most cases. >> > >> > On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <[email protected]> >> wrote: >> > >> >> I see, but we are removing the good recommendations and we are >> >> assuming that the less preferred items by a user can predict his best >> >> preferred. For example, a user that has 100 books , and preferred 16 >> >> of them only while the rest are books he have read. By removing the 16 >> >> we are left with 84 books that it seems won't be able to predict the >> >> right set of 16 ? >> >> >> >> What are the recommended approaches to evaluate the results ? I assume >> >> IR approach is one of them. >> >> >> >> Highly appreciating your help Sean . >> >> >> >> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <[email protected]> wrote: >> >> > Yes, or else those items would not be eligible for recommendation. >> And it >> >> > would be like giving students the answers to a test before the test. >> >> > >> >> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <[email protected]> >> >> wrote: >> >> > >> >> >> A related question please. >> >> >> >> >> >> Do Mahout remove the 16% good items before recommending and use the >> >> >> 84% to predict the 16% ? >> >> >> >> >> >> >> >> >> -- Lance Norskog [email protected]
