Evaluating precision @ 1 is evaluating the 1st recommendation, whether it's a good recommendation. It's like asking for the data point that a classifier would classify as most probably in a certain class. That's not the same as what a classifier is built to do, which is to decide whether any given item is in a class or not. Those are obviously quite related questions though.
On Thu, Aug 9, 2012 at 9:20 PM, ziad kamel <[email protected]> wrote: > Thanks again. > > A quick question , in recommendation , if we measure precision @ 1 , > how is that different from measuring precision in a classifier ? Does > that mean a recommender becomes a classifier at this case ? > > > > > On Thu, Aug 9, 2012 at 12:18 PM, Sean Owen <[email protected]> wrote: > > Yes, this is a definite weakness of the precision test as applied to > > recommenders. It is somewhat flawed; it is easy to apply and has some > use. > > > > Any item the user has interacted with is significant. The less-preferred > 84 > > still probably predict the most-preferred 16 to some extent. But you > make a > > good point, the bottom of the list is of a different nature than the top, > > and that bias does harm the recommendations, making the test result less > > useful. > > > > This is not a big issue though if the precision@ number is quite small > > compared to the user pref list size. > > > > There's a stronger problem, that the user's pref list is not complete. A > > recommendation that's not in the list already may still be a good > > recommendation, in the abstract. But a precision test would count it as > > "wrong". > > > > nDCG is slightly better than precision but still has this fundamental > > problem. > > > > The "real" test is to make recommendations and then put them in front of > > users somehow and see how many are clicked or acted on. That's the best > > test but fairly impractical in most cases. > > > > On Thu, Aug 9, 2012 at 5:54 PM, ziad kamel <[email protected]> > wrote: > > > >> I see, but we are removing the good recommendations and we are > >> assuming that the less preferred items by a user can predict his best > >> preferred. For example, a user that has 100 books , and preferred 16 > >> of them only while the rest are books he have read. By removing the 16 > >> we are left with 84 books that it seems won't be able to predict the > >> right set of 16 ? > >> > >> What are the recommended approaches to evaluate the results ? I assume > >> IR approach is one of them. > >> > >> Highly appreciating your help Sean . > >> > >> On Thu, Aug 9, 2012 at 11:45 AM, Sean Owen <[email protected]> wrote: > >> > Yes, or else those items would not be eligible for recommendation. > And it > >> > would be like giving students the answers to a test before the test. > >> > > >> > On Thu, Aug 9, 2012 at 5:41 PM, ziad kamel <[email protected]> > >> wrote: > >> > > >> >> A related question please. > >> >> > >> >> Do Mahout remove the 16% good items before recommending and use the > >> >> 84% to predict the 16% ? > >> >> > >> >> > >> >
