Ted, Thanks for the quick response. Perhaps I used the wrong terminology, but a recommender that uses binary data is nothing new. For example: a news web site would like to recommend news stories based on your past viewing behavior: you viewed an article or not. Chapter 6 in Mahout in Action has the Wikipedia snapshot with link exists or not, recommendations are done on these binary datasets. The recommender is not generating a 1 or 0.
Thanks again, I will probably go with precision. What do you think about coverage? Peter On Mon, Apr 25, 2011 at 5:50 PM, Ted Dunning <[email protected]> wrote: > If the recommendation will only produce binary output scores and you have > actual held out user data, then you can still compute AUC. If you want to > compute log-likelihood, you need to compute probabilities p_1 and p_2 that > represent what the recommender *should* have said when it actually said 0 > or > 1. You can adapt these to give optimum log-likelihood on one held out set > and then get a real value for log-likelihood on another held out set. > > Precision, recall, false positive rate are also possibly useful. > > If the engine has an internal threshold knob, you can build ROC curves and > estimate AUC using averaging. > > But the question remains, why would use such a recommendation engine? > > On Mon, Apr 25, 2011 at 5:28 PM, Peter Harrington < > [email protected]> wrote: > > > Does anyone have a suggestion for how to evaluate a recommendation engine > > that uses a binary rating system? > > Usually the R scores (similarity score * rating of other items) are > > normalized by dividing by the sum of all rated similarity scores. If I > do > > this for a binary scoring system I would get 1.0 for every item. > > > > Is there another normalization I can do to get a number between 0 and > 1.0? > > Should I just use precision and recall? > > > > Thanks for the help, > > Peter Harrington > > >
