Ted,
Thanks for the quick response.  Perhaps I used the wrong terminology, but a
recommender that uses binary data is nothing new.  For example: a news web
site would like to recommend news stories based on your past
viewing behavior:  you viewed an article or not.  Chapter 6 in Mahout in
Action has the Wikipedia snapshot with link exists or not, recommendations
are done on these binary datasets.
The recommender is not generating a 1 or 0.

Thanks again, I will probably go with precision.  What do you think about
coverage?
Peter

On Mon, Apr 25, 2011 at 5:50 PM, Ted Dunning <[email protected]> wrote:

> If the recommendation will only produce binary output scores and you have
> actual held out user data, then you can still compute AUC.  If you want to
> compute log-likelihood, you need to compute probabilities p_1 and p_2 that
> represent what the recommender *should* have said when it actually said 0
> or
> 1.  You can adapt these to give optimum log-likelihood on one held out set
> and then get a real value for log-likelihood on another held out set.
>
> Precision, recall, false positive rate are also possibly useful.
>
> If the engine has an internal threshold knob, you can build ROC curves and
> estimate AUC using averaging.
>
> But the question remains, why would use such a recommendation engine?
>
> On Mon, Apr 25, 2011 at 5:28 PM, Peter Harrington <
> [email protected]> wrote:
>
> > Does anyone have a suggestion for how to evaluate a recommendation engine
> > that uses a binary rating system?
> > Usually the R scores (similarity score * rating of other items) are
> > normalized by dividing by the sum of all rated similarity scores.  If I
> do
> > this for a binary scoring system I would get 1.0 for every item.
> >
> > Is there another normalization I can do to get a number between 0 and
> 1.0?
> > Should I just use precision and recall?
> >
> > Thanks for the help,
> > Peter Harrington
> >
>

Reply via email to