Re: How to evaluate a recommender with binary ratings?

Sean Owen Tue, 26 Apr 2011 00:14:39 -0700

Peter (/Ted),

Yes this is all answered in the framework already. You would never directly
use the recommenders intended for data sets with ratings, as most don't make
sense when all ratings are 1.0. You would use, for example,
GenericBooleanPrefItemBasedRecommender, a variant on
GenericItemBasedRecommender, which overloads the notion of
"estimatePreference()" to still return a useful value.

There is already GenericRecommenderIRStatsEvaluator which runs precision,
recall, f-score and NDCG stats on a recommender. These are meaningful even
without ratings, though of course things like RMSE aren't anymore. (This is
all in Mahout in Action too, yes.)

The output of a recommender or similarity metric isn't a probability in
general, so you can't apply AUC in all cases, so this is not implemented in
general. However yes for the case of LogLikelihoodSimilarity you could
manage to put that together.

On Tue, Apr 26, 2011 at 1:50 AM, Ted Dunning <[email protected]> wrote:

> If the recommendation will only produce binary output scores and you have
> actual held out user data, then you can still compute AUC.  If you want to
> compute log-likelihood, you need to compute probabilities p_1 and p_2 that
> represent what the recommender *should* have said when it actually said 0
> or
> 1.  You can adapt these to give optimum log-likelihood on one held out set
> and then get a real value for log-likelihood on another held out set.
>
> Precision, recall, false positive rate are also possibly useful.
>
> If the engine has an internal threshold knob, you can build ROC curves and
> estimate AUC using averaging.
>
> But the question remains, why would use such a recommendation engine?
>
> On Mon, Apr 25, 2011 at 5:28 PM, Peter Harrington <
> [email protected]> wrote:
>
> > Does anyone have a suggestion for how to evaluate a recommendation engine
> > that uses a binary rating system?
> > Usually the R scores (similarity score * rating of other items) are
> > normalized by dividing by the sum of all rated similarity scores.  If I
> do
> > this for a binary scoring system I would get 1.0 for every item.
> >
> > Is there another normalization I can do to get a number between 0 and
> 1.0?
> > Should I just use precision and recall?
> >
> > Thanks for the help,
> > Peter Harrington
> >
>

Re: How to evaluate a recommender with binary ratings?

Reply via email to