If you are comparing ranking systems against a gold standard of relevance, the accepted standard measure is AUC. You can define AUC most conveniently as the probability that the score of a randomly chosen known good example is higher than the score of a randomly chosen known bad example. This is the same as the Wilcoxon rank test and closely related to the Mann Whitney test.
AUC has the nice property that it can be computed in an on-line fashion and can be used as an meta-objective in a stochastic gradient descent learning algorithm. If you are simply comparing to ranking systems against each other there, are other scores with interesting properties, but that doesn't seem like what you would be doing in evaluating a recommender. On Sat, Oct 16, 2010 at 6:46 PM, Lance Norskog <[email protected]> wrote: > I have a recommender that I would like to evaluate. The Absolute > evaluator doesn't work, because it compares preference values. The > recommender and its datamodel operate in different numerical spaces > and there is no way to normalize the two. So this leaves comparing the > relative order of the recommendations from the DataModel v.s. > Recommender. There is no order-comparing evaluator. > > What's a good strategy for this problem? Order comparison seems the > right approach, but what are "intellectually defendable" formulae? > This is what I've got: > > For each user, I get the item preferences from the DataModel. Then I > get preferences for the same items from the recommender. These are > stored in matching arrays. > > I've tried a couple of measurements: > 1) Do a bubble sort of one prefs list against the other, counting the > number of swaps needed to make the two match. > 2) For each item in one prefs list, find its position in the other > prefs list and save the distance. > > For both of these measures, I've tried various combinations of > division and square roots to get a useful comparison score. Throwing > in a square root allows one to accentuate nearer distances v.s. > farther. > > Comments? Technical references? > > Thanks, > > -- > Lance Norskog > [email protected] >
