Different people watch different numbers of movies. They also rate some but not all. Their recommendations may be in one or a few clusters (other clustering can be genre, which day of the week is the rating, on and on) or may be scattered all over genres (Harry Potter & British comedy & European soft-core 70's porn). Evaluating the worth of user X's ratings is also important. If you want to interpret the ratings in an absolute number system, you want to map the incoming ratings because they may average at 7.
The code in Mahout doesn't address these issues. On Mon, Dec 27, 2010 at 6:54 AM, Otis Gospodnetic <[email protected]> wrote: > Hi, > > I was wondering how people evaluate the quality of recommendations other than > RMSE and such in eval package. > For example, what are some good ways to measure/evaluate the quality of > recommendations based on simply observing users' usage of recommendations? > Here are 2 ideas. > > * If you have a mechanism to capture user's rating of the watched item, that > gives you (in)direct feedback about the quality of the recommendation. When > evaluating and comparing you probably also want to take into account the > ordinal of the recommended item in the list of recommended items. If a > person > chooses 1st recommendation and gives it a score of 10 (best) it's different > than when a person chooses 7th recommendation and gives it a score of 10. Or > if a person chooses 1st recommendation and gives it a rating of 1.0 (worst) > vs. > choosing 10th recommendation and rating it 1.0. > > * Even if you don't have a mechanism to capture rating feedback from viewers, > you can evaluate and compare. You can do that by purely looking at ordinals > of > items selected from recommendations. If a person chooses something closer to > "the top" of the recommendation list, the recommendations can be considered > better than if the user chooses something closer to "the bottom". This idea > is > similar to MRR in search - http://en.wikipedia.org/wiki/Mean_reciprocal_rank > . > > * The above ideas assume recommendations are not shuffled, meaning that their > order represents their real recommendation score-based order > > I'm wondering: > A) if these ways or measuring/evaluating quality of recommendations are > good/bad/flawed > B) if there are other, better ways of doing this > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > -- Lance Norskog [email protected]
