Thanks for your thorough response. It is really helpful as we are new to Mahout and recommendations in general. The approach you mention about training on data up to a certain point a time and having the recommender score the next actual observations is very interesting. This would seem to work well with our Boolean dataset. We will give this a try.
Thanks again for the help. -Jonathan On Sun, Aug 26, 2012 at 3:55 PM, Sean Owen <[email protected]> wrote: > Most watched by that particular user. > > The issue is that the recommender is trying to answer, "of all items > the user has not interacted with, which is the user most likely to > interact with"? So the 'right answers' to the quiz it gets ought to be > answers to this question. That is why the test data ought to be what > appears to be the most interacted / preferred items. > > For example If you watched 10 Star Trek episodes, then 1 episode of > the Simpsons, and then held out the Simpson episode -- the recommender > is almost surely not going to predict it, not above more Star Trek. > That seems like correct behavior, but would be scored badly by a > simple precision test. > > There are two downsides to this approach. Firstly removing well liked > items from the training set may meaningfully skew a user's > recommendations. It's not such a big issue if the test set is small -- > and it should be. > > The second is that by taking out data this way you end up with a > training set which never really existed at one point in time. That > also could be a source of bias. > > Using recent data points tends to avoid both of these problem -- but > then has the problem above. > > > There's another approach I've been playing with, which works when the > recommender produces some score for each rec, not just a ranked list. > You can train on data up to a certain point in time, then have the > recommender score the observations that really happened after that > point. Ideally it should produce a high score for things that really > were observed next. > > This isn't implemented in Mahout but you do get a score with recs even > without ratings. >
