Most watched by that particular user.

The issue is that the recommender is trying to answer, "of all items
the user has not interacted with, which is the user most likely to
interact with"? So the 'right answers' to the quiz it gets ought to be
answers to this question. That is why the test data ought to be what
appears to be the most interacted / preferred items.

For example If you watched 10 Star Trek episodes, then 1 episode of
the Simpsons, and then held out the Simpson episode -- the recommender
is almost surely not going to predict it, not above more Star Trek.
That seems like correct behavior, but would be scored badly by a
simple precision test.

There are two downsides to this approach. Firstly removing well liked
items from the training set may meaningfully skew a user's
recommendations. It's not such a big issue if the test set is small --
and it should be.

The second is that by taking out data this way you end up with a
training set which never really existed at one point in time. That
also could be a source of bias.

Using recent data points tends to avoid both of these problem -- but
then has the problem above.


There's another approach I've been playing with, which works when the
recommender produces some score for each rec, not just a ranked list.
You can train on data up to a certain point in time, then have the
recommender score the observations that really happened after that
point. Ideally it should produce a high score for things that really
were observed next.

This isn't implemented in Mahout but you do get a score with recs even
without ratings.

Reply via email to