Another problem is activity changes over time. The Netflix viewing data showed a striking change in viewing patterms (I think sometime in 2004). Suppose you test 6 months of training data v.s. the following month of test data: Q3 and Q4 of 2003 v.s. January 2004. Now, do this for every month in 2004-2005, rolling the training and test sets forward month by month. You will see recommendation quality dip and recover across this time, because the recent past activity stopped predicting the future for several months.
On Mon, Jan 30, 2012 at 10:31 AM, Ted Dunning <[email protected]> wrote: > I don't know that I have any secrets. > > I have observed garbage performance from recommenders based on behavior. > That performance got enormously better as we chose different behaviors to > indicate engagement. > > As an example of what we looked at, consider a video site which records > ratings, video views and 30 second (or more) video views. > > Ratings information was minute compared to the other data and thus had > little value. Many videos never had any ratings and the vast majority of > all users never rated anything. Even worse, it was impossible to ever > detect any improvement in performance when we added ratings information. > Performance with ratings alone was not discernibly better than random > recommendations. > > Video views was the largest data source and after the problems with the > paucity of ratings it looked better. Unfortunately, our users often > clicked in videos due to misleading meta-data or because they were vaguely > curious. Neither of those situations represented an expression of user > preference. In practice, recommender performance with video views was > better than random, but still pretty poor. > > 30 second video views produced very good results in spite of the fact that > the data was 10x smaller than raw video views. This was demonstrated by > heuristic examination (aka the "laugh test") and by click-through and by > user session length. Mixing in video views degraded performance visibly. > > > In building these systems, it was critical to incorporate a system like the > LogLikelihoodSimilarity for building the item-item model. Direct user > based recommenders that used cosine and similar user-user metrics were > laughably bad and were dominated by popular items. > > In earlier work at Musicmatch, we had similar results in that we had to > carefully select which interactions we used as input to the recommender. > The overall process was much simpler, however, since we came closer to > good results in our first tries. > > > > > On Mon, Jan 30, 2012 at 1:43 PM, Lee Carroll > <[email protected]>wrote: > >> >So I find that mental state estimations are the indirect way to model and >> >predict behaviors while directly modeling behaviors based on observed >> >behaviors is, well, more direct. >> >> That's a lovely switch :-) you should come and work for our business >> unit, they would love you :-) >> >> However the experience of using page behaviour to recommend product >> has been really disappointing >> never out performing simple heuristics (and i mean really simple >> market segmentation). Maybe we should look again >> but having fallen for the engagement metric stuff once what would we >> need to look out for to make it better ? >> What's your secret Ted! -- Lance Norskog [email protected]
