The affect of downweighting the popular items is very similar to removing them from recommendations so I still suspect precision will go down using IDF. Obviously this can pretty easily be tested, I just wondered if anyone had already done it.
This brings up a problem with holdout based precision. It measures the value of a model trained on a training set in predicting something that is in the holdout set. This may or may not correlate with affecting user behavior. To use purchases as preference indicators, a precision metric would measure how well purchases in the trianing set predicted purchases in the test set. If IDF lowers precision, it may also affect user behavior strongly by recommending non-obvious (non-inevitable) items. This affect on user behavior AFAIK can't be measured from holdout tests. I worry that precision related measures may point us in the wrong direction. Are A/B tests our only reliable metric for questions like this? On Feb 6, 2013, at 9:04 AM, Paulo Villegas <[email protected]> wrote: > > This results in no information for universally preferred items, which > is indeed what I was looking for. It looks like this should also work > for other values or explicit preferences--item prices, ratings, > etc.. > > Intuition says this will result in a lower precision related cross > validation measure since you are discounting the obvious > recommendations. I have no experience with measuring something like > this, any you have would be appreciated. > (this is just guesswork, so I could be terribly wrong) In a non-IDF-weighted recommender, if you take out the top N% of items (items with more occurrences in the user-item matrix) precision will suffer badly, since the recommender will miss opportunities to recommend "easy targets" (items with high probability of occurrence in the test set). In an IDF-weighted recommender, it could improve precision instead, since you take items highly likely to be in the testset that were not going to be recommended in top positions due to their strong IDF down-weight. This would be a hint that the IDF weight is working to suppress the "obvious" recommendations. In this last case, precision would tend to go up as you keep removing a bigger share of top items, until you reach a diminishing returns point, in which the growing reduction of data relating user & items provoked by removing top item interactions spoils any advantage of taking them out of the picture. This might be the point in which you decide pruning top items is best. So you could use that % of top items pruned as the place for your "canonical" precision value. Highly application- and domain-dependent, anyway Paulo
