That is the opposite of what you'd expect, and I think that's a possible explanation you've identified, but still seems unlikely to me. Something else may be wrong. Is this repeatable, and not just a fluke of the random number generator? What are the exact args you're using, just to make sure you're really setting the percentages and such as you think?
If you have more data available, indeed I'd use more data, especially if that more accurately reflects your real environment. You can try to exclude these low-rank items, though this makes the test less representative of reality, since those kinds of item do exist and are an issue. What ItemSimilarity? because some are by nature already accounting for these issues, like log-likelihood. But you can use IDRescorer if you like to exclude such items, if you do want to go that way, yes. On Wed, Jan 4, 2012 at 1:51 AM, Nick Jordan <[email protected]> wrote: > Hi All, > > I'm currently running an item based recommendation > using KnnItemBasedRecommender. My data set isn't very large at > approximately 30k preferences over 10k items. When running > a AverageAbsoluteDifferenceRecommenderEvaluator evaluation on a 0.9 > training set the result is ~0.80 (on a preference scale of 1-5). When > tuning that training set down to only 0.1 the mean difference is closer to > 0.2. > > I assume that this number is actually lower because there are less > recommendations that can actually be made. Meaning that with the smaller > training set there isn't enough similarity to make recommendations, and so > those that it does make are more accurate. So the question for me becomes, > what does the evaluation look like when only providing recommendations for > items with more than x declared preferences? I'm wondering what the best > way to determine this. Should I create a new recommender that only will > return items with x or more preferences (maybe using IDRescorer?) or should > I create a new evaulator to do something similar? Is there a native method > to accomplish this that I've missed? Is my hypothesis just likely wrong? > > Appreciate the feedback. > > Nick >
