You can do whatever you like if it works for you, but this sounds wrong to me. Yes you got more recommendations, but are those last recommendations actually good ones? The algorithm may be "telling you" there's not enough information to be sure about recommending many items.
A neighborhood of hundreds of users is very large. It's such a crowd, that much of the neighborhood is undoubtedly "far" from the user. Yes, those are the nearest 1000 users, but perhaps 20 of them are really similar and the other 980 are introducing increasingly more noise in the computation. I would actually suggest you use a threshold-based neighborhood definition. The cutoff value depends on your similarity metric. If you use Pearson... maybe 0.5 or so? Yes, you may get fewer recommendations, but maybe that's good. (Another plug: if you are interested in this tradeoff, and evaluating metrics and such, this is all written up pretty thoroughly in Mahout in Action: http://manning.com/owen/) 2010/8/30 Young <[email protected]>: > Hi Sean, > Thanks. When I expand the neighborsize into 1000, there are 80 items in > common when giving 500 recommendations. That's quite reasonable and accepted. > > -- Young > > > > > At 2010-08-30 23:55:15,"Sean Owen" <[email protected]> wrote: > >>That result is quite possible. For example, with a user-based >>recommender, the only items that can possibly be recommended are those >>in the user's neighborhood. If the neighborhood is small, it's >>possible that only 23 unique items exist among users in that >>neighborhood. You can never get more recommendations than this. >> >>I don't think this result is "bad" per se, but if you want to try to >>get more recommendations, you really need more 'dense' data. Or, >>another algorithm may have different properties that are more >>desirable to you. Try SlopeOneRecommender. >> >>2010/8/30 Young <[email protected]>: >>> Hi all, >>> Based on 1M grouplens data, I tried to use user-based recommender and >>> item-based recommender to give same user the recommendations. But the >>> results vary so much. There are 4302 items in dataModel. For user 3 or 8, >>> when returning 500 recommendeditems, there are only 23 items are in common. >>> In itembased recommender, I use PearsonCorrelationSimilarity. >>> In userbased recommender, I use NearestNNeighborhood (size 100), >>> PearsonCorrelationSimilarity. >>> Should these results be accepted? Or what should I do to improve this >>> situation? >>> >>> Thank you very much. >>> >>> -- Young >>> >
