You can do whatever you like if it works for you, but this sounds
wrong to me. Yes you got more recommendations, but are those last
recommendations actually good ones? The algorithm may be "telling you"
there's not enough information to be sure about recommending many
items.

A neighborhood of hundreds of users is very large. It's such a crowd,
that much of the neighborhood is undoubtedly "far" from the user. Yes,
those are the nearest 1000 users, but perhaps 20 of them are really
similar and the other 980 are introducing increasingly more noise in
the computation.

I would actually suggest you use a threshold-based neighborhood
definition. The cutoff value depends on your similarity metric. If you
use Pearson... maybe 0.5 or so?

Yes, you may get fewer recommendations, but maybe that's good.

(Another plug: if you are interested in this tradeoff, and evaluating
metrics and such, this is all written up pretty thoroughly in Mahout
in Action: http://manning.com/owen/)

2010/8/30 Young <[email protected]>:
> Hi Sean,
> Thanks. When I expand the neighborsize into 1000, there are 80 items in 
> common when giving 500 recommendations. That's quite reasonable and accepted.
>
> -- Young
>
>
>
>
> At 2010-08-30 23:55:15,"Sean Owen" <[email protected]> wrote:
>
>>That result is quite possible. For example, with a user-based
>>recommender, the only items that can possibly be recommended are those
>>in the user's neighborhood. If the neighborhood is small, it's
>>possible that only 23 unique items exist among users in that
>>neighborhood. You can never get more recommendations than this.
>>
>>I don't think this result is "bad" per se, but if you want to try to
>>get more recommendations, you really need more 'dense' data. Or,
>>another algorithm may have different properties that are more
>>desirable to you. Try SlopeOneRecommender.
>>
>>2010/8/30 Young <[email protected]>:
>>> Hi all,
>>> Based on 1M grouplens data, I tried to use user-based recommender and 
>>> item-based recommender to give same user the recommendations. But the 
>>> results vary so much. There are 4302 items in dataModel. For user 3 or 8, 
>>> when returning 500 recommendeditems, there are only 23 items are in common.
>>> In itembased recommender, I use PearsonCorrelationSimilarity.
>>> In userbased recommender, I use NearestNNeighborhood (size 100), 
>>> PearsonCorrelationSimilarity.
>>> Should these results be accepted? Or what should I do to improve this 
>>> situation?
>>>
>>> Thank you very much.
>>>
>>> -- Young
>>>
>

Reply via email to