This is expected behavior as far as I understand the algorithm. I don't see how a user-based recommender can estimated a preference by X for Y if nobody who rated Y is connected to X at all.
You can use a PreferenceInferrer to fill in a lot of missing data, but I don't really recommend this for more than experimentation. The issue here is mostly that the user-item matrix is too sparse. And yes there are load of follow-up suggestions that tackle that, depending on your data, as alex hinted at. On Mon, Aug 9, 2010 at 3:31 AM, Yanir Seroussi <[email protected]> wrote: > Hi, > > The first example here ( > https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation) > shows how to create a GenericUserBasedRecommender with a > NearestNUserNeighborhood. My problem/question is that setting n to any small > number seems to limit the coverage of the recommender, because the nearest n > users are calculated without taking the target item into account. > For example, given a user X and n = 10, if we want to estimatePreference() > for an item Y, if this item is not rated by any user in the neighbourhood, > the prediction will be NaN. I don't think that this is what one would expect > from a user-based nearest-neighbour recommender, as Herlocker et al. (1999), > who are cited in the example page above, didn't mention any change in > coverage based on the number of nearest neighbours. > Am I doing something wrong, or is this the way it should be? I have a > feeling it is not the way it should be, because then using small > neighbourhood sizes makes no sense as it severely restricts the ability of > the recommender to estimate preferences. > > Please note that I observed this behaviour in version 0.3, but it seems to > be the same in the latest version. > > Cheers, > Yanir >
