Hi, It seems like you have a very sparse user-item matrix, which is very common in real world applications. You can try to improve coverage with Item-based Recommender (assuming you have much more Users in the dataset compare to Items) or you can make CF neighborhood formation (similarity measures) happen in SVD reduced space. How to find the most optimal low-rank SVD setting is another long story.
fyi, above changes will not be able address to the low coverage issue caused by cold-start problem. Cold-start problem needs to be addressed by changing data representation and features we use to construct user-item matrix. Cheers, Alex On Mon, Aug 9, 2010 at 4:31 AM, Yanir Seroussi <[email protected]>wrote: > Hi, > > The first example here ( > > https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation > ) > shows how to create a GenericUserBasedRecommender with a > NearestNUserNeighborhood. My problem/question is that setting n to any > small > number seems to limit the coverage of the recommender, because the nearest > n > users are calculated without taking the target item into account. > For example, given a user X and n = 10, if we want to estimatePreference() > for an item Y, if this item is not rated by any user in the neighbourhood, > the prediction will be NaN. I don't think that this is what one would > expect > from a user-based nearest-neighbour recommender, as Herlocker et al. > (1999), > who are cited in the example page above, didn't mention any change in > coverage based on the number of nearest neighbours. > Am I doing something wrong, or is this the way it should be? I have a > feeling it is not the way it should be, because then using small > neighbourhood sizes makes no sense as it severely restricts the ability of > the recommender to estimate preferences. > > Please note that I observed this behaviour in version 0.3, but it seems to > be the same in the latest version. > > Cheers, > Yanir >
