I think it would be surprising behavior for a recommender to return data it already knows; I just think the implicit contract is to return only predictions. That's how real-world recommender systems appear to behave, to the end user; Amazon doesn't show you books you have already read, even if indeed they may be some of your favorites ever.
That's how it's built now anyway so would prefer not to change it, because you can combine with data you already have, if that's what you want, more easily than you can strip out the data you already have from the result, if that's not what you want. You also run the risk of the top items being all existing data points; then the recommender is not providing any useful extra info. You can make RecommendedItem for all existing data points, mix with recommendations, and sort. If you don't have rating values, then you can't use a recommender built on predicting ratings, since they will all be 1, and your result is as you say random. The answer is, don't do that! Either you don't use ratings, and use the boolean versions, or you do use ratings (like your decaying click value) and then you can use either. On Fri, Jan 27, 2012 at 10:09 AM, Anatoliy Kats <[email protected]>wrote: > So you're proposing that we separate the actions of estimating preferences > for unknown items, and recommending items to users to click : the latter > could include some items for which a preference has been expressed. It's a > good idea to think that way, thanks for the tip. I would argue, though, > that .recommend() is aimed at the latter task: it predicts preferences, > and sorts them, and returns the top N items. It is a final step in a > process that includes unknown preference estimation as an intermediate > step. This is built into Mahout as I see it, by separating .recommend() > and .estimatePreference(). That's why I still think the most elegant > solution is simply adding known preference values to the predicted ones to > the set of possible recommendations. AFAIK this is most easily > accomplished by playing around with CandidateItemStrategies. How would you > go about it without having to write your own sorting function? > > About boolean recommenders: Many of my users made no purchases, only > clicks. So, if I use a generic recommender, it will make random > recommendations because my training data is essentially boolean. Has > anyone else run into this problem? One solution I am about to try is > letting the rating value of a click decay with time since the click was > made. I am not sure if the ratings will be different enough for > GenericRecommender to work, and I am also not sure I am justified in > reducing the item similarity between two items because two users clicked on > them at different times. Has anyone tried a solution based on a > regularized normalization of some sort? > > Thanks. > >
