User-user similarity is based on these counts? That sounds a bit like the Tanimoto / Jaccard coefficient.See TanimotoCoeffcientSimilarity. Yes you can use that though log-likelihood is probably a more sophisticated choice.
Recommending an item that occurs most in the neighborhood? Sure you can make it work that way. It probably works "OK" in practice though you can see possible problems with it. What if everyone in the neighborhood hates an item? this would recommend it highly. It's also throwing away the degree of similarity to the user who likes an item. The conventional wisdom in recommenders is that you want to fight the tendency to always recommend well-known items. People probably already know about the well-known items even if they've not rated them yet. It also makes the recommendations less personalized in a sense -- the recommendation result approaches the one you'd get by just recommending the globally most-preferred items. If your goal is to fight sparseness, start looking at SVD-based methods. This is really the point of SVDs, to "summarize" a very high-dimensional user-item matrix in a much lower-dimensional "user group" - "item group" matrix. Maybe you don't have enough information to recommend Bauhaus to Joan, a teenage goth, but, the SVD lets you sort of draw conclusions like "gothy teens like Peter Murphy's albums". That is the summary is much less sparse and so works better for recommendation for users/items with little connection to the rest of the matrix otherwise. On Sat, Feb 19, 2011 at 2:43 AM, Chris Schilling <[email protected]> wrote: > Hello again, > > Very simple question here: I am also testing the user-user cf in mahout. > So, once I define my user neighborhood, is it possible to select the > recommendations from that based on the number of preferences per item rather > than a weighted average? Basically, I'd like to recommend the items with the > most preferences. It would be simple to implement, so I was curious if this > was already possible. I understand that in this case, the counts become > dependent on the size of the neighborhood. This is something I'd want to use > for testing. > > Thanks > Chris
