Some of the references for the newer cooccurrence recommender that we now suggest you use are at the top of the page here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html There are many benefits of this new method including at its core a new similarity algorithm that relies on log-likelihood (LLR) calculated cooccurrence strength. These don’t suffer from the problems you mention. On Mar 8, 2015, at 8:06 AM, Ted Dunning <[email protected]> wrote: On Sat, Mar 7, 2015 at 3:05 AM, Tevfik Aytekin <[email protected]> wrote: > There can be two solutions: > 1. There should be a parameter n, which determines the minimum number > of common ratings needed to compute a similarity otherwise the system > should return NaN. > 2. The similarity should be computed using all the ratings, for the > above two vectors, the cosine similarity should be > > (3*5+2*4)/(sqrt(3^2+4^2+2^2)+sqrt(3^2+5^2+2^2+4^2)) > or 3. Use the more modern and scalable recommendation methods.
