All preferences are "1" in your world. Therefore user vectors are always like (1,1,...,1). The distance between any two is 0, and the similarity is 1. This metric is not appropriate for binary data. The closest thing to what I think you want is the TanimotoCoefficientsimilarity, but also try LogLikelihoodSimilarity.
Yes, if you have a range of ratings, not just 1, it becomes meaningful again to look at distance as a similarity metric. Sean On Sun, May 8, 2011 at 5:37 PM, Thomas Söhngen <[email protected]> wrote: > Hello everyone, > > I am calculating similiar items with the SIMILARITY_EUCLIDEAN_DISTANCE > class. My input is binary data, users clicking a like button. The output > only generates similarities with a similarity score of "1". It doesn't > calculate all items similiar to each other, but for the items it finds a > similarity, the output is always "1". Why is this? > > I don't have the problem, when I also add a "dislike" information, with > input lines "item_id,user_id,1" for a Like interaction and > "item_id,user_id,-1" for dislikes. The similarity lies between 0 and 1 then. > > Regards and thanks for suggestions, > Thomas >
