Is there a way to tell Mahout to "fill-up" the user-item matrix with zeros, when no rating is given for a user, item combination? I asume distance would become meaningful again then.

Do you have any suggestions for scientific sources helping to choose an appropiate similarity function?

Regards
Thomas


Am 08.05.2011 19:57, schrieb Sean Owen:
All preferences are "1" in your world. Therefore user vectors are
always like (1,1,...,1). The distance between any two is 0, and the
similarity is 1. This metric is not appropriate for binary data. The
closest thing to what I think you want is the
TanimotoCoefficientsimilarity, but also try LogLikelihoodSimilarity.

Yes, if you have a range of ratings, not just 1, it becomes meaningful
again to look at distance as a similarity metric.

Sean

On Sun, May 8, 2011 at 5:37 PM, Thomas Söhngen<[email protected]>  wrote:
Hello everyone,

I am calculating similiar items with the SIMILARITY_EUCLIDEAN_DISTANCE
class. My input is binary data, users clicking a like button. The output
only generates similarities with a similarity score of "1". It doesn't
calculate all items similiar to each other, but for the items it finds a
similarity, the output is always "1". Why is this?

I don't have the problem, when I also add a "dislike" information, with
input lines "item_id,user_id,1" for a Like interaction and
"item_id,user_id,-1" for dislikes. The similarity lies between 0 and 1 then.

Regards and thanks for suggestions,
Thomas

Reply via email to