Thanks for your reply! But all the others give me pretty similar results.

Pearson: -0.14<similariry<0.12
Uncentered_cosine: 0.79<similarity<0.85
Tanimoto: 0.001<similarity<0.2
Loglikelyhood: 0.8<similarity<0.99
 
Thanks

-----Original Message-----
From: Sean Owen [mailto:[email protected]] 
Sent: Thursday, September 06, 2012 11:27 AM
To: [email protected]
Subject: Re: Simple Result Interpretation Question

This sounds like rounding error. If I recall correctly the Euclidean distance 
is converted to similarity with a function like 1/(1+d). I suppose the embedded 
assumption is that distances are "not extremely small". If your vector space 
has small values and distances are commonly 0.000001 or something, the results 
would always be near 1.

You can make up another translation to [0,1], or scale your values if that's 
the cause. Or try another metric; basing on the Euclidean distance has always 
been a bit artificial.

On Thu, Sep 6, 2012 at 4:13 PM, Thomas, Sebastien <[email protected]> 
wrote:
> Hi community,
>
> I am new to mahout and I am looking for some hint. I am running the 
> "itemsimilarity", I have about 8 million users and 32 items. My output file 
> (with the format: <item1, item2, similarity>) is basically telling me that 
> all my items are similar (if my interpretation is right). For example, all 
> the similarities are 1s when I run the EUCLIDEAN_DISTANCE similarity class.
>
> I would appreciate any help to understand and know what to do.
>
> Thank you
>
> Sebastien

Reply via email to