I'm curious, with 8 million users and only 32 products, your data might not be sparse enough (never thought that would be a problem). You might have enough users that purchased a high enough percentage of your products that you end up with a every item to every items recommendation.
On Thu, Sep 6, 2012 at 8:54 AM, Thomas, Sebastien < [email protected]> wrote: > Thanks for your reply! But all the others give me pretty similar results. > > Pearson: -0.14<similariry<0.12 > Uncentered_cosine: 0.79<similarity<0.85 > Tanimoto: 0.001<similarity<0.2 > Loglikelyhood: 0.8<similarity<0.99 > > Thanks > > -----Original Message----- > From: Sean Owen [mailto:[email protected]] > Sent: Thursday, September 06, 2012 11:27 AM > To: [email protected] > Subject: Re: Simple Result Interpretation Question > > This sounds like rounding error. If I recall correctly the Euclidean > distance is converted to similarity with a function like 1/(1+d). I suppose > the embedded assumption is that distances are "not extremely small". If > your vector space has small values and distances are commonly 0.000001 or > something, the results would always be near 1. > > You can make up another translation to [0,1], or scale your values if > that's the cause. Or try another metric; basing on the Euclidean distance > has always been a bit artificial. > > On Thu, Sep 6, 2012 at 4:13 PM, Thomas, Sebastien < > [email protected]> wrote: > > Hi community, > > > > I am new to mahout and I am looking for some hint. I am running the > "itemsimilarity", I have about 8 million users and 32 items. My output file > (with the format: <item1, item2, similarity>) is basically telling me that > all my items are similar (if my interpretation is right). For example, all > the similarities are 1s when I run the EUCLIDEAN_DISTANCE similarity class. > > > > I would appreciate any help to understand and know what to do. > > > > Thank you > > > > Sebastien > -- Thanks, John C
