I'm curious, with 8 million users and only 32 products, your data might not
be sparse enough (never thought that would be a problem).  You might have
enough users that purchased a high enough percentage of your products that
you end up with a every item to every items recommendation.



On Thu, Sep 6, 2012 at 8:54 AM, Thomas, Sebastien <
[email protected]> wrote:

> Thanks for your reply! But all the others give me pretty similar results.
>
> Pearson: -0.14<similariry<0.12
> Uncentered_cosine: 0.79<similarity<0.85
> Tanimoto: 0.001<similarity<0.2
> Loglikelyhood: 0.8<similarity<0.99
>
> Thanks
>
> -----Original Message-----
> From: Sean Owen [mailto:[email protected]]
> Sent: Thursday, September 06, 2012 11:27 AM
> To: [email protected]
> Subject: Re: Simple Result Interpretation Question
>
> This sounds like rounding error. If I recall correctly the Euclidean
> distance is converted to similarity with a function like 1/(1+d). I suppose
> the embedded assumption is that distances are "not extremely small". If
> your vector space has small values and distances are commonly 0.000001 or
> something, the results would always be near 1.
>
> You can make up another translation to [0,1], or scale your values if
> that's the cause. Or try another metric; basing on the Euclidean distance
> has always been a bit artificial.
>
> On Thu, Sep 6, 2012 at 4:13 PM, Thomas, Sebastien <
> [email protected]> wrote:
> > Hi community,
> >
> > I am new to mahout and I am looking for some hint. I am running the
> "itemsimilarity", I have about 8 million users and 32 items. My output file
> (with the format: <item1, item2, similarity>) is basically telling me that
> all my items are similar (if my interpretation is right). For example, all
> the similarities are 1s when I run the EUCLIDEAN_DISTANCE similarity class.
> >
> > I would appreciate any help to understand and know what to do.
> >
> > Thank you
> >
> > Sebastien
>



-- 

Thanks,
John C

Reply via email to