Re: ItemSimilarity algorithm

Ted Dunning Thu, 05 Jul 2012 10:05:28 -0700

On Thu, Jul 5, 2012 at 8:36 AM, Saikat Kanjilal <[email protected]> wrote:


>
> Thanks for the input Sean, one other question, in the scenario where most
> of the recommendations are boolean style recommendations (i.e. a csv file
> that just says that a user has some sort of association with an item), is
> it fair to say that the tanimoto and loglikelihood coefficients perform
> better than the other coefficients.  I wanted to get a deeper understanding
> of this as well, thanks for your insight.
>

That would definitely be my expectation.



>
> > Date: Tue, 3 Jul 2012 19:19:07 +0300
> > Subject: Re: ItemSimilarity algorithm
> > From: [email protected]
> > To: [email protected]
> >
> > Item-item similarity is a property of the information you have on two
> > items and just those items. Whether there are just those 2 items over
> > 500K users, or 2M items over 500K users, makes no difference. So no I
> > don't think that this skew implies you should use any particular
> > algorithm, by itself.
> >
> > I think other considerations tend to dominate. For example very sparse
> > data makes Pearson / cosine measure not work well. But with so
> > relatively few items... I imagine it is not so sparse.
> >
> > On Tue, Jul 3, 2012 at 6:57 PM, Saikat Kanjilal <[email protected]>
> wrote:
> > >
> > > Hello Everyone,I was reading through the documentation on the
> different itemsimilarity algorithms in mahout and had a question, if one
> has a scenario where the number of items are significantly less  than the
> number of users (say 500,000 users to 1000 items) are there particular item
> similarity coefficients (namely logLikelihood or tanimoto coeeficient) that
> lend themself to producing better recommendations, I've read through the
> Mahout in action and the java docs and cant seem to find any clues on this.
>  Any insight based on your experience would be much appreciated.
> > > Regards
>

Re: ItemSimilarity algorithm

Reply via email to