RE: ItemSimilarity algorithm

Saikat Kanjilal Thu, 05 Jul 2012 08:37:16 -0700

Thanks for the input Sean, one other question, in the scenario where most of 
the recommendations are boolean style recommendations (i.e. a csv file that 
just says that a user has some sort of association with an item), is it fair to 
say that the tanimoto and loglikelihood coefficients perform better than the 
other coefficients.  I wanted to get a deeper understanding of this as well, 
thanks for your insight.


> Date: Tue, 3 Jul 2012 19:19:07 +0300
> Subject: Re: ItemSimilarity algorithm
> From: [email protected]
> To: [email protected]
> 
> Item-item similarity is a property of the information you have on two
> items and just those items. Whether there are just those 2 items over
> 500K users, or 2M items over 500K users, makes no difference. So no I
> don't think that this skew implies you should use any particular
> algorithm, by itself.
> 
> I think other considerations tend to dominate. For example very sparse
> data makes Pearson / cosine measure not work well. But with so
> relatively few items... I imagine it is not so sparse.
> 
> On Tue, Jul 3, 2012 at 6:57 PM, Saikat Kanjilal <[email protected]> wrote:
> >
> > Hello Everyone,I was reading through the documentation on the different 
> > itemsimilarity algorithms in mahout and had a question, if one has a 
> > scenario where the number of items are significantly less  than the number 
> > of users (say 500,000 users to 1000 items) are there particular item 
> > similarity coefficients (namely logLikelihood or tanimoto coeeficient) that 
> > lend themself to producing better recommendations, I've read through the 
> > Mahout in action and the java docs and cant seem to find any clues on this. 
> >  Any insight based on your experience would be much appreciated.
> > Regards

RE: ItemSimilarity algorithm

Reply via email to