Thanks for the input Sean, one other question, in the scenario where most of the recommendations are boolean style recommendations (i.e. a csv file that just says that a user has some sort of association with an item), is it fair to say that the tanimoto and loglikelihood coefficients perform better than the other coefficients. I wanted to get a deeper understanding of this as well, thanks for your insight.
> Date: Tue, 3 Jul 2012 19:19:07 +0300 > Subject: Re: ItemSimilarity algorithm > From: [email protected] > To: [email protected] > > Item-item similarity is a property of the information you have on two > items and just those items. Whether there are just those 2 items over > 500K users, or 2M items over 500K users, makes no difference. So no I > don't think that this skew implies you should use any particular > algorithm, by itself. > > I think other considerations tend to dominate. For example very sparse > data makes Pearson / cosine measure not work well. But with so > relatively few items... I imagine it is not so sparse. > > On Tue, Jul 3, 2012 at 6:57 PM, Saikat Kanjilal <[email protected]> wrote: > > > > Hello Everyone,I was reading through the documentation on the different > > itemsimilarity algorithms in mahout and had a question, if one has a > > scenario where the number of items are significantly less than the number > > of users (say 500,000 users to 1000 items) are there particular item > > similarity coefficients (namely logLikelihood or tanimoto coeeficient) that > > lend themself to producing better recommendations, I've read through the > > Mahout in action and the java docs and cant seem to find any clues on this. > > Any insight based on your experience would be much appreciated. > > Regards
