On Thu, Jul 5, 2012 at 8:36 AM, Saikat Kanjilal <[email protected]> wrote:
> > Thanks for the input Sean, one other question, in the scenario where most > of the recommendations are boolean style recommendations (i.e. a csv file > that just says that a user has some sort of association with an item), is > it fair to say that the tanimoto and loglikelihood coefficients perform > better than the other coefficients. I wanted to get a deeper understanding > of this as well, thanks for your insight. > That would definitely be my expectation. > > > Date: Tue, 3 Jul 2012 19:19:07 +0300 > > Subject: Re: ItemSimilarity algorithm > > From: [email protected] > > To: [email protected] > > > > Item-item similarity is a property of the information you have on two > > items and just those items. Whether there are just those 2 items over > > 500K users, or 2M items over 500K users, makes no difference. So no I > > don't think that this skew implies you should use any particular > > algorithm, by itself. > > > > I think other considerations tend to dominate. For example very sparse > > data makes Pearson / cosine measure not work well. But with so > > relatively few items... I imagine it is not so sparse. > > > > On Tue, Jul 3, 2012 at 6:57 PM, Saikat Kanjilal <[email protected]> > wrote: > > > > > > Hello Everyone,I was reading through the documentation on the > different itemsimilarity algorithms in mahout and had a question, if one > has a scenario where the number of items are significantly less than the > number of users (say 500,000 users to 1000 items) are there particular item > similarity coefficients (namely logLikelihood or tanimoto coeeficient) that > lend themself to producing better recommendations, I've read through the > Mahout in action and the java docs and cant seem to find any clues on this. > Any insight based on your experience would be much appreciated. > > > Regards >
