Thank you and Sean for your immediate advice :) I very much appreciate it. On Mon, Dec 26, 2011 at 11:25 PM, Ted Dunning <[email protected]> wrote:
> Log-likelihood is very much like PMI (but better). > > This is a general recommendation problem, but should not be a problem after > using the log-likelihood ratio. It is easy to show that any item that > cooccurs with everything will have zero score with LLR. > > It may also be possible that these common items are prevalent in distinct > sub-populations. In that case, you may actually have some strong signal > there. In that case, down-sampling common items and downsampling prolific > consumers is very much a good idea. > > Downsampling is better in most cases than reweighting because it has pretty > much the same effect but makes things run much faster as well. You might > as well get both benefits at once. > > On Mon, Dec 26, 2011 at 2:20 PM, Valentin Pletzer <[email protected]> > wrote: > > > I am already using Log-likelihood. But since the items are free downloads > > some items tend to cooccur very often with nearly every other item. So > > maybe my problem isnt a mahout problem but a more generell recommendation > > problem? > > > > I am thinking about some dampening factor for very popular items or > > something similar to PMI ( > > http://en.wikipedia.org/wiki/Pointwise_mutual_information) > > > > On Mon, Dec 26, 2011 at 11:07 PM, Sean Owen <[email protected]> wrote: > > > > > What item similarity metric are you using? Log-likelihood tends to > > > account for an item's baseline popularity and normalize it away. So a > > > best-seller isn't similar to an item just because it's a best-seller > > > and shows up a lot, but because it shows up an unusually large number > > > of times, even granting it's a best seller. Try that if you're not > > > already using it. > > > > > > On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <[email protected]> > > > wrote: > > > > Hi, > > > > > > > > I am trying to achieve some item-to-item-recommendations and the > setup > > > > works quite well. But one thing I stumbled across is that some items > > are > > > so > > > > popular that they are a recommendation for nearly every other item. > In > > > the > > > > Amazon paper they say that they are sampling the bestseller buying > > > > customers. Do I have to do this preprocessing step myself or does > > Mahout > > > > help with that? > > > > > > > > Thanks > > > > Valentin > > > > > >
