Re: sampling bestseller buyers for recommendations

Valentin Pletzer Mon, 26 Dec 2011 14:33:56 -0800

Thank you and Sean for your immediate advice :) I very much appreciate it.

On Mon, Dec 26, 2011 at 11:25 PM, Ted Dunning <[email protected]> wrote:


> Log-likelihood is very much like PMI (but better).
>
> This is a general recommendation problem, but should not be a problem after
> using the log-likelihood ratio.  It is easy to show that any item that
> cooccurs with everything will have zero score with LLR.
>
> It may also be possible that these common items are prevalent in distinct
> sub-populations.  In that case, you may actually have some strong signal
> there.  In that case, down-sampling common items and downsampling prolific
> consumers is very much a good idea.
>
> Downsampling is better in most cases than reweighting because it has pretty
> much the same effect but makes things run much faster as well.  You might
> as well get both benefits at once.
>
> On Mon, Dec 26, 2011 at 2:20 PM, Valentin Pletzer <[email protected]>
> wrote:
>
> > I am already using Log-likelihood. But since the items are free downloads
> > some items tend to cooccur very often with nearly every other item. So
> > maybe my problem isnt a mahout problem but a more generell recommendation
> > problem?
> >
> > I am thinking about some dampening factor for very popular items or
> > something similar to PMI (
> > http://en.wikipedia.org/wiki/Pointwise_mutual_information)
> >
> > On Mon, Dec 26, 2011 at 11:07 PM, Sean Owen <[email protected]> wrote:
> >
> > > What item similarity metric are you using? Log-likelihood tends to
> > > account for an item's baseline popularity and normalize it away. So a
> > > best-seller isn't similar to an item just because it's a best-seller
> > > and shows up a lot, but because it shows up an unusually large number
> > > of times, even granting it's a best seller. Try that if you're not
> > > already using it.
> > >
> > > On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <[email protected]>
> > > wrote:
> > > > Hi,
> > > >
> > > > I am trying to achieve some item-to-item-recommendations and the
> setup
> > > > works quite well. But one thing I stumbled across is that some items
> > are
> > > so
> > > > popular that they are a recommendation for nearly every other item.
> In
> > > the
> > > > Amazon paper they say that they are sampling the bestseller buying
> > > > customers. Do I have to do this preprocessing step myself or does
> > Mahout
> > > > help with that?
> > > >
> > > > Thanks
> > > > Valentin
> > >
> >
>

Re: sampling bestseller buyers for recommendations

Reply via email to