I concur.  Log is a natural choice for lots of measures like this.

On Tue, Nov 23, 2010 at 7:12 AM, Sean Owen <[email protected]> wrote:

> Yes I think the logarithm is a fine choice. The base doesn't matter as
> the scale of ratings doesn't make a difference.
>
> On Tue, Nov 23, 2010 at 2:07 PM, Sebastian Schelter <[email protected]>
> wrote:
> > Hi,
> >
> > I'm currently looking into the last.fm dataset (from
> > http://denoiserthebetter.posterous.com/music-recommendation-datasets) as
> I'm
> > planning to write a magazine article or blogpost on howto create a simple
> > music recommender with Mahout. It should be an easy-to-follow tutorial
> that
> > encourages people to download Mahout and play a little with the
> recommender
> > stuff.
> >
> > The dataset consists of several million
> > (userID,artist,numberOfPlays)-tuples, and my goal is to find the most
> > similar artists and recommend new artists to users. I extracted a 20%
> sample
> > of the data, ignored the numberOfPlays and used an ItembasedRecommender
> with
> > LoglikelihoodSimilarity, did some random tests and got reasonable
> results.
> >
> > Now I wanna go on and include the "strength" of the preference into the
> > computation. What would be the best way to deal with the numberOfPlays? I
> > thought about using the log of the numberOfPlays as rating value and
> > applying PearsonCorrelationSimilarity as measure, would that be a viable
> way
> > to approach this problem?
> >
> > --sebastian
> >
>

Reply via email to