Yes I think the logarithm is a fine choice. The base doesn't matter as
the scale of ratings doesn't make a difference.

On Tue, Nov 23, 2010 at 2:07 PM, Sebastian Schelter <[email protected]> wrote:
> Hi,
>
> I'm currently looking into the last.fm dataset (from
> http://denoiserthebetter.posterous.com/music-recommendation-datasets) as I'm
> planning to write a magazine article or blogpost on howto create a simple
> music recommender with Mahout. It should be an easy-to-follow tutorial that
> encourages people to download Mahout and play a little with the recommender
> stuff.
>
> The dataset consists of several million
> (userID,artist,numberOfPlays)-tuples, and my goal is to find the most
> similar artists and recommend new artists to users. I extracted a 20% sample
> of the data, ignored the numberOfPlays and used an ItembasedRecommender with
> LoglikelihoodSimilarity, did some random tests and got reasonable results.
>
> Now I wanna go on and include the "strength" of the preference into the
> computation. What would be the best way to deal with the numberOfPlays? I
> thought about using the log of the numberOfPlays as rating value and
> applying PearsonCorrelationSimilarity as measure, would that be a viable way
> to approach this problem?
>
> --sebastian
>

Reply via email to