Playing with the last.fm dataset

Sebastian Schelter Tue, 23 Nov 2010 06:08:14 -0800

Hi,

I'm currently looking into the last.fm dataset (fromhttp://denoiserthebetter.posterous.com/music-recommendation-datasets) asI'm planning to write a magazine article or blogpost on howto create asimple music recommender with Mahout. It should be an easy-to-followtutorial that encourages people to download Mahout and play a littlewith the recommender stuff.

The dataset consists of several million(userID,artist,numberOfPlays)-tuples, and my goal is to find the mostsimilar artists and recommend new artists to users. I extracted a 20%sample of the data, ignored the numberOfPlays and used anItembasedRecommender with LoglikelihoodSimilarity, did some random testsand got reasonable results.

Now I wanna go on and include the "strength" of the preference into thecomputation. What would be the best way to deal with the numberOfPlays?I thought about using the log of the numberOfPlays as rating value andapplying PearsonCorrelationSimilarity as measure, would that be a viableway to approach this problem?


--sebastian

Playing with the last.fm dataset

Reply via email to