Another dataset to play with is this compilation of song listenings scraped from the last.fm API:

http://mtg.upf.edu/node/1671.

Should include about 20M ratings.

--sebastian

On 08.07.2011 09:17, Sean Owen wrote:
The link is http://www.occamslab.com/petricek/data/

The KDD or Netflix data are plenty big to play with. How big is big for your
purpose?

On Fri, Jul 8, 2011 at 7:05 AM, web service<[email protected]>  wrote:

Is it taken offline as well ?

On Thu, Jul 7, 2011 at 10:40 PM, Alex Kozlov<[email protected]>  wrote:

There is still a libimseti dataset
http://www.occamslab.com/petricek/datawith 17,359,346 ratings.  People
are scared after the Netflix lawsuit.

On Thu, Jul 7, 2011 at 10:17 PM, Ted Dunning<[email protected]>
wrote:

Those are both reasonably large, but not commercial in scale.

At Veoh, we had about 10 non-zero elements in our raw data.  I think
Netflix
has 100 million.

On Thu, Jul 7, 2011 at 8:05 PM, Lance Norskog<[email protected]>
wrote:

What recommendation datasets, that are available, are considered
"large" by Mahout testing standards? Yahoo KDD Cup is offline, the
Netflix data went under a cloud...

--
Lance Norskog
[email protected]






Reply via email to