Another dataset to play with is this compilation of song listenings
scraped from the last.fm API:
http://mtg.upf.edu/node/1671.
Should include about 20M ratings.
--sebastian
On 08.07.2011 09:17, Sean Owen wrote:
The link is http://www.occamslab.com/petricek/data/
The KDD or Netflix data are plenty big to play with. How big is big for your
purpose?
On Fri, Jul 8, 2011 at 7:05 AM, web service<[email protected]> wrote:
Is it taken offline as well ?
On Thu, Jul 7, 2011 at 10:40 PM, Alex Kozlov<[email protected]> wrote:
There is still a libimseti dataset
http://www.occamslab.com/petricek/datawith 17,359,346 ratings. People
are scared after the Netflix lawsuit.
On Thu, Jul 7, 2011 at 10:17 PM, Ted Dunning<[email protected]>
wrote:
Those are both reasonably large, but not commercial in scale.
At Veoh, we had about 10 non-zero elements in our raw data. I think
Netflix
has 100 million.
On Thu, Jul 7, 2011 at 8:05 PM, Lance Norskog<[email protected]>
wrote:
What recommendation datasets, that are available, are considered
"large" by Mahout testing standards? Yahoo KDD Cup is offline, the
Netflix data went under a cloud...
--
Lance Norskog
[email protected]