17 sep 2007 kl. 12.06 skrev David Welton:


I'm in the process of evaluating solr and sphinx, and have come to
realize that actually having a large data set to run them against
would be handy.  However, I'm pretty new to both systems, so thought
that perhaps asking around my produce something useful.

What *I* mean by largish is something that won't fit into memory - say
5 or 6 gigs, which is probably puny for some and huge for others.

IMDB is about 1.2GB of data:

<http://www.imdb.com/interfaces#plain>

You can extract real queries from the TPB data collection, it should contain about 1M queries in the movie category:

<http://torrents.thepiratebay.org/3783572/ db_dump_and_query_log_from_piratebay.org__summer_of_2006.3783572.TPB.tor rent>


--
karl

Reply via email to