17 sep 2007 kl. 12.06 skrev David Welton:
I'm in the process of evaluating solr and sphinx, and have come to
realize that actually having a large data set to run them against
would be handy. However, I'm pretty new to both systems, so thought
that perhaps asking around my produce something useful.
What *I* mean by largish is something that won't fit into memory - say
5 or 6 gigs, which is probably puny for some and huge for others.
IMDB is about 1.2GB of data:
<http://www.imdb.com/interfaces#plain>
You can extract real queries from the TPB data collection, it should
contain about 1M queries in the movie category:
<http://torrents.thepiratebay.org/3783572/
db_dump_and_query_log_from_piratebay.org__summer_of_2006.3783572.TPB.tor
rent>
--
karl