You might be interested in the Lucene Java contrib/Benchmark task,
which provides an indexing implementation of a download of Wikipedia
(available at http://people.apache.org/~gsingers/wikipedia/)
It is pretty trivial to convert the indexing code to send add
commands to Solr.
HTH,
Grant
On Sep 17, 2007, at 6:06 AM, David Welton wrote:
Hi,
I'm in the process of evaluating solr and sphinx, and have come to
realize that actually having a large data set to run them against
would be handy. However, I'm pretty new to both systems, so thought
that perhaps asking around my produce something useful.
What *I* mean by largish is something that won't fit into memory - say
5 or 6 gigs, which is probably puny for some and huge for others.
BTW, I would also welcome any input from others who have done the
above comparison, although what we'll be using it for is specific
enough that of course I'll need to do my own testing.
Thanks!
--
David N. Welton
http://www.welton.it/davidw/