Re: largish test data set?
You might be interested in the Lucene Java contrib/Benchmark task, which provides an indexing implementation of a download of Wikipedia (available at http://people.apache.org/~gsingers/wikipedia/) It is pretty trivial to convert the indexing code to send add commands to Solr. HTH, Grant On Sep 17, 2007, at 6:06 AM, David Welton wrote: Hi, I'm in the process of evaluating solr and sphinx, and have come to realize that actually having a large data set to run them against would be handy. However, I'm pretty new to both systems, so thought that perhaps asking around my produce something useful. What *I* mean by largish is something that won't fit into memory - say 5 or 6 gigs, which is probably puny for some and huge for others. BTW, I would also welcome any input from others who have done the above comparison, although what we'll be using it for is specific enough that of course I'll need to do my own testing. Thanks! -- David N. Welton http://www.welton.it/davidw/
Re: largish test data set?
Hi Yonik. Do you have any performance statistics about those changes? Is it possible to upgrade to this new Lucene version using the Solr 1.2 stable version? Regards, Daniel On 17/9/07 17:37, Yonik Seeley [EMAIL PROTECTED] wrote: If you want to see what performance will be like on the next release, you could try upgrading Solr's internal version of lucene to trunk (current dev version)... there have been some fantastic improvements in indexing speed. For query speed/throughput, Solr 1.2 or trunk should do fine. -Yonik On 9/17/07, David Welton [EMAIL PROTECTED] wrote: Hi, I'm in the process of evaluating solr and sphinx, and have come to realize that actually having a large data set to run them against would be handy. However, I'm pretty new to both systems, so thought that perhaps asking around my produce something useful. What *I* mean by largish is something that won't fit into memory - say 5 or 6 gigs, which is probably puny for some and huge for others. BTW, I would also welcome any input from others who have done the above comparison, although what we'll be using it for is specific enough that of course I'll need to do my own testing. Thanks! -- David N. Welton http://www.welton.it/davidw/ http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: largish test data set?
If you want to see what performance will be like on the next release, you could try upgrading Solr's internal version of lucene to trunk (current dev version)... there have been some fantastic improvements in indexing speed. For query speed/throughput, Solr 1.2 or trunk should do fine. -Yonik On 9/17/07, David Welton [EMAIL PROTECTED] wrote: Hi, I'm in the process of evaluating solr and sphinx, and have come to realize that actually having a large data set to run them against would be handy. However, I'm pretty new to both systems, so thought that perhaps asking around my produce something useful. What *I* mean by largish is something that won't fit into memory - say 5 or 6 gigs, which is probably puny for some and huge for others. BTW, I would also welcome any input from others who have done the above comparison, although what we'll be using it for is specific enough that of course I'll need to do my own testing. Thanks! -- David N. Welton http://www.welton.it/davidw/
Re: largish test data set?
17 sep 2007 kl. 12.06 skrev David Welton: I'm in the process of evaluating solr and sphinx, and have come to realize that actually having a large data set to run them against would be handy. However, I'm pretty new to both systems, so thought that perhaps asking around my produce something useful. What *I* mean by largish is something that won't fit into memory - say 5 or 6 gigs, which is probably puny for some and huge for others. IMDB is about 1.2GB of data: http://www.imdb.com/interfaces#plain You can extract real queries from the TPB data collection, it should contain about 1M queries in the movie category: http://torrents.thepiratebay.org/3783572/ db_dump_and_query_log_from_piratebay.org__summer_of_2006.3783572.TPB.tor rent -- karl