My limited experience with larger indexes is:  
1) the logistics of copying around and backing up this much data, and
2) indexing is disk-bound. We're on SAS disks and it makes no difference
between one indexing thread and a dozen (we have small records).

Smaller returns are faster. You need to limit the search results via as many
parameters as you can, and filters are the way to do this.

-----Original Message-----
From: Walter Underwood [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: dataset parameters suitable for lucene application

No one can answer that, because it depends on how you configure Solr.
How many fields do you want to search? Are you using fuzzy search?
Facets? Highlighting?

We are searching a much smaller collection, about 250K docs, with great
success. We see 80 queries/sec on each of four servers, and response times
under 100ms. Each query searches against seven fields and we don't use any
of the features I listed above.

wunder

On 9/26/07 10:50 AM, "Law, John" <[EMAIL PROTECTED]> wrote:

> Thanks all! One last question...
> 
> If I had a collection of 2.5 billion docs and a demand averaging 200 
> queries per second, what's the confidence that Solr/Lucene could 
> handle this volume and execute search with sub-second response times?
> 
> 
> -----Original Message-----
> From: Charlie Jackson [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 26, 2007 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: RE: dataset parameters suitable for lucene application
> 
> Sorry, I meant that it maxed out in the sense that my maxDoc field on 
> the stats page was 8.8 million, which indicates that the most docs it 
> has ever had was around 8.8 million. It's down to about 7.8 million 
> currently. I have seen no signs of a "maximum" number of docs Solr can 
> handle.
> 
> 
> -----Original Message-----
> From: Chris Harris [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 26, 2007 11:49 AM
> To: solr-user@lucene.apache.org
> Subject: Re: dataset parameters suitable for lucene application
> 
> By "maxed out" do you mean that Solr's performance became unacceptable 
> beyond 8.8M records, or that you only had 8.8M records to index? If 
> the former, can you share the particular symptoms?
> 
> On 9/26/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:
>> My experiences so far with this level of data have been good.
>> 
>> Number of records: Maxed out at 8.8 million Database size: friggin 
>> huge (100+ GB) Index size: ~24 GB
>> 
>> 1) It took me about a day to index 8 million docs using a
> non-optimized
>> program I wrote. It's non-optimized in the sense that it's not 
>> multi-threaded. It batched together groups of about 5,000 docs at a
> time
>> to be indexed.
>> 
>> 2) Search times for a basic search are almost always sub-second. If 
>> we toss in some faceting, it takes a little longer, but I've hardly 
>> ever seen it go above 1-2 seconds even with the most advanced queries.
>> 
>> Hope that helps.
>> 
>> 
>> Charlie
>> 
>> ____________________________________________
>> 
>> -----Original Message-----
>> From: Law, John [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, September 26, 2007 9:28 AM
>> To: solr-user@lucene.apache.org
>> Subject: dataset parameters suitable for lucene application
>> 
>> I am new to the list and new to lucene and solr. I am considering
> Lucene
>> for a potential new application and need to know how well it scales.
>> 
>> Following are the parameters of the dataset.
>> 
>> Number of records: 7+ million
>> Database size: 13.3 GB
>> Index Size:  10.9 GB
>> 
>> My questions are simply:
>> 
>> 1) Approximately how long would it take Lucene to index these
> documents?
>> 2) What would the approximate retrieval time be (i.e. search response 
>> time)?
>> 
>> Can someone provide me with some informed guidance in this regard?
>> 
>> Thanks in advance,
>> John
>> 
>> ______________________________________________
>> John Law
>> Director, Platform Management
>> ProQuest
>> 789 Eisenhower Parkway
>> Ann Arbor, MI 48106
>> 734-997-4877
>> [EMAIL PROTECTED]
>> www.proquest.com
>> www.csa.com
>> 
>> ProQuest... Start here.
>> 
>> 
>> 
>> 

Reply via email to