dataset parameters suitable for lucene application

2007-09-26 Thread Law, John
I am new to the list and new to lucene and solr. I am considering Lucene
for a potential new application and need to know how well it scales. 

Following are the parameters of the dataset.

Number of records: 7+ million
Database size: 13.3 GB
Index Size:  10.9 GB 

My questions are simply:

1) Approximately how long would it take Lucene to index these documents?
2) What would the approximate retrieval time be (i.e. search response
time)?

Can someone provide me with some informed guidance in this regard?

Thanks in advance,
John

__
John Law
Director, Platform Management
ProQuest
789 Eisenhower Parkway
Ann Arbor, MI 48106
734-997-4877
[EMAIL PROTECTED]
www.proquest.com
www.csa.com

ProQuest... Start here.





RE: dataset parameters suitable for lucene application

2007-09-26 Thread Law, John
Thanks all! One last question...

If I had a collection of 2.5 billion docs and a demand averaging 200
queries per second, what's the confidence that Solr/Lucene could handle
this volume and execute search with sub-second response times?


-Original Message-
From: Charlie Jackson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 1:32 PM
To: solr-user@lucene.apache.org
Subject: RE: dataset parameters suitable for lucene application

Sorry, I meant that it maxed out in the sense that my maxDoc field on
the stats page was 8.8 million, which indicates that the most docs it
has ever had was around 8.8 million. It's down to about 7.8 million
currently. I have seen no signs of a maximum number of docs Solr can
handle. 


-Original Message-
From: Chris Harris [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: dataset parameters suitable for lucene application

By maxed out do you mean that Solr's performance became unacceptable
beyond 8.8M records, or that you only had 8.8M records to index? If
the former, can you share the particular symptoms?

On 9/26/07, Charlie Jackson [EMAIL PROTECTED] wrote:
 My experiences so far with this level of data have been good.

 Number of records: Maxed out at 8.8 million
 Database size: friggin huge (100+ GB)
 Index size: ~24 GB

 1) It took me about a day to index 8 million docs using a
non-optimized
 program I wrote. It's non-optimized in the sense that it's not
 multi-threaded. It batched together groups of about 5,000 docs at a
time
 to be indexed.

 2) Search times for a basic search are almost always sub-second. If we
 toss in some faceting, it takes a little longer, but I've hardly ever
 seen it go above 1-2 seconds even with the most advanced queries.

 Hope that helps.


 Charlie

 

 -Original Message-
 From: Law, John [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, September 26, 2007 9:28 AM
 To: solr-user@lucene.apache.org
 Subject: dataset parameters suitable for lucene application

 I am new to the list and new to lucene and solr. I am considering
Lucene
 for a potential new application and need to know how well it scales.

 Following are the parameters of the dataset.

 Number of records: 7+ million
 Database size: 13.3 GB
 Index Size:  10.9 GB

 My questions are simply:

 1) Approximately how long would it take Lucene to index these
documents?
 2) What would the approximate retrieval time be (i.e. search response
 time)?

 Can someone provide me with some informed guidance in this regard?

 Thanks in advance,
 John

 __
 John Law
 Director, Platform Management
 ProQuest
 789 Eisenhower Parkway
 Ann Arbor, MI 48106
 734-997-4877
 [EMAIL PROTECTED]
 www.proquest.com
 www.csa.com

 ProQuest... Start here.