On Thursday April 5 2007 9:33 am, David Pratt wrote:
> I realize that the amount of RAM needed will be based on the size of the
> index, how many documents and what you are storing in the index itself -
> but some anecdotal information would be helpful. I am looking at an
> index that could reach 20 - 50 million documents. Will a commodity
> server with 2Gb be enough?

IIRC, it's more a function of how quickly you're adding data than total size. 
Though this may be incorrect when merging segments (aka optimizing).  A fast 
disk helps quite a lot too. 

You'll want to configure the IndexWriter for bulk loading.  The relevant items 
are setMergeFactor, which controls how often segments are merged on disk, and 
setMaxBufferedDocs, which controls how many docs are held in RAM before being 
written out.  A higher value for both will be faster, though be aware that an 
index build with a high merge factor is slower to query, so you'd probably 
want to optimize() at the end.  On our indexing server, with ~4kb documents, 
setMaxBufferedDocs(200) uses about 700MB of RAM.  See the Javadocs & Lucene 
In Action for more details.

On the searching front, a dedicated commodity box w/ 2 GB can probably serve 
around 2 million documents (again, depending on document size).  Multiple 
CPUs will let you serve more simultaneous queries.

> I guess it is possible to build a test index with sample data to
> determine this also. Many thanks.

You should probably ask the Lucene list, but please report any test results 
here as well (you could put them on the wiki too).

-- 
Peter Fein   ||   773-575-0694   ||   [EMAIL PROTECTED]
http://www.pobox.com/~pfein/   ||   PGP: 0xCCF6AE6B
irc: [EMAIL PROTECTED]   ||   jabber: [EMAIL PROTECTED]
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to