On Thursday April 5 2007 9:33 am, David Pratt wrote: > I realize that the amount of RAM needed will be based on the size of the > index, how many documents and what you are storing in the index itself - > but some anecdotal information would be helpful. I am looking at an > index that could reach 20 - 50 million documents. Will a commodity > server with 2Gb be enough?
IIRC, it's more a function of how quickly you're adding data than total size. Though this may be incorrect when merging segments (aka optimizing). A fast disk helps quite a lot too. You'll want to configure the IndexWriter for bulk loading. The relevant items are setMergeFactor, which controls how often segments are merged on disk, and setMaxBufferedDocs, which controls how many docs are held in RAM before being written out. A higher value for both will be faster, though be aware that an index build with a high merge factor is slower to query, so you'd probably want to optimize() at the end. On our indexing server, with ~4kb documents, setMaxBufferedDocs(200) uses about 700MB of RAM. See the Javadocs & Lucene In Action for more details. On the searching front, a dedicated commodity box w/ 2 GB can probably serve around 2 million documents (again, depending on document size). Multiple CPUs will let you serve more simultaneous queries. > I guess it is possible to build a test index with sample data to > determine this also. Many thanks. You should probably ask the Lucene list, but please report any test results here as well (you could put them on the wiki too). -- Peter Fein || 773-575-0694 || [EMAIL PROTECTED] http://www.pobox.com/~pfein/ || PGP: 0xCCF6AE6B irc: [EMAIL PROTECTED] || jabber: [EMAIL PROTECTED] _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
