Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: A colleague of mine found the fastest way to index was to use a RAMDirectory, letting it grow to a pre-defined maximum size, then merging it to a new temporary file-based index to flush it. Repeat this, creating new directories for all the file based indexes then perform

Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Harald Kirsch
On Tue, Jul 06, 2004 at 10:44:40PM -0700, Kevin A. Burton wrote: I'm trying to burn an index of 14M documents. I have two problems. 1. I have to run optimize() every 50k documents or I run out of file handles. this takes TIME and of course is linear to the size of the index so it just

Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Doug Cutting
A mergeFactor of 5000 is a bad idea. If you want to index faster, try increasing minMergeDocs instead. If you have lots of memory this can probably be 5000 or higher. Also, why do you optimize before you're done? That only slows things. Perhaps you have to do it because you've set

Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Julien Nioche
. - Original Message - From: Kevin A. Burton [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 7:44 AM Subject: Most efficient way to index 14M documents (out of memory/file handles) I'm trying to burn an index of 14M documents. I have two problems. 1

Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Doug Cutting
PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 7:44 AM Subject: Most efficient way to index 14M documents (out of memory/file handles) I'm trying to burn an index of 14M documents. I have two problems. 1. I have to run optimize() every 50k documents or I run out

Most efficient way to index 14M documents (out of memory/file handles)

2004-07-06 Thread Kevin A. Burton
I'm trying to burn an index of 14M documents. I have two problems. 1. I have to run optimize() every 50k documents or I run out of file handles. this takes TIME and of course is linear to the size of the index so it just gets slower by the time I complete. It starts to crawl at about 3M