[EMAIL PROTECTED] wrote:
A colleague of mine found the fastest way to index was to use a RAMDirectory, letting it grow
to a pre-defined maximum size, then merging it to a new temporary file-based index to
flush it. Repeat this, creating new directories for all the file based indexes then perform
On Tue, Jul 06, 2004 at 10:44:40PM -0700, Kevin A. Burton wrote:
I'm trying to burn an index of 14M documents.
I have two problems.
1. I have to run optimize() every 50k documents or I run out of file
handles. this takes TIME and of course is linear to the size of the
index so it just
A mergeFactor of 5000 is a bad idea. If you want to index faster, try
increasing minMergeDocs instead. If you have lots of memory this can
probably be 5000 or higher.
Also, why do you optimize before you're done? That only slows things.
Perhaps you have to do it because you've set
.
- Original Message -
From: Kevin A. Burton [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 7:44 AM
Subject: Most efficient way to index 14M documents (out of memory/file
handles)
I'm trying to burn an index of 14M documents.
I have two problems.
1
PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 7:44 AM
Subject: Most efficient way to index 14M documents (out of memory/file
handles)
I'm trying to burn an index of 14M documents.
I have two problems.
1. I have to run optimize() every 50k documents or I run out
I'm trying to burn an index of 14M documents.
I have two problems.
1. I have to run optimize() every 50k documents or I run out of file
handles. this takes TIME and of course is linear to the size of the
index so it just gets slower by the time I complete. It starts to crawl
at about 3M