:[EMAIL PROTECTED]
Sent: 15 December 2004 18:43
To: Lucene Users List
Subject: RE: Indexing a large number of DB records
Note that this really includes some extra steps.
You don't need a temp index. Add everything to a single index using a
single IndexWriter instance. No need to call addIndexes
Hello Homam,
The batches I was referring to were batches of DB rows.
Instead of SELECT * FROM table... do SELECT * FROM table ... OFFSET=X
LIMIT=Y.
Don't close IndexWriter - use the single instance.
There is no MakeStable()-like method in Lucene, but you can control the
number of in-memory
Hi Homan
I had a similar problem as you in that I was indexing A LOT of data
Essentially how I got round it was to batch the index.
What I was doing was to add 10,000 documents to a temporary index, use
addIndexes() to merge to temporary index into the live index (which also
optimizes the live
Note that this really includes some extra steps.
You don't need a temp index. Add everything to a single index using a
single IndexWriter instance. No need to call addIndexes nor optimize
until the end. Adding Documents to an index takes a constant amount of
time, regardless of the index size,
Hello,
There are a few things you can do:
1) Don't just pull all rows from the DB at once. Do that in batches.
2) If you can get a Reader from your SqlDataReader, consider this:
Thanks Otis!
What do you mean by building it in batches? Does it
mean I should close the IndexWriter every 1000 rows
and reopen it? Does that releases references to the
document objects so that they can be
garbage-collected?
I'm calling optimize() only at the end.
I agree that 1500 documents is