RE: Indexing a large number of DB records

2004-12-16 Thread Garrett Heaver
:[EMAIL PROTECTED] Sent: 15 December 2004 18:43 To: Lucene Users List Subject: RE: Indexing a large number of DB records Note that this really includes some extra steps. You don't need a temp index. Add everything to a single index using a single IndexWriter instance. No need to call addIndexes

Re: Indexing a large number of DB records

2004-12-15 Thread Otis Gospodnetic
Hello Homam, The batches I was referring to were batches of DB rows. Instead of SELECT * FROM table... do SELECT * FROM table ... OFFSET=X LIMIT=Y. Don't close IndexWriter - use the single instance. There is no MakeStable()-like method in Lucene, but you can control the number of in-memory

RE: Indexing a large number of DB records

2004-12-15 Thread Garrett Heaver
Hi Homan I had a similar problem as you in that I was indexing A LOT of data Essentially how I got round it was to batch the index. What I was doing was to add 10,000 documents to a temporary index, use addIndexes() to merge to temporary index into the live index (which also optimizes the live

RE: Indexing a large number of DB records

2004-12-15 Thread Otis Gospodnetic
Note that this really includes some extra steps. You don't need a temp index. Add everything to a single index using a single IndexWriter instance. No need to call addIndexes nor optimize until the end. Adding Documents to an index takes a constant amount of time, regardless of the index size,

Re: Indexing a large number of DB records

2004-12-14 Thread Otis Gospodnetic
Hello, There are a few things you can do: 1) Don't just pull all rows from the DB at once. Do that in batches. 2) If you can get a Reader from your SqlDataReader, consider this:

Re: Indexing a large number of DB records

2004-12-14 Thread Homam S.A.
Thanks Otis! What do you mean by building it in batches? Does it mean I should close the IndexWriter every 1000 rows and reopen it? Does that releases references to the document objects so that they can be garbage-collected? I'm calling optimize() only at the end. I agree that 1500 documents is