I'm trying with batchSize=-1 now. So far it seems to be working, but very slowly. I will update when it completes or crashes.
Even with a batchSize of 100 I was running out of memory. I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I believe that's the maximum for my environment. The batchSize parameter doesn't seem to control what happens... when I select top 5,000,000 with a batchSize of 10,000, it works. When I select top 10,000,000 with the same batchSize, it runs out of memory. Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM. Noble Paul നോബിള് नोब्ळ् wrote: > > DIH streams rows one by one. > set the fetchSize="-1" this might help. It may make the indexing a bit > slower but memory consumption would be low. > The memory is consumed by the jdbc driver. try tuning the -Xmx value for > the VM > --Noble > > On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar > <[EMAIL PROTECTED]> wrote: >> Setting the batchSize to 10000 would mean that the Jdbc driver will keep >> 10000 rows in memory *for each entity* which uses that data source (if >> correctly implemented by the driver). Not sure how well the Sql Server >> driver implements this. Also keep in mind that Solr also needs memory to >> index documents. You can probably try setting the batch size to a lower >> value. >> >> The regular memory tuning stuff should apply here too -- try disabling >> autoCommit and turn-off autowarming and see if it helps. >> >> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]> wrote: >> >>> >>> I'm trying to load ~10 million records into Solr using the >>> DataImportHandler. >>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) >>> as >>> soon as I try loading more than about 5 million records. >>> >>> Here's my configuration: >>> I'm connecting to a SQL Server database using the sqljdbc driver. I've >>> given >>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize >>> to >>> 10000. My SQL query is "select top XXX field1, ... from table1". I have >>> about 40 fields in my Solr schema. >>> >>> I thought the DataImportHandler would stream data from the DB rather >>> than >>> loading it all into memory at once. Is that not the case? Any thoughts >>> on >>> how to get around this (aside from getting a machine with more memory)? >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html Sent from the Solr - User mailing list archive at Nabble.com.