Hi, I don't think the problem is within DataImportHandler since it just streams resultset. The fetchSize is just passed as a parameter passed to Statement#setFetchSize() and the Jdbc driver is supposed to honor it and keep only that many rows in memory.
From what I could find about the Sql Server driver -- there's a connection property called responseBuffering whose default value is "full" which causes the entire result set is fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for more details. You can set connection properties like this directly in the jdbc url specified in DataImportHandler's dataSource configuration. On Wed, Jun 25, 2008 at 10:17 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > > I'm trying with batchSize=-1 now. So far it seems to be working, but very > slowly. I will update when it completes or crashes. > > Even with a batchSize of 100 I was running out of memory. > > I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I > believe that's the maximum for my environment. > > The batchSize parameter doesn't seem to control what happens... when I > select top 5,000,000 with a batchSize of 10,000, it works. When I select > top > 10,000,000 with the same batchSize, it runs out of memory. > > Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM. > > > Noble Paul നോബിള് नोब्ळ् wrote: > > > > DIH streams rows one by one. > > set the fetchSize="-1" this might help. It may make the indexing a bit > > slower but memory consumption would be low. > > The memory is consumed by the jdbc driver. try tuning the -Xmx value for > > the VM > > --Noble > > > > On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar > > <[EMAIL PROTECTED]> wrote: > >> Setting the batchSize to 10000 would mean that the Jdbc driver will keep > >> 10000 rows in memory *for each entity* which uses that data source (if > >> correctly implemented by the driver). Not sure how well the Sql Server > >> driver implements this. Also keep in mind that Solr also needs memory to > >> index documents. You can probably try setting the batch size to a lower > >> value. > >> > >> The regular memory tuning stuff should apply here too -- try disabling > >> autoCommit and turn-off autowarming and see if it helps. > >> > >> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]> > wrote: > >> > >>> > >>> I'm trying to load ~10 million records into Solr using the > >>> DataImportHandler. > >>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) > >>> as > >>> soon as I try loading more than about 5 million records. > >>> > >>> Here's my configuration: > >>> I'm connecting to a SQL Server database using the sqljdbc driver. I've > >>> given > >>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize > >>> to > >>> 10000. My SQL query is "select top XXX field1, ... from table1". I have > >>> about 40 fields in my Solr schema. > >>> > >>> I thought the DataImportHandler would stream data from the DB rather > >>> than > >>> loading it all into memory at once. Is that not the case? Any thoughts > >>> on > >>> how to get around this (aside from getting a machine with more memory)? > >>> > >>> -- > >>> View this message in context: > >>> > http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html > >>> Sent from the Solr - User mailing list archive at Nabble.com. > >>> > >>> > >> > >> > >> -- > >> Regards, > >> Shalin Shekhar Mangar. > >> > > > > > > > > -- > > --Noble Paul > > > > > > -- > View this message in context: > http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.