Re: DataImportHandler running out of memory

Shalin Shekhar Mangar Wed, 25 Jun 2008 10:18:10 -0700

Hi,

I don't think the problem is within DataImportHandler since it just streams
resultset. The fetchSize is just passed as a parameter passed to
Statement#setFetchSize() and the Jdbc driver is supposed to honor it and
keep only that many rows in memory.


From what I could find about the Sql Server driver -- there's a connection
property called responseBuffering whose default value is "full" which causes
the entire result set is fetched. See
http://msdn.microsoft.com/en-us/library/ms378988.aspx for more details. You
can set connection properties like this directly in the jdbc url specified
in DataImportHandler's dataSource configuration.

On Wed, Jun 25, 2008 at 10:17 PM, wojtekpia <[EMAIL PROTECTED]> wrote:

>
> I'm trying with batchSize=-1 now. So far it seems to be working, but very
> slowly. I will update when it completes or crashes.
>
> Even with a batchSize of 100 I was running out of memory.
>
> I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
> believe that's the maximum for my environment.
>
> The batchSize parameter doesn't seem to control what happens... when I
> select top 5,000,000 with a batchSize of 10,000, it works. When I select
> top
> 10,000,000 with the same batchSize, it runs out of memory.
>
> Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >
> > DIH streams rows one by one.
> > set the fetchSize="-1" this might help. It may make the indexing a bit
> > slower but memory consumption would be low.
> > The memory is consumed by the jdbc driver. try tuning the -Xmx value for
> > the VM
> > --Noble
> >
> > On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
> > <[EMAIL PROTECTED]> wrote:
> >> Setting the batchSize to 10000 would mean that the Jdbc driver will keep
> >> 10000 rows in memory *for each entity* which uses that data source (if
> >> correctly implemented by the driver). Not sure how well the Sql Server
> >> driver implements this. Also keep in mind that Solr also needs memory to
> >> index documents. You can probably try setting the batch size to a lower
> >> value.
> >>
> >> The regular memory tuning stuff should apply here too -- try disabling
> >> autoCommit and turn-off autowarming and see if it helps.
> >>
> >> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]>
> wrote:
> >>
> >>>
> >>> I'm trying to load ~10 million records into Solr using the
> >>> DataImportHandler.
> >>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
> >>> as
> >>> soon as I try loading more than about 5 million records.
> >>>
> >>> Here's my configuration:
> >>> I'm connecting to a SQL Server database using the sqljdbc driver. I've
> >>> given
> >>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
> >>> to
> >>> 10000. My SQL query is "select top XXX field1, ... from table1". I have
> >>> about 40 fields in my Solr schema.
> >>>
> >>> I thought the DataImportHandler would stream data from the DB rather
> >>> than
> >>> loading it all into memory at once. Is that not the case? Any thoughts
> >>> on
> >>> how to get around this (aside from getting a machine with more memory)?
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: DataImportHandler running out of memory

Reply via email to