Re: DataImportHandler running out of memory

Shalin Shekhar Mangar Wed, 25 Jun 2008 04:12:44 -0700

The OP is actually using Sql Server (not MySql) as per his mail.

On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:


> I'm assuming, of course, that the DIH doesn't automatically modify the SQL
> statement according to the batch size.
>
> -Grant
>
>
> On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:
>
>  I think it's a bit different.  I ran into this exact problem about two
>> weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch size for
>> it's v5 JDBC driver.
>>
>> See
>> http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
>> do a search for MySQL fetch size.
>>
>> You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work)
>> in order to get streaming in MySQL.
>>
>> -Grant
>>
>>
>> On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:
>>
>>  Setting the batchSize to 10000 would mean that the Jdbc driver will keep
>>> 10000 rows in memory *for each entity* which uses that data source (if
>>> correctly implemented by the driver). Not sure how well the Sql Server
>>> driver implements this. Also keep in mind that Solr also needs memory to
>>> index documents. You can probably try setting the batch size to a lower
>>> value.
>>>
>>> The regular memory tuning stuff should apply here too -- try disabling
>>> autoCommit and turn-off autowarming and see if it helps.
>>>
>>> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>> I'm trying to load ~10 million records into Solr using the
>>>> DataImportHandler.
>>>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
>>>> as
>>>> soon as I try loading more than about 5 million records.
>>>>
>>>> Here's my configuration:
>>>> I'm connecting to a SQL Server database using the sqljdbc driver. I've
>>>> given
>>>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
>>>> to
>>>> 10000. My SQL query is "select top XXX field1, ... from table1". I have
>>>> about 40 fields in my Solr schema.
>>>>
>>>> I thought the DataImportHandler would stream data from the DB rather
>>>> than
>>>> loading it all into memory at once. Is that not the case? Any thoughts
>>>> on
>>>> how to get around this (aside from getting a machine with more memory)?
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: DataImportHandler running out of memory

Reply via email to