On 4/18/2014 6:15 PM, Candygram For Mongo wrote: > We are getting Out Of Memory errors when we try to execute a full import > using the Data Import Handler. This error originally occurred on a > production environment with a database containing 27 million records. Heap > memory was configured for 6GB and the server had 32GB of physical memory. > We have been able to replicate the error on a local system with 6 million > records. We set the memory heap size to 64MB to accelerate the error > replication. The indexing process has been failing in different scenarios. > We have 9 test cases documented. In some of the test cases we increased > the heap size to 128MB. In our first test case we set heap memory to 512MB > which also failed.
One characteristic of a JDBC connection is that unless you tell it otherwise, it will try to retrieve the entire resultset into RAM before any results are delivered to the application. It's not Solr doing this, it's JDBC. In this case, there are 27 million rows in the resultset. It's highly unlikely that this much data (along with the rest of Solr's memory requirements) will fit in 6GB of heap. JDBC has a built-in way to deal with this. It's called fetchSize. By using the batchSize parameter on your JdbcDataSource config, you can set the JDBC fetchSize. Set it to something small, between 100 and 1000, and you'll probably get rid of the OOM problem. http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource If you had been using MySQL, I would have recommended that you set batchSize to -1. This sets fetchSize to Integer.MIN_VALUE, which tells the MySQL driver to stream results instead of trying to either batch them or return everything. I'm pretty sure that the Oracle driver doesn't work this way -- you would have to modify the dataimport source code to use their streaming method. Thanks, Shawn