On 6/6/2013 4:13 AM, Sebastian Steinfeld wrote:
The amout of documents I want to index is 8 million, the first 1,6 million are 
indexed in 2min, but to complete the Import it takes nearly 2 hours.
The size of the index on the hard drive is 610MB.
I started the solr server with 2GB memory.

I read that the duration of indexing might be connected to the batch size, so I 
increased the batchSize in the dataSource to 10.000, but this didn't make any 
differences.
I also tried to disable the autocommit, which is configured in the 
solrconfig.xml. I disabled it by uncommenting it, but this also didn't made any 
differences.

If you are importing from MySQL, you actually want the batchSize to be -1. This streams the results so they don't take up large blocks of memory. Other JDBC drivers have different ways of configuring this mode of operation. You fully redacted the driver and URL in your config file, so I don't know what you are using.

2GB of Java heap for Solr is probably not enough. It's likely that once your index gets big enough, Solr is starved for memory and has to perform constant garbage collections to free up enough for basic operation. I would bet that you also don't have enough free memory for the OS to cache the index well:

http://wiki.apache.org/solr/SolrPerformanceProblems

If you are using 4.x with the updateLog turned on, then you want autoCommit enabled with openSearcher to be false. This is covered on the wiki page I linked.

Thanks,
Shawn

Reply via email to