I'm using perl to indirectly call the solr ExtractingRequestHandler to stream remote documents into a solr index instance. Every 100 URL's I process I do a commit. I've got about 30K documents to be indexed. I'm using a stock, out of the box version of solr 1.4.1 with the necessary schema changes for the fields I'm indexing.

I seem to be running into performance problems about 40 documents in. I start getting Failed: 500 read timeouts that last about 4 minutes each slowing processing down to a crawl. I've tried a later version of tika (0.8) and that didn't seem to help. I'm also not sure it's the problem.

Given I'm using a pretty much unaltered version of Solr could it be my problem? I'm running everything under a typical Tomcat install on a Linux VM. I understand there are performance tweaks I can make to the Solr config but would like to focus them first on resolving this problem rather than blanket tweaking the entire config.

Is there anything in particular I should look at? Can I provide any more information?


Thanks - Tod

Reply via email to