Hello everyone,

I'm about to add a few hundred thousand scanned pdf pages WITH OCR into one of my 1.2.x Invenio installations and I was wondering if anyone has been using SOLR for fulltext indexing in production. The reason I'm a bit skeptical is because even the latest master for Invenio 1.x branch uses the same old solr-3.1.0 from 2011... It's been a while since I last tested this feature (and although it worked back then for a few demo records) indexing words from almost a million PDF pages is something different :)
So I could really use your feedback and suggestions, if you have any!

(N.B. I suppose elasticsearch will excel in such work cases, but upgrading to 2.x or even 3.x is out of the question for the time being...)

Kind regards,
Theodoros Theodoropoulos

ps. FYI, my lame attempt to compile the provided java classes with java-1.7 (using the appropriate new classpaths) in order to hook it up to the latest solr-4.10.x has been a disaster... If anyone succeed in this please come forward!

Reply via email to