Hello everyone,
I'm about to add a few hundred thousand scanned pdf pages WITH OCR into
one of my 1.2.x Invenio installations and I was wondering if anyone has
been using SOLR for fulltext indexing in production.
The reason I'm a bit skeptical is because even the latest master for
Invenio 1.x branch uses the same old solr-3.1.0 from 2011...
It's been a while since I last tested this feature (and although it
worked back then for a few demo records) indexing words from almost a
million PDF pages is something different :)
So I could really use your feedback and suggestions, if you have any!
(N.B. I suppose elasticsearch will excel in such work cases, but
upgrading to 2.x or even 3.x is out of the question for the time being...)
Kind regards,
Theodoros Theodoropoulos
ps. FYI, my lame attempt to compile the provided java classes with
java-1.7 (using the appropriate new classpaths) in order to hook it up
to the latest solr-4.10.x has been a disaster... If anyone succeed in
this please come forward!