Hi Theodoros,
In data Monday 22 February 2016 17:55:24, Theodoros Theodoropoulos ha scritto:
> I'm about to add a few hundred thousand scanned pdf pages WITH OCR into
> one of my 1.2.x Invenio installations and I was wondering if anyone has
> been using SOLR for fulltext indexing in production.
> The reason I'm a bit skeptical is because even the latest master for
> Invenio 1.x branch uses the same old solr-3.1.0 from 2011...
> It's been a while since I last tested this feature (and although it
> worked back then for a few demo records) indexing words from almost a
> million PDF pages is something different :)
> So I could really use your feedback and suggestions, if you have any!
>
> (N.B. I suppose elasticsearch will excel in such work cases, but
> upgrading to 2.x or even 3.x is out of the question for the time being...)
>
> Kind regards,
> Theodoros Theodoropoulos
>
> ps. FYI, my lame attempt to compile the provided java classes with
> java-1.7 (using the appropriate new classpaths) in order to hook it up
> to the latest solr-4.10.x has been a disaster... If anyone succeed in
> this please come forward!
Solr integration is indeed unmantained. In INSPIRE and CDS we use exactly the
implementation based on Invenio 1.2.x (on SLC6/RHEL6/CentOS6) which is based
on Solr 3. For us it works in a basic way just to index fulltext files. (we
have no major issue in terms of performance: nearly 600K full text files
indexed on a single node machine for nearly 200GB of disk space used.
Indeed Invenio 2/3 is fully based on ES for its very indexing core.
Cheers!
Sam
--
Samuele Kaplun
INSPIRE Service Manager ** <http://inspirehep.net/>