Re: Anyone using SOLR for fulltext indexing?

Samuele Kaplun Mon, 22 Feb 2016 23:39:07 -0800

Hi Theodoros,

In data Monday 22 February 2016 17:55:24, Theodoros Theodoropoulos ha scritto:
> I'm about to add a few hundred thousand scanned pdf pages WITH OCR into
> one of my 1.2.x Invenio installations and I was wondering if anyone has
> been using SOLR for fulltext indexing in production.
> The reason I'm a bit skeptical is because even the latest master for
> Invenio 1.x branch uses the same old solr-3.1.0 from 2011...
> It's been a while since I last tested this feature (and although it
> worked back then for a few demo records) indexing words from almost a
> million PDF pages is something different :)
> So I could really use your feedback and suggestions, if you have any!
> 
> (N.B. I suppose elasticsearch will excel in such work cases, but
> upgrading to 2.x or even 3.x is out of the question for the time being...)
> 
> Kind regards,
> Theodoros Theodoropoulos
> 
> ps. FYI, my lame attempt to compile the provided java classes with
> java-1.7 (using the appropriate new classpaths) in order to hook it up
> to the latest solr-4.10.x has been a disaster... If anyone succeed in
> this please come forward!


Solr integration is indeed unmantained. In INSPIRE and CDS we use exactly the 
implementation based on Invenio 1.2.x (on SLC6/RHEL6/CentOS6) which is based 
on Solr 3. For us it works in a basic way just to index fulltext files. (we 
have no major issue in terms of performance: nearly 600K full text files 
indexed on a single node machine for nearly 200GB of disk space used.

Indeed Invenio 2/3 is fully based on ES for its very indexing core.

Cheers!
        Sam

-- 
Samuele Kaplun
INSPIRE Service Manager ** <http://inspirehep.net/>

Re: Anyone using SOLR for fulltext indexing?

Reply via email to