Tika is already distributed with Solr. It should “just work” since the path is 
already in solrconfig.xml
 <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" 
regex=".*\.jar" />

Other PDF converters? I’m sure there are, but Tika is free….

But, i wouldn’t really recommend that you just ship the docs to Solr, I’d 
recommend that you build a little program to do the extraction on one or more 
clients, the details of why are here:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

It’s a little old, but the concepts are still valid. The RDBMS parts you can 
just rip out.

Best,
Erick

> On Mar 14, 2019, at 2:53 PM, Paul Buiocchi <pfb6...@yahoo.com.INVALID> wrote:
> 
> Greetings,
> I am setting up solr 8 on a vanilla Linux Ubuntu server (16.04)
> The whole reason for the setup is to index 1000s of PDF files (newspaper 
> scans).
> - I created my core and have Solr up and running.- I am assuming that I need 
> Apache Tika to index the files-Do I tie Tika into Solr via the SOLCONFIG.XML 
> file ?-if so , does anyone have a sample syntax ?-Tika.jar ? Server or client 
> ?
> - are there other PDF converters other than Tika . if so how do they compare.
> Any other advice /suggestions 
> Thank you all , I really appreciate the help !
> 
> Sent from Yahoo Mail on Android

Reply via email to