Tika is already distributed with Solr. It should “just work” since the path is already in solrconfig.xml <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
Other PDF converters? I’m sure there are, but Tika is free…. But, i wouldn’t really recommend that you just ship the docs to Solr, I’d recommend that you build a little program to do the extraction on one or more clients, the details of why are here: https://lucidworks.com/2012/02/14/indexing-with-solrj/ It’s a little old, but the concepts are still valid. The RDBMS parts you can just rip out. Best, Erick > On Mar 14, 2019, at 2:53 PM, Paul Buiocchi <pfb6...@yahoo.com.INVALID> wrote: > > Greetings, > I am setting up solr 8 on a vanilla Linux Ubuntu server (16.04) > The whole reason for the setup is to index 1000s of PDF files (newspaper > scans). > - I created my core and have Solr up and running.- I am assuming that I need > Apache Tika to index the files-Do I tie Tika into Solr via the SOLCONFIG.XML > file ?-if so , does anyone have a sample syntax ?-Tika.jar ? Server or client > ? > - are there other PDF converters other than Tika . if so how do they compare. > Any other advice /suggestions > Thank you all , I really appreciate the help ! > > Sent from Yahoo Mail on Android