Re: Integrating grobid with Tika in solr

Shawn Heisey Wed, 04 May 2016 07:00:53 -0700

On 5/4/2016 7:15 AM, Betsey Benagh wrote:
> (X-posted from stack overflow)
> 
> This feels like a basic, dumb question, but my reading of the documentation 
> has not led me to an answer.
> 
> 
> i'm using Solr to index journal articles. Using the out-of-the-box 
> configuration, it indexed the text of the documents, but I'm looking to use 
> Grobid to pull out the authors, title, affiliations, etc. I got grobid up and 
> running as a service.
> 
> I added
> 
> <str name="tika.config">/path/to/tika-config.xml</str>
> 
> to the requestHandler for /update/extract in solrconfig.xml
> 
> The tika-config looks like:
> 
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <properties>
>   <parsers>
>     <parser class="org.apache.tika.parser.journal.JournalParser">
>       <mime>application/pdf</mime>
>     </parser>
>   </parsers>
> </properties>
> 
> 
> I'm getting a ClassNotFound exception when I try to import a document, but 
> can't figure out where to set the classpath to fix it.


I do not know anything about grobid.

We'll need to see the exception -- the entire multi-line stacktrace,
including any "caused by" sections.

In general, you should create a lib directory in the solr home and place
all extra jars in that directory.  Otherwise you need <lib> elements in
solrconfig.xml to load jars -- and they will be loaded once for every
core that uses that <lib> element.  ${solr.solr.home}/lib loads jars
*once* when Solr starts and makes them available to all cores.

Thanks,
Shawn

Re: Integrating grobid with Tika in solr

Reply via email to