Thanks, I¹m currently using 5.5, and will try upgrading to 6.0.
On 5/4/16, 10:37 AM, "Allison, Timothy B." <[email protected]> wrote: >Y. Solr 6.0.0 is shipping with Tika 1.7. Grobid came in with Tika 1.11. > >-----Original Message----- >From: Allison, Timothy B. [mailto:[email protected]] >Sent: Wednesday, May 4, 2016 10:29 AM >To: [email protected] >Subject: RE: Integrating grobid with Tika in solr > >I think Solr is using a version of Tika that predates that addition of >the Grobid parser. You'll have to add that manually somehow until Solr >upgrades to Tika 1.13 (soon to be released...I think). SOLR-8981. > >-----Original Message----- >From: Betsey Benagh [mailto:[email protected]] >Sent: Wednesday, May 4, 2016 10:07 AM >To: [email protected] >Subject: Re: Integrating grobid with Tika in solr > >Grobid runs as a service, and I'm (theoretically) configuring Tika to >call it. > >From the Grobid wiki, here are instructions for integrating with Tika >application: > >First we need to create the GrobidExtractor.properties file that points >to the Grobid REST Service. My file looks like the following: > >grobid.server.url=http://localhost:[port] > >Now you can run GROBID via Tika-app with the following command on a >sample PDF file. > >java -classpath >$HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar >org.apache.tika.cli.TikaCLI >--config=$HOME/src/grobidparser-resources/tika-config.xml -J >$HOME/src/grobid/papers/ICSE06.pdf > >Here's the stack trace. > ><lst name="error"><lst name="metadata"><str >name="error-class">org.apache.solr.common.SolrException</str><str >name="root-error-class">java.lang.ClassNotFoundException</str></lst><str >name="msg">org.apache.tika.exception.TikaException: Unable to find a >parser class: org.apache.tika.parser.journal.JournalParser</str><str >name="trace">org.apache.solr.common.SolrException: >org.apache.tika.exception.TikaException: Unable to find a parser class: >org.apache.tika.parser.journal.JournalParser >at >org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(Extract >ingRequestHandler.java:82) >at >org.apache.solr.core.PluginBag$LazyPluginHolder.createInst(PluginBag.java: >367) >at org.apache.solr.core.PluginBag$LazyPluginHolder.get(PluginBag.java:348) >at org.apache.solr.core.PluginBag.get(PluginBag.java:148) >at >org.apache.solr.handler.RequestHandlerBase.getRequestHandler(RequestHandle >rBase.java:231) >at org.apache.solr.core.SolrCore.getRequestHandler(SolrCore.java:1362) >at >org.apache.solr.servlet.HttpSolrCall.extractHandlerFromURLPath(HttpSolrCal >l.java:326) >at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:296) >at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:412) >at >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >a:225) >at >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >a:183) >at >org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl >er.java:1652) >at >org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) >at >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >43) >at >org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577 >) >at >org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja >va:223) >at >org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja >va:1127) >at >org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) >at >org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav >a:185) >at >org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav >a:1061) >at >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >41) >at >org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa >ndlerCollection.java:215) >at >org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollectio >n.java:110) >at >org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java >:97) >at org.eclipse.jetty.server.Server.handle(Server.java:499) >at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) >at >org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257 >) >at >org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) >at >org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.jav >a:635) >at >org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java >:555) >at java.lang.Thread.run(Thread.java:745) >Caused by: org.apache.tika.exception.TikaException: Unable to find a >parser class: org.apache.tika.parser.journal.JournalParser >at >org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:362 >) >at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:127) >at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:115) >at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:111) >at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:92) >at >org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(Extract >ingRequestHandler.java:80) >... 30 more >Caused by: java.lang.ClassNotFoundException: >org.apache.tika.parser.journal.JournalParser >at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) >at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >at java.lang.Class.forName0(Native Method) at >java.lang.Class.forName(Class.java:348) >at >org.apache.tika.config.ServiceLoader.getServiceClass(ServiceLoader.java:18 >9) >at >org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:338 >) >... 35 more ></str><int name="code">500</int></lst> > > > >On 5/4/16, 10:00 AM, "Shawn Heisey" ><[email protected]<mailto:[email protected]>> wrote: > >On 5/4/2016 7:15 AM, Betsey Benagh wrote: >(X-posted from stack overflow) >This feels like a basic, dumb question, but my reading of the >documentation has not led me to an answer. >i'm using Solr to index journal articles. Using the out-of-the-box >configuration, it indexed the text of the documents, but I'm looking to >use Grobid to pull out the authors, title, affiliations, etc. I got >grobid up and running as a service. >I added ><str name="tika.config">/path/to/tika-config.xml</str> >to the requestHandler for /update/extract in solrconfig.xml The >tika-config looks like: ><?xml version="1.0" encoding="UTF-8" standalone="no"?> <properties> > <parsers> > <parser class="org.apache.tika.parser.journal.JournalParser"> > <mime>application/pdf</mime> > </parser> > </parsers> ></properties> >I'm getting a ClassNotFound exception when I try to import a document, >but can't figure out where to set the classpath to fix it. > >I do not know anything about grobid. > >We'll need to see the exception -- the entire multi-line stacktrace, >including any "caused by" sections. > >In general, you should create a lib directory in the solr home and place >all extra jars in that directory. Otherwise you need <lib> elements in >solrconfig.xml to load jars -- and they will be loaded once for every >core that uses that <lib> element. ${solr.solr.home}/lib loads jars >*once* when Solr starts and makes them available to all cores. > >Thanks, >Shawn > >
