I've got an existing Spring Solr SolrJ application that indexes a mixture of
documents.  It seems to have been working fine now for a couple of weeks but
today I've just started getting an exception when processing a certain pdf
file.

The exception is :

ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@4683c2
    at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
    at
uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308)
    at
uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710)
    at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167)
    at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414)
    at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402)
    at
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771)
    at
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716)
    at
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647)
    at
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
    at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630)
    at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436)
    at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374)
    at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302)
    at
org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195)
    at
org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159)
    at
org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141)
    at
org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90)
    at
org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417)
    at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
    at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
    at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
    at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
    at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2
    at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
    at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
    at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
    ... 44 more
Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSString cannot be
cast to org.pdfbox.cos.COSName
    at org.pdfbox.cos.COSDictionary.getNameAsString(COSDictionary.java:600)
    at
org.pdfbox.pdmodel.PDDocumentInformation.getTrapped(PDDocumentInformation.java:275)
    at
org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:66)
    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:50)
    at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
    ... 46 more

With a bit of investigation it seems that this error was in pdfbox 0.7.3 and
was fixed in 0.8.  See

http://osdir.com/ml/tika-dev.lucene.apache.org/2009-09/msg00037.html

Looking at my libraries it seems I am using pdfbox 0.7.3.  I am using maven
for building and pdfbox 0.7.3 appears to have come from the tika-parsers 0.4
pom file which in turn appears to have come solr-cell 1.4.0 pom file.  In my
project's maven pom file I have the following entries and don't explicitly
specify pdfbox or a particular version :

        <dependency>
            <artifactId>solr-solrj</artifactId>
            <groupId>org.apache.solr</groupId>
            <version>1.4.0</version>
            <type>jar</type>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <artifactId>solr-core</artifactId>
            <groupId>org.apache.solr</groupId>
            <version>1.4.0</version>
            <type>jar</type>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.solr</groupId>
            <artifactId>solr-cell</artifactId>
            <version>1.4.0</version>
        </dependency>

Can anyone confirm that a maven build for Solr 1.4 brings in pdfbox 0.7.3,
as I'm wondering whether if there is a problem with my maven set up?  If I
am correct can anyone advise as to what I need to do to get the right
version of pdfbox, apart from editing my pom file locally?

Thanks
Shaun

Reply via email to