I've got an existing Spring Solr SolrJ application that indexes a mixture of documents. It seems to have been working fine now for a couple of weeks but today I've just started getting an exception when processing a certain pdf file.
The exception is : ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) at uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308) at uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302) at org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90) at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) ... 44 more Caused by: java.lang.ClassCastException: org.pdfbox.cos.COSString cannot be cast to org.pdfbox.cos.COSName at org.pdfbox.cos.COSDictionary.getNameAsString(COSDictionary.java:600) at org.pdfbox.pdmodel.PDDocumentInformation.getTrapped(PDDocumentInformation.java:275) at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:66) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:50) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) ... 46 more With a bit of investigation it seems that this error was in pdfbox 0.7.3 and was fixed in 0.8. See http://osdir.com/ml/tika-dev.lucene.apache.org/2009-09/msg00037.html Looking at my libraries it seems I am using pdfbox 0.7.3. I am using maven for building and pdfbox 0.7.3 appears to have come from the tika-parsers 0.4 pom file which in turn appears to have come solr-cell 1.4.0 pom file. In my project's maven pom file I have the following entries and don't explicitly specify pdfbox or a particular version : <dependency> <artifactId>solr-solrj</artifactId> <groupId>org.apache.solr</groupId> <version>1.4.0</version> <type>jar</type> <scope>compile</scope> </dependency> <dependency> <artifactId>solr-core</artifactId> <groupId>org.apache.solr</groupId> <version>1.4.0</version> <type>jar</type> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-cell</artifactId> <version>1.4.0</version> </dependency> Can anyone confirm that a maven build for Solr 1.4 brings in pdfbox 0.7.3, as I'm wondering whether if there is a problem with my maven set up? If I am correct can anyone advise as to what I need to do to get the right version of pdfbox, apart from editing my pom file locally? Thanks Shaun