v½I: org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
org.apache.tika.parser.pdf.PDFParser@2ca72c6c
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:215)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1322)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
com.rondhuit.servlet.ConvDoubleByteSpaceToHalfFilter.doFilter(ConvDoubleByteSpaceToHalfFilter.java:32)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
com.rondhuit.servlet.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:105)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.pdf.PDFParser@2ca72c6c
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:148)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:194)
... 24 more
Caused by: java.io.IOException: For input string: "00000000-1"
at
org.apache.pdfbox.pdfparser.PDFParser.parseXrefTable(PDFParser.java:709)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:449)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:814)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:63)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
hello.
Through solr cell, my user posted the pdf document which contains CAD image
only.
At this time, Tika threw Illegal IOException in the attachment.
It seems like an error raised at pdfbox, and pdfbox cannot recognize something
about XrefTable of the pdf?
What kind of error is it?If you have any clue, please let me know.
environment:
Solr 3.1.0-dev 2010-09-10
(Apache Tika 0.8-SNAPSHOT, POI 3.6)
Regards,
Shinichiro Abe