Re: Illegal IOException from tika.parser

Jukka Zitting Tue, 05 Apr 2011 02:02:10 -0700

Hi,

On 04/05/2011 09:07 AM, Shinichiro Abe wrote:

It seems like an error raised at pdfbox, and pdfbox cannot recognize
something about XrefTable of the pdf? What kind of error is it?

The PDF in question might be malformed, or there could be a bug inPDFBox that prevents it from correctly parsing this file.

To solve the problem, the best way is to report the issue to the PDFBoxissue tracker at https://issues.apache.org/jira/browse/PDFBOX, ideallywith the sample PDF as an attachment.

Such troubles are fairly common when you are dealing with large numbersof files from various different sources. Usually they aren't tootroublesome, as you often can live with not being able to search suchdocuments based on their full text contents. For example in ApacheJackrabbit we simply log such problems and index the document as if itwas empty. It's of course a good idea to report such issues so they canbe fixed in future versions.


--
Jukka Zitting

Re: Illegal IOException from tika.parser

Reply via email to