Re: Illegal IOException from tika.parser

Shinichiro Abe Tue, 05 Apr 2011 02:37:39 -0700

Hello.
Thank you for your reply.
I'll try to report at JIRA of PDFbox with sample PDF .


Thank you.
Shinichiro Abe

On 2011/04/05, Jukka Zitting wrote:

> Hi,
> 
> On 04/05/2011 09:07 AM, Shinichiro Abe wrote:
>> It seems like an error raised at pdfbox, and pdfbox cannot recognize
>> something about XrefTable of the pdf? What kind of error is it?
> 
> The PDF in question might be malformed, or there could be a bug in PDFBox 
> that prevents it from correctly parsing this file.
> 
> To solve the problem, the best way is to report the issue to the PDFBox issue 
> tracker at https://issues.apache.org/jira/browse/PDFBOX, ideally with the 
> sample PDF as an attachment.
> 
> Such troubles are fairly common when you are dealing with large numbers of 
> files from various different sources. Usually they aren't too troublesome, as 
> you often can live with not being able to search such documents based on 
> their full text contents. For example in Apache Jackrabbit we simply log such 
> problems and index the document as if it was empty. It's of course a good 
> idea to report such issues so they can be fixed in future versions.
> 
> -- 
> Jukka Zitting

Re: Illegal IOException from tika.parser

Reply via email to