Re: PDF Parse Failure only on an IBM JDK?

Chris Bowditch Thu, 13 Aug 2009 03:03:57 -0700

Chris Bowditch wrote:

Hi All,
I am facing a very strange problem with PDFBox 0.8.0 (revision 779577)On a Sun JDK the PDF parses without error, but on an IBM JDK I get thefollowing error:
Exception in thread "main" java.io.IOException: Error: Expected aninteger type, actual='Ã£ÃÃ'


UPDATE on this issue:

There was actually an error being retried that occured before the belowerror:


java.io.IOException: Error: Expected an integer type, actual='ãÏÓ'

atorg.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)atorg.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:482)

        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
        at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323)
        at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286)
        at org.apache.pdfbox.PDFReader.main(PDFReader.java:271)

The characters that fail to parse occur at the start of the PDF:

%PDF-1.4
%âãÏÓ
6 0 obj
<</Filter /FlateDecode
/Length 489
>>
stream

I have debugged the PDFParser class and the problem lies in theskipToNextObject method which is called . On the IBM JDK when the bytesare converted to a String some of the bytes are skipped (specificallythose with a negative value), but when the bytes are subsequentlyunread, the unreading goes back too far. I'm working on a patch now andwill raise a Jira Entry for this.

atorg.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)atorg.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
        at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:322)
        at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:285)
        at org.apache.pdfbox.PDFReader.main(PDFReader.java:270)

The error can be reproduced using the PDFReader class.
Unfortunately I cannot attach the PDF as-is since its confidential butif anyone knows a tool I can use to obfuscate the PDF please let me know!
My question is how can I debug this error?

Thanks,

Chris


Thanks,

Chris

Re: PDF Parse Failure only on an IBM JDK?

Reply via email to