Chris Bowditch wrote:

Hi All,

I am facing a very strange problem with PDFBox 0.8.0 (revision 779577) On a Sun JDK the PDF parses without error, but on an IBM JDK I get the following error:

Exception in thread "main" java.io.IOException: Error: Expected an integer type, actual='ãÃÃ'

UPDATE on this issue:

There was actually an error being retried that occured before the below error:

java.io.IOException: Error: Expected an integer type, actual='ãÏÓ'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:482)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
        at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323)
        at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286)
        at org.apache.pdfbox.PDFReader.main(PDFReader.java:271)

The characters that fail to parse occur at the start of the PDF:

%PDF-1.4
%âãÏÓ
6 0 obj
<</Filter /FlateDecode
/Length 489
>>
stream

I have debugged the PDFParser class and the problem lies in the skipToNextObject method which is called . On the IBM JDK when the bytes are converted to a String some of the bytes are skipped (specifically those with a negative value), but when the bytes are subsequently unread, the unreading goes back too far. I'm working on a patch now and will raise a Jira Entry for this.


at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:490)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
        at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:322)
        at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:285)
        at org.apache.pdfbox.PDFReader.main(PDFReader.java:270)

The error can be reproduced using the PDFReader class.

Unfortunately I cannot attach the PDF as-is since its confidential but if anyone knows a tool I can use to obfuscate the PDF please let me know!

My question is how can I debug this error?

Thanks,

Chris

Thanks,

Chris


Reply via email to