[ https://issues.apache.org/jira/browse/PDFBOX-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Bowditch updated PDFBOX-504: ---------------------------------- Attachment: readable.pdf I have attached a random PDF that I found which has no confidential information. > Can't Parse any PDF using IBM JDK > --------------------------------- > > Key: PDFBOX-504 > URL: https://issues.apache.org/jira/browse/PDFBOX-504 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 0.8.0-incubator > Environment: RedHat Linux IBM JDK > Reporter: Chris Bowditch > Priority: Critical > Attachments: readable.pdf > > > All PDF (that I have tried) fail to parse using IBM JDK 1.5 on RedHat Linux. > The error you receive is: > Exception in thread "main" java.io.IOException: Error: Expected an integer > type, actual='ãÃÃ' > at > org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220) > at > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:493) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704) > at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323) > at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286) > at org.apache.pdfbox.PDFReader.main(PDFReader.java:271) > Although after debugging the actual error is hidden: > java.io.IOException: Error: Expected an integer type, actual='ãÏÓ' > at > org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220) > at > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:483) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704) > at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323) > at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286) > at org.apache.pdfbox.PDFReader.main(PDFReader.java:271) > The characters shown in the hidden message occur at the start of most PDF > Files that I have checked: > %PDF-1.4 > %âãÏÓ > 6 0 obj > <</Filter /FlateDecode > /Length 489 > >> > stream > Tracing the code I can see the problem is down to the skipToNextObject() > method in PDFParser class. This method is new since v0.7.4. > The code converts the array of 16 bytes to a String. The characters âãÏÓ are > read as negative numbers in both Sun and IBM JDKs but whilst on Sun the > String created from the byte array contains the characters on IBM JDK these > characters are missing from the String. So when you read back 16 characters > the stream offset is incorrect. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.