hi, first of all, many thanks for the contributors of the pdfbox project that I've been using for long time for anything relating to pdf in java.
I am using pdfbox to process various pdf files. lately, I received a file whose parsing failed: ie: ... Exception in thread "main" java.io.IOException: Error: Unknown annotation type COSInt{49633506} at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:198) at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:696) at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:663) ... Looking further into this error, the reason was coming from the parsing of /ObjStm ... that expects each object, serialised in the stream, to have separator (ie; white space) while the pdf was having some COS object serialised without the such separation in the current code base, accessible on GitHub, the following test passes: @Test void testParse2NumberObjects () throws IOException { COSStream stream = new COSStream(); stream.setItem(COSName.N, COSInteger.TWO); stream.setItem(COSName.FIRST, COSInteger.get(8)); OutputStream outputStream = stream.createOutputStream(); outputStream.write("6 0 4 2 1 2".getBytes()); outputStream.close(); PDFObjectStreamParser objectStreamParser = new PDFObjectStreamParser(stream, null); Map<COSObjectKey, COSBase> objectNumbers = objectStreamParser.parseAllObjects(); assertEquals(2, objectNumbers.size()); assertEquals(COSInteger.get (1), objectNumbers.get(new COSObjectKey(6, 0))); assertEquals(COSInteger.get (2), objectNumbers.get(new COSObjectKey(4, 0))); } while this one fails: @Test void testParse2NumberObjectsNoSpace () throws IOException { COSStream stream = new COSStream(); stream.setItem(COSName.N, COSInteger.TWO); stream.setItem(COSName.FIRST, COSInteger.get(8)); OutputStream outputStream = stream.createOutputStream(); outputStream.write("6 0 4 *1* *12*".getBytes()); outputStream.close(); PDFObjectStreamParser objectStreamParser = new PDFObjectStreamParser(stream, null); Map<COSObjectKey, COSBase> objectNumbers = objectStreamParser.parseAllObjects(); assertEquals(2, objectNumbers.size()); assertEquals(COSInteger.get (1), objectNumbers.get(new COSObjectKey(6, 0))); assertEquals(COSInteger.get (2), objectNumbers.get(new COSObjectKey(4, 0))); } with error: org.opentest4j.AssertionFailedError: Expected :COSInt{*1*} Actual :COSInt{*12*} at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) ... at org.apache.pdfbox.pdfparser.PDFObjectStreamParserTest.testParse2NumberObjectsNoSpace(PDFObjectStreamParserTest.java:103) ... notes: a- the second object (number = 4), now indicates 1 as its offset and both '1' and '2' are now 'joined'. b- the file was being created by on November last year and converted from word to pdf by 'Adobe Acrobat Pro (64-bit) 24 Paper Capture Plug-in': I do expect to see such (valid) pdf construction more often in the (near) future. @ (Tilman & Andreas): I was able to have the pdfbox working by changing the PDFObjectStreamParser implementation, rewriting the privateReadObjectOffsets() method to return an array and using a parser that does not parse beyond implicit limitation given by next object's offset. let me know if you want to access this change. thank you,