hi Andreas, re: 'is there any chance ...' I would have to ask for authorisation to the owner (a company) and I doubt I could have it sent quickly. I can, though, share the actual /ObjStm content, decompressed; let me know if this would help you.
re: "which version ..." I am using an old version ... (that I am patching myself) ... I can, however, reproduce it with current code on the trunk branch ... (therefore, the 2 unit tests to exhibit the current behavior) On Mon, Feb 17, 2025 at 6:02 PM Andreas Lehmkühler <andr...@lehmi.de.invalid> wrote: > Hi, > > is there any chance to get a hand on the pdf in question? > > Which version pd PDFBox are you using? > > Andreas > > Am 17.02.25 um 17:16 schrieb mountain the blue: > > hi, > > > > first of all, many thanks for the contributors of the pdfbox project that > > I've been using for long time for anything relating to pdf in java. > > > > I am using pdfbox to process various pdf files. > > lately, I received a file whose parsing failed: > > ie: > > ... > > Exception in thread "main" java.io.IOException: Error: Unknown annotation > > type COSInt{49633506} > > at > > > org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:198) > > at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:696) > > at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:663) > > ... > > > > Looking further into this error, the reason was coming from the parsing > of > > /ObjStm ... that expects each object, serialised in the stream, to have > > separator (ie; white space) while the > > pdf was having some COS object serialised without the such separation > > > > in the current code base, accessible on GitHub, the following test > passes: > > > > @Test > > void testParse2NumberObjects () throws IOException > > { > > COSStream stream = new COSStream(); > > stream.setItem(COSName.N, COSInteger.TWO); > > stream.setItem(COSName.FIRST, COSInteger.get(8)); > > OutputStream outputStream = stream.createOutputStream(); > > outputStream.write("6 0 4 2 1 2".getBytes()); > > outputStream.close(); > > PDFObjectStreamParser objectStreamParser = new > > PDFObjectStreamParser(stream, null); > > Map<COSObjectKey, COSBase> objectNumbers = > > objectStreamParser.parseAllObjects(); > > assertEquals(2, objectNumbers.size()); > > assertEquals(COSInteger.get (1), objectNumbers.get(new > COSObjectKey(6, 0))); > > assertEquals(COSInteger.get (2), objectNumbers.get(new > COSObjectKey(4, 0))); > > } > > > > > > while this one fails: > > > > @Test > > void testParse2NumberObjectsNoSpace () throws IOException > > { > > COSStream stream = new COSStream(); > > stream.setItem(COSName.N, COSInteger.TWO); > > stream.setItem(COSName.FIRST, COSInteger.get(8)); > > OutputStream outputStream = stream.createOutputStream(); > > outputStream.write("6 0 4 *1* *12*".getBytes()); > > outputStream.close(); > > PDFObjectStreamParser objectStreamParser = new > > PDFObjectStreamParser(stream, null); > > Map<COSObjectKey, COSBase> objectNumbers = > > objectStreamParser.parseAllObjects(); > > assertEquals(2, objectNumbers.size()); > > assertEquals(COSInteger.get (1), objectNumbers.get(new > COSObjectKey(6, 0))); > > assertEquals(COSInteger.get (2), objectNumbers.get(new > COSObjectKey(4, 0))); > > } > > > > with error: > > org.opentest4j.AssertionFailedError: > > Expected :COSInt{*1*} > > Actual :COSInt{*12*} > > > > at > > > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > > ... > > at > > > org.apache.pdfbox.pdfparser.PDFObjectStreamParserTest.testParse2NumberObjectsNoSpace(PDFObjectStreamParserTest.java:103) > > ... > > notes: > > > > a- the second object (number = 4), now indicates 1 as its offset and both > > '1' and '2' are now 'joined'. > > > > b- the file was being created by on November last year and converted from > > word to pdf by 'Adobe Acrobat Pro (64-bit) 24 Paper Capture Plug-in': I > do > > expect to see such (valid) pdf construction more often in the (near) > future. > > > > @ (Tilman & Andreas): I was able to have the pdfbox working by changing > the > > PDFObjectStreamParser implementation, rewriting the > > privateReadObjectOffsets() method to return an array and using a parser > > that does not parse beyond implicit limitation given by next object's > > offset. let me know if you want to access this change. > > > > thank you, > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >