Hi Muruan, Thx for pointing out the attachments didn't get through. 2 pdf files and 1 patch file (containing test case to reproduce issue) are available here: https://www.dropbox.com/sh/291b24dstixowgt/aQTZl5j_pP
Kind regards, Tim -----Original Message----- From: Maruan Sahyoun [mailto:[email protected]] Sent: maandag 31 maart 2014 16:47 To: [email protected] Subject: Re: PDFBox 1.8.4 and pdf's generated by MS Word Hi Tim, the attachment didn't make it through - could you upload it to a public location? BR Maruan Am 31.03.2014 um 12:56 schrieb Tim Costermans <[email protected]>: > Hello, > > I've written a test case to reproduce the issue. (see patch) > > Could someone have a look at it and give me some pointers on how to solve > this issue? I applied this patch on the 1.8.4 tag I checked out locally. > The issue is that I don't know the pdf spec, so I don't know how to fix this > issue in the PDFBOX source code. > > Word2010.pdf is the input pdf, I open the document with PDFBOX add a string > to the pdf. In this case 'Hello world!'. > Afterwards I save the pdf. > > If I look at the content of the pdf before and after I modified it (using > Notepad++) I see this: > > Word2010.pdf: > Line 647: <</Size 18/Root 1 0 R/Info 7 0 > R/ID[<AE9AF29D5A22AE47B47C4DA29170BE64><AE9AF29D5A22AE47B47C4DA29170BE > 64>] /Prev 81972/XRefStm 81702>> > > modified_Word2010.pdf: > Line 791: /XRefStm 81702 > > XRefStm is not updated although the original pdf had multiple revisions that > were merged into a new pdf document. > > A third party library we use defends on this XRefStm value and cannot > open the pdf after it was modified. (Stack trace see previous msg) Any help > would be much appreciated. > > Kind regards, > > Tim Costermans > > From: Tim Costermans > Sent: woensdag 26 maart 2014 14:31 > To: '[email protected]' > Subject: PDFBox 1.8.4 and pdf's generated by MS Word > > Hello, > > It' seems that pdf's generated by MS Word 2010 or 2013 are a recipe for > trouble in combination with PDFBOX version 1.8.0 or 1.8.4. > I upgrade to PDFBOX 1.8.4 and one issue remains: > > Caused by: **thirdparty.pdf.exceptions.PDFParsingException: > [offset=91308]Expected numeric object for object number > at > **thirdparty.pdf.exceptions.PDFParsingException.newInstance(PDFParsingException.java:58) > at > **thirdparty.pdf.io.PDFParser.throwEx(PDFParser.java:1215) > at > **thirdparty.pdf.io.PDFParser.readCompressedCrossRefTable(PDFParser.java:805) > at > **thirdparty.pdf.io.PDFParser.readCrossRefTable(PDFParser.java:1175) > at > **thirdparty.pdf.PDFDocument.open(PDFDocument.java:154) > at **thirdparty.PDFDocument.open(PDFDocument.java:124) > at > com.*****.sign.pdf.PDFPresigner.presign(PDFPresigner.java:24) > ... 26 more > > How to reproduce: > 1) Fire up MS Word v 2010 , type some text, save as PDF. > 2) Open this pdf file with Notepad++, you will notice the following at the > bottom of the file: > ... > trailer > <</Size 18/Root 1 0 R/Info 7 0 > R/ID[<7AE435CBC968B94F8B050F40F6D5CE5F><7AE435CBC968B94F8B050F40F6D5CE > 5F>] >> startxref > 82089 > %%EOF > xref > 0 0 > trailer > <</Size 18/Root 1 0 R/Info 7 0 > R/ID[<7AE435CBC968B94F8B050F40F6D5CE5F><7AE435CBC968B94F8B050F40F6D5CE > 5F>] /Prev 82089/XRefStm 81819>> startxref > 82605 > %%EOF > > Our application is trying to add an image to this pdf using PDFBox, when > calling PDFDocument.save() the "revisions" are merged an a new pdf is being > created. > The newly created pdf is being passed to a third party that tries to open it, > but it fails because XRefStm is not correctly updated during save. > Probably related to https://issues.apache.org/jira/browse/PDFBOX-1822 > > I also tried PDFDocument.incrementalSave() but then I get into a nullpointer > exception cuased by PDFXRefStream: List<Integer> indexEntry = > getIndexEntry(); containing two null objects. (first and last still being > null and being added to the list). > How do I solve this issue? > What's the real issue here? > I'm not in control of the pdf's that the application can receive. > > Also ran into the following bug but worked around it > https://issues.apache.org/jira/browse/PDFBOX-1838 .

