Re: PDFParser Conflict Resolution

Cary L. Schofield Mon, 24 Feb 2014 14:15:48 -0800

Thanks for you reply. I have followed your recommendation. There was aTODO in the NonSequentialParser that indicated that signature contentsare not encrypt and thus should not be decrypted. I have added code tonot decrypt in this case and my documents seem to parsed correctly.


Thanks again.



On 02/22/2014 09:23 AM, Maruan Sahyoun wrote:

Hi,

the PDFParser works sequentially throughout the file from top to bottom and 
collects all objects. Conflict resolution is done by making the assumption that 
if an object with the same number exists later in the file that this should be 
the correct one.

NonSequentialParser works through the file by looking at the Xref information 
(table or stream). This is inline with the PDF specification.

So patching as you’ve done might resolve your issue but might also introduce 
issues with other files. The best way would be to find out why 
NonSequentialParser has issues parsing your file. If you think it’s a bug 
please open an issue in jira [https://issues.apache.org/jira/browse/PDFBOX] and 
attach the PDF file to together with some sample code.

BR
Maruan Sahyoun

Am 21.02.2014 um 23:47 schrieb Cary L. Schofield <[email protected]>:

I have a signed document that is getting parsed incorrectly.

Using PDFParser the document form is missing all fields and I can't get to the 
signature fields.
Using NonSequentialPDFParser I can get to the signature fields but the signed 
data appears to have been corrupted.

I was able to determine that the form was being replaced or corrupted during 
conflict resolution.

I solved the problem by patching PDFParser.ConflictObj to ignore an object in 
the conflict list when the existing object (from the object pool) is a direct 
object.

I know I should do the research, but was hoping someone would already know if 
the patch is reasonable or likely to cause more/other problems.

Thanks

Re: PDFParser Conflict Resolution

Reply via email to