Re: Signed PDF with non-encrypted headers causes issue in PDFBox 2.0.9

Tilman Hausherr Fri, 27 Apr 2018 02:54:19 -0700

Am 25.04.2018 um 14:20 schrieb Evert-Jan de Bruin:

Hi,


Okay, so it seems the document header is incorrect according to the official 
standards.

But this document is not some minor exception, because this is an official 
document from the Dutch government proving your educational history. We have 
tens of thousands of these documents to process in our system. We can't ask the 
government to change their PDFs and have 10.000+ students re-request their 
certificate because PDFbox can't handle the (incorrect) header :-)

Readers like Adobe (or others) don't complain either.


I tried displaying the file with GSView 6.0 and it froze.

And it's not just the document information... I see that the strings inthe output intents are also in clear text.

And yes, you can ask your government to create their 10.000+ PDF filesin a correct way. They (or their "unexpensive external contractor") havebeen using itext 2.1.7, that is 9 years old.

The file claims to be PDF/A-1b but it is not (because it is encrypted).Test it here

https://www.pdf-online.com/osa/validate.aspx
or with veraPDF.





Tilman

Isn't there some way to make PDFBox work with these kinds of documents?

Thanks,
Evert-Jan de Bruin

-----Original Message-----
From: Tilman Hausherr <[email protected]>
Sent: dinsdag 24 april 2018 21:36
To: [email protected]
Subject: Re: Signed PDF with non-encrypted headers causes issue in PDFBox 2.0.9

I had a quick look... yes the document info is unencrypted which is incorrect. 
EncryptMeta is false but this applies only to XMP metadata streams.

Tilman

Am 24.04.2018 um 12:48 schrieb Evert-Jan de Bruin:

Hello,

For my project I have to merge PDF files together. This usually works
fine, but it does not always work with digitally signed PDF files.

Simply a load() of the document will already fail with
InvalidBlockSizeException. Here is an example document:
https://ufile.io/mgshz

I went into the PDFBox code, and the issue seems to be that it detects
AES encryption in the PDF due to the digital signature, but then
assumes everything is encrypted and needs to be decrypted. However,
the headers are **not** encrypted so decryption fails.

I can get it all to work by going to PDFParser.java and disabling
these three lines in prepareDecryption():

//                securityHandler = encryption.getSecurityHandler();

// securityHandler.prepareForDecryption(encryption,
document.getDocumentID(),

// decryptionMaterial);

// accessPermission = securityHandler.getCurrentAccessPermission();

However, this is of course very ugly as decryption is now totally
disabled. I also get warnings about offset issues but the end result
seems fine.

Is there a more elegant solution or is this really a bug?

It seems to be a repetition of
https://issues.apache.org/jira/browse/PDFBOX-3229
<https://issues.apache.org/jira/browse/PDFBOX-3229> but this should
have been fixed in 2.0.0, however, it still occurs in 2.0.9

Regards,

Evert-Jan de Bruin

K00716_Osiris_MailSignature-logo

        

CACI bv
www.osiris-ho.nl <http://www.osiris-ho.nl/>

        

De Ruyterkade 7
1013 AA Amsterdam

        

088 - 654 3594
[email protected] <mailto:[email protected]>

This electronic message contains information from CACI BV, which may
be confidential, proprietary, privileged or otherwise protected from
disclosure. The information is intended to be used solely by the
recipient(s) named above. If you are not an intended recipient, be
aware that any review, disclosure, copying, distribution or use of
this transmission or its contents is prohibited.

If you have received this transmission in error, please notify the
sender immediately and delete all copies of this message.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Signed PDF with non-encrypted headers causes issue in PDFBox 2.0.9

Reply via email to