Hello Andrea,
I disagree - IMHO your PDF is incorrect. "PK" means that it is a ZIP
file. Apparently with an uncompressed PDF in it (yes, ZIP can have
uncompressed files). Of course one could adjust the offsets, but this
wouldn't be right: the PDF has been modified, the PK header has been
added. Try renaming that file and then click on it to confirm my theory
that it is really a ZIP file.
(I suspect you'll tell me that it validates with Adobe Reader. If so,
then I'd say Adobe is wrong. I just tried adding "XXXX" in front of a
file with NOTEPAD++ and Adobe does not tell that the file was modified.)
The good thing is that there is no bug in COSFilterInputStream (I was
afraid of that), so I'll use getSignedContent() in the signature example
instead of the code I have now.
Tilman
Am 09.06.2016 um 10:45 schrieb Andrea Canu:
Hi Tilman
thank you for your answer.
The PDF is a real document so I can't share it, but I can give you an
extract:
Those are the first 1044 bytes of the document.
--------------------------------------------------------------
*PK ¹Js: ¼àð3£ 3£ < CAACT-00-00-08 document.pdf*%PDF-1.6
%âãÏÓ
3582 0 obj
<</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
endobj
xref
3582 34
0000000016 00000 n
0000003154 00000 n
0000003481 00000 n
0000003680 00000 n
0000004019 00000 n
0000004048 00000 n
0000004265 00000 n
0000004495 00000 n
0000004765 00000 n
0000004950 00000 n
0000006189 00000 n
0000007372 00000 n
0000007629 00000 n
0000060752 00000 n
0000061525 00000 n
0000062245 00000 n
0000062284 00000 n
0000062509 00000 n
0000062740 00000 n
0000062819 00000 n
0000064540 00000 n
0000064945 00000 n
0000065082 00000 n
0000065306 00000 n
0000065606 00000 n
0000072471 00000 n
0000075166 00000 n
0000078960 00000 n
0000079194 00000 n
0000079411 00000 n
0000118645 00000 n
0000118722 00000 n
0000002986 00000 n
0000001000 00000 n
trailer
<</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
startxref
0
%%EOF
.....
--------------------------------------------------------------
I would to bring your attention to the first 60 bytes.
Those bytes are stripped out by the *COSParser *parser, skipped like
garbage.
The method that skips those bytes is:
COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)
....
private static final String PDF_HEADER = "%PDF-";
I've noticed that I must to manually skip too those 60 bytes from the
*pdfInputStream
*before to call the method
signature.getSignedContent ( *pdfInputStream *)
In this way, the returned byte-array digest HASH and the HASH inside
signature match.
Andrea
On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <[email protected]>
wrote:
Am 08.06.2016 um 13:27 schrieb Andrea Canu:
Hi guys
I want to ask you about the correct way to get the signed-content from the
signature.
Since now I've used the PDSignature class's method:
signature.getSignedContent ( *pdfInputStream *)
With this method I'm able to extract from the *pdfInputStream *the
byte-array of the signed-content based on the signature's ByteRange.
I've noticed that if I try to verify the signature based on that
byte-array, the verification sometime unexpectedly fails!
Hello Andrea,
Can you share the PDF (upload it)?
I doubt your theory re: bug in COSParser. I'd rather search if there is a
bug in COSFilterInputStream.
If you can't share the PDF, then please download the bytes "the hard way":
// download the signed content, described in
/ByteRange COSArray:
// [offset1 len1 offset2 len2]
int[] byteRange = sig.getByteRange();
byte[] buf = new byte[byteRange[1] + byteRange[3]];
RandomAccessFile raf = new RandomAccessFile(infile,
"r");
raf.seek(byteRange[0]);
raf.readFully(buf, byteRange[0], byteRange[1]);
raf.seek(byteRange[2]);
raf.readFully(buf, byteRange[1], byteRange[3]);
raf.close();
This code is not fully correct, because /ByteRange might have more than 4
elements. So have a look at it to be sure.
Then compare the byte array "buf" with the one from getSignedContent.
Another possibility that it fails might be that there are different
signature methods. See the code at
https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
I didn't use getsignedContent() there but I think I should. So I'd be very
interested to find out if there is a bug there.
Tilman
Now, looking at the COSParser class I've found this method :
COSParser.parseHeader
This method, trying to find the correct document's header, is able to skip
some garbage in the PDF document looking for the markers "%PDF-" and
"%FDF-".
So, I've noticed that the signature verification succeed if I skip that
garbage during the signed-content extraction.
My question is:
Why this garbage-management is not present also into the getSignedContent
code?
The workaround I found is to skip that garbage manually from the
*pdfInputStream*, but now the problem is the correct way to calculate the
offset for the *pdfInputStream.*
Any suggestion?
Kinds regards
Andrea.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]