[jira] [Resolved] (PDFBOX-3541) Use /L entry to determine if a linearized file shall be treated as such for PDF/A validation

2016-10-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3541.
-
Resolution: Fixed
  Assignee: Tilman Hausherr

> Use /L entry to determine if a linearized file shall be treated as such for 
> PDF/A validation
> 
>
> Key: PDFBOX-3541
> URL: https://issues.apache.org/jira/browse/PDFBOX-3541
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Preflight
>Affects Versions: 2.0.3
>Reporter: Maruan Sahyoun
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.4, 2.1.0
>
>
> With PDFBOX-3540 the detection of a linearized file which has later been 
> updated for PDF/A validation was improved so that provisions can be properly 
> applied or ignored. That could be improved by checking the /L entry of the 
> linearization dictionary. The *ISO 19005-1:2005/Cor.2:2011* has this:
> {quote}
> In a linearized PDF, if the ID keyword is present in both the first page 
> trailer dictionary and the last
> trailer dictionary, the value to both instances of the ID keyword shall be 
> identical.
> ...
> This provision shall not apply where the value to the L key in the 
> linearization dictionary does not match the actual length of the PDF.
> {quote} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3541) Use /L entry to determine if a linearized file shall be treated as such for PDF/A validation

2016-10-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3541:

Affects Version/s: 2.0.3

> Use /L entry to determine if a linearized file shall be treated as such for 
> PDF/A validation
> 
>
> Key: PDFBOX-3541
> URL: https://issues.apache.org/jira/browse/PDFBOX-3541
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Preflight
>Affects Versions: 2.0.3
>Reporter: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.4, 2.1.0
>
>
> With PDFBOX-3540 the detection of a linearized file which has later been 
> updated for PDF/A validation was improved so that provisions can be properly 
> applied or ignored. That could be improved by checking the /L entry of the 
> linearization dictionary. The *ISO 19005-1:2005/Cor.2:2011* has this:
> {quote}
> In a linearized PDF, if the ID keyword is present in both the first page 
> trailer dictionary and the last
> trailer dictionary, the value to both instances of the ID keyword shall be 
> identical.
> ...
> This provision shall not apply where the value to the L key in the 
> linearization dictionary does not match the actual length of the PDF.
> {quote} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3541) Use /L entry to determine if a linearized file shall be treated as such for PDF/A validation

2016-10-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608970#comment-15608970
 ] 

ASF subversion and git services commented on PDFBOX-3541:
-

Commit 1766703 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1766703 ]

PDFBOX-3541: also use length to determine whether it's a linearized file

> Use /L entry to determine if a linearized file shall be treated as such for 
> PDF/A validation
> 
>
> Key: PDFBOX-3541
> URL: https://issues.apache.org/jira/browse/PDFBOX-3541
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Preflight
>Reporter: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.4, 2.1.0
>
>
> With PDFBOX-3540 the detection of a linearized file which has later been 
> updated for PDF/A validation was improved so that provisions can be properly 
> applied or ignored. That could be improved by checking the /L entry of the 
> linearization dictionary. The *ISO 19005-1:2005/Cor.2:2011* has this:
> {quote}
> In a linearized PDF, if the ID keyword is present in both the first page 
> trailer dictionary and the last
> trailer dictionary, the value to both instances of the ID keyword shall be 
> identical.
> ...
> This provision shall not apply where the value to the L key in the 
> linearization dictionary does not match the actual length of the PDF.
> {quote} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3541) Use /L entry to determine if a linearized file shall be treated as such for PDF/A validation

2016-10-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608969#comment-15608969
 ] 

ASF subversion and git services commented on PDFBOX-3541:
-

Commit 1766702 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1766702 ]

PDFBOX-3541: also use length to determine whether it's a linearized file

> Use /L entry to determine if a linearized file shall be treated as such for 
> PDF/A validation
> 
>
> Key: PDFBOX-3541
> URL: https://issues.apache.org/jira/browse/PDFBOX-3541
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Preflight
>Reporter: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.4, 2.1.0
>
>
> With PDFBOX-3540 the detection of a linearized file which has later been 
> updated for PDF/A validation was improved so that provisions can be properly 
> applied or ignored. That could be improved by checking the /L entry of the 
> linearization dictionary. The *ISO 19005-1:2005/Cor.2:2011* has this:
> {quote}
> In a linearized PDF, if the ID keyword is present in both the first page 
> trailer dictionary and the last
> trailer dictionary, the value to both instances of the ID keyword shall be 
> identical.
> ...
> This provision shall not apply where the value to the L key in the 
> linearization dictionary does not match the actual length of the PDF.
> {quote} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3542) Can PDFBOX use Streams to read PDSignatures from document?

2016-10-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608930#comment-15608930
 ] 

Tilman Hausherr commented on PDFBOX-3542:
-

I don't have any better ideas now. The reason that PDFBox loads all document 
structures for signing is because PDFBox doesn't have "parse on demand". PDFBox 
must load the document because the signature field + annotation must be 
appended in a way to conform with the existing structures.

If you have a non confidential file I could have a look whether there is some 
optimization that we missed. But don't expect any miracles, and this may take a 
few days.

Here's a list of huge files that I have:
{code}
72.168.407 475419.pdf
71.460.416 620038.pdf
50.820.260 302439.pdf
49.749.322 209086.pdf
46.733.747 755045.pdf
38.696.108 503657.pdf
37.580.965 767115.pdf
37.148.455 942416.pdf
36.775.297 364591.pdf
31.845.407 240242.pdf
31.196.981 574442.pdf
30.773.313 560466.pdf
30.397.179 134823.pdf
27.228.247 234570.pdf
26.519.885 071300.pdf
26.338.972 884613.pdf
26.262.215 022391.pdf
25.805.316 160655.pdf
25.465.233 898927.pdf
23.065.331 509787.pdf
22.805.751 125112.pdf
22.381.412 486395.pdf
21.718.527 510488.pdf
21.486.589 586504.pdf
{code}
These files can be found here:
http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/
The three first digits of the files in my list (e.g. 586504.pdf, whose size is 
21.486.589 bytes) tell the name of the zip file (586.zip).

> Can PDFBOX use Streams to read PDSignatures from document?
> --
>
> Key: PDFBOX-3542
> URL: https://issues.apache.org/jira/browse/PDFBOX-3542
> Project: PDFBox
>  Issue Type: Wish
>  Components: PDModel
>Affects Versions: 2.0.3
>Reporter: Andrea Paternesi
>Priority: Critical
>
> I did not find a way to avoid loading into memory the whole PDDocument to 
> read the signatures dictionaries.
> If you have very big PDF files (30MB or more), java gets an Out of Memory 
> error.
> Right now i did not find a correct way to load signatures usign stream.
> Can you give any hont?
> Thanks in advance.
> Andrea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3544) Invalid ByteRange for getContents() method

2016-10-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608927#comment-15608927
 ] 

Tilman Hausherr commented on PDFBOX-3544:
-

I've renamed the confusing variable names in PDFBOX-2852 ("len" instead of 
"end").

> Invalid ByteRange for getContents() method
> --
>
> Key: PDFBOX-3544
> URL: https://issues.apache.org/jira/browse/PDFBOX-3544
> Project: PDFBox
>  Issue Type: Bug
>  Components: Signing
>Affects Versions: 2.0.3
>Reporter: Lonzak
>
> PDSignature.java class, getContents() method, line 325ff.
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1]+1;
> int end = byteRange[2]-begin;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}
> Lets asume a byte range of 
> /ByteRange[ 0, 840, 960, 240]
> The current implementation would return
> {841, 119} which is from *841 - 960*
> According to 
> [adobe|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/DigitalSignaturesInPDF.pdf]
>  (page 5) this is invalid:
> {quote}
> "In this example, the hash is calculated for bytes 0 through 839, and 960 
> through 1200."
> {quote}
> Thus the values for the signature should be
> {840, 119} which is from *840 - 959*
> The implementation should be:
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1];
> int end = byteRange[2]-begin-1;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2852) Improve code quality (2)

2016-10-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608921#comment-15608921
 ] 

ASF subversion and git services commented on PDFBOX-2852:
-

Commit 1766696 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1766696 ]

PDFBOX-2852: rename misleading variable names

> Improve code quality (2)
> 
>
> Key: PDFBOX-2852
> URL: https://issues.apache.org/jira/browse/PDFBOX-2852
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Attachments: PDNameTreeNode.java.patch, XMPSchema.java.patch, 
> explicit_array_creation.patch, fix_javadoc.patch, foreach.patch, 
> noarray.patch, semicolon.patch, stringbuilder.patch, 
> unnecessary_type_casting.patch, unused_imports.patch, usestatic.patch, 
> winansiencoding.patch, winansiencoding2.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2576, which was getting too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2852) Improve code quality (2)

2016-10-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608922#comment-15608922
 ] 

ASF subversion and git services commented on PDFBOX-2852:
-

Commit 1766697 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1766697 ]

PDFBOX-2852: rename misleading variable names

> Improve code quality (2)
> 
>
> Key: PDFBOX-2852
> URL: https://issues.apache.org/jira/browse/PDFBOX-2852
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Attachments: PDNameTreeNode.java.patch, XMPSchema.java.patch, 
> explicit_array_creation.patch, fix_javadoc.patch, foreach.patch, 
> noarray.patch, semicolon.patch, stringbuilder.patch, 
> unnecessary_type_casting.patch, unused_imports.patch, usestatic.patch, 
> winansiencoding.patch, winansiencoding2.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2576, which was getting too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3544) Invalid ByteRange for getContents() method

2016-10-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608901#comment-15608901
 ] 

Tilman Hausherr commented on PDFBOX-3544:
-

I didn't test your code change (by your own admission, it doesn't work), I 
looked at what we do: we take the offset at "<" and after the ">" and that one 
is in /ByteRange. So there has to be an adjustment to get the actual 
signatzure. Adobe does the same, see the signed files in PDFBOX-3540. Because 
of that we have to add 1. I agree that this is in contradiction with the 
document you mention - that one sets the offset after the "<".

You can also find a file signed with itext in PDFBOX-1751, and with Adobe in 
PDFBOX-3114.

> Invalid ByteRange for getContents() method
> --
>
> Key: PDFBOX-3544
> URL: https://issues.apache.org/jira/browse/PDFBOX-3544
> Project: PDFBox
>  Issue Type: Bug
>  Components: Signing
>Affects Versions: 2.0.3
>Reporter: Lonzak
>
> PDSignature.java class, getContents() method, line 325ff.
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1]+1;
> int end = byteRange[2]-begin;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}
> Lets asume a byte range of 
> /ByteRange[ 0, 840, 960, 240]
> The current implementation would return
> {841, 119} which is from *841 - 960*
> According to 
> [adobe|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/DigitalSignaturesInPDF.pdf]
>  (page 5) this is invalid:
> {quote}
> "In this example, the hash is calculated for bytes 0 through 839, and 960 
> through 1200."
> {quote}
> Thus the values for the signature should be
> {840, 119} which is from *840 - 959*
> The implementation should be:
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1];
> int end = byteRange[2]-begin-1;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3544) Invalid ByteRange for getContents() method

2016-10-26 Thread Lonzak (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608607#comment-15608607
 ] 

Lonzak edited comment on PDFBOX-3544 at 10/26/16 2:33 PM:
--

Or is that document wrong? Strangely I get an AIOOB Exception if I try that 
version...


was (Author: teewetee):
Or is that document wrong?

> Invalid ByteRange for getContents() method
> --
>
> Key: PDFBOX-3544
> URL: https://issues.apache.org/jira/browse/PDFBOX-3544
> Project: PDFBox
>  Issue Type: Bug
>  Components: Signing
>Affects Versions: 2.0.3
>Reporter: Lonzak
>
> PDSignature.java class, getContents() method, line 325ff.
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1]+1;
> int end = byteRange[2]-begin;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}
> Lets asume a byte range of 
> /ByteRange[ 0, 840, 960, 240]
> The current implementation would return
> {841, 119} which is from *841 - 960*
> According to 
> [adobe|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/DigitalSignaturesInPDF.pdf]
>  (page 5) this is invalid:
> {quote}
> "In this example, the hash is calculated for bytes 0 through 839, and 960 
> through 1200."
> {quote}
> Thus the values for the signature should be
> {840, 119} which is from *840 - 959*
> The implementation should be:
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1];
> int end = byteRange[2]-begin-1;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3544) Invalid ByteRange for getContents() method

2016-10-26 Thread Lonzak (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608607#comment-15608607
 ] 

Lonzak commented on PDFBOX-3544:


Or is that document wrong?

> Invalid ByteRange for getContents() method
> --
>
> Key: PDFBOX-3544
> URL: https://issues.apache.org/jira/browse/PDFBOX-3544
> Project: PDFBox
>  Issue Type: Bug
>  Components: Signing
>Affects Versions: 2.0.3
>Reporter: Lonzak
>
> PDSignature.java class, getContents() method, line 325ff.
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1]+1;
> int end = byteRange[2]-begin;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}
> Lets asume a byte range of 
> /ByteRange[ 0, 840, 960, 240]
> The current implementation would return
> {841, 119} which is from *841 - 960*
> According to 
> [adobe|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/DigitalSignaturesInPDF.pdf]
>  (page 5) this is invalid:
> {quote}
> "In this example, the hash is calculated for bytes 0 through 839, and 960 
> through 1200."
> {quote}
> Thus the values for the signature should be
> {840, 119} which is from *840 - 959*
> The implementation should be:
> {code:title=PDSignature.java|borderStyle=solid}
> /**
>  * Will return the embedded signature between the byterange gap.
>  *
>  * @param pdfFile The signed pdf file as byte array
>  * @return a byte array containing the signature
>  * @throws IOException if the pdfFile can't be read
>  */
> public byte[] getContents(byte[] pdfFile) throws IOException
> {
> int[] byteRange = getByteRange();
> int begin = byteRange[0]+byteRange[1];
> int end = byteRange[2]-begin-1;
> return getContents(new COSFilterInputStream(pdfFile,new int[] 
> {begin,end}));
> }
> {code:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3544) Invalid ByteRange for getContents() method

2016-10-26 Thread TvT (JIRA)
TvT created PDFBOX-3544:
---

 Summary: Invalid ByteRange for getContents() method
 Key: PDFBOX-3544
 URL: https://issues.apache.org/jira/browse/PDFBOX-3544
 Project: PDFBox
  Issue Type: Bug
  Components: Signing
Affects Versions: 2.0.3
Reporter: TvT


PDSignature.java class, getContents() method, line 325ff.

{code:title=PDSignature.java|borderStyle=solid}
/**
 * Will return the embedded signature between the byterange gap.
 *
 * @param pdfFile The signed pdf file as byte array
 * @return a byte array containing the signature
 * @throws IOException if the pdfFile can't be read
 */
public byte[] getContents(byte[] pdfFile) throws IOException
{
int[] byteRange = getByteRange();
int begin = byteRange[0]+byteRange[1]+1;
int end = byteRange[2]-begin;

return getContents(new COSFilterInputStream(pdfFile,new int[] 
{begin,end}));
}
{code:}
Lets asume a byte range of 
/ByteRange[ 0, 840, 960, 240]

The current implementation would return
{841, 119} which is from *841 - 960*

According to 
[adobe|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/DigitalSignaturesInPDF.pdf]
 (page 5) this is invalid:
{quote}
"In this example, the hash is calculated for bytes 0 through 839, and 960 
through 1200."
{quote}
Thus the values for the signature should be
{840, 119} which is from *840 - 959*

The implementation should be:
{code:title=PDSignature.java|borderStyle=solid}
/**
 * Will return the embedded signature between the byterange gap.
 *
 * @param pdfFile The signed pdf file as byte array
 * @return a byte array containing the signature
 * @throws IOException if the pdfFile can't be read
 */
public byte[] getContents(byte[] pdfFile) throws IOException
{
int[] byteRange = getByteRange();
int begin = byteRange[0]+byteRange[1];
int end = byteRange[2]-begin-1;

return getContents(new COSFilterInputStream(pdfFile,new int[] 
{begin,end}));
}
{code:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-3532) Java 6 errors

2016-10-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3532.
-
Resolution: Fixed

> Java 6 errors
> -
>
> Key: PDFBOX-3532
> URL: https://issues.apache.org/jira/browse/PDFBOX-3532
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.3
>Reporter: simon steiner
>Assignee: Tilman Hausherr
> Fix For: 1.8.13, 2.0.4, 2.1.0
>
>
> Under java 6 and 8 and clean ~/.m2 directory:
> mvn clean install -DskipTests
> Downloading: 
> http://www.pdfa.org/wp-content/uploads/2011/08/isartor-pdfa-2008-08-13.zip
> javax.net.ssl.SSLHandshakeException: 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
> Java 6 only:
> [ERROR] 
> pdf-box-svn/pdfbox/src/test/java/org/apache/pdfbox/pdmodel/TestPDDocument.java:[205,41]
>  cannot find symbol
> [ERROR] symbol  : class Builder
> [ERROR] location: class java.util.Locale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3540) Trailer Syntax error, ID is different in the first and the last trailer - for PDF with incremental updates

2016-10-26 Thread Maya Angelova (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607765#comment-15607765
 ] 

Maya Angelova commented on PDFBOX-3540:
---

Well, I have the file as an attachment in a mail, my tests extract the 
attachment, and thereafter perform the validation... I'll be able to look into 
it in a few hours and write the result here. Thank you for your effort!

> Trailer Syntax error, ID is different in the first and the last trailer - for 
> PDF with incremental updates
> --
>
> Key: PDFBOX-3540
> URL: https://issues.apache.org/jira/browse/PDFBOX-3540
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 1.8.12, 2.0.3
>Reporter: Maruan Sahyoun
>Assignee: Tilman Hausherr
> Fix For: 2.0.4, 2.1.0
>
> Attachments: Pardes13_Rez02.pdf, testfile_original.pdf, 
> testfile_signed_once.pdf, testfile_signed_twice.pdf
>
>
> As reported at the users mailing list:
> 
> Hello guys,
> I have the following problem using apache.pdfbox when validating a valid 
> PDF/A-1 file, which is being signed twice:
> 1. The online validator confirms that the file is valid 
> (https://www.pdf-tools.com/pdf/validate-pdfa-online.aspx)
> 2. But when I validate it using the following code:
> {code}
> PreflightParser parser = new PreflightParser(byteDatasource);
> parser.parse();
> PreflightDocument document = parser.getPreflightDocument();
> document.validate();
> result = document.getResult();
> {code}
> 3. The file is linearized
> 4. I get that the file is invalid and the error description reads:
> {code}
> Trailer Syntax error, ID is different in the first and the last trailer
> {code}
> According to issues PDFBOX-3256 and PDFBOX-2502 this should be fixed?
> Could anyone give me a tip how to go around this problem or would that be a 
> bug?
> The pdf file is attached.
> 
> *Analysis:*
> The original PDF is linearized with a subsequent incremental update.
> According to ISO 32000-1 F1
> {quote}
> Incremental update shall still be permitted, but the resulting PDF is no 
> longer linearized and subsequently shall be treated as ordinary PDF. 
> Linearizing it again may require reprocessing the entire file; see G.7, 
> "Accessing an Updated File" for details.
> {quote}
> as the file shall no longer be treated as linearized the provision about 
> matching  ID's as outlined in PDFBOX-2502 no longer applies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3540) Trailer Syntax error, ID is different in the first and the last trailer - for PDF with incremental updates

2016-10-26 Thread Maya Angelova (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607763#comment-15607763
 ] 

Maya Angelova commented on PDFBOX-3540:
---

Well, I have the file as an attachment in a mail, my tests extract the 
attachment, and thereafter perform the validation... I'll be able to look into 
it in a few hours and write the result here. Thank you for your effort!

> Trailer Syntax error, ID is different in the first and the last trailer - for 
> PDF with incremental updates
> --
>
> Key: PDFBOX-3540
> URL: https://issues.apache.org/jira/browse/PDFBOX-3540
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 1.8.12, 2.0.3
>Reporter: Maruan Sahyoun
>Assignee: Tilman Hausherr
> Fix For: 2.0.4, 2.1.0
>
> Attachments: Pardes13_Rez02.pdf, testfile_original.pdf, 
> testfile_signed_once.pdf, testfile_signed_twice.pdf
>
>
> As reported at the users mailing list:
> 
> Hello guys,
> I have the following problem using apache.pdfbox when validating a valid 
> PDF/A-1 file, which is being signed twice:
> 1. The online validator confirms that the file is valid 
> (https://www.pdf-tools.com/pdf/validate-pdfa-online.aspx)
> 2. But when I validate it using the following code:
> {code}
> PreflightParser parser = new PreflightParser(byteDatasource);
> parser.parse();
> PreflightDocument document = parser.getPreflightDocument();
> document.validate();
> result = document.getResult();
> {code}
> 3. The file is linearized
> 4. I get that the file is invalid and the error description reads:
> {code}
> Trailer Syntax error, ID is different in the first and the last trailer
> {code}
> According to issues PDFBOX-3256 and PDFBOX-2502 this should be fixed?
> Could anyone give me a tip how to go around this problem or would that be a 
> bug?
> The pdf file is attached.
> 
> *Analysis:*
> The original PDF is linearized with a subsequent incremental update.
> According to ISO 32000-1 F1
> {quote}
> Incremental update shall still be permitted, but the resulting PDF is no 
> longer linearized and subsequently shall be treated as ordinary PDF. 
> Linearizing it again may require reprocessing the entire file; see G.7, 
> "Accessing an Updated File" for details.
> {quote}
> as the file shall no longer be treated as linearized the provision about 
> matching  ID's as outlined in PDFBOX-2502 no longer applies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Issue Comment Deleted] (PDFBOX-3540) Trailer Syntax error, ID is different in the first and the last trailer - for PDF with incremental updates

2016-10-26 Thread Maya Angelova (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maya Angelova updated PDFBOX-3540:
--
Comment: was deleted

(was: Well, I have the file as an attachment in a mail, my tests extract the 
attachment, and thereafter perform the validation... I'll be able to look into 
it in a few hours and write the result here. Thank you for your effort!)

> Trailer Syntax error, ID is different in the first and the last trailer - for 
> PDF with incremental updates
> --
>
> Key: PDFBOX-3540
> URL: https://issues.apache.org/jira/browse/PDFBOX-3540
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 1.8.12, 2.0.3
>Reporter: Maruan Sahyoun
>Assignee: Tilman Hausherr
> Fix For: 2.0.4, 2.1.0
>
> Attachments: Pardes13_Rez02.pdf, testfile_original.pdf, 
> testfile_signed_once.pdf, testfile_signed_twice.pdf
>
>
> As reported at the users mailing list:
> 
> Hello guys,
> I have the following problem using apache.pdfbox when validating a valid 
> PDF/A-1 file, which is being signed twice:
> 1. The online validator confirms that the file is valid 
> (https://www.pdf-tools.com/pdf/validate-pdfa-online.aspx)
> 2. But when I validate it using the following code:
> {code}
> PreflightParser parser = new PreflightParser(byteDatasource);
> parser.parse();
> PreflightDocument document = parser.getPreflightDocument();
> document.validate();
> result = document.getResult();
> {code}
> 3. The file is linearized
> 4. I get that the file is invalid and the error description reads:
> {code}
> Trailer Syntax error, ID is different in the first and the last trailer
> {code}
> According to issues PDFBOX-3256 and PDFBOX-2502 this should be fixed?
> Could anyone give me a tip how to go around this problem or would that be a 
> bug?
> The pdf file is attached.
> 
> *Analysis:*
> The original PDF is linearized with a subsequent incremental update.
> According to ISO 32000-1 F1
> {quote}
> Incremental update shall still be permitted, but the resulting PDF is no 
> longer linearized and subsequently shall be treated as ordinary PDF. 
> Linearizing it again may require reprocessing the entire file; see G.7, 
> "Accessing an Updated File" for details.
> {quote}
> as the file shall no longer be treated as linearized the provision about 
> matching  ID's as outlined in PDFBOX-2502 no longer applies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3542) Can PDFBOX use Streams to read PDSignatures from document?

2016-10-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607728#comment-15607728
 ] 

Tilman Hausherr edited comment on PDFBOX-3542 at 10/26/16 7:33 AM:
---

Loading from a file is best. To save memory, try
{code}
PDDocument.load (file, MemoryUsageSetting.setupTempFileOnly());
{code}



was (Author: tilman):
Loading from a file is best. To save memory, try
{code}
PDDocument.load (new File(), MemoryUsageSetting.setupTempFileOnly());
{code}


> Can PDFBOX use Streams to read PDSignatures from document?
> --
>
> Key: PDFBOX-3542
> URL: https://issues.apache.org/jira/browse/PDFBOX-3542
> Project: PDFBox
>  Issue Type: Wish
>  Components: PDModel
>Affects Versions: 2.0.3
>Reporter: Andrea Paternesi
>Priority: Critical
>
> I did not find a way to avoid loading into memory the whole PDDocument to 
> read the signatures dictionaries.
> If you have very big PDF files (30MB or more), java gets an Out of Memory 
> error.
> Right now i did not find a correct way to load signatures usign stream.
> Can you give any hont?
> Thanks in advance.
> Andrea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3542) Can PDFBOX use Streams to read PDSignatures from document?

2016-10-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607728#comment-15607728
 ] 

Tilman Hausherr commented on PDFBOX-3542:
-

Loading from a file is best. To save memory, try
{code}
PDDocument.load (new File(), MemoryUsageSetting.setupTempFileOnly());
{code}


> Can PDFBOX use Streams to read PDSignatures from document?
> --
>
> Key: PDFBOX-3542
> URL: https://issues.apache.org/jira/browse/PDFBOX-3542
> Project: PDFBox
>  Issue Type: Wish
>  Components: PDModel
>Affects Versions: 2.0.3
>Reporter: Andrea Paternesi
>Priority: Critical
>
> I did not find a way to avoid loading into memory the whole PDDocument to 
> read the signatures dictionaries.
> If you have very big PDF files (30MB or more), java gets an Out of Memory 
> error.
> Right now i did not find a correct way to load signatures usign stream.
> Can you give any hont?
> Thanks in advance.
> Andrea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3542) Can PDFBOX use Streams to read PDSignatures from document?

2016-10-26 Thread Andrea Paternesi (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607660#comment-15607660
 ] 

Andrea Paternesi edited comment on PDFBOX-3542 at 10/26/16 7:18 AM:


I usually use PDDocument.load() passing a File. And in certain circumstances it 
gets the out of memory error.

There are some cases in which you cannot use JVM parameters to use more memory.
In my case we use the old java AX Bridge to integrate some signing features  
with some old MS Visual Fox code. There is no way to pass arguments in java 7 
within the bridge.

If i use a FileInputStream to load the PDDocument will it handle the stream in 
a different way?

What i need to do is only read the signatures to validate them.
So i suppose i do not need to load the entire dcument in memory but extract 
only the signature dictionary which is very little in size even with many 
signatures inside.

What i noticed is that while signing even a big file this does not happen.

I will try the 2.0.4  but what is the scatch file method?

Any clue?
Thanks.
Andrea.





was (Author: patton73):
I usually use PDDocument.load() passing a File. And in certain circumstances it 
gets the out of memory error.

There are some cases in which you cannot use JVM parameters to use more memory.
In my case we use the old java AX Bridge to integrate some signing features  
with some old MS Visual Fox code. There is no way to pass arguments in java 7 
within the bridge.

If i use a FileInputStream to load the PDDocument will it handle the stream in 
a different way?

What i need to do is only read the signatures to validate them.
So i suppose i do not need to load the entire dcument in memory but extract 
only the signature dictionary which is very little in size even with many 
signatures inside.

What i noticed is that while signing even a big file this does not happen.

I will try the 2.0.4 and see the scatch file method.

Any clue?
Thanks.
Andrea.




> Can PDFBOX use Streams to read PDSignatures from document?
> --
>
> Key: PDFBOX-3542
> URL: https://issues.apache.org/jira/browse/PDFBOX-3542
> Project: PDFBox
>  Issue Type: Wish
>  Components: PDModel
>Affects Versions: 2.0.3
>Reporter: Andrea Paternesi
>Priority: Critical
>
> I did not find a way to avoid loading into memory the whole PDDocument to 
> read the signatures dictionaries.
> If you have very big PDF files (30MB or more), java gets an Out of Memory 
> error.
> Right now i did not find a correct way to load signatures usign stream.
> Can you give any hont?
> Thanks in advance.
> Andrea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3542) Can PDFBOX use Streams to read PDSignatures from document?

2016-10-26 Thread Andrea Paternesi (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607660#comment-15607660
 ] 

Andrea Paternesi edited comment on PDFBOX-3542 at 10/26/16 7:17 AM:


I usually use PDDocument.load() passing a File. And in certain circumstances it 
gets the out of memory error.

There are some cases in which you cannot use JVM parameters to use more memory.
In my case we use the old java AX Bridge to integrate some signing features  
with some old MS Visual Fox code. There is no way to pass arguments in java 7 
within the bridge.

If i use a FileInputStream to load the PDDocument will it handle the stream in 
a different way?

What i need to do is only read the signatures to validate them.
So i suppose i do not need to load the entire dcument in memory but extract 
only the signature dictionary which is very little in size even with many 
signatures inside.

What i noticed is that while signing even a big file this does not happen.

I will try the 2.0.4 and see the scatch file method.

Any clue?
Thanks.
Andrea.





was (Author: patton73):
I usually use PDDocument.load() passing a File. And in certain circumstances it 
gets the out of memory error.

There are some cases in which you cannot use JVM parameters to use more memory.
In my case we use the old java AX Bridge to integrate some signing features  
with some old MS Visual Fox code. There is no way to pass arguments in java 7 
within the bridge.

If i use a FileInputStream to load the PDDocument will it handle the stream in 
a different way?

What i need to do is only read the signatures to validate them.
So i suppose i do not need to load the entire dcument in memory but extract 
only the signature dictionary which is very little in size even with many 
signatures inside.

What i noticed is that while signing even a big file this does not happen.

Any clue?
Thanks.
Andrea.




> Can PDFBOX use Streams to read PDSignatures from document?
> --
>
> Key: PDFBOX-3542
> URL: https://issues.apache.org/jira/browse/PDFBOX-3542
> Project: PDFBox
>  Issue Type: Wish
>  Components: PDModel
>Affects Versions: 2.0.3
>Reporter: Andrea Paternesi
>Priority: Critical
>
> I did not find a way to avoid loading into memory the whole PDDocument to 
> read the signatures dictionaries.
> If you have very big PDF files (30MB or more), java gets an Out of Memory 
> error.
> Right now i did not find a correct way to load signatures usign stream.
> Can you give any hont?
> Thanks in advance.
> Andrea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3542) Can PDFBOX use Streams to read PDSignatures from document?

2016-10-26 Thread Andrea Paternesi (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607660#comment-15607660
 ] 

Andrea Paternesi commented on PDFBOX-3542:
--

I usually use PDDocument.load() passing a File. And in certain circumstances it 
gets the out of memory error.

There are some cases in which you cannot use JVM parameters to use more memory.
In my case we use the old java AX Bridge to integrate some signing features  
with some old MS Visual Fox code. There is no way to pass arguments in java 7 
within the bridge.

If i use a FileInputStream to load the PDDocument will it handle the stream in 
a different way?

What i need to do is only read the signatures to validate them.
So i suppose i do not need to load the entire dcument in memory but extract 
only the signature dictionary which is very little in size even with many 
signatures inside.

What i noticed is that while signing even a big file this does not happen.

Any clue?
Thanks.
Andrea.




> Can PDFBOX use Streams to read PDSignatures from document?
> --
>
> Key: PDFBOX-3542
> URL: https://issues.apache.org/jira/browse/PDFBOX-3542
> Project: PDFBox
>  Issue Type: Wish
>  Components: PDModel
>Affects Versions: 2.0.3
>Reporter: Andrea Paternesi
>Priority: Critical
>
> I did not find a way to avoid loading into memory the whole PDDocument to 
> read the signatures dictionaries.
> If you have very big PDF files (30MB or more), java gets an Out of Memory 
> error.
> Right now i did not find a correct way to load signatures usign stream.
> Can you give any hont?
> Thanks in advance.
> Andrea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org