Re: Relocating git repositories on git-wip-us.apache.org

2018-12-21 Thread Maruan Sahyoun
opened as INFRA-17488 and INFRA-17487

> +1
> 
> Tilman
> 
> Am 21.12.2018 um 14:48 schrieb Maruan Sahyoun:
> > Hi,
> > 
> > given that the move to GitBox has been smooth from what I see I'd propose 
> > that we ask INFRA to move the remaining repos:
> > 
> > # pdfbox-jbig2 - jbig2 subproject
> > # pdfbox-testfiles - build test files for jbig2
> > 
> > WDYT?
> > 
> > BR
> > Maruan
> > 
> >   
> > > INFRA-17416 has been closed and I changed the maven configuration for 
> > > pdfbox-docs accordingly [PDFBOX-4405]. Please change the
> > > git repo to https://gitbox.apache.org/repos/asf/pdfbox-docs.git
> > > 
> > > In addition you can also use GitHub after linking your ASF and GitHub 
> > > accounts [https://gitbox.apache.org/setup/] to do changes
> > > to the documentation but for now the local build using jekyll is still 
> > > needed.
> > > 
> > > BR
> > > Maruan
> > >   
> > > > I opened INFRA-17415 (removal of pdfbox-examples) and INFRA-17416 
> > > > (relocation of pdfbox-docs)
> > > > 
> > > > BR
> > > > Maruan
> > > > 
> > > > > Am 07.12.2018 um 22:56 schrieb Andreas Lehmkuehler:
> > > > > > Am 07.12.18 um 18:39 schrieb Maruan Sahyoun:
> > > > > > > hi,
> > > > > > > 
> > > > > > > as announced earlier infra wil decomission git-wip-us.apache.org 
> > > > > > > in
> > > > > > > favour of the gitbox service. AFAIK we currently have four
> > > > > > > repos there
> > > > > > > 
> > > > > > > # pdfbox-docs - our documentation
> > > > > > > # pdfbox-examples - empty
> > > > > > > # pdfbox-jbig2 - jbig2 subproject
> > > > > > > # pdfbox-testfiles - build test files for jbig2
> > > > > > > 
> > > > > > > Although there is a voluntary period until January 9th (after 
> > > > > > > which
> > > > > > > we need to move wihtin a month) I'd propose to start right
> > > > > > > away as this will give us more time.
> > > > > > +1
> > > > > +1
> > > > > 
> > > > > 
> > > > > > > I'd propse to move the doumentation first as it's not so critical 
> > > > > > > if
> > > > > > > we loose some of the commit information and after that
> > > > > > > pdfbox-jbig2. pdfbox-examples can be retired.
> > > > > > +1
> > > > > +1
> > > > > 
> > > > > If we agree on
> > > > > 
> > > > > https://issues.apache.org/jira/browse/PDFBOX-4401
> > > > > 
> > > > > then please remove the KCMS segment in
> > > > > 
> > > > > https://pdfbox.apache.org/2.0/getting-started.html
> > > > > 
> > > > > or replace with warning that it applies only to older jdk8 / 9 
> > > > > versions.
> > > > > 
> > > > > "2.0.4" in the pom segment should be replaced with 2.0.13 or "insert
> > > > > version here"
> > > > > 
> > > > > Tilman
> > > > > 
> > > > > 
> > > > > > > WDYT?
> > > > > > I somehow missed the announcement due to a suboptimal rule within my
> > > > > > mailer ;-)
> > > > > > 
> > > > > > @Maruan: are you going to handle this?
> > > > > > 
> > > > > > One more thing. I just saw that we need a documented consensus in 
> > > > > > the
> > > > > > project. We should simply start a vote once we have a plan on how to
> > > > > > proceed (Maruan already proposed a good one). IMHO lazy consensus
> > > > > > should be sufficient.
> > > > > > 
> > > > > > 
> > > > > > Andreas
> > > > > > 
> > > > > > > BR
> > > > > > > Maruan
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > -
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > > > > 
> > > > > > -
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > > > 
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Relocating git repositories on git-wip-us.apache.org

2018-12-21 Thread Tilman Hausherr

+1

Tilman

Am 21.12.2018 um 14:48 schrieb Maruan Sahyoun:

Hi,

given that the move to GitBox has been smooth from what I see I'd propose that 
we ask INFRA to move the remaining repos:

# pdfbox-jbig2 - jbig2 subproject
# pdfbox-testfiles - build test files for jbig2

WDYT?

BR
Maruan

  

INFRA-17416 has been closed and I changed the maven configuration for 
pdfbox-docs accordingly [PDFBOX-4405]. Please change the
git repo to https://gitbox.apache.org/repos/asf/pdfbox-docs.git

In addition you can also use GitHub after linking your ASF and GitHub accounts 
[https://gitbox.apache.org/setup/] to do changes
to the documentation but for now the local build using jekyll is still needed.

BR
Maruan
  

I opened INFRA-17415 (removal of pdfbox-examples) and INFRA-17416 (relocation 
of pdfbox-docs)

BR
Maruan


Am 07.12.2018 um 22:56 schrieb Andreas Lehmkuehler:

Am 07.12.18 um 18:39 schrieb Maruan Sahyoun:

hi,

as announced earlier infra wil decomission git-wip-us.apache.org in
favour of the gitbox service. AFAIK we currently have four
repos there

# pdfbox-docs - our documentation
# pdfbox-examples - empty
# pdfbox-jbig2 - jbig2 subproject
# pdfbox-testfiles - build test files for jbig2

Although there is a voluntary period until January 9th (after which
we need to move wihtin a month) I'd propose to start right
away as this will give us more time.

+1

+1



I'd propse to move the doumentation first as it's not so critical if
we loose some of the commit information and after that
pdfbox-jbig2. pdfbox-examples can be retired.

+1

+1

If we agree on

https://issues.apache.org/jira/browse/PDFBOX-4401

then please remove the KCMS segment in

https://pdfbox.apache.org/2.0/getting-started.html

or replace with warning that it applies only to older jdk8 / 9 versions.

"2.0.4" in the pom segment should be replaced with 2.0.13 or "insert
version here"

Tilman



WDYT?

I somehow missed the announcement due to a suboptimal rule within my
mailer ;-)

@Maruan: are you going to handle this?

One more thing. I just saw that we need a documented consensus in the
project. We should simply start a vote once we have a plan on how to
proceed (Maruan already proposed a good one). IMHO lazy consensus
should be sufficient.


Andreas


BR
Maruan



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Relocating git repositories on git-wip-us.apache.org

2018-12-21 Thread Andreas Lehmkuehler

+1

and BTW thanks for taking care about that!

Andreas

Am 21.12.18 um 14:48 schrieb Maruan Sahyoun:

Hi,

given that the move to GitBox has been smooth from what I see I'd propose that 
we ask INFRA to move the remaining repos:

# pdfbox-jbig2 - jbig2 subproject
# pdfbox-testfiles - build test files for jbig2

WDYT?

BR
Maruan

  

INFRA-17416 has been closed and I changed the maven configuration for 
pdfbox-docs accordingly [PDFBOX-4405]. Please change the
git repo to https://gitbox.apache.org/repos/asf/pdfbox-docs.git

In addition you can also use GitHub after linking your ASF and GitHub accounts 
[https://gitbox.apache.org/setup/] to do changes
to the documentation but for now the local build using jekyll is still needed.

BR
Maruan
  

I opened INFRA-17415 (removal of pdfbox-examples) and INFRA-17416 (relocation 
of pdfbox-docs)

BR
Maruan


Am 07.12.2018 um 22:56 schrieb Andreas Lehmkuehler:

Am 07.12.18 um 18:39 schrieb Maruan Sahyoun:

hi,

as announced earlier infra wil decomission git-wip-us.apache.org in
favour of the gitbox service. AFAIK we currently have four
repos there

# pdfbox-docs - our documentation
# pdfbox-examples - empty
# pdfbox-jbig2 - jbig2 subproject
# pdfbox-testfiles - build test files for jbig2

Although there is a voluntary period until January 9th (after which
we need to move wihtin a month) I'd propose to start right
away as this will give us more time.

+1


+1



I'd propse to move the doumentation first as it's not so critical if
we loose some of the commit information and after that
pdfbox-jbig2. pdfbox-examples can be retired.

+1


+1

If we agree on

https://issues.apache.org/jira/browse/PDFBOX-4401

then please remove the KCMS segment in

https://pdfbox.apache.org/2.0/getting-started.html

or replace with warning that it applies only to older jdk8 / 9 versions.

"2.0.4" in the pom segment should be replaced with 2.0.13 or "insert
version here"

Tilman



WDYT?

I somehow missed the announcement due to a suboptimal rule within my
mailer ;-)

@Maruan: are you going to handle this?

One more thing. I just saw that we need a documented consensus in the
project. We should simply start a vote once we have a plan on how to
proceed (Maruan already proposed a good one). IMHO lazy consensus
should be sufficient.


Andreas


BR
Maruan



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4413) Add support for AES256 encryption for public key

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4413:

Affects Version/s: 2.0.13

> Add support for AES256 encryption for public key
> 
>
> Key: PDFBOX-4413
> URL: https://issues.apache.org/jira/browse/PDFBOX-4413
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Crypto
>Affects Versions: 2.0.13
>Reporter: Tilman Hausherr
>Priority: Major
>
> Adobe 9 added support for AES 256 encryption. This should also be implemented 
> for public key encryption, currently it is only implemented for symmetric key 
> encryption.
> Further information is available at 
> [http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf].
> I suspect that much of it is already available in the base class, but 
> PublicKeySecurityHandler.prepareDocumentForEncryption() is definitively old 
> stuff.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4413) Add support for AES256 encryption for public key

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4413:

Description: 
Adobe 9 added support for AES 256 encryption. This should also be implemented 
for public key encryption, currently it is only implemented for symmetric key 
encryption.

Further information is available at 
[http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf].

I suspect that much of it is already available in the base class, but 
PublicKeySecurityHandler.prepareDocumentForEncryption() is definitively old 
stuff.

  was:
Adobe 9 added support for AES 256 encryption. This should also be implemented 
for public key encryption, currently it is only implemented for symmetric key 
encryption.

Further information is available at 
[http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf].


> Add support for AES256 encryption for public key
> 
>
> Key: PDFBOX-4413
> URL: https://issues.apache.org/jira/browse/PDFBOX-4413
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Crypto
>Reporter: Tilman Hausherr
>Priority: Major
>
> Adobe 9 added support for AES 256 encryption. This should also be implemented 
> for public key encryption, currently it is only implemented for symmetric key 
> encryption.
> Further information is available at 
> [http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf].
> I suspect that much of it is already available in the base class, but 
> PublicKeySecurityHandler.prepareDocumentForEncryption() is definitively old 
> stuff.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Relocating git repositories on git-wip-us.apache.org

2018-12-21 Thread Maruan Sahyoun
Hi,

given that the move to GitBox has been smooth from what I see I'd propose that 
we ask INFRA to move the remaining repos:

# pdfbox-jbig2 - jbig2 subproject
# pdfbox-testfiles - build test files for jbig2

WDYT?

BR
Maruan

 
> INFRA-17416 has been closed and I changed the maven configuration for 
> pdfbox-docs accordingly [PDFBOX-4405]. Please change the
> git repo to https://gitbox.apache.org/repos/asf/pdfbox-docs.git
> 
> In addition you can also use GitHub after linking your ASF and GitHub 
> accounts [https://gitbox.apache.org/setup/] to do changes
> to the documentation but for now the local build using jekyll is still needed.
> 
> BR
> Maruan
>  
> > I opened INFRA-17415 (removal of pdfbox-examples) and INFRA-17416 
> > (relocation of pdfbox-docs)
> > 
> > BR
> > Maruan
> > 
> > > Am 07.12.2018 um 22:56 schrieb Andreas Lehmkuehler:
> > > > Am 07.12.18 um 18:39 schrieb Maruan Sahyoun:
> > > > > hi,
> > > > > 
> > > > > as announced earlier infra wil decomission git-wip-us.apache.org in 
> > > > > favour of the gitbox service. AFAIK we currently have four
> > > > > repos there
> > > > > 
> > > > > # pdfbox-docs - our documentation
> > > > > # pdfbox-examples - empty
> > > > > # pdfbox-jbig2 - jbig2 subproject
> > > > > # pdfbox-testfiles - build test files for jbig2
> > > > > 
> > > > > Although there is a voluntary period until January 9th (after which 
> > > > > we need to move wihtin a month) I'd propose to start right
> > > > > away as this will give us more time.
> > > > +1
> > > 
> > > +1
> > > 
> > > 
> > > > > I'd propse to move the doumentation first as it's not so critical if 
> > > > > we loose some of the commit information and after that
> > > > > pdfbox-jbig2. pdfbox-examples can be retired.
> > > > +1
> > > 
> > > +1
> > > 
> > > If we agree on
> > > 
> > > https://issues.apache.org/jira/browse/PDFBOX-4401
> > > 
> > > then please remove the KCMS segment in
> > > 
> > > https://pdfbox.apache.org/2.0/getting-started.html
> > > 
> > > or replace with warning that it applies only to older jdk8 / 9 versions.
> > > 
> > > "2.0.4" in the pom segment should be replaced with 2.0.13 or "insert 
> > > version here"
> > > 
> > > Tilman
> > > 
> > > 
> > > > > WDYT?
> > > > I somehow missed the announcement due to a suboptimal rule within my 
> > > > mailer ;-)
> > > > 
> > > > @Maruan: are you going to handle this?
> > > > 
> > > > One more thing. I just saw that we need a documented consensus in the 
> > > > project. We should simply start a vote once we have a plan on how to 
> > > > proceed (Maruan already proposed a good one). IMHO lazy consensus 
> > > > should be sufficient.
> > > > 
> > > > 
> > > > Andreas
> > > > 
> > > > > BR
> > > > > Maruan
> > > > > 
> > > > > 
> > > > > 
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > > 
> > > > 
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > 
> > > 
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726742#comment-16726742
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

Yes, I guess so, that is the usual rhythm. The last release was earlier this 
month.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if 

[jira] [Resolved] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-4407.
-
Resolution: Fixed

Thank you for the feedback and for your analysis of the problem! Setting to 
resolved.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if 

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726737#comment-16726737
 ] 

Dan Anderson commented on PDFBOX-4407:
--

>From your comment on another issue, it looks like 2.0.14 release will probably 
>be in about 3 months?

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
>  

[jira] [Created] (PDFBOX-4413) Add support for AES256 encryption for public key

2018-12-21 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-4413:
---

 Summary: Add support for AES256 encryption for public key
 Key: PDFBOX-4413
 URL: https://issues.apache.org/jira/browse/PDFBOX-4413
 Project: PDFBox
  Issue Type: Improvement
  Components: Crypto
Reporter: Tilman Hausherr


Adobe 9 added support for AES 256 encryption. This should also be implemented 
for public key encryption, currently it is only implemented for symmetric key 
encryption.

Further information is available at 
[http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726720#comment-16726720
 ] 

Dan Anderson commented on PDFBOX-4407:
--

That works!  JAWS was able to read the appended document, and the reading order 
looked correct in Adobe.  Thank you!

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : 

[jira] [Commented] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-12-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726689#comment-16726689
 ] 

Tilman Hausherr commented on PDFBOX-4363:
-

If you can provide a patch that doesn't break the API and that is as nice as 
the one here that was committed, then please create an issue :)

> [Patch] Add a common interface PDShadingPaint for all shading paints
> 
>
> Key: PDFBOX-4363
> URL: https://issues.apache.org/jira/browse/PDFBOX-4363
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.13, 3.0.0 PDFBox
>
> Attachments: pdshadingpaint_base_interface.patch, 
> shadingpaint_generic_baseclass.patch
>
>
> The attached patch adds an common interface PDShadingPaint to all 
> PDShading-based Paint's. It allows to access the underlying PDShading object 
> and the matrix of the paint.
> At the moment it is not possible to access this fields without using dirty 
> accessibility hacks, see also [this 
> commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].
> Why would you need that? I need that for my 
> [PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I 
> draw PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I 
> derive from both classes and change / filter certain aspects of the PDF. One 
> use case is to extract a specific seperation color into its own PDF page, 
> remap it to another color and also draw some overfill (i.e. a additional 
> border of 0.5pt around all shapes drawn with this color). This so prepared 
> page is then used with a machine which glues foil (gold, silver or copper) on 
> the places marked with that color.
> You can look at an [example 
> here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
>  it does not use a seperation color, but you should get the idea. 
> My long term goal is to be able to use PDFBox for all possible pre-press PDF 
> manipulations. E.g. changing/remapping colorspaces, resampling images to the 
> target resolution, ...
> At the moment I know of this "special cases" which will need a special 
> treatment, as they are normally handled through rendering them first into a 
> BufferedImage:
>  - Transparency groups
>  - Softmasks
> Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3184) Throwing in PDType1Font.encode for chars above 255 is wrong.

2018-12-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/PDFBOX-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726653#comment-16726653
 ] 

Andreas Lehmkühler commented on PDFBOX-3184:


[~gau...@ainosoft.com] There is no new conclusion. My last comment is nearly 3 
years old and it was fixed at the same time, see the SVN comment just above my 
last one. Saying that IMHO there isn't any reason for reopening this ticket or 
better creating a new new one.

> Throwing in PDType1Font.encode for chars above 255 is wrong.
> 
>
> Key: PDFBOX-3184
> URL: https://issues.apache.org/jira/browse/PDFBOX-3184
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Maaartinus
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Chars like `'\u2019'` can be handled by the code following the test, so 
> throwing in `PDType1Font.encode` whenever `unicode > 0xff` is wrong. See
> [http://stackoverflow.com/a/34598915/581205]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3184) Throwing in PDType1Font.encode for chars above 255 is wrong.

2018-12-21 Thread Debasish (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726645#comment-16726645
 ] 

Debasish commented on PDFBOX-3184:
--

[~lehmi] shoul this issue be reopened in light of your new conclusion ? 

> Throwing in PDType1Font.encode for chars above 255 is wrong.
> 
>
> Key: PDFBOX-3184
> URL: https://issues.apache.org/jira/browse/PDFBOX-3184
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Maaartinus
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Chars like `'\u2019'` can be handled by the code following the test, so 
> throwing in `PDType1Font.encode` whenever `unicode > 0xff` is wrong. See
> [http://stackoverflow.com/a/34598915/581205]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-12-21 Thread simon steiner (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726580#comment-16726580
 ] 

simon steiner commented on PDFBOX-4363:
---

Can this be done for TilingPaint also

> [Patch] Add a common interface PDShadingPaint for all shading paints
> 
>
> Key: PDFBOX-4363
> URL: https://issues.apache.org/jira/browse/PDFBOX-4363
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.13, 3.0.0 PDFBox
>
> Attachments: pdshadingpaint_base_interface.patch, 
> shadingpaint_generic_baseclass.patch
>
>
> The attached patch adds an common interface PDShadingPaint to all 
> PDShading-based Paint's. It allows to access the underlying PDShading object 
> and the matrix of the paint.
> At the moment it is not possible to access this fields without using dirty 
> accessibility hacks, see also [this 
> commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].
> Why would you need that? I need that for my 
> [PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I 
> draw PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I 
> derive from both classes and change / filter certain aspects of the PDF. One 
> use case is to extract a specific seperation color into its own PDF page, 
> remap it to another color and also draw some overfill (i.e. a additional 
> border of 0.5pt around all shapes drawn with this color). This so prepared 
> page is then used with a machine which glues foil (gold, silver or copper) on 
> the places marked with that color.
> You can look at an [example 
> here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
>  it does not use a seperation color, but you should get the idea. 
> My long term goal is to be able to use PDFBox for all possible pre-press PDF 
> manipulations. E.g. changing/remapping colorspaces, resampling images to the 
> target resolution, ...
> At the moment I know of this "special cases" which will need a special 
> treatment, as they are normally handled through rendering them first into a 
> BufferedImage:
>  - Transparency groups
>  - Softmasks
> Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-4407:
---

Assignee: Tilman Hausherr

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> 

[jira] [Updated] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4407:

Fix Version/s: 3.0.0 PDFBox
   2.0.14

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> numbersArray.indexOfObject(objectReference.getCOSObject()) > 0) {

[jira] [Updated] (PDFBOX-4009) Structure tree lost when merging from the command line

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4009:

Component/s: (was: PDModel)
 Utilities

> Structure tree lost when merging from the command line
> --
>
> Key: PDFBOX-4009
> URL: https://issues.apache.org/jira/browse/PDFBOX-4009
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree, merge
>
> When merging with PDFMerger from the command line (e.g. the files from 
> PDFBOX-4007) the structure tree is lost. The cause is that {{mergeDocuments}} 
> merges into an empty {{PDDocument}} object, and {{appendDocument()}} does not 
> copy the structure tree if it doesn't exist in the destination. It does 
> something only if source and destination have one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4408) Object StructParent property does not match entry in parent tree

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4408:

Component/s: Utilities

> Object StructParent property does not match entry in parent tree
> 
>
> Key: PDFBOX-4408
> URL: https://issues.apache.org/jira/browse/PDFBOX-4408
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: form-merged-good.pdf, form-merged.pdf, form.pdf
>
>
> After merging 2 documents together, the parent tree entries for the second 
> page do not match their expected key.  For the attached document you can look 
> at the "Date you are available" field after the merge.  In the parent tree 
> the second page field is located with key of 15.  However the object expects 
> its key to be 21.  The /StructParent value is 21 but it should be 15.  This 
> mismatch causes issues with a11y readers and the reading order in Adobe DC.
>  
> The fields brought from the destination document are correct, but fields from 
> the source document do not match.  They continue with the pattern, so the 
> next field is located at key 16 but the /StructParent value is 22, 17 for 23, 
> etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4407:

Component/s: Utilities

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> numbersArray.indexOfObject(objectReference.getCOSObject()) > 0) {
> result = true;
>  

[jira] [Updated] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4407:

Affects Version/s: (was: 3.0.0 PDFBox)
   2.0.13

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> 

[jira] [Commented] (PDFBOX-4007) Merged documents don't retain tags

2018-12-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726539#comment-16726539
 ] 

Tilman Hausherr commented on PDFBOX-4007:
-

Hello [~DavesPlanet], could you test with the current snapshot whether the 
problem still happens?

[https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.14-SNAPSHOT/]

Several problems have been fixed recently (PDFBOX-4407 and PDFBOX-4408).

> Merged documents don't retain tags
> --
>
> Key: PDFBOX-4007
> URL: https://issues.apache.org/jira/browse/PDFBOX-4007
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Dave Hill
>Priority: Minor
>  Labels: StructureTree, merge
> Attachments: FourFontsTagged.pdf, HelloWorldTagged.pdf, 
> PDFMergeUtility-2.patch, PDFMergeUtility.patch, 
> Tagged+GeneralForbearance-Merged.pdf, 
> Tagged-GeneralForbearance-merged-21.12.2018.pdf, Tagged.pdf
>
>
> Certain combinations of documents don't retain tags when merged. The document 
> [^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If 
> you try to merge this with the government [General Forbearance 
> form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library=general=en-us]
>  the output crashes DC when you try to view the tags. If you use a flattened 
> version of the General Forbearance form then the tags are just munged.
> {code}
> public static void main(String[] args) throws Exception {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File("Tagged.pdf"));
> PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
> pdfMergerUtility.appendDocument(dest, src);
> src.close();
> dest.save(new File("BrokenTags.pdf"));
> dest.close();
> }
> {code}
> The included patch appears to make tagging more reliable, but I'm still 
> relying heavily on cloning which can apparently cause other issues.  The 
> documents I get out with this code seem present correctly in Adobe readers 
> for all combinations of documents that I tested against.
> My patch is made and tested against yesterdays production head and it 
> includes my changes from 
> [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] since it is 
> in the exact same place in the code.
> The priority of this is a blocker for 508 compliance of merged documents but 
> I guessed it to be more of a minor issue in the overall scheme of things, 
> please correct me if I am mistaken.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4007) Merged documents don't retain tags

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4007:

Attachment: Tagged-GeneralForbearance-merged-21.12.2018.pdf

> Merged documents don't retain tags
> --
>
> Key: PDFBOX-4007
> URL: https://issues.apache.org/jira/browse/PDFBOX-4007
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Dave Hill
>Priority: Minor
>  Labels: StructureTree, merge
> Attachments: FourFontsTagged.pdf, HelloWorldTagged.pdf, 
> PDFMergeUtility-2.patch, PDFMergeUtility.patch, 
> Tagged+GeneralForbearance-Merged.pdf, 
> Tagged-GeneralForbearance-merged-21.12.2018.pdf, Tagged.pdf
>
>
> Certain combinations of documents don't retain tags when merged. The document 
> [^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If 
> you try to merge this with the government [General Forbearance 
> form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library=general=en-us]
>  the output crashes DC when you try to view the tags. If you use a flattened 
> version of the General Forbearance form then the tags are just munged.
> {code}
> public static void main(String[] args) throws Exception {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File("Tagged.pdf"));
> PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
> pdfMergerUtility.appendDocument(dest, src);
> src.close();
> dest.save(new File("BrokenTags.pdf"));
> dest.close();
> }
> {code}
> The included patch appears to make tagging more reliable, but I'm still 
> relying heavily on cloning which can apparently cause other issues.  The 
> documents I get out with this code seem present correctly in Adobe readers 
> for all combinations of documents that I tested against.
> My patch is made and tested against yesterdays production head and it 
> includes my changes from 
> [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] since it is 
> in the exact same place in the code.
> The priority of this is a blocker for 508 compliance of merged documents but 
> I guessed it to be more of a minor issue in the overall scheme of things, 
> please correct me if I am mistaken.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726516#comment-16726516
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

[~tschusssl] it passes your test, however it also passes your test when undoing 
the fix. Could you please test whether the file I just attached works with your 
screen reader?

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == 

[jira] [Updated] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4407:

Attachment: reading-order-merged-good.pdf

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> numbersArray.indexOfObject(objectReference.getCOSObject()) > 0) {
> result = true;
> }
> }
> return result;
> }
> {code}