Please upload the file to a shareholder Tilman
-- Original-Nachricht -- Von: Alastair Porter <alast...@porter.net.nz> Betreff: Splitter does not include structure tree in documents past the first split Datum: 14.05.2025, 18:37 Uhr An: users@pdfbox.apache.org Hi, Apologies if my terminology is wrong on some of the following topics, I've not worked with PDFs in much detail before. When using the Splitter to split pdfs, it appears that any split that doesn't start on the first page of the input document does not include Structure tree elements / accessibility tags. I note the recent work in PDFBOX-2725 ([PATCH] Split pdf lose accessibility tags) and PDFBOX-5929 (Remove orphan annotations in structure tree) which may have affected some of this related code. I can reproduce this with both the app cli: java -jar pdfbox/app/target/pdfbox-app-4.0.0-SNAPSHOT.jar split -i input.pdf -outputPrefix output-split and also with the API: Splitter splitter = new Splitter(); splitter.setSplitAtPage(20); List<PDDocument> documents = splitter.split(inputDocument); I also checked pdfbox 3.0.3 (last release before PDFBOX-5929) and the behaviour appears to be the same - that is, it doesn't appear that the patch broke some already existing functionality. I am evaluating the resulting pdfs using the PAC PDF Accessibility Checker (https://pac.pdf-accessibility.org/en) and also the pdfbox debugger. I expect to see items in Root/StructTreeRoot/K in the debugger. In the first file, I correctly see the /K element. What's more, this element has correctly been pruned and doesn't include any items from the input document which point to pages that are not in this split. In subsequent split files, I see no /K element in the StructTreeRoot at all. I attached a PDF which I've been using for simple testing, which exhibits this behaviour. I had a bit of a look through the existing code, and I see that in Splitter.java, in cloneStructureTree COSBase k1 = srcStructureTreeRoot.getK(); COSBase k2 = new KCloner(dstPageTree).createClone(k1, dstStructureTreeRoot.getCOSObject(), null); dstStructureTreeRoot.setK(k2); k2 is always null after the first split, it seems like it may not be created correctly. Is this a known bug, or perhaps an issue with the way I'm using the API or the format of the input documents? Thanks, Alastair
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org