On 14.05.2025 18:35, Alastair Porter wrote:

In the first file, I correctly see the /K element. What's more, this element has correctly been pruned and doesn't include any items from the input document which point to pages that are not in this split. In subsequent split files, I see no /K element in the StructTreeRoot at all.

I attached a PDF which I've been using for simple testing, which exhibits this behaviour.

I had a bit of a look through the existing code, and I see that in Splitter.java, in cloneStructureTree

COSBase k1 = srcStructureTreeRoot.getK();
COSBase k2 = new KCloner(dstPageTree).createClone(k1, dstStructureTreeRoot.getCOSObject(), null);
dstStructureTreeRoot.setK(k2);

k2 is always null after the first split, it seems like it may not be created correctly.

I wrote that code in January 2024 (PDFBOX-2725) however I would have to look into my own code to see how I solved it. I think that when cloning it ignores any elements with pages that don't belong to the one in the target.

One anomaly I noticed immediately is this:


The page in this top /K element is page 1. Thus it will ignore that element when the destination doesn't contain page 1. Maybe this algorithm will have to be refined, e.g. only remove the page but not the element itself if it is not at a "leaf". The description of /Pg is "a page object representing a page on which some or all of the content items designated by the K entry shall be rendered"

Tilman

Reply via email to