On 14.05.2025 18:35, Alastair Porter wrote:
In the first file, I correctly see the /K element. What's more, this
element has correctly been pruned and doesn't include any items from
the input document which point to pages that are not in this split.
In subsequent split files, I see no /K element in the StructTreeRoot
at all.
I attached a PDF which I've been using for simple testing, which
exhibits this behaviour.
I had a bit of a look through the existing code, and I see that in
Splitter.java, in cloneStructureTree
COSBase k1 = srcStructureTreeRoot.getK();
COSBase k2 = new KCloner(dstPageTree).createClone(k1,
dstStructureTreeRoot.getCOSObject(), null);
dstStructureTreeRoot.setK(k2);
k2 is always null after the first split, it seems like it may not be
created correctly.
I wrote that code in January 2024 (PDFBOX-2725) however I would have to
look into my own code to see how I solved it. I think that when cloning
it ignores any elements with pages that don't belong to the one in the
target.
One anomaly I noticed immediately is this:
The page in this top /K element is page 1. Thus it will ignore that
element when the destination doesn't contain page 1. Maybe this
algorithm will have to be refined, e.g. only remove the page but not the
element itself if it is not at a "leaf". The description of /Pg is "a
page object representing a page on which some or all of the content
items designated by the K entry shall be rendered"
Tilman