Re: Splitter does not include structure tree in documents past the first split

Tilman Hausherr Thu, 15 May 2025 12:20:56 -0700

On 14.05.2025 18:35, Alastair Porter wrote:

In the first file, I correctly see the /K element. What's more, thiselement has correctly been pruned and doesn't include any items fromthe input document which point to pages that are not in this split.In subsequent split files, I see no /K element in the StructTreeRootat all.
I attached a PDF which I've been using for simple testing, whichexhibits this behaviour.
I had a bit of a look through the existing code, and I see that inSplitter.java, in cloneStructureTree
COSBase k1 = srcStructureTreeRoot.getK();
COSBase k2 = new KCloner(dstPageTree).createClone(k1,dstStructureTreeRoot.getCOSObject(), null);
dstStructureTreeRoot.setK(k2);
k2 is always null after the first split, it seems like it may not becreated correctly.

I wrote that code in January 2024 (PDFBOX-2725) however I would have tolook into my own code to see how I solved it. I think that when cloningit ignores any elements with pages that don't belong to the one in thetarget.


One anomaly I noticed immediately is this:

The page in this top /K element is page 1. Thus it will ignore thatelement when the destination doesn't contain page 1. Maybe thisalgorithm will have to be refined, e.g. only remove the page but not theelement itself if it is not at a "leaf". The description of /Pg is "apage object representing a page on which some or all of the contentitems designated by the K entry shall be rendered"


Tilman

Re: Splitter does not include structure tree in documents past the first split

Reply via email to