On 23.12.2024 20:42, Tilman Hausherr wrote:
On 23.12.2024 20:00, Tilman Hausherr wrote:
Hi,
In the meantime I was able to reproduce it with page 49 and 50 and I
have a theory what happened. The newer versions of PDFBox do a lot of
cleanup in the structure tree when cleaning up annotation
destinations that don't exist in the destination. Because you deleted
the annotations manually, these "orphan pages" are possibly still in
the structure tree part that is kept.
1) Sorry, I see you mentioned 20 in the initial post. I looked too
much into the code
2) I inspected the result file, it does indeed have orphan pages. I'll
look at the code (that I wrote in January) to find out whether I had
the intention of removing orphan pages or not when cloning the
structure tree. In the worst case (for you) I never had the intention
and can't, so you created the problem by deleting the annotations
without cleaning up the structure tree. In the best case (for you),
there's either a bug in the code, or there isn't but I can add some
orphan cleaning.
Updates:
3) fixed a typo in your last name, sorry
4) created and fixed https://issues.apache.org/jira/browse/PDFBOX-5928 ,
so that the orphan test works better now, now it does detect the orphan.
Still need to find out what I mentioned in (2).
Tilman
PS in case this wasn't clear, you don't have to write any further test code.