On 24.12.2024 10:05, Tilman Hausherr wrote:
On 23.12.2024 20:42, Tilman Hausherr wrote:
On 23.12.2024 20:00, Tilman Hausherr wrote:
Hi,
In the meantime I was able to reproduce it with page 49 and 50 and I
have a theory what happened. The newer versions of PDFBox do a lot
of cleanup in the structure tree when cleaning up annotation
destinations that don't exist in the destination. Because you
deleted the annotations manually, these "orphan pages" are possibly
still in the structure tree part that is kept.
1) Sorry, I see you mentioned 20 in the initial post. I looked too
much into the code
2) I inspected the result file, it does indeed have orphan pages.
I'll look at the code (that I wrote in January) to find out whether I
had the intention of removing orphan pages or not when cloning the
structure tree. In the worst case (for you) I never had the intention
and can't, so you created the problem by deleting the annotations
without cleaning up the structure tree. In the best case (for you),
there's either a bug in the code, or there isn't but I can add some
orphan cleaning.
Updates:
3) fixed a typo in your last name, sorry
4) created and fixed https://issues.apache.org/jira/browse/PDFBOX-5928
, so that the orphan test works better now, now it does detect the
orphan. Still need to find out what I mentioned in (2).
Tilman
PS in case this wasn't clear, you don't have to write any further test
code.
The orphans will now be removed:
https://issues.apache.org/jira/browse/PDFBOX-5929
snapshot build:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.4-SNAPSHOT/
but ideally, you shouldn't remove annotations without cleaning up the
structure tree too (which is quite difficult), or removing it
completely. Maybe you originally did this to avoid orphans due to
destinations that don't exist in the target files, but PDFBox now fixes
these itself, this is since 3.0.2 and bugs removed in 3.0.3.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org