Yes I was removing the orphans due to the issue you mention. I will try without removing the annotations.
PS: Looks like it's no more needed, but here you have running a code to reproduce it: https://github.com/jfisbein/pdfbox-issue On Tue, 24 Dec 2024 at 13:37, Tilman Hausherr <thaush...@t-online.de> wrote: > On 24.12.2024 10:05, Tilman Hausherr wrote: > > On 23.12.2024 20:42, Tilman Hausherr wrote: > >> On 23.12.2024 20:00, Tilman Hausherr wrote: > >>> Hi, > >>> > >>> In the meantime I was able to reproduce it with page 49 and 50 and I > >>> have a theory what happened. The newer versions of PDFBox do a lot > >>> of cleanup in the structure tree when cleaning up annotation > >>> destinations that don't exist in the destination. Because you > >>> deleted the annotations manually, these "orphan pages" are possibly > >>> still in the structure tree part that is kept. > >> > >> > >> 1) Sorry, I see you mentioned 20 in the initial post. I looked too > >> much into the code > >> > >> 2) I inspected the result file, it does indeed have orphan pages. > >> I'll look at the code (that I wrote in January) to find out whether I > >> had the intention of removing orphan pages or not when cloning the > >> structure tree. In the worst case (for you) I never had the intention > >> and can't, so you created the problem by deleting the annotations > >> without cleaning up the structure tree. In the best case (for you), > >> there's either a bug in the code, or there isn't but I can add some > >> orphan cleaning. > > > > Updates: > > > > 3) fixed a typo in your last name, sorry > > > > 4) created and fixed https://issues.apache.org/jira/browse/PDFBOX-5928 > > , so that the orphan test works better now, now it does detect the > > orphan. Still need to find out what I mentioned in (2). > > > > Tilman > > > > PS in case this wasn't clear, you don't have to write any further test > > code. > > > > The orphans will now be removed: > > https://issues.apache.org/jira/browse/PDFBOX-5929 > > snapshot build: > > > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.4-SNAPSHOT/ > > but ideally, you shouldn't remove annotations without cleaning up the > structure tree too (which is quite difficult), or removing it > completely. Maybe you originally did this to avoid orphans due to > destinations that don't exist in the target files, but PDFBox now fixes > these itself, this is since 3.0.2 and bugs removed in 3.0.3. > > Tilman > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- Joan Fisbein | Engineering Manager joan.fisb...@clarity.ai www.clarity.ai <https://clarity.ai/> <https://clarity.ai/in-the-news/>