Yes I was removing the orphans due to the issue you mention.
I will try without removing the annotations.

PS: Looks like it's no more needed, but here you have running a code to
reproduce it: https://github.com/jfisbein/pdfbox-issue

On Tue, 24 Dec 2024 at 13:37, Tilman Hausherr <thaush...@t-online.de> wrote:

> On 24.12.2024 10:05, Tilman Hausherr wrote:
> > On 23.12.2024 20:42, Tilman Hausherr wrote:
> >> On 23.12.2024 20:00, Tilman Hausherr wrote:
> >>> Hi,
> >>>
> >>> In the meantime I was able to reproduce it with page 49 and 50 and I
> >>> have a theory what happened. The newer versions of PDFBox do a lot
> >>> of cleanup in the structure tree when cleaning up annotation
> >>> destinations that don't exist in the destination. Because you
> >>> deleted the annotations manually, these "orphan pages" are possibly
> >>> still in the structure tree part that is kept.
> >>
> >>
> >> 1) Sorry, I see you mentioned 20 in the initial post. I looked too
> >> much into the code
> >>
> >> 2) I inspected the result file, it does indeed have orphan pages.
> >> I'll look at the code (that I wrote in January) to find out whether I
> >> had the intention of removing orphan pages or not when cloning the
> >> structure tree. In the worst case (for you) I never had the intention
> >> and can't, so you created the problem by deleting the annotations
> >> without cleaning up the structure tree. In the best case (for you),
> >> there's either a bug in the code, or there isn't but I can add some
> >> orphan cleaning.
> >
> > Updates:
> >
> > 3) fixed a typo in your last name, sorry
> >
> > 4) created and fixed https://issues.apache.org/jira/browse/PDFBOX-5928
> > , so that the orphan test works better now, now it does detect the
> > orphan. Still need to find out what I mentioned in (2).
> >
> > Tilman
> >
> > PS in case this wasn't clear, you don't have to write any further test
> > code.
> >
>
> The orphans will now be removed:
>
> https://issues.apache.org/jira/browse/PDFBOX-5929
>
> snapshot build:
>
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.4-SNAPSHOT/
>
> but ideally, you shouldn't remove annotations without cleaning up the
> structure tree too (which is quite difficult), or removing it
> completely. Maybe you originally did this to avoid orphans due to
> destinations that don't exist in the target files, but PDFBox now fixes
> these itself, this is since 3.0.2 and bugs removed in 3.0.3.
>
> Tilman
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

-- 

Joan Fisbein | Engineering Manager
joan.fisb...@clarity.ai
www.clarity.ai <https://clarity.ai/>
<https://clarity.ai/in-the-news/>

Reply via email to