Hi, thank you. The fix works for us.
> -----Ursprüngliche Nachricht----- > Von: Andreas Lehmkühler [mailto:andr...@lehmi.de.INVALID] > Gesendet: Samstag, 6. April 2024 16:09 > An: users@pdfbox.apache.org > Betreff: Re: Text extraction from a certain PDF does not seem to terminate > > Hi, > > Am 03.04.24 um 15:53 schrieb Brangs, Erik: > > Hi, > > > > when attempting text extraction from the PDF at > > https://d-nb.info/1324982411/34 , > either using PDFBox 3.0.0 or PDFBox 4.0.0-SNAPSHOT, the extraction uses about > 1,8 > GB heap memory and does not seem to terminate. I cancelled the extraction > attempt > after roughly 20 minutes. Is this another bad PDF or is there a bug in PDFBox? > > Thanks for the report. As Tilman already pointed out, the described > behavior is a performance regression and was fixed recently, see [1] for > any details. > > Andreas > > [1] https://issues.apache.org/jira/browse/PDFBOX-5799 > > > > > > -- > > Erik Brangs > > Deutsche Nationalbibliothek > > Informationstechnik > > Adickesallee 1 > > 60322 Frankfurt am Main > > Telefon: +49 69 1525-1792 > > Telefax: +49 69 1525-1799 > > mailto:e.bra...@dnb.de > > https://www.dnb.de<https://www.dnb.de/> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org