Hi,
Am 03.04.24 um 15:53 schrieb Brangs, Erik:
Hi,
when attempting text extraction from the PDF at https://d-nb.info/1324982411/34
, either using PDFBox 3.0.0 or PDFBox 4.0.0-SNAPSHOT, the extraction uses about
1,8 GB heap memory and does not seem to terminate. I cancelled the extraction
attempt after roughly 20 minutes. Is this another bad PDF or is there a bug in
PDFBox?
Thanks for the report. As Tilman already pointed out, the described
behavior is a performance regression and was fixed recently, see [1] for
any details.
Andreas
[1] https://issues.apache.org/jira/browse/PDFBOX-5799
--
Erik Brangs
Deutsche Nationalbibliothek
Informationstechnik
Adickesallee 1
60322 Frankfurt am Main
Telefon: +49 69 1525-1792
Telefax: +49 69 1525-1799
mailto:e.bra...@dnb.de
https://www.dnb.de<https://www.dnb.de/>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org