Hi,

Am 03.04.24 um 15:53 schrieb Brangs, Erik:
Hi,

when attempting text extraction from the PDF at https://d-nb.info/1324982411/34 
, either using PDFBox 3.0.0 or PDFBox 4.0.0-SNAPSHOT, the extraction uses about 
1,8 GB heap memory and does not seem to terminate. I cancelled the extraction 
attempt after roughly 20 minutes. Is this another bad PDF or is there a bug in 
PDFBox?

Thanks for the report. As Tilman already pointed out, the described behavior is a performance regression and was fixed recently, see [1] for any details.

Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5799



--
Erik Brangs
Deutsche Nationalbibliothek
Informationstechnik
Adickesallee 1
60322 Frankfurt am Main
Telefon: +49 69 1525-1792
Telefax: +49 69 1525-1799
mailto:e.bra...@dnb.de
https://www.dnb.de<https://www.dnb.de/>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to