Re: Text extraction from a certain PDF does not seem to terminate

Andreas Lehmkühler Sat, 06 Apr 2024 07:09:56 -0700

Hi,

Am 03.04.24 um 15:53 schrieb Brangs, Erik:

Hi,


when attempting text extraction from the PDF at https://d-nb.info/1324982411/34 
, either using PDFBox 3.0.0 or PDFBox 4.0.0-SNAPSHOT, the extraction uses about 
1,8 GB heap memory and does not seem to terminate. I cancelled the extraction 
attempt after roughly 20 minutes. Is this another bad PDF or is there a bug in 
PDFBox?

Thanks for the report. As Tilman already pointed out, the describedbehavior is a performance regression and was fixed recently, see [1] forany details.


Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5799


--
Erik Brangs
Deutsche Nationalbibliothek
Informationstechnik
Adickesallee 1
60322 Frankfurt am Main
Telefon: +49 69 1525-1792
Telefax: +49 69 1525-1799
mailto:e.bra...@dnb.de
https://www.dnb.de<https://www.dnb.de/>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Text extraction from a certain PDF does not seem to terminate

Reply via email to