Text extraction from a certain PDF does not seem to terminate

Brangs, Erik Wed, 03 Apr 2024 07:22:48 -0700

Hi,

when attempting text extraction from the PDF at https://d-nb.info/1324982411/34 
, either using PDFBox 3.0.0 or PDFBox 4.0.0-SNAPSHOT, the extraction uses about 
1,8 GB heap memory and does not seem to terminate. I cancelled the extraction 
attempt after roughly 20 minutes. Is this another bad PDF or is there a bug in 
PDFBox?


--
Erik Brangs
Deutsche Nationalbibliothek
Informationstechnik
Adickesallee 1
60322 Frankfurt am Main
Telefon: +49 69 1525-1792
Telefax: +49 69 1525-1799
mailto:e.bra...@dnb.de
https://www.dnb.de<https://www.dnb.de/>

Text extraction from a certain PDF does not seem to terminate

Reply via email to