AW: Text extraction from a certain PDF does not seem to terminate

Brangs, Erik Fri, 19 Apr 2024 03:03:26 -0700

Hi,

thank you. The fix works for us.


> -----Ursprüngliche Nachricht-----
> Von: Andreas Lehmkühler [mailto:[email protected]]
> Gesendet: Samstag, 6. April 2024 16:09
> An: [email protected]
> Betreff: Re: Text extraction from a certain PDF does not seem to terminate
> 
> Hi,
> 
> Am 03.04.24 um 15:53 schrieb Brangs, Erik:
> > Hi,
> >
> > when attempting text extraction from the PDF at 
> > https://d-nb.info/1324982411/34 ,
> either using PDFBox 3.0.0 or PDFBox 4.0.0-SNAPSHOT, the extraction uses about 
> 1,8
> GB heap memory and does not seem to terminate. I cancelled the extraction 
> attempt
> after roughly 20 minutes. Is this another bad PDF or is there a bug in PDFBox?
> 
> Thanks for the report. As Tilman already pointed out, the described
> behavior is a performance regression and was fixed recently, see [1] for
> any details.
> 
> Andreas
> 
> [1] https://issues.apache.org/jira/browse/PDFBOX-5799
> 
> 
> >
> > --
> > Erik Brangs
> > Deutsche Nationalbibliothek
> > Informationstechnik
> > Adickesallee 1
> > 60322 Frankfurt am Main
> > Telefon: +49 69 1525-1792
> > Telefax: +49 69 1525-1799
> > mailto:[email protected]
> > https://www.dnb.de<https://www.dnb.de/>
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

AW: Text extraction from a certain PDF does not seem to terminate

Reply via email to