Le 08/02/2022 à 19:17, Tilman Hausherr a écrit :
Am 08.02.2022 um 14:35 schrieb Marton Róbert:
Dear PDFBox developers,

We are working on an application that is parsing and verifying the quality of PDF files. We wanted to extract the text of the pages using multiple threads to speed up the process but eventually we found that PDFBox 2 is not thread safe. Do you plan to make it thread safe in PDFBox 3 at least for text extraction?

I don't know (But maybe yes? Due to PDFBOX-5214).


Hello,

Do you have any details regarding this thread safety issue ?
(sorry if this is not be the right for this discussion,
I ask this because we have a strange Heisenbug in production)

Andreas Lehmkühler wrote in https://issues.apache.org/jira/browse/PDFBOX-5214#comment-17398118 : «After some debugging I've found the reason. The static instances of the 14 standard fonts within PDType1Font aren't threadsafe. Those fonts are backed by a COSDictionary which isn't supposed to be stable. »

I do not understand which field is mutable in the PDType1Font object ?
(yes, some fields are lazily calculated, but they use thread safe Map etc.
so we should be safe ?)


  MM.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to