Re: Adding per-character OCR to Poppler

Oliver Sander Fri, 30 Jan 2026 00:30:09 -0800

Hi Jason,

thank you for this code. Can you please post it as a merge request at


  https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests

? That way it can get a proper review.

Best,
Oliver

On 1/29/26 20:29, Gans, Jason David wrote:

Hello Poppler project,
I have been working towards a solution for extracting text from PDFfiles that contain embedded Unicode values that do not match renderedglyphs. This idea was mentioned in the Poppler mailing lists back in2012 (https://lists.freedesktop.org/archives/poppler/2012-April/009035.html <https://lists.freedesktop.org/archives/poppler/2012-April/009035.html>), but I couldn’t find any information suggesting thatit was implemented and tested.
I have posted an experimental version of Poppler (“Poppler-science”;https://github.com/lanl/poppler-science <https://github.com/lanl/poppler-science>) that has been modified to include a multilayerperceptron to decode font glyph symbols that are commonly used in thescientific literature. I would appreciate any feedback from the Popplercommunity and any suggestions for improvements!
Regards,

Jason Gans

Bioscience Division
Los Alamos National Laboratory

Re: Adding per-character OCR to Poppler

Reply via email to