Hi Jason,
thank you for this code. Can you please post it as a merge request at
https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests
? That way it can get a proper review.
Best,
Oliver
On 1/29/26 20:29, Gans, Jason David wrote:
Hello Poppler project,
I have been working towards a solution for extracting text from PDF
files that contain embedded Unicode values that do not match rendered
glyphs. This idea was mentioned in the Poppler mailing lists back in
2012 (https://lists.freedesktop.org/archives/poppler/2012-
April/009035.html <https://lists.freedesktop.org/archives/poppler/2012-
April/009035.html>), but I couldn’t find any information suggesting that
it was implemented and tested.
I have posted an experimental version of Poppler (“Poppler-science”;
https://github.com/lanl/poppler-science <https://github.com/lanl/
poppler-science>) that has been modified to include a multilayer
perceptron to decode font glyph symbols that are commonly used in the
scientific literature. I would appreciate any feedback from the Poppler
community and any suggestions for improvements!
Regards,
Jason Gans
Bioscience Division
Los Alamos National Laboratory