"No Unicode mapping for" when extracting text from a PDF

Luca Loiodice Thu, 04 Jan 2018 11:21:17 -0800

I am trying to migrate a project from a commercial Windows PDF library to
PDFBox, but I see reduced accuracy when I extract text from arbitrary files.


For example, I have a PDF (enclosed) that does not have Unicode mappings
for certain glyph ... and so when I try and extract the text using PDF Box
I get the following:

WARNING: No Unicode mapping for G70 (112) in font HAGLDF+MSTT31c5ed
Jan 04, 2018 10:24:02 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont
toUnicode

The Windows library returns the correct text for the gliph with missing
character mapping.
Is there a way for me to add some code to make PDFBox or my program figure
out what the text is in this case ?

Thanks for any help,
Luca

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

"No Unicode mapping for" when extracting text from a PDF

Reply via email to