Am 04.01.2018 um 20:20 schrieb Luca Loiodice:
I am trying to migrate a project from a commercial Windows PDF library
to PDFBox, but I see reduced accuracy when I extract text from
arbitrary files.
For example, I have a PDF (enclosed) that does not have Unicode
mappings for certain glyph ... and so when I try and extract the text
using PDF Box I get the following:
Attachments are swallowed, you'd need to upload to a sharehoster.
WARNING: No Unicode mapping for G70 (112) in font HAGLDF+MSTT31c5ed
Jan 04, 2018 10:24:02 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont
toUnicode
The Windows library returns the correct text for the gliph with
missing character mapping.
Is there a way for me to add some code to make PDFBox or my program
figure out what the text is in this case ?
Yes, but you'd need to build from source because G70 is non standard,
the change is described in
https://issues.apache.org/jira/browse/PDFBOX-3962
at the bottom.
Tilman
Thanks for any help,
Luca
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]