Thanks. Any chance I can add the conversion as a post processing step and avoid having to build from source? Because I get the code back as part of the extracted text ... so I was wondering if I can load the font from the PDF and use the code -> glyph name matrix to replace the code with the character.
In that case I am not sure how I can load the data from the font ... but I see the debugger is able to do it. *Luca Loiodice |* Software Architect *T: *713 231 9100 *F: *713 583 1131 *C:* 512 577 6677 4400 Post Oak Parkway, Suite 2700, Houston, TX 77027 Follow Us: Facebook <https://t.xink.io/Tracking/Index/vwUAACcuAAAqdCYA0> | LinkedIn <https://t.xink.io/Tracking/Index/wAUAACcuAAAqdCYA0> | Twitter <https://t.xink.io/Tracking/Index/wQUAACcuAAAqdCYA0> | Youtube <https://t.xink.io/Tracking/Index/wgUAACcuAAAqdCYA0> On Thu, Jan 4, 2018 at 7:28 PM, Tilman Hausherr <[email protected]> wrote: > Am 04.01.2018 um 20:20 schrieb Luca Loiodice: > >> I am trying to migrate a project from a commercial Windows PDF library to >> PDFBox, but I see reduced accuracy when I extract text from arbitrary files. >> >> For example, I have a PDF (enclosed) that does not have Unicode mappings >> for certain glyph ... and so when I try and extract the text using PDF Box >> I get the following: >> > > Attachments are swallowed, you'd need to upload to a sharehoster. > > >> WARNING: No Unicode mapping for G70 (112) in font HAGLDF+MSTT31c5ed >> Jan 04, 2018 10:24:02 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont >> toUnicode >> >> The Windows library returns the correct text for the gliph with missing >> character mapping. >> Is there a way for me to add some code to make PDFBox or my program >> figure out what the text is in this case ? >> > > Yes, but you'd need to build from source because G70 is non standard, the > change is described in > https://issues.apache.org/jira/browse/PDFBOX-3962 > at the bottom. > > Tilman > > >> Thanks for any help, >> Luca >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > >

