Re: PDType0Font toUnicode Mapping

Tilman Hausherr Mon, 18 Jul 2016 09:43:36 -0700

Am 18.07.2016 um 11:08 schrieb OYEBISI, Daniel:

Hi,


While extracting text from a PDF (screenshot attached), I came across a No 
Unicode Mapping warning. The resulting extracted text does not contain the 
Wingding3 characters present in the PDF. I have been trying to debug this PDF 
for some time now but I can't seem to understand the issues involved.

Please can someone explain why PDFBox is unable to correctly extract these 
symbols?


The codes are missing in the ToUnicode CMap:

/CIDInit /ProcSet findresource begin 12 dict begin begincmap/CIDSystemInfo <<

/Registry (LNDPFO+TT11+0) /Ordering (T42UV) /Supplement 0 >> def
/CMapName /LNDPFO+TT11+0 def
/CMapType 2 def
1 begincodespacerange <0003> <0003> endcodespacerange
1 beginbfchar
<0003> <0020>    <=======================
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end


All you have is code 3 that maps to a space.

Tilman


Kindly find the links related to this PDF below:

PDF file on Dropbox
https://www.dropbox.com/s/57cvb36h4x2v96k/page2.pdf?dl=0

Screenshot (Text extraction)
https://www.dropbox.com/s/ftb3tuwvq3npg8o/page2%20no%20unicode%20mapping.PNG?dl=0



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: PDType0Font toUnicode Mapping

Reply via email to