pdffonts shows that the fonts are embedded but subsetted.  Subsetting preserves 
the glyphs but sometimes loses the mapping back to a unicode code point, which 
can make the text unextractable. See for example 
https://forums.adobe.com/thread/1990373
Regards, William




________________________________
From: poppler <[email protected]> on behalf of Zhong, 
Steven <[email protected]>
Sent: Tuesday, May 28, 2019 11:34 PM
To: '[email protected]'
Subject: [poppler] How to recognize the Japan Font.


Hi All,



I want to convert the PDF that you can refer the link 
https://www.fidelity.jp/static/pdf/fund/5111893-FD30BA/Reports/Monthly/FD30BA-MF-201904.pdf



But cant read it correctly ,  I find the Font is

MS-PGothic-90ms-RKSJ-H

Encoding is Identify-H



Convert to txt is like below.        I guess it is font missing.    How to 
install the font and to read it currently.     Many Thanks

ᅜෆ⥲⏕⏘䠄㻳㻰㻼䠅ᡂ㛗⋡䛜๓ᅄ༙ᮇ䛸ྠỈ‽䛻䛺䜚䚸୰ᅜᬒẼ䛻ᗏධ䜜ឤ䛜ฟጞ䜑䛯䛣䛸䜒㈙䛔Ᏻᚰឤ䛻䛴䛺





vcap@e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -v

pdfinfo version 0.62.0

Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org

Copyright 1996-2011 Glyph & Cog, LLC



My popper is 0.6.2

vcap@e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -listenc

Available encodings are:

ASCII7

Big5

Big5ascii

EUC-CN

EUC-JP

GBK

ISO-2022-CN

ISO-2022-JP

ISO-2022-KR

ISO-8859-6

ISO-8859-7

ISO-8859-8

ISO-8859-9

KOI8-R

Latin1

Latin2

Shift-JIS

Symbol

TIS-620

UTF-16

UTF-8

Windows-1255

ZapfDingbats


_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to