Actually, subsetting doesn’t (necessarily) change how glyph->Unicode codepoint mapping is done. The same set of mechanisms are used regardless of the full or subset embed state of a font – See clause 9.10 of ISO 32000-2 (the PDF standard).
Leonard From: poppler <[email protected]> on behalf of William Bader <[email protected]> Date: Thursday, May 30, 2019 at 12:45 AM To: "Zhong, Steven" <[email protected]>, "[email protected]" <[email protected]> Subject: Re: [poppler] How to recognize the Japan Font. pdffonts shows that the fonts are embedded but subsetted. Subsetting preserves the glyphs but sometimes loses the mapping back to a unicode code point, which can make the text unextractable. See for example https://forums.adobe.com/thread/1990373<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fforums.adobe.com%2Fthread%2F1990373&data=02%7C01%7Clrosenth%40adobe.com%7C1062fc43e4634999988208d6e4550726%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636947451178983480&sdata=Bh6i16n0V93HKD7%2FLVvb0tgnLe1lVEwVdmJ7P669Evc%3D&reserved=0> Regards, William ________________________________ From: poppler <[email protected]> on behalf of Zhong, Steven <[email protected]> Sent: Tuesday, May 28, 2019 11:34 PM To: '[email protected]' Subject: [poppler] How to recognize the Japan Font. Hi All, I want to convert the PDF that you can refer the link https://www.fidelity.jp/static/pdf/fund/5111893-FD30BA/Reports/Monthly/FD30BA-MF-201904.pdf<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fidelity.jp%2Fstatic%2Fpdf%2Ffund%2F5111893-FD30BA%2FReports%2FMonthly%2FFD30BA-MF-201904.pdf&data=02%7C01%7Clrosenth%40adobe.com%7C1062fc43e4634999988208d6e4550726%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636947451178983480&sdata=ASkibQsFOcgMy9ZJBD75ViMvkpdcSJNPxaWOuGwfC%2BY%3D&reserved=0> But cant read it correctly , I find the Font is MS-PGothic-90ms-RKSJ-H Encoding is Identify-H Convert to txt is like below. I guess it is font missing. How to install the font and to read it currently. Many Thanks ᅜෆ⥲⏕⏘䠄㻳㻰㻼䠅ᡂ㛗⋡䛜๓ᅄ༙ᮇ䛸ྠỈ‽䛻䛺䜚䚸୰ᅜᬒẼ䛻ᗏධ䜜ឤ䛜ฟጞ䜑䛯䛣䛸䜒㈙䛔Ᏻᚰឤ䛻䛴䛺 vcap@e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -v pdfinfo version 0.62.0 Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpoppler.freedesktop.org&data=02%7C01%7Clrosenth%40adobe.com%7C1062fc43e4634999988208d6e4550726%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636947451178993473&sdata=5ZMhW9jTflMXB9%2FD1reuEhfrq%2FWqIgE44JgRczcIHUY%3D&reserved=0> Copyright 1996-2011 Glyph & Cog, LLC My popper is 0.6.2 vcap@e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -listenc Available encodings are: ASCII7 Big5 Big5ascii EUC-CN EUC-JP GBK ISO-2022-CN ISO-2022-JP ISO-2022-KR ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9 KOI8-R Latin1 Latin2 Shift-JIS Symbol TIS-620 UTF-16 UTF-8 Windows-1255 ZapfDingbats
_______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
