Hi Suzuki-san, Thanks for your help. I have recognize it using the tesseract (OCR), but sometimes the result of OCR is not good. So I think if can extract it to text, it will be better. Thanks again.
-----Original Message----- From: suzuki toshiya <[email protected]> Sent: 2019年5月29日 14:24 To: Zhong, Steven <[email protected]>; '[email protected]' <[email protected]> Cc: Leonard Rosenthol <[email protected]> Subject: Re: [poppler] How to recognize the Japan Font. Dear Zhong, Oooooh, I apologize that I gave wrong comment. I've confirmed that this PDF cannot be searched even if I give it to Adobe products (it does not mean a data protection, I guess it is caused by a poor workflow to generate PDF). At present I cannot suggest easy method to extract the text from this PDF - maybe OCR is the easiest? Regards, mpsuzuki On 2019/05/29 15:10, Zhong, Steven wrote: > Hi Suzuki-san, > > We have install the latest poppler and poppler-data. But the result is the > same. > > By the way , we can't copy the content correctly from the PDF on win 10 > through Ctrl + C , Ctrl +V. Thanks > > root@08a02db0d267:/home/vcap/app/pop/bin# ./pdfinfo -v pdfinfo version > 0.77.0 Copyright 2005-2019 The Poppler Developers - > https://urldefense.proofpoint.com/v2/url?u=https-3A__jpn01.safelinks.p > rotection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fpoppler.freedesktop. > org-26amp-3Bdata-3D02-257C01-257Cmpsuzuki-2540hiroshima-2Du.ac.jp-257C > 09d6dc10535b413ef22308d6e3fc6de0-257Cc40454ddb2634926868d8e12640d3750- > 257C1-257C1-257C636947070675056492-26amp-3Bsdata-3DdizuW5UOqRDQtbDVXf2 > gpXv5xjWGXt-252FCb6ii4ySUi-252FA-253D-26amp-3Breserved-3D0&d=DwIFaQ&c= > SsZxQMfaWJ1sSVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv16eg2LZ2DjciLqO6 > MNuEh4qjVsbZJ_K528M&m=dO98ldsGLrKLwbkFdIe_Ohvg3Tox91cbIvhvEc9bkvk&s=Xy > n_-enhlRg4uRPJdoxoWPN33MO8ugMmRFgZXEdwhI4&e= > Copyright 1996-2011 Glyph & Cog, LLC > root@08a02db0d267:/home/vcap/app/pop/bin# > > > > head: cannot open '10' for reading: No such file or directory ==> sss > <== 㻝㻛㻥 > > 䚷 > > タᐃ᪥䠖㻞㻜㻝㻡ᖺ㻝㻞᭶㻣᪥ > ಙクᮇ㛫䠖㻞㻜㻝㻡ᖺ㻝㻞᭶㻣᪥䛛䜙㻞㻜㻟㻝ᖺ㻥᭶㻞㻡᪥䜎䛷 > Ỵ⟬᪥䠖ཎ๎䛸䛧䛶ẖᖺ㻥᭶㻞㻡᪥䠄ఇᴗ᪥䛾ሙྜ䛿⩣Ⴀᴗ᪥䠅 > 䈜ᙜヱᐇ⦼䛿㐣ཤ䛾䜒䛾䛷䛒䜚䚸ᑗ᮶䛾㐠⏝ᡂᯝ➼䜢ಖド䛩䜛䜒䛾䛷䛿䛒䜚䜎䛫䜣䚹 > > 䕔ᇶ‽౯㢠䞉⣧㈨⏘⥲㢠䛾᥎⛣ > root@08a02db0d267:/home/vcap/app/pop/bin# > > > -----Original Message----- > From: suzuki toshiya <[email protected]> > Sent: 2019年5月29日 12:25 > To: '[email protected]' <[email protected]> > Cc: Leonard Rosenthol <[email protected]>; Zhong, Steven > <[email protected]> > Subject: Re: [poppler] How to recognize the Japan Font. > > Hi Zhong, > > As Leonard pointed, the fonts are embedded in the document. My comments are 3 > points. > > * maybe you should install poppler-data package including the mapping tables > from Adobe CID (please google or baidu to understand what it is) to character > encoding. > * but your poppler 0.62.0 might be too old to find matching poppler-data > package. > * I suggest to upgrade poppler and install poppler-data. > > Regards, > mpsuzuki > > On 2019/05/29 13:01, Leonard Rosenthol wrote: >> The font is embedded in the PDF – but that is only for the purposes of >> rendering. >> [cid:[email protected]] >> >> Leonard >> >> From: poppler <[email protected]> on behalf of >> "Zhong, Steven" <[email protected]> >> Date: Wednesday, May 29, 2019 at 11:58 AM >> To: "[email protected]" <[email protected]> >> Subject: [poppler] How to recognize the Japan Font. >> >> Hi All, >> >> I want to convert the PDF that you can refer the link >> https://urldefense.proofpoint.com/v2/url?u=https-3A__jpn01.safelinks. >> protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofp >> oint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Fwww.fidelity.jp-5Fs >> -26amp-3Bdata-3D02-257C01-257Cmpsuzuki-2540hiroshima-2Du.ac.jp-257C09 >> d6dc10535b413ef22308d6e3fc6de0-257Cc40454ddb2634926868d8e12640d3750-2 >> 57C1-257C1-257C636947070675056492-26amp-3Bsdata-3DiwYEPUoN7tPgEUgMHus >> quSyleS21dcyUTfhn9T2IS74-253D-26amp-3Breserved-3D0&d=DwIFaQ&c=SsZxQMf >> aWJ1sSVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv16eg2LZ2DjciLqO6MNuEh4 >> qjVsbZJ_K528M&m=dO98ldsGLrKLwbkFdIe_Ohvg3Tox91cbIvhvEc9bkvk&s=RiL7Rbx >> 9D0ATRSThNSLycraaa90647fDNMzMMsStiRg&e= >> tatic_pdf_fund_5111893-2DFD30BA_Reports_Monthly_FD30BA-2DMF-2D201904. >> p >> df&d=DwIFaQ&c=SsZxQMfaWJ1sSVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv1 >> 6 >> eg2LZ2DjciLqO6MNuEh4qjVsbZJ_K528M&m=_RhRce5ysnSgbiIYDiT8YGyVac5MdwtW2 >> Q >> AH434ax9Q&s=AAqXcYzH07HTqKJ-c6oM8j4kWBfgxzKIVxD65Hu328Y&e=<https://ur >> ldefense.proofpoint.com/v2/url?u=https-3A__url&d=DwIFaQ&c=SsZxQMfaWJ1 >> sSVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv16eg2LZ2DjciLqO6MNuEh4qjVs >> bZJ_K528M&m=dO98ldsGLrKLwbkFdIe_Ohvg3Tox91cbIvhvEc9bkvk&s=OuVcWm0oGLr >> ipj1jZpr-ciak66m4e62GoQBZyKNcgdg&e= >> defense.proofpoint.com/v2/url?u=https-3A__jpn01.safelinks.protection. >> o >> utlook.com_-3Furl-3Dhttps-253A-252F-252Fwww.fidelity.jp-252Fstatic-25 >> 2 >> Fpdf-252Ffund-252F5111893-2DFD30BA-252FReports-252FMonthly-252FFD30BA >> - >> 2DMF-2D201904.pdf-26data-3D02-257C01-257Cmpsuzuki-2540hiroshima-2Du.a >> c >> .jp-257C16e02f2420ec400edcd408d6e3ea576e-257Cc40454ddb2634926868d8e12 >> 6 >> 40d3750-257C1-257C0-257C636946992969364339-26sdata-3DCa0Lhw6vFQtBt7u5 >> O >> mscsZlbFzTfkQC0rQAQASsgCNo-253D-26reserved-3D0&d=DwIFaQ&c=SsZxQMfaWJ1 >> s >> SVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv16eg2LZ2DjciLqO6MNuEh4qjVsb >> Z >> J_K528M&m=_RhRce5ysnSgbiIYDiT8YGyVac5MdwtW2QAH434ax9Q&s=YVKkBYgdhgptu >> A >> LB6Prm09Um2ul5LdCAlSEPijOtTNo&e=> >> >> But cant read it correctly , I find the Font is >> MS-PGothic-90ms-RKSJ-H Encoding is Identify-H >> >> Convert to txt is like below. I guess it is font missing. How to >> install the font and to read it currently. Many Thanks >> ᅜෆ⥲⏕⏘䠄㻳㻰㻼䠅ᡂ㛗⋡䛜๓ᅄ༙ᮇ䛸ྠỈ‽䛻䛺䜚䚸୰ᅜᬒẼ䛻ᗏධ䜜ឤ䛜ฟጞ䜑䛯䛣䛸䜒㈙䛔Ᏻᚰឤ䛻䛴䛺 >> >> >> vcap@e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -v pdfinfo >> version 0.62.0 Copyright 2005-2017 The Poppler Developers - >> https://urldefense.proofpoint.com/v2/url?u=https-3A__jpn01.safelinks. >> protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofp >> oint.com-252Fv2-252Furl-253Fu-253Dhttp-2D3A-5F-5Fpoppler.freedeskto-2 >> 6amp-3Bdata-3D02-257C01-257Cmpsuzuki-2540hiroshima-2Du.ac.jp-257C09d6 >> dc10535b413ef22308d6e3fc6de0-257Cc40454ddb2634926868d8e12640d3750-257 >> C1-257C1-257C636947070675066483-26amp-3Bsdata-3DN-252BqNP9qA9qw6Rs-25 >> 2BUnlKxaSo9HGspgKcO2Wrv2ALjdfw-253D-26amp-3Breserved-3D0&d=DwIFaQ&c=S >> sZxQMfaWJ1sSVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv16eg2LZ2DjciLqO6 >> MNuEh4qjVsbZJ_K528M&m=dO98ldsGLrKLwbkFdIe_Ohvg3Tox91cbIvhvEc9bkvk&s=F >> y6eKP-_bWf-ozvhv8lc5rjtBEG1_MU0Uy_JsKylOSU&e= >> p.org&d=DwIFaQ&c=SsZxQMfaWJ1sSVfloc5FVGba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3 >> x >> v16eg2LZ2DjciLqO6MNuEh4qjVsbZJ_K528M&m=_RhRce5ysnSgbiIYDiT8YGyVac5Mdw >> t >> W2QAH434ax9Q&s=rMRVesSKrqPMQNmKpZ9oOO2FhiZY5fDFo4xJVQl34gs&e=<https:/ >> / >> urldefense.proofpoint.com/v2/url?u=https-3A__jpn01.safelinks.protecti >> o >> n.outlook.com_-3Furl-3Dhttp-253A-252F-252Fpoppler.freedesktop.org-26d >> a >> ta-3D02-257C01-257Cmpsuzuki-2540hiroshima-2Du.ac.jp-257C16e02f2420ec4 >> 0 >> 0edcd408d6e3ea576e-257Cc40454ddb2634926868d8e12640d3750-257C1-257C0-2 >> 5 >> 7C636946992969374325-26sdata-3Dxq-252FKaib2f9WujNOEGxTm-252FtQoWlyAd0 >> d >> -252BIvFAxWMM8yw-253D-26reserved-3D0&d=DwIFaQ&c=SsZxQMfaWJ1sSVfloc5FV >> G >> ba8BA_qR4Jzdt8ol2oSPA&r=tyXS-3xv16eg2LZ2DjciLqO6MNuEh4qjVsbZJ_K528M&m >> = >> _RhRce5ysnSgbiIYDiT8YGyVac5MdwtW2QAH434ax9Q&s=qPUF-7sEtuuD4I6Z9atZnYM >> 4 >> WK-1QvxVAVJOxFP3Oro&e=> >> Copyright 1996-2011 Glyph & Cog, LLC >> >> My popper is 0.6.2 >> vcap@e0779423-b47e-499c-4c1b-4ecd:~/app/pop/bin$ ./pdfinfo -listenc >> Available encodings are: >> ASCII7 >> Big5 >> Big5ascii >> EUC-CN >> EUC-JP >> GBK >> ISO-2022-CN >> ISO-2022-JP >> ISO-2022-KR >> ISO-8859-6 >> ISO-8859-7 >> ISO-8859-8 >> ISO-8859-9 >> KOI8-R >> Latin1 >> Latin2 >> Shift-JIS >> Symbol >> TIS-620 >> UTF-16 >> UTF-8 >> Windows-1255 >> ZapfDingbats >> > _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
