Re: [poppler] How poppler deal with multiple charsets?

suzuki toshiya Tue, 01 Nov 2011 02:38:19 -0700

Hi,

Please check GfxCIDFont::getNextChar in GfxFont.cc, for non 8bit string,
you may find how poppler translates a bytestream to Unicode string.
I have to note that the text in PDF is related with a font in PDF,
so encoding info is determined by the font.


Also please check poppler-data package for the mapping table resource.

Regards,
mpsuzuki

杨辉强 wrote:

Hi, all:
I am a newbie to poppler. Now I want to extract text in pdf filewhich contain Chinese GBK or other charsets.Whether the poppler can deal with this situation and how it do it?Now I am hacking the source code.So I want to know which part of the source codes are related to dealingwith multiple charsets.
Thank you very much.
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler


_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] How poppler deal with multiple charsets?

Reply via email to