Hi,

Please check GfxCIDFont::getNextChar in GfxFont.cc, for non 8bit string,
you may find how poppler translates a bytestream to Unicode string.
I have to note that the text in PDF is related with a font in PDF,
so encoding info is determined by the font.

Also please check poppler-data package for the mapping table resource.

Regards,
mpsuzuki

杨辉强 wrote:
Hi, all:
I am a newbie to poppler. Now I want to extract text in pdf file which contain Chinese GBK or other charsets. Whether the poppler can deal with this situation and how it do it? Now I am hacking the source code. So I want to know which part of the source codes are related to dealing with multiple charsets.



Thank you very much.
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to