Hi,
Please check GfxCIDFont::getNextChar in GfxFont.cc, for non 8bit string,
you may find how poppler translates a bytestream to Unicode string.
I have to note that the text in PDF is related with a font in PDF,
so encoding info is determined by the font.
Also please check poppler-data package for the mapping table resource.
Regards,
mpsuzuki
杨辉强 wrote:
Hi, all:
I am a newbie to poppler. Now I want to extract text in pdf file
which contain Chinese GBK or other charsets.
Whether the poppler can deal with this situation and how it do it?
Now I am hacking the source code.
So I want to know which part of the source codes are related to dealing
with multiple charsets.
Thank you very much.
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler