There is a bunch of code to load, and where there is none, or there are holes, it will guess the unicode mapping, and where that fails it will map to the "private" unicode space. This code is in the Gfx8BitFont constructor. I have recently made a number of modifications (which I can share on request) to ensure that each character code is mapped uniquely, and is optionally mapped only to a single Unicode character (i.e. the fl ligature is not mapped to f and l, but the unicode 0xFB02.
If the unicode mapping functionality is useful elsewhere in the codebase, then we may want to factor it out of the constructor of Gfx8Bit, or perhaps it would be good enough to construct a font, and use the resulting unicode mapping. Maybe I'm missing the whole point, but just wanted to let you know about this. --josh On 11/14/11 2:37 PM, "Max Filippov" <[email protected]> wrote: >> There are changes in that code added in xpdf 3.02, I'm not sure they >> fix your issue though, but you might want to take a look, see: >> >> >>http://cgit.freedesktop.org/~carlosgc/poppler-xpdf3merge/tree/ALL_DIFF#n6 >>462 > >Thanks for the answer. >I've taken a look at the patch and 17 lines below the place you've >spotted there's a comment: > > //~ this currently drops all non-Latin1 characters > >which is 100% accurate, non-latin-1 symbols are replaced with question >marks. >So, I'm still searching for guidance before I possibly reinvent a wheel (: > >Thanks. >-- Max >_______________________________________________ >poppler mailing list >[email protected] >http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
