On Fri, Jan 10, 2014 at 9:13 PM, Juraj Sukop <juraj.su...@gmail.com> wrote:
> > > > On Sat, Jan 11, 2014 at 12:49 AM, Antoine Pitrou <solip...@pitrou.net>wrote: > >> Also, when you say you've never encountered UTF-16 text in PDFs, it >> sounds like those people who've never encountered any non-ASCII data in >> their programs. > > > Let me clarify: one does not think in "writing text in Unicode"-terms in > PDF. Instead, one records the sequence of "character codes" which > correspond to "glyphs" or the glyph IDs directly. That's because one > Unicode character may have more than one glyph and more characters can be > shown as one glyph. > > > AFAIK (and just for the record), there could be both Latin1 text and UTF-16 in a PDF (and other encodings too), depending on the font used: /Encoding /WinAnsiEncoding (mostly latin1 "standard" fonts) /Encoding /Identity-H (generally for unicode UTF-16 True Type "embedded" fonts) For example, in PyFPDF (a PHP library ported to python), the following code writes out text that could be encoded in two different encodings: s = sprintf("BT %.2f %.2f Td (%s) Tj ET", x*self.k, (self.h-y)*self.k, txt) https://code.google.com/p/pyfpdf/source/browse/fpdf/fpdf.py#602 In Python2, txt is just a str, but in Python3 handling everything as latin1 string obviously doesn't work for TTF in this case. Best regards Mariano Reingart http://www.sistemasagiles.com.ar http://reingart.blogspot.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com