Hi, I was doing some testing on the new _string_io module, since I was slightly skeptical on my handling of wide Unicode characters (32-bit of length, instead of the usual 16-bit in UTF-16). So, I ran this little test:
>>> s = _string_io.StringIO() >>> s.write(u'晉') >>> s.tell() 2 Like I expected, wide Unicode characters count for two. However, I was surprised that Python treats them as two characters as well: >>> len(u'晉') 2 >>> u'晉' u'\ud87e\udccd' Is it a bug, or only an implementation choice? Cheers, -- Alexandre _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com