Hi,

I was doing some testing on the new _string_io module, since I was
slightly skeptical on my handling of wide Unicode characters (32-bit
of length, instead of the usual 16-bit in UTF-16). So, I ran this
little test:

   >>> s = _string_io.StringIO()
   >>> s.write(u'晉')
   >>> s.tell()
   2

Like I expected, wide Unicode characters count for two. However, I was
surprised that Python treats them as two characters as well:

   >>> len(u'晉')
   2
   >>> u'晉'
   u'\ud87e\udccd'

Is it a bug, or only an implementation choice?

Cheers,
-- Alexandre
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to