[Python-3000] Handling of wide Unicode characters

Alexandre Vassalotti Fri, 01 Jun 2007 15:57:47 -0700

Hi,

I was doing some testing on the new _string_io module, since I was
slightly skeptical on my handling of wide Unicode characters (32-bit
of length, instead of the usual 16-bit in UTF-16). So, I ran this
little test:


   >>> s = _string_io.StringIO()
   >>> s.write(u'晉')
   >>> s.tell()
   2

Like I expected, wide Unicode characters count for two. However, I was
surprised that Python treats them as two characters as well:

   >>> len(u'晉')
   2
   >>> u'晉'
   u'\ud87e\udccd'

Is it a bug, or only an implementation choice?

Cheers,
-- Alexandre
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

[Python-3000] Handling of wide Unicode characters

Reply via email to