Re: How do I display unicode value stored in a string variable using ord()

Paul Rubin Sat, 18 Aug 2012 11:33:13 -0700

Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> writes:
> (There is an extension to UCS-2, UTF-16, which encodes non-BMP characters 
> using two code points. This is fragile and doesn't work very well, 
> because string-handling methods can break the surrogate pairs apart, 
> leaving you with invalid unicode string. Not good.)
...
> With PEP 393, each Python string will be stored in the most efficient 
> format possible:


Can you explain the issue of "breaking surrogate pairs apart" a little
more?  Switching between encodings based on the string contents seems
silly at first glance.  Strings are immutable so I don't understand why
not use UTF-8 or UTF-16 for everything.  UTF-8 is more efficient in
Latin-based alphabets and UTF-16 may be more efficient for some other
languages.  I think even UCS-4 doesn't completely fix the surrogate pair
issue if it means the only thing I can think of.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

Reply via email to