On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Guido has stated that the
> internal representation used by Python strings is a sequence of
> Unicode code units, not characters.  I don't think that's reached the
> status of "pronouncement" yet, but you will probably need a PEP to get
> the guarantees you want.

I think of this as cast in stone; we can't reasonably guarantee more
if we want to be compatible with the UTF-16 (*) Unicode
representations used on Windows and in Java. How much more
pronouncement do you want?

(*) I'm not at all sure that it's called that -- you guys keep asking
trick questions based on terminology that's only clear to people who
have read the Unicode standard several times forwards and backwards. I
mean the representation that uses 16-bit values, where characters >=
2**16 are represented as two 16-bit "surrogate" values. (I hope I at
least have the 'surrogate' thing right this time.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to