On 9/18/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Guido has stated that the > internal representation used by Python strings is a sequence of > Unicode code units, not characters. I don't think that's reached the > status of "pronouncement" yet, but you will probably need a PEP to get > the guarantees you want.
I think of this as cast in stone; we can't reasonably guarantee more if we want to be compatible with the UTF-16 (*) Unicode representations used on Windows and in Java. How much more pronouncement do you want? (*) I'm not at all sure that it's called that -- you guys keep asking trick questions based on terminology that's only clear to people who have read the Unicode standard several times forwards and backwards. I mean the representation that uses 16-bit values, where characters >= 2**16 are represented as two 16-bit "surrogate" values. (I hope I at least have the 'surrogate' thing right this time.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com