On Thu, Sep 1, 2011 at 8:02 AM, Terry Reedy <tjre...@udel.edu> wrote: > On 8/31/2011 1:10 PM, Guido van Rossum wrote: >> Ok, I dig this, to some extent. However saying it is UCS-2 is equally >> bad. > > As I said on the tracker, our narrow builds are in-between (while moving > closer to UTF-16), and both terms are deceptive, at least to some.
We should probably just explicitly document that the internal representation in narrow builds is a UCS-2/UTF-16 hybrid - like UTF-16, it can handle the full code point space, but, like UCS-2, it allows code unit sequences (such as lone surrogates) that strict UTF-16 would reject. Perhaps we should also finally split strings out to a dedicated section on the same tier as Sequence types in the library reference. Yes, they're sequences, but they're also so much more than that (try as you might, you're unlikely to be successful in ducktyping strings the way you can sequences, mappings, files, numbers and other interfaces. Needing a "real string" is even more common than needing a "real dict", especially after the efforts to make most parts of the interpreter that previously cared about the latter distinction accept arbitrary mapping objects). I've created http://bugs.python.org/issue12874, suggesting that the "Sequence Types" and "memoryview type" sections could be usefully rearranged as: Sequence Types - list, tuple, range Text Data - str Binary Data - bytes, bytearray, memoryview Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com