On Sun, Oct 27, 2019, at 03:39, Andrew Barnert via Python-ideas wrote: > (Actually, IIRC, one of the two has a str type that, despite being 2.x, > is unicode rather than bytes, but with some extra undocumented > functionality to smuggle bytes around in a str and have it sometimes > work.)
I do like the way GNU Emacs represents strings - abstractly, a string can contain any character, or any byte > 127 distinct from a character. Concretely, IIRC they are represented either as pure byte strings or as UTF-8 with "bytes > 127" represented as the extended UTF-8 representations of code points 0x3FFF80 through 0x3FFFFF [values between 0x110000 and 0x3FFF7F are used for other purposes]. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GLM57Y6TC2HR4BEGXA6UPL44BULIIDTH/ Code of Conduct: http://python.org/psf/codeofconduct/