Steve Dower writes: > I plan to use only Unicode to interact with the OS and then utf8 > within Python if the caller wants bytes.
This doesn't answer Victor's questions, or mine. This proposal requires identifying and transcoding bytes that represent text in encodings other than UTF-8. 1. How do you propose to identify "bytes that represent text (and might be filenames)" if they did *not* originate in a filesystem or console API? 2. How do you propose to identify the non-UTF-8 encoding, if you have forced all variables signifying bytes encodings to UTF-8? Additional considerations: As far as I can see, this is just a recipe for a different way to get mojibake. *The* way to avoid mojibake is to "let text be text" *internally*. Developers who insist on processing text as bytes are going to get what they deserve *in edge cases*. But mostly (ie, in the mono-encoding environments of most users) it just (barely ;-) works. And there are many use cases where you *can* process bytes that happen to encode text as "just bytes" (eg, low-level networking code). These cases have performance issues if the bytes-text-bytes-text-bytes double-round-trip implied for *stream content* (vs the OS APIs you're concerned with, which effectively round-trip text-bytes-text) is imposed on them. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/