> PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode > strings in a reversible way.
That isn't really true; it is not, inherently, about UTF-8. Instead, it tries to represent non-filesystem-encoding byte sequence in Unicode strings in a reversible way. > Quietly escaping a bad UTF-8 encoding with private Unicode characters is > unlikely to be the right thing And indeed, the PEP stopped using PUA characters. > Therefore, when Python encounters path names on a file system > that are not consistent with the (assumed) encoding for that file > system, Python should raise an error. This is what happens currently, and users are quite unhappy about it. > If you really don't care what the string looks like and you just want an > encoding that round-trips without loss, you can probably just set your > encoding to one of the 8 bit encodings, like ISO 8859-15. Decoding > arbitrary byte sequences to unicode strings as ISO 8859-15 is no less > correct than decoding them as the proposed "utf-8b". In fact, the most > likely source of non-UTF-8 sequences is ISO 8859 encodings. Yes, users can do that (to a degree), but they are still unhappy about it. The approach actually fails for command line arguments > As for what the byte-oriented interfaces should do, they are simply > platform dependent. On UNIX, they should do the obvious thing. On > Windows, they can either hook up to the low-level byte-oriented system > calls that the systems supply, or Windows could fake it and have the > byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8 > sequences as illegal (there are already many illegal byte sequences > anyway). As is, these interfaces are incomplete - they don't support command line arguments, or environment variables. If you want to complete them, you should write a PEP. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com