2017-01-06 10:50 GMT+01:00 M.-A. Lemburg <m...@egenix.com>: > Victor: I think you are taking the UTF-8 idea a bit too far.
Hum, sorry, the PEP is still a draft, the rationale is far from perfect yet. Let me try to simplify the issue: users are unable to configure a locale for various reasons and expect that Python 3 must "just works", so never fail on encoding or decoding. Do you mean that you must try to fix this issue? Or that my approach is not the good one? > Nick was trying to address the situation where the locale is > set to "C", or rather not set at all (in which case the lib C > defaults to the "C" locale). The latter is a fairly standard > situation when piping data on Unix or when spawning processes > which don't inherit the current OS environment. In the second version of my PEP, Python 3.7 will basically "just work" with the POSIX locale (or C locale if you prefer). This locale enables the UTF-8 mode which forces UTF-8/surrogatescape, and this error handler prevents the most common encode/decode error (but not all of them!). When I read the different issues on the bug tracker, I understood that people have different opinions because they have different use cases and so different expectations. I tried to describe a few use cases to help to understand why we don't have the expectations: https://www.python.org/dev/peps/pep-0540/#replace-a-word-in-a-text I guess that "piping data on Unix" is represented by my "Replace a word in a text" example, right? It implements the "sed -e s/apple/orange/g" command using Python 3. Classical usage: cat input_file | sed -e s/apple/orange/g > output "UNIX users" don't want Unicode errors here. > The problem with the "C" locale is that the encoding defaults to > "ASCII" and thus does not allow Python to show its built-in > Unicode support. I don't think that it's the main annoying issues for users. User complain because basic functions like (1) "List a directory into stdout" or (2) "List a directory into a text file" fail badly: (1) https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout (2) https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-a-text-file They don't really care of powerful Unicode features, but are bitten early just on writing data back to the disk, into a pipe, or something else. Python 3.6 tries to be nice with users when *getting* data, and it is very pedantic when you try to put the data somewhere. The only exception is that stdout now uses the surrogateescape error handler, but only with the POSIX locale. > Nick's PEP and the discussion on the ticket > http://bugs.python.org/issue28180 are trying to address this > particular situation, not enforce any particular encoding > overriding the user's configured environment. > > So I think it would be better if you'd focus your PEP on the > same situation: locale set to "C" or not set at all. I'm not sure that I understood: do you suggest to only modify the behaviour when the POSIX locale is used, but don't add any option to ignore the locale and force UTF-8? At least, I would like to get a UTF-8/strict mode which would require an option to enable it. About -X utf8, the idea is to write explicitly that you are sure that all inputs are encoded to UTF-8 and that you request to encode outputs to UTF-8. I guess that you are concerned by locales using encodings other than ASCII or UTF-8 like Latin1, ShiftJIS or something else? > BTW: You mention a locale "POSIX" in a few places. I have > never seen this used in practice and wonder why we should > even consider this in Python as possible work-around for > a particular set of features. The locale setting in your > environment does have a lot of influence on your user > experience, so forcing people to set a "POSIX" locale doesn't > sound like a good idea - if they have to go through the > trouble of correctly setting up their environment for Python > to correctly run, they would much more likely use the correct > setting rather than a generic one like "POSIX", which is > defined as alias for the "C" locale and not as a separate > locale: (...) Hum, the POSIX locale is the "C" locale in my PEP. I don't request users to force the POSIX locale. I propose to make Python nicer than users already *get* the POSIX locale for various reasons: * OS not correctly configured * SSH connection failing to set the locale * user using LANG=C to get messages in english * LANG=C used for a bad reason * program run in an empty environment * user locale set to a non-existent locale => the libc falls back on POSIX * etc. "LANG=C": "LC_ALL=C" is more correct, but it seems like LANG=C is more common than LC_ALL=C or LC_CTYPE=C in the wild. >> It's actually very similar to your PEP, except that instead of adding >> the ability to make CPython ignore the C level locale settings (which >> I think is a bad idea based on your own previous work in that area and >> on the way that CPython interacts with other C/C++ components in the >> same process and in subprocesses), it just *changes* those settings >> when we're pretty sure they're wrong. > > ... and this is taking the original intent of the ticket > a little too far as well :-) By ticket, do you mean a Python issue? By the way, I'm aware of these two issues: http://bugs.python.org/issue19846 http://bugs.python.org/issue28180 I'm sure that other issues were opened to request something similiar, but they got probably less feedback, and I was to lazy yet to find them all. > Without the "C.UTF-8" locale available, your PEP [538] only affects > the FS encoding, AFAICT, unless other parts of the application > try to interpret the locale env settings as well and use their > own logic for the interpretation. I decided to write the PEP 540 because only few operating systems provide C.UTF-8 or C.utf8. I'm trying to find a solution working on all UNIX and BSD systems. Maybe I'm wrong, and my approach (ignore the locale, rather than really "fixing" the locale) is plain wrong. Again, it's a very hard issue, I don't think that any perfect solution exists. Otherwise, we would already have fixed this issue 8 years ago! It's a matter of compromises and finding a practical design which works for most users. > For the purpose of experimentation, I would find it better > to start with just fixing the FS encoding in 3.7 and > leaving the option to adjust the locale setting turned off > per default. Sorry, what do you mean by "fixing the FS encoding"? I understand that it's basically my PEP 540 without -X utf8 and PYTHONUTF8, only with the UTF-8 mode enabled for the POSIX locale? By the way, Nick's PEP 538 doesn't mention surrogateescape. IMHO if we override or ignore the locale, it's safer to use surrogateescape. The Use Cases of my PEP 540 should help to understand why. Victor _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/