On 09/05/2017, Nick Coghlan <ncogh...@gmail.com> wrote: > > Enough changes have accumulated in PEP 538 since the start of the > previous thread that it seems sensible to me to start a new thread > specifically covering the current design (which aims to address all > the concerns raised in the previous thread). > > I haven't requoted the PEP in full since it's so long, but will > instead refer readers to the web version: > https://www.python.org/dev/peps/pep-0538/
I did try to follow along via the mailing list threads, and have now read over the PEP again. Responding now as I'm actually touching code relevent to this again. Broadly the proposal looks good to me. It does help one of the two cases I care about, and does no serious harm. For a command line Python script, making sure Python itself uses UTF-8 for the C locale is sufficient, and setting LC_CTYPE so spawned processes that aren't Python have a chance at doing the right thing too is a reasonable upgrade. This is probably good enough to drop one hack[1] rather than porting it to Python 3. For hosted Python code this does nothing (apart from print to stderr), so mod_wsgi for instance is still going to need the same kind dance to get users to set LANG as configuration themselves. Ideally this PEP would have a C api or something so I could file bugs to make it just do the right thing. A few notes on specifics, I'll try not to stray too much into choices already made. The PEP doesn't persuade me that Py_Initialize actually is too late to switch *specifically* from ascii to utf-8. Any preceeding operations that operate on unicode would have been a safe subset. There might be issues with other internals, or surrogateescape, or it's just a pain? I don't like the side effect of changing the standard stream error handler to surrogateescape if LANG=C.UTF-8 is actually set. Obviously bad data vs exception is a trade off anyway, but means to get a Python script that will always output valid data or exit, you have to set an arbitrary language like en_US. Yes, that's true of the change as implemented in 3.5 anyway. Not setting LANG and only setting LC_CTYPE seems fine. Obviously, things can go wrong based on odd behaviours of spawned processes, but it works for the normal idioms. I'm not sold on adding the PYTHONCOERCECLOCALE runtime configuration. All it really does is turn off stderr kipple if you must use the C locale for other reasons? Anyone with the ability to set that variable could just set LANG instead. I was going to suggest just documenting LC_ALL=C as the override instead of adding a python specific variable, but note looking around that Debian discourage that[3]. That's all, though I will also grumble a bit about how long the PEP is. Martin [1] Override Py_FileSystemDefaultEncoding to utf-8 from ascii for the bzr script <https://code.launchpad.net/~gz/bzr/filesystem_default_encoding_794353/+merge/85170> [2] WSGIDaemonProcess lang and locale options <https://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html> [3] "Using LC_ALL is strongly discouraged as it overrides everything" <https://wiki.debian.org/Locale#Configuration> _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com