On Fri, Jan 29, 2021 at 4:00 AM Christopher Barker <python...@gmail.com> wrote: > > The "real" solution is to change the defaults not to use the system encoding > at all -- which, of course, we are moving towards with PEP 597. So first a > plug to do that as fast as possible! I myself would love to see PEP 597 > implemented tomorrow -- for all supported versions of Python. >
Note that PEP 597 doesn't change the default encoding. It just adds an option to emit a warning when the default encoding is used. I think it might take about 10 years to change it. > However, the real trick here is that Python is a programming > language/library/runtime -- not an application. So the folks starting up the > interpreter are very often NOT the same as the folks writing the code. > > And this is why this is the issue it is -- folks write code on *nix systems, > or maybe Windows with utf-8 as a system encoding, or only test with ASCII > data, or ... -- then someone else actually runs the code, on Windows, and it > doesn't work. Even if the person is technically writing the code, they may > have copy and pasted it or who knows what? Think about it -- of all the > Python code you run (libraries, etc) -- how much of it did you write yourself? > > (I myself have been highly negligent with my teaching materials in this > regard -- so have personally unleashed dozens of folks writting buggy code > on the world.) You are right. Many codes are written by other people. It cause UnicodeDecodeError on Windows. And UTF-8 mode rescues it. > > Anyway -- I'm afraid any combination of start-up flags, environment > variables, etc. will not be enough -- is there a way to enable UTF-8 mode in > the code, e.g. with a __future__ import? > This may be impossible, as UTF-8 mode is an interpreter global setting, and > it could get very messy if a __future import__ in one library changes the > behavior of all the other code -- but maybe there's some way to accomplish > something similar? > > Could monkey patch open() for that module, but would there be any way to have > it work, on a module basis, for all other uses of TextIOWrapper? UTF-8 mode is used to decode command-line arguments and environment variables on Unix. So UTF-8 mode can be enabled only at startup for now. This restriction is caused by Unix so I think we can add something like `sys._enable_utf8_mode()` only on Windows if it is really needed. But it means codes using `sys._enable_utf8_mode()` are Windows-only. It doesn't make sense. Another way is adding runtime option to change only the default text encoding. (e.g. `io.set_default_encoding("utf-8")`) This is a considerable option. When we add it on the top of scripts or Notebook, it uses UTF-8 to open files on all platforms. On the other hand, it adds another "xxx encoding" terminology to Python. Python has too many "xxx encoding"s and it confuses users. So I am cautious about adding another encoding option and focus on UTF-8 mode now. > > Maybe one work around would be for the __future__ import (Or something) to > set the mode, and then trigger warnings for all uses of TextIOWrapper that > don't use utf-8 -- that us turn on PEP597 > > So you'd use one library that had the __future__ import, and it wouldn't > break any other code, but it would turn on Warnings. > Please don't discuss PEP 597 in this thread. Let's focus on UTF-8 mode. They are different approaches and they are not mutually exclusive. * UTF-8 mode helps users who see UnicodeDecodeError while `pip install`. * PEP 597 helps developers to notice `open("README.md").read()` in `setup.py`. > Anyway, this is a very hard problem, but what I'm trying to get at is that we > don't want the exact same code to run differently depending on what > environment it's running in. Currently, it depends on the system encoding, > we'd just be switching to it depending on whether utf-mode is turned on, > which is better, I suppose, (e.g Jupyter could choose to turn utf-mode on by > default for example), but would still have the same fundamental problem. > > Imagine someone runs some code in Jupyter, and it's fine, and then they run > it in plain Python, on the same machine, and it breaks -- ouch! > You are right. UTF-8 mode must be accessible for both of Jupyter on conda Python and Python installed by official installer. If UTF-8 mode is accessible enough, user can fix it by enabling UTF-8 mode. > BTW: is there a way at runtime to check for UTF8 mode? Then at least I could > raise a warning in my code. Or maybe simply check if > locale.getpreferredencoding() returns utf-8, and raise a warning if not. There is `sys.flags.utf8_mode`. But UTF-8 mode is not used on most Unix users because locale encoding is UTF-8. So checking `locale.getpreferredencoding(False)` is better. But note that `locale.getpreferredencoding(False)` may return "utf8", "utf-8", "utf_8", "UTF-8"... > That wouldn't be hard to do, but it might be worth having a small utility > that does it in a _future__import: > > from __future__ import warn_if_not_utf8 It seems you are misusing __future__ import. __future__ import is for compilers and parsers. It is not for runtime behavior. And I don't think we should add `warn_if_not_utf8()` for now. >> >> Is it possible to enable UTF-8 mode in a configuration file like >> `pyvenv.cfg`? > > I can't see how that's any more powerful/flexible than an environment > variable. > It is powerful/flexible for power users. But not for beginners. Imagine users execute Jupyter from the start menu. * Command-line `-Xutf8` or `set PYTHONUTF8=1` is not accessible. * User environment variable is not accessible too, and it may affect other Python installations. >> Is it possible to make it easier to configure? >> >> * Put a checkbox in the installer? >> * Provide a small tool to allow configuration after installation? >> * python3 -m utf8mode enable|disable? >> * Accessible only for CLI user >> * Add "Enable UTF-8 mode" and "Disable UTF-8 mode" to Start menu? > > > This is still going to have the same fundamental problems of the same code > running differently on different machines or even the same machine in > different environments, installs -- someone upgrades and forgets to check > that box again, etc .... > There are pros and cons. If we use user-wide (or system-wide) setting like `PYTHONUTF8` in user environment variable, all Python environments use UTF-8 mode consistently. But it will break legacy applications running on old Python environment. If we have per-environment option, it's easy to recommend users to enable UTF-8 mode. > Maybe this would be a good thing to do once there are Warnings in place? > Do you mean programs only runs on UTF-8 mode warns if UTF-8 mode is not enabled? e.g. ``` if sys.platform == "win32" and not sys.flags.utf8_mode: sys.exit("This programs runs only on UTF-8 mode. Please enable UTF-8 mode.") ``` Then, I don't like it... Windows only API to enable UTF-8 mode in runtime seems better. ``` if sys.platform == "win32": sys._win32_enable_utf8mode() ``` Regards, -- Inada Naoki <songofaca...@gmail.com> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KZYWEPFI4TNBBOJB3ZFGVTRWKL73XXRO/ Code of Conduct: http://python.org/psf/codeofconduct/