On Thu, Jan 28, 2021 at 4:25 PM Inada Naoki <songofaca...@gmail.com> wrote:
> > The "real" solution is to change the defaults not to use the system > encoding at all -- which, of course, we are moving towards with PEP 597. So > first a plug to do that as fast as possible! I myself would love to see PEP > 597 implemented tomorrow -- for all supported versions of Python. > > > > Note that PEP 597 doesn't change the default encoding. It just adds an > option to emit a warning when the default encoding is used. > I know -- and THAT could be done soon, yes? > I think it might take about 10 years to change it. > I hope it's not that long -- having code that runs differently in different environments is not good ... > However, the real trick here is that Python is a programming > language/library/runtime -- not an application. So the folks starting up > the interpreter are very often NOT the same as the folks writing the code. > > > > And this is why this is the issue it is -- folks write code on *nix > systems, or maybe Windows with utf-8 as a system encoding, or only test > with ASCII data, or ... -- then someone else actually runs the code, on > Windows, and it doesn't work. Even if the person is technically writing the > code, they may have copy and pasted it or who knows what? Think about it -- > of all the Python code you run (libraries, etc) -- how much of it did you > write yourself? > > > > (I myself have been highly negligent with my teaching materials in this > regard -- so have personally unleashed dozens of folks writting buggy code > on the world.) > > Many codes are written by other people. It cause > UnicodeDecodeError on Windows. > And UTF-8 mode rescues it. > exactly. But the trick is that UTF-* mode is in control of the end user / installer of Python, not the writer of the code. > UTF-8 mode is used to decode command-line arguments and environment > variables on Unix. So UTF-8 mode can be enabled only at startup for > now. > This restriction is caused by Unix so I think we can add something > like `sys._enable_utf8_mode()` only on Windows if it is really needed. > But it means codes using `sys._enable_utf8_mode()` are Windows-only. > It doesn't make sense. > well, that would be a no-op on other platforms. > Another way is adding runtime option to change only the default text > encoding. (e.g. `io.set_default_encoding("utf-8")`) > This is a considerable option. When we add it on the top of scripts or > Notebook, it uses UTF-8 to open files on all platforms. > > On the other hand, it adds another "xxx encoding" terminology to > Python. Python has too many "xxx encoding"s and it confuses users. > So I am cautious about adding another encoding option I appreciate that -- but I do like handing control over to the code-writer, rather than the python-installer. > > Maybe one work around would be for the __future__ import (Or something) > to set the mode, and then trigger warnings for all uses of TextIOWrapper > that don't use utf-8 -- that us turn on PEP597 > > > > So you'd use one library that had the __future__ import, and it wouldn't > break any other code, but it would turn on Warnings. > > Please don't discuss PEP 597 in this thread. Let's focus on UTF-8 mode. > They are different approaches and they are not mutually exclusive. > Sure, but they are related. But I"ll try to find the right thread for PEP 597 > > Imagine someone runs some code in Jupyter, and it's fine, and then they > run it in plain Python, on the same machine, and it breaks -- ouch! > > You are right. UTF-8 mode must be accessible for both of Jupyter on > conda Python and Python installed by official installer. > If UTF-8 mode is accessible enough, user can fix it by enabling UTF-8 mode. > Sure -- but these days folks may have multiple environments and multiple ways to run code (Jupyter, IDEs), so it's way too easy to have UTF-8 mode on in some but not others -- all on the same machine. I'm not a Windows user (much), but users of my library are, and my students are, and I'm having a hard time figuring out what will make this work for them. In the case of my students, I can encourage UTF-8 mode for all installations. In the case of my library users -- it's harder, but I can do the same to some extent -- I do currently suggest a conda environment for my code -- so yes, making it easier to turn it on in an environment would be good. Hmm -- sorry for thinking as I write here, but if UTF-8 mode could be part of an environment spec -- that would be good. So it there a way to have a package installed that turned it on? (obviously a no-op on other platfroms). So you would specify a dependency on the utf8_mode package, At run time, if the utf8_mode package was installed, then UTF-8 mode would be turned on. So that wouldn't quite put it in the hands of the coder -- but would put it in the hands of the application developer -- the person writing the requirements file. So checking `locale.getpreferredencoding(False)` is better. > But note that `locale.getpreferredencoding(False)` may return "utf8", > "utf-8", "utf_8", "UTF-8"... > A good reason to provide a utility for this then -- I know i have no idea all the ways it could be spelled. > > That wouldn't be hard to do, but it might be worth having a small > utility that does it in a _future__import: > > > > from __future__ import warn_if_not_utf8 > > It seems you are misusing __future__ import. __future__ import is for > compilers and parsers. It is not for runtime behavior. > well yes -- but to the "layperson" -- it's a way to say: "make this code act like it will in the future" --which is this case. > And I don't think we should add `warn_if_not_utf8()` for now. > I've been thinking about this -- on the one hand, if I, as a library or application author, am thinking about this issue, then I can (and should) add the ``encoding="utf-8"`` flag everywhere I open a text file in my code. So why not just do that, rather than adding an extra import or function call, or whatever? But in fact, I know I've (and my dev team) have been lazy, and have a lot of places where I should be setting the encoding and am not. And sure, I know how to use grep -- I can find all those places. But it would actually be a lot easier and more reliable to have a way to set up the future behavior. But maybe a topic for another thread. >> Is it possible to enable UTF-8 mode in a configuration file like > `pyvenv.cfg`? > > > > I can't see how that's any more powerful/flexible than an environment > variable. > > It is powerful/flexible for power users. But not for beginners. > Imagine users execute Jupyter from the start menu. > > * Command-line `-Xutf8` or `set PYTHONUTF8=1` is not accessible. > * User environment variable is not accessible too, and it may affect > other Python installations. > which is actually what I like about environment variables -- it could apply to all Python installations on the system -- which would be a good thing! Where would Python look for a "configuration file like `pyvenv.cfg`" ? If we use user-wide (or system-wide) setting like `PYTHONUTF8` in user > environment variable, all Python environments use UTF-8 mode > consistently. > But it will break legacy applications running on old Python environment. > not ones old enough not to look for PYTHONUTF8 -- it would only change if the Python were upgraded. and at least some legacy applications are using py2exe and the like, and those would still be safe. > If we have per-environment option, it's easy to recommend users to > enable UTF-8 mode. > Back to my idea above -- any way to have that be a pip (and conda) installable package? So it could be in a requirements file? Do you mean programs only runs on UTF-8 mode warns if UTF-8 mode is > not enabled? e.g. > > ``` > if sys.platform == "win32" and not sys.flags.utf8_mode: > sys.exit("This programs runs only on UTF-8 mode. Please enable UTF-8 > mode.") > ``` > > Then, I don't like it... Windows only API to enable UTF-8 mode in > runtime seems better. > > ``` > if sys.platform == "win32": > sys._win32_enable_utf8mode() > ``` > I agree -- if that's possible, then it's a better option. Though I would make it simply: ``sys._enable_utf8mode()`` and have it be a no-op outside of Windows. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SPLE5KTKSRDO77OJRXOG346SX6FH3W5Y/ Code of Conduct: http://python.org/psf/codeofconduct/