On Thu, Jan 28, 2021 at 4:25 PM Inada Naoki <songofaca...@gmail.com> wrote:

> > The "real" solution is to change the defaults not to use the system
> encoding at all -- which, of course, we are moving towards with PEP 597. So
> first a plug to do that as fast as possible! I myself would love to see PEP
> 597 implemented tomorrow -- for all supported versions of Python.
> >
>
> Note that PEP 597 doesn't change the default encoding. It just adds an
> option to emit a warning when the default encoding is used.
>

I know -- and THAT could be done soon, yes?


> I think it might take about 10 years to change it.
>

I hope it's not that long -- having code that runs differently in different
environments is not good ...

> However, the real trick here is that Python is a programming
> language/library/runtime -- not an application. So the folks starting up
> the interpreter are very often NOT the same as the folks writing the code.
> >
> > And this is why this is the issue it is -- folks write code on *nix
> systems, or maybe Windows with utf-8 as a system encoding, or only test
> with ASCII data, or ...  -- then someone else actually runs the code, on
> Windows, and it doesn't work. Even if the person is technically writing the
> code, they may have copy and pasted it or who knows what? Think about it --
> of all the Python code you run (libraries, etc) -- how much of it did you
> write yourself?
> >
> > (I myself have been highly negligent with my teaching materials in this
> regard --  so have personally unleashed dozens of folks writting buggy code
> on the world.)
>
> Many codes are written by other people. It cause
> UnicodeDecodeError on Windows.
> And UTF-8 mode rescues it.
>

exactly. But the trick is that UTF-* mode is in control of the end user /
installer of Python, not the writer of the code.


> UTF-8 mode is used to decode command-line arguments and environment
> variables on Unix. So UTF-8 mode can be enabled only at startup for
> now.
> This restriction is caused by Unix so I think we can add something
> like `sys._enable_utf8_mode()` only on Windows if it is really needed.
> But it means codes using `sys._enable_utf8_mode()` are Windows-only.
> It doesn't make sense.
>

well, that would be a no-op on other platforms.


> Another way is adding runtime option to change only the default text
> encoding. (e.g. `io.set_default_encoding("utf-8")`)
> This is a considerable option. When we add it on the top of scripts or
> Notebook, it uses UTF-8 to open files on all platforms.
>
> On the other hand, it adds another "xxx encoding" terminology to
> Python. Python has too many "xxx encoding"s and it confuses users.
> So I am cautious about adding another encoding option


I appreciate that -- but I do like handing control over to the code-writer,
rather than the python-installer.


> > Maybe one work around would be for the __future__ import (Or something)
> to set the mode, and then trigger warnings for all uses of TextIOWrapper
> that don't use utf-8 -- that us turn on PEP597
> >
> > So you'd use one library that had the __future__ import, and it wouldn't
> break any other code,  but it would turn on Warnings.
>
> Please don't discuss PEP 597 in this thread. Let's focus on UTF-8 mode.
> They are different approaches and they are not mutually exclusive.
>

Sure, but they are related. But I"ll try to find the right thread for PEP
597


> > Imagine someone runs some code in Jupyter, and it's fine, and then they
> run it in plain Python, on the same machine, and it breaks -- ouch!
>
> You are right. UTF-8 mode must be accessible for both of Jupyter on
> conda Python and Python installed by official installer.
> If UTF-8 mode is accessible enough, user can fix it by enabling UTF-8 mode.
>

Sure -- but these days folks may have multiple environments and multiple
ways to run code (Jupyter, IDEs), so it's way too easy to have UTF-8 mode
on in some but not others -- all on the same machine.

I'm not a Windows user (much), but users of my library are, and my students
are, and I'm having a hard time figuring out what will make this work for
them.

In the case of my students, I can encourage UTF-8 mode for all
installations.

In the case of my library users -- it's harder, but I can do the same to
some extent -- I do currently suggest a conda environment for my code -- so
yes, making it easier to turn it on in an environment would be good.

Hmm -- sorry for thinking as I write here, but if UTF-8 mode could be part
of an environment spec -- that would be good.

So it there a way to have a package installed that turned it on? (obviously
a no-op on other platfroms). So you would specify a dependency on the
utf8_mode package, At run time, if the utf8_mode package was installed,
then UTF-8 mode would be turned on.

So that wouldn't quite put it in the hands of the coder -- but would put it
in the hands of the application developer -- the person writing the
requirements file.

So checking `locale.getpreferredencoding(False)` is better.
> But note that `locale.getpreferredencoding(False)` may return "utf8",
> "utf-8", "utf_8", "UTF-8"...
>

A good reason to provide a utility for this then -- I know i have no idea
all the ways it could be spelled.


> > That wouldn't be hard to do, but it might be worth having a small
> utility that does it in a _future__import:
> >
> > from __future__ import warn_if_not_utf8
>
> It seems you are misusing __future__ import. __future__ import is for
> compilers and parsers. It is not for runtime behavior.
>

well yes -- but to the "layperson" -- it's a way to say: "make this code
act like it will in the future" --which is this case.


> And I don't think we should add `warn_if_not_utf8()` for now.
>

I've been thinking about this -- on the one hand, if I, as a library or
application author, am thinking about this issue, then I can (and should)
add the ``encoding="utf-8"`` flag everywhere I open a text file in my code.
So why not just do that, rather than adding an extra import or function
call, or whatever?

But in fact, I know I've (and my dev team) have been lazy, and have a lot
of places where I should be setting the encoding and am not. And sure, I
know how to use grep -- I can find all those places. But it would actually
be a lot easier and more reliable to have a way to set up the future
behavior.

But maybe a topic for another thread.

>> Is it possible to enable UTF-8 mode in a configuration file like
> `pyvenv.cfg`?
> >
> > I can't see how that's any more powerful/flexible than an environment
> variable.
>
> It is powerful/flexible for power users. But not for beginners.
> Imagine users execute Jupyter from the start menu.
>
> * Command-line `-Xutf8` or `set PYTHONUTF8=1` is not accessible.
> * User environment variable is not accessible too, and it may affect
> other Python installations.
>

which is actually what I like about environment variables -- it could apply
to all Python installations on the system -- which would be a good thing!

Where would Python look for a "configuration file like `pyvenv.cfg`" ?

If we use user-wide (or system-wide) setting like `PYTHONUTF8` in user
> environment variable, all Python environments use UTF-8 mode
> consistently.
> But it will break legacy applications running on old Python environment.
>

not ones old enough not to look for PYTHONUTF8 -- it would only change if
the Python were upgraded.

and at least some legacy applications are using py2exe and the like, and
those would still be safe.

>  If we have per-environment option, it's easy to recommend users to

> enable UTF-8 mode.
>

Back to my idea above -- any way to have that be a pip (and conda)
installable package? So it could be in a requirements file?

Do you mean programs only runs on UTF-8 mode warns if UTF-8 mode is
> not enabled? e.g.
>
> ```
> if sys.platform == "win32" and not sys.flags.utf8_mode:
>     sys.exit("This programs runs only on UTF-8 mode. Please enable UTF-8
> mode.")
> ```
>
> Then, I don't like it... Windows only API to enable UTF-8 mode in
> runtime seems better.
>
> ```
> if sys.platform == "win32":
>     sys._win32_enable_utf8mode()
> ```
>
I agree -- if that's possible, then it's a better option.

Though I would make it simply:

``sys._enable_utf8mode()``

and have it be a no-op outside of Windows.

-CHB

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SPLE5KTKSRDO77OJRXOG346SX6FH3W5Y/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to