[Python-Dev] Re: PEP 597: Add optional EncodingWarning

Inada Naoki Tue, 09 Feb 2021 17:34:29 -0800

On Wed, Feb 10, 2021 at 5:50 AM Paul Moore <[email protected]> wrote:
>
> On Tue, 9 Feb 2021 at 16:54, Inada Naoki <[email protected]> wrote:
> >
> > On Tue, Feb 9, 2021 at 9:31 PM Paul Moore <[email protected]> wrote:
> > >
> > > Personally, I'm not at all keen on the idea of making users always
> > > specify encoding in the first place, even if it's "just for the
> > > transition".
> >
> > I agree with you. But as I wrote in the PEP, omitted encoding caused
> > much troubles already.
> > Windows users can not just `pip install somepkg` because some library
> > authors write `long_description=open("README.md").read()` in setup.py.
> >
> > I am trying to fix this situation by two parallel approaches:
> >
> > * (This PEP) Provide a tool for finding this type of bugs, and
> > recommend `encoding="utf-8"` for cross-platform library authors.
> > * (Author thread) Make UTF-8 mode more usable for Windows users,
> > especially students.
>
> Thanks for explaining (again). There's so much debate, across multiple
> proposals, that I can barely follow it. I'm impressed that you're
> managing to keep things straight at all :-)
>
> I guess my views on this PEP come down to
>
> * I see no harm in having a tool that helps developers spot
> platform-specific assumptions about encoding.
> * Realistically, I'd be surprised if developers actually use such a
> tool. If they were likely to do so, they could probably just as easily
> locate all the uses of open() in their code, and check that way. So
> I'm not sure this proposal is actually worth it, even if the end
> result would be very beneficial.
> * In the setup.py case, why don't those same Windows users complain
> that the library fails to install? A quick bug report, followed by a
> simple fix, seems more likely to happen than the developer suddenly
> deciding to scan their code for encoding issues.
>


Yes, some issues are solved already.
On the other hand, there are dozen question about UnicodeDecodeError
in Q&A sites like Stack Overflow.
Many people don't know what the error means, and how to report it correctly.

I sometime set PYTHONWARNINGS=deafult in my bashrc and find
DeprecationWarnings in libraries I am using, and report them.

On the other hand, it is difficult to find omitted `encoding="utf-8"`,
because I use macOS and Linux in daily development.
If there is PYTHONWARNENCODING, I can write `export
PYTHONWARNENCODING=1` in my .bashrc.


> Regarding the wider question of UTF8 as default, my views can probably
> be summarised as follows:
>
> * If you want to write correct code to deal with encodings, there is
> no substitute for carefully considering every bytes/string conversion,
> deciding how you are going to identify the encoding to use, and then
> specifying that encoding explicitly. Default values for encodings have
> no place in such code.
> * In reality, though, that's far too much work for many situations.
> Default encodings are a necessary convenience, particularly for simple
> scripts, or for people who can't, or don't want to, do the analysis
> that the "correct" approach implies.

Yes. and the UTF-8 is the default encoding for s.encode() already.

> * Picking the right default is *hard*. Changing the default is even
> harder, unfortunately.
> * I feel that we already have a number of mechanisms (PEPs 538 and
> 540) trying to tackle this issue. Adding yet more suggests to me that
> we'd be better off pausing and working out why we still have an issue.
> We should be moving towards *fewer* mechanisms, not more.
> * We have UTF-8 mode, and users can set it per-process (via flag or
> environment variable) per-user or per-site (by environment variable).
> I don't honestly believe that a user (whatever OS they work on) who is
> capable of writing Python code, can't be shown how to set an
> environment variable. I see no reason to suggest we need yet another
> way to set UTF-8 mode, or that a per-interpreter or per-virtualenv
> setting is particularly crucial (suggestions that have been made in
> the Python-Ideas threads).

Note that many Python users don't use consoles. They just starts
Jupyter Notebook, or they just write .py file and run it in the
Minecraft mods.

> * UTF-8 is likely to be the most appropriate default encoding for
> Python in the longer term, and I agree that Windows is fast
> approaching the point where a UTF-8 encoding is more appropriate than
> the ANSI codepage for "new stuff". But there's a lot of legacy files
> and applications around, and I suspect that a UTF-8 default will
> inconvenience a lot of people working with such data. But equally,
> such people may not be in a huge rush to switch to the latest Python
> version. Whichever way we go, though, some people will be
> inconvenienced.
>
> I'm also somewhat bemused by the rather negative view of "Windows
> beginners" that lies behind a lot of these discussions. People's
> experiences may well differ, but the people I see using (and learning)
> Python on Windows are often experienced computer users, maybe
> developers with significant experience in Java or other "enterprise
> languages", or data scientists who have a lot of knowledge of
> computers, but are relatively new to programming. Or systems admins,
> or database specialists, who want to use Python to write scripts on
> Windows. None of those people fit the picture of people who wouldn't
> know how to set an environment variable, or configure their
> environment. On the other hand, (in my experience) they often don't
> really have much knowledge of character encodings, and tend to just
> use whatever default their PC uses, and expect it to work. They *can*,
> however, understand when an encoding problem is explained to them, and
> can set an explicit encoding once they know they need to.
>
> Paul

Of course, it is not a problem for experts.

But Python is very widely used for learning Programming.  There are
many Programming courses for Junior high school students, or even for
elementary school students.
They may start learning Python on the browser. When they installed
Python and run the program worked on the browser, it won't work
because of encoding issue...

-- 
Inada Naoki  <[email protected]>
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/KKPKCAOJHBF5XULJLUPEZLS6SQ5QITRX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 597: Add optional EncodingWarning

Reply via email to