On Fri, Feb 12, 2021 at 5:18 AM Jim J. Jewett <jimjjew...@gmail.com> wrote:
>
> Inada Naoki wrote:
>
> > Default encoding is used for:
>
> > a. Really need to use locale specific encoding
> > b. UTF-8 (bug. not work on Windows)
> > c. ASCII (not a bug, but slow on Windows)
>
> > I assume most usages are (b) and (c). This PEP can reduce them soon.
>
> Is this just an assumption, based on those times being visible to someone who 
> installs a lot of packages, or has the use of any locale other than UTF-8 and 
> ASCII really gone down a lot?  Have browsers stopped using charset sniffing?
>

Using "most" is my fault. I am not good at Englsh. I should use "many" here.
You can see many bugs caused by not specifying `encoding="utf-8"` in Q&A sites.
I wrote some number about this common bugs in the PEP.

UTF-8 is used for 96.3% of web sites [1], although browser still use
charset sniffing. But how is it relating to this PEP?
[1] https://w3techs.com/technologies/details/en-utf8


> > Additionally, encoding="locale" will be backward/forward compatible
>
> What would be the problem with changing the default from None to locale?

It doesn't work on Python ~3.9.
So using `encoding="locale"` is not recommended anytime soon until
user drops Python 3.9 support.

> (I think you mentioned that they are the same 99% of the time; is that other 
> 1% likely to be cases where locale is wrong but None is right?  Would there 
> be a better way to represent that 1%?)
>

`encoding="locale"` and `encoding=None` has same behavior except
`encoding="locale"` doesn't emit EncodingWarning even when it is
opt-in.

There is little difference between `encoding=None` and
`encoding=locale.getpreferredencoding(False)`. The difference is:

* When Python is using Windows, and
* When when the file is console, and
* (for open()) When PYTHONLEGACYWINDOWSSTDIO is set
* (for TextIOWrapper()) When the file is not _WindowsConsoleIO

encoding=None uses console codepage but
encoding=locale.getpreferredencoding(False) uses
Otherwise, encoding=None and
encoding=locale.getpreferredencoding(False) are same.

So `encoding=locale.getpreferredencoding(False)` can be used to
specify locale-specific encoding explicitly.
But this PEP doesn't recommend it. This PEP recommend to use
EncodingWarning for just finding missing `encoding="utf-8"` (or any
other specific encoding).

-- 
Inada Naoki  <songofaca...@gmail.com>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PD4BTBAQHFUYOCF5QKIBDIMHATPVEFPW/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to