[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-13 Thread Inada Naoki
To demonstrate how this warning is useful, I used my reference implementation.

When I try `pip install`, I found these issues soon.

https://bugs.python.org/issue43214 (Open pth file with locale-encoding)
https://github.com/pypa/pip/pull/9608 (Not a real bug, but open JSON
file with locale-encoding)

And when creating a PR for pip, I found this issue in tox:

https://github.com/tox-dev/tox/issues/1908 (Open toml file with
locale-encoding, may not work on Windows)

Although most developers won't use this option, I and few other
developers can put `export PYTHONWARNENCODING=1` in .bashrc and will
find many possible bugs that happen only on Windows, even if they
don't use Windows daily development.

Isn't this option worth enough?
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/K7PVGEHDB3BXLNFZ6UWFJOKCC337UTWO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-12 Thread Jim J. Jewett
In the documentation (not sure whether it should be the documentation
for "open" or for encoding), include at least a link to instructions
on how to (try to) verify that your codebase is using the encoding
parameter properly.  Those instructions would say something like "Add
the following lines to end of Lib\site.py:
_origopen=open
def open(...):
if ...
warnings.warn(...)
_origopen(...)
"

-jJ

On Fri, Feb 12, 2021 at 6:28 PM Inada Naoki  wrote:
>
> On Sat, Feb 13, 2021 at 4:53 AM Jim J. Jewett  wrote:
> >
> > Offering encoding="locale" (or open.locale or ... ) instead of a long 
> > function call using False (locale.getpreferredencoding(False)) seems like a 
> > win for Explicit is Better Than Implicit.  It would then be possible to say 
> > "yeah, locale really is what I meant".
> >
> > Err... unless the charset determination is so tricky that it ends up just 
> > adding another not-quite-right near-but-not-exact-synonym.
> >
> > Adding a new Warning subclass, and maybe a new warning type, and maybe a 
> > new environment variable, and maybe a new launch flag ... these all seem to 
> > risk just making things more complicated without sufficient gain.
> >
> > Would a recipe for site-packages be sufficient, or does this need to run 
> > too early in the bootstrapping process?
> >
> > -jJ
>
> What does "a recipe for site-packages" mean?
>
> --
> Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MSK5HN4IGUMBRF4PM7IZYMI7OJGD4KJC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-12 Thread Inada Naoki
On Sat, Feb 13, 2021 at 4:53 AM Jim J. Jewett  wrote:
>
> Offering encoding="locale" (or open.locale or ... ) instead of a long 
> function call using False (locale.getpreferredencoding(False)) seems like a 
> win for Explicit is Better Than Implicit.  It would then be possible to say 
> "yeah, locale really is what I meant".
>
> Err... unless the charset determination is so tricky that it ends up just 
> adding another not-quite-right near-but-not-exact-synonym.
>
> Adding a new Warning subclass, and maybe a new warning type, and maybe a new 
> environment variable, and maybe a new launch flag ... these all seem to risk 
> just making things more complicated without sufficient gain.
>
> Would a recipe for site-packages be sufficient, or does this need to run too 
> early in the bootstrapping process?
>
> -jJ

What does "a recipe for site-packages" mean?

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4ZOMEDEZ72SU7FDTTF5XUIPOA5SU72R6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-12 Thread Jim J. Jewett
Offering encoding="locale" (or open.locale or ... ) instead of a long function 
call using False (locale.getpreferredencoding(False)) seems like a win for 
Explicit is Better Than Implicit.  It would then be possible to say "yeah, 
locale really is what I meant".  

Err... unless the charset determination is so tricky that it ends up just 
adding another not-quite-right near-but-not-exact-synonym.

Adding a new Warning subclass, and maybe a new warning type, and maybe a new 
environment variable, and maybe a new launch flag ... these all seem to risk 
just making things more complicated without sufficient gain.

Would a recipe for site-packages be sufficient, or does this need to run too 
early in the bootstrapping process?

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VUVVGVCBLVR55ELDLX44SFLBK7ED7WGG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Inada Naoki
On Fri, Feb 12, 2021 at 12:45 PM Jim J. Jewett  wrote:
>
> On Thu, Feb 11, 2021 at 7:35 PM Inada Naoki  wrote:
>
> > The PEP helps developers living on UTF-8 locale to find missing
> > `encoding="utf-8"` bug.
> > This type of bug is very common, and many Windows users are suffered
> > by the bug when reading JSON, YAML, TOML, Markdown, or any other UTF-8
> > files.
>
> I think this is where we have been talking past each other.
>
> You seem to be assuming that the programmer knows the correct
> encoding, presumably because they (or their program) wrote it.

Not always, but many times.

>  You
> then assume that they neglected to mention the encoding out of
> forgetfulness, perhaps because on their system, everything is always
> UTF-8.  This clearly does happen, but the people who would make this
> mistake most often -- they probably wouldn't think to test their code
> under a special mode that catches only this.  (They might run a linter
> that looked for all sorts of problems, including this.)
>

Some Python experts can write `export PYTHONWARNENCODING=1` in their .bashrc.
They can find such mistakes not only in their codes but also in
libraries they are using.
Since they are experts, they can understand the warning and report it
to the library author correctly.

So this option helps library authors even if they don't use this option.


> I instead assume that the programmer really doesn't know the encoding,
> because the file is supplied by the user.  (The user may not know
> either, since it is really supplied by some other program, but ...
> neither python nor the programmer knows for sure.)
>  In this case, the
> warning is not just a false alarm, but is actively misleading.
>
> -jJ

This option is opt-in.  People don't understand what this warning
means should not opt-in the warning.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KLYUYKLHWCTTK7HOYNPDRPRS6WIQQU7K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Inada Naoki
On Fri, Feb 12, 2021 at 12:28 PM Jim J. Jewett  wrote:
>
> (I apologize if my summaries distort what Inada Naoki
>  explained.)
>
> He said that some people use the default None when they really want
> either UTF-8 or ASCII.

Yes. Even Python core developers.
For example: https://bugs.python.org/issue33684

This is just one example. I saw many codes using default encoding to
read JSON, YAML, TOML, Markdown, etc...


>
> My concern is that the warning will be a false alarm if they really do
> need whatever locale returns, and that case may still be common.  (If
> web browsers had stopped bothering to sniff for other charsets, then
> maybe that situation really was getting rare.)
>

That's one of reason why this warning is opt-in, like BytesWarning.

> I asked when encoding=None is actually different from encoding=locale,
> currently spelled encoding=locale.getpreferredencoding(False)
>

I don't understand this sentence. This PEP proposes
`encoding="locale"` that is equal to encoding=None but don't emit
EncodingWarning.

There was discussion about difference between `encoding=None` and
`encoding=locale.getpreferredencoding(False)` in this thread.


> They can be different on Windows console, presumably because the
> environment settings that control locale may differ from the charset
> actually used by the console.  Even then, it only differs for open()
> when PYTHONLEGACYWINDOWSSTDIO is set, and for TextIOWrapper() When the
> file is not _WindowsConsoleIO
>
> To me, that sounds narrow enough to be a windows issue, rather than an
> issue with open.

Yes. So if user want to specify locale-specific encoding and don't
want to drop Python 3.9 support, user can use
encoding=locale.getpreferredencoding(False).

But this PEP doesn't recommend it. Third party libraries can use
`encoding="locale"` after they drop Python 3.9 support.


>  Is there some way to write an encoding that sniffs
> for charsets, particularly on windows, and to use that as the default
> instead of assuming that locale will be correct?
>
> -jJ

There is no reliable way, AFAIK.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LJASRUN5G2PYEUOT7H34LGGBYEHBUB3C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Jim J. Jewett
On Thu, Feb 11, 2021 at 7:35 PM Inada Naoki  wrote:

> The PEP helps developers living on UTF-8 locale to find missing
> `encoding="utf-8"` bug.
> This type of bug is very common, and many Windows users are suffered
> by the bug when reading JSON, YAML, TOML, Markdown, or any other UTF-8
> files.

I think this is where we have been talking past each other.

You seem to be assuming that the programmer knows the correct
encoding, presumably because they (or their program) wrote it.  You
then assume that they neglected to mention the encoding out of
forgetfulness, perhaps because on their system, everything is always
UTF-8.  This clearly does happen, but the people who would make this
mistake most often -- they probably wouldn't think to test their code
under a special mode that catches only this.  (They might run a linter
that looked for all sorts of problems, including this.)

I instead assume that the programmer really doesn't know the encoding,
because the file is supplied by the user.  (The user may not know
either, since it is really supplied by some other program, but ...
neither python nor the programmer knows for sure.)  In this case, the
warning is not just a false alarm, but is actively misleading.

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LHMFNFIOATO46NVOVCUOKFQCRWCZLY7M/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Jim J. Jewett
(I apologize if my summaries distort what Inada Naoki
 explained.)

He said that some people use the default None when they really want
either UTF-8 or ASCII.

My concern is that the warning will be a false alarm if they really do
need whatever locale returns, and that case may still be common.  (If
web browsers had stopped bothering to sniff for other charsets, then
maybe that situation really was getting rare.)

I asked when encoding=None is actually different from encoding=locale,
currently spelled encoding=locale.getpreferredencoding(False)

They can be different on Windows console, presumably because the
environment settings that control locale may differ from the charset
actually used by the console.  Even then, it only differs for open()
when PYTHONLEGACYWINDOWSSTDIO is set, and for TextIOWrapper() When the
file is not _WindowsConsoleIO

To me, that sounds narrow enough to be a windows issue, rather than an
issue with open.  Is there some way to write an encoding that sniffs
for charsets, particularly on windows, and to use that as the default
instead of assuming that locale will be correct?

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SSALFO3RTPX7QZ7B2MOWTZKYCJ5XKWK4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Eryk Sun
On 2/11/21, Inada Naoki  wrote:
>
> There is little difference between `encoding=None` and
> `encoding=locale.getpreferredencoding(False)`. The difference is:
>
> * When Python is using Windows, and
> * When when the file is console, and
> * (for open()) When PYTHONLEGACYWINDOWSSTDIO is set
> * (for TextIOWrapper()) When the file is not _WindowsConsoleIO
>
> encoding=None uses console codepage but

os.device_encoding() -- i.e. _Py_device_encoding() -- only works for
hard-coded file descriptors 0, 1, and 2, instead of detecting a
console file. So opening "CON", "CONIN$", or "CONOUT$" has never used
the console input or output code page, nor has opening a duped
standard I/O fd such as open(os.dup(0)). It would be easy to
generalize _Py_device_encoding() to detect console files, but it's new
behavior.

Python 3.8+ introduced a bug (issue 42261) in which, even with legacy
standard I/O enabled and file descriptors 0-2, the console input and
output code pages are ignored. For example:

C:\>chcp 437
Active code page: 437
C:\>set PYTHONLEGACYWINDOWSSTDIO=1
C:\>py -3.9 -c "import sys; print(sys.stdout.encoding)"
cp1252

Regarding the last bullet point, io.TextIOWrapper doesn't know
anything about io._WindowsConsoleIO. The decision to use UTF-8 is in
io.open(). So manually wrapping a _WindowsConsoleIO file with
TextIOWrapper uses the locale preferred encoding instead of UTF-8. For
example:

>>> fb = open('conin$', 'rb')
>>> fb.raw
<_io._WindowsConsoleIO mode='rb' closefd=True>
>>> f = io.TextIOWrapper(fb)
>>> f.encoding
'cp1252'

I don't know whether it's worth making TextIOWrapper check for
_WindowsConsoleIO in order to make it use UTF-8. It's not common to
manually wrap a binary-mode file.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QBNH3XGSNBQ7XIJ5E542JIQ5Q5E63MCU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Inada Naoki
On Thu, Feb 11, 2021 at 4:44 PM Jim J. Jewett  wrote:
>
> The PEP helps when the locale is ASCII or C, but that isn't enforced in 
> actual files.  I am confident that this is a frequent problem for packages 
> downloaded from mostly-English sites, including many software repositories.
>

The PEP helps developers living on UTF-8 locale to find missing
`encoding="utf-8"` bug.
This type of bug is very common, and many Windows users are suffered
by the bug when reading JSON, YAML, TOML, Markdown, or any other UTF-8
files.


> It does not seem to be a win when the locale is something incompatible with 
> utf-8, such as Latin-1, or whatever is still common in Japan.  The 
> surrogate-escape mechanism allows a proper round-trip, but python itself will 
> stop processing the characters correctly.
>

Surrogate-escape mechanism doesn't relating this PEP.


> For interactive use, when talking to another program (such as a terminal) 
> instead of an already existing file, the backwards compatibility problem 
> seems worse.
>

This PEP is 100% backward compatible.


> Changing the default to utf-8 (after a deprecation period showing how to make 
> locale an explicit default) may be reasonable, but claiming that it is 
> backwards compatible ... I didn't get that impression from the PEP.
>

This PEP doesn't propose to change the default encoding.

*If* we decide to change the default encoding in the future (maybe,
2025 or later) and start emitting DeprecationWarning where `encoding`
option is omitted, this PEP help it by:

* `encoding="locale"` option can be used since Python 3.10, and
* The number of DeprecationWarning shown is decreased because we can
add `encoding="utf-8"` many places before the time. At least, we can
fix all EncodingWarning in stdlib.

Maybe, the "Prepare to change the default encoding to UTF-8" is misleading.
I will try to fix the section or remove the section.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JBBRBR6AUTGP2SAVAUJVZJ3GM6FJQEBV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Inada Naoki
On Fri, Feb 12, 2021 at 5:18 AM Jim J. Jewett  wrote:
>
> Inada Naoki wrote:
>
> > Default encoding is used for:
>
> > a. Really need to use locale specific encoding
> > b. UTF-8 (bug. not work on Windows)
> > c. ASCII (not a bug, but slow on Windows)
>
> > I assume most usages are (b) and (c). This PEP can reduce them soon.
>
> Is this just an assumption, based on those times being visible to someone who 
> installs a lot of packages, or has the use of any locale other than UTF-8 and 
> ASCII really gone down a lot?  Have browsers stopped using charset sniffing?
>

Using "most" is my fault. I am not good at Englsh. I should use "many" here.
You can see many bugs caused by not specifying `encoding="utf-8"` in Q sites.
I wrote some number about this common bugs in the PEP.

UTF-8 is used for 96.3% of web sites [1], although browser still use
charset sniffing. But how is it relating to this PEP?
[1] https://w3techs.com/technologies/details/en-utf8


> > Additionally, encoding="locale" will be backward/forward compatible
>
> What would be the problem with changing the default from None to locale?

It doesn't work on Python ~3.9.
So using `encoding="locale"` is not recommended anytime soon until
user drops Python 3.9 support.

> (I think you mentioned that they are the same 99% of the time; is that other 
> 1% likely to be cases where locale is wrong but None is right?  Would there 
> be a better way to represent that 1%?)
>

`encoding="locale"` and `encoding=None` has same behavior except
`encoding="locale"` doesn't emit EncodingWarning even when it is
opt-in.

There is little difference between `encoding=None` and
`encoding=locale.getpreferredencoding(False)`. The difference is:

* When Python is using Windows, and
* When when the file is console, and
* (for open()) When PYTHONLEGACYWINDOWSSTDIO is set
* (for TextIOWrapper()) When the file is not _WindowsConsoleIO

encoding=None uses console codepage but
encoding=locale.getpreferredencoding(False) uses
Otherwise, encoding=None and
encoding=locale.getpreferredencoding(False) are same.

So `encoding=locale.getpreferredencoding(False)` can be used to
specify locale-specific encoding explicitly.
But this PEP doesn't recommend it. This PEP recommend to use
EncodingWarning for just finding missing `encoding="utf-8"` (or any
other specific encoding).

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PD4BTBAQHFUYOCF5QKIBDIMHATPVEFPW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Inada Naoki
On Fri, Feb 12, 2021 at 6:34 AM Paul Moore  wrote:
>
> On Thu, 11 Feb 2021 at 21:05, Jim J. Jewett  wrote:
> >
> > Who will benefit from this new warning?
> >
> > Is this basically just changing builtins.open by adding:
> >
> > if encoding is None and sys.flags.encoding_warning: # and not Android 
> > and not -X utf8 ?
> > warnings.warn(EncodingWarning("Are you sure you want locale instead 
> > of utf-8?"))
> >
> > Even for the few people with the knowledge, time, interest, and authority 
> > to fix the code, is that really helpful?
> >
> > Helpful enough to put it directly in python as an optional mode, separate 
> > from the dev mode or show all warnings mode?  Why not just add it to a 
> > linter, or write a 2to3 style checker?  Or at least emit or not based on a 
> > warnings filter?
>
> That's a very good point. If this warning is of use, why have none of
> the well-known linters implemented it? And why not prototype the
> proposal in them, at least? Python-ideas posts routinely get pushed to
> justify "why can't this be done in an external library?" and that
> probably applies here too.
>

* Linters can not add `encoding="locale"` to Python.
* This PEP provides the way to shift where warnings  is emitted.

def my_read_file(filename, encoding=None):
encoding = io.text_encoding(encoding)
with open(filename, encoding=encoding) with f:
return f.read()

This function is not warned. Caller of this function is warned
instead. It is difficult to implement in the Linter.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CKMRUBEI3UHEXSELZIQBA6NZCK77O75T/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Paul Moore
On Thu, 11 Feb 2021 at 21:05, Jim J. Jewett  wrote:
>
> Who will benefit from this new warning?
>
> Is this basically just changing builtins.open by adding:
>
> if encoding is None and sys.flags.encoding_warning: # and not Android and 
> not -X utf8 ?
> warnings.warn(EncodingWarning("Are you sure you want locale instead 
> of utf-8?"))
>
> Even for the few people with the knowledge, time, interest, and authority to 
> fix the code, is that really helpful?
>
> Helpful enough to put it directly in python as an optional mode, separate 
> from the dev mode or show all warnings mode?  Why not just add it to a 
> linter, or write a 2to3 style checker?  Or at least emit or not based on a 
> warnings filter?

That's a very good point. If this warning is of use, why have none of
the well-known linters implemented it? And why not prototype the
proposal in them, at least? Python-ideas posts routinely get pushed to
justify "why can't this be done in an external library?" and that
probably applies here too.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VRBGH3ECNJBZMX7LINDNXYHQSKTKRTEX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Jim J. Jewett
Who will benefit from this new warning?

Is this basically just changing builtins.open by adding:

if encoding is None and sys.flags.encoding_warning: # and not Android and 
not -X utf8 ?
warnings.warn(EncodingWarning("Are you sure you want locale instead of 
utf-8?"))

Even for the few people with the knowledge, time, interest, and authority to 
fix the code, is that really helpful?  

Helpful enough to put it directly in python as an optional mode, separate from 
the dev mode or show all warnings mode?  Why not just add it to a linter, or 
write a 2to3 style checker?  Or at least emit or not based on a warnings filter?

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GXZFCYFK7VOUSVZ5BVDCUW3JNJG6KPRS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Jim J. Jewett
Inada Naoki wrote:

> Default encoding is used for:

> a. Really need to use locale specific encoding
> b. UTF-8 (bug. not work on Windows)
> c. ASCII (not a bug, but slow on Windows)

> I assume most usages are (b) and (c). This PEP can reduce them soon.

Is this just an assumption, based on those times being visible to someone who 
installs a lot of packages, or has the use of any locale other than UTF-8 and 
ASCII really gone down a lot?  Have browsers stopped using charset sniffing?

> Additionally, encoding="locale" will be backward/forward compatible

What would be the problem with changing the default from None to locale?  (I 
think you mentioned that they are the same 99% of the time; is that other 1% 
likely to be cases where locale is wrong but None is right?  Would there be a 
better way to represent that 1%?)

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NGAF753ALAPMUKNJWFBYLDOTYTUJH6ZG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Inada Naoki
On Thu, Feb 11, 2021 at 4:44 PM Jim J. Jewett  wrote:
>
> I just reread PEP 597, then re-reread the Rationale.
>

Do you read current PEP 597, or old PEP 597 in discuss.python.org?


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UKGDVMHUNNNRA4D4UCG4RLPZDIVKNNEY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Victor Stinner
On Tue, Feb 9, 2021 at 9:51 PM Paul Moore  wrote:
> * Realistically, I'd be surprised if developers actually use such a
> tool. If they were likely to do so, they could probably just as easily
> locate all the uses of open() in their code, and check that way. So
> I'm not sure this proposal is actually worth it, even if the end
> result would be very beneficial.

That's ok, they are many Python features which are only used by a
minority of users. For me it's similar to the Python Development Mode:
https://docs.python.org/dev/library/devmode.html

Most users and developers will never use it, but for developers who
care, it's a useful tool (it ease the discovery of issues).

Victor
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QRYY3M2RXQ7W33DO4TRJBNXWTV6N6BQE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Jim J. Jewett
I just reread PEP 597, then re-reread the Rationale.

The PEP helps when the locale is ASCII or C, but that isn't enforced in actual 
files.  I am confident that this is a frequent problem for packages downloaded 
from mostly-English sites, including many software repositories.

It does not seem to be a win when the locale is something incompatible with 
utf-8, such as Latin-1, or whatever is still common in Japan.  The 
surrogate-escape mechanism allows a proper round-trip, but python itself will 
stop processing the characters correctly.

For interactive use, when talking to another program (such as a terminal) 
instead of an already existing file, the backwards compatibility problem seems 
worse.

Changing the default to utf-8 (after a deprecation period showing how to make 
locale an explicit default) may be reasonable, but claiming that it is 
backwards compatible ... I didn't get that impression from the PEP.

-jJ
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RA5SLRB4M7IDLVZKQ3NWVACBLHII2BTR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Inada Naoki
On Wed, Feb 10, 2021 at 11:58 PM Anders Munch  wrote:
>
> On Wed, Feb 10, 2021 at 1:46 AM Anders Munch  wrote:
> >> How about swapping around "locale" and None?
> Inada Naoki   wrote:
> >
> > I thought it, but it may not work. Consider about function like this:
> >
> > ```
> > def read_text(self, encoding=None):
> > with open(self._filename, encoding=encoding) as f:
> > return f.read()
> > ```
> >
> > If `encoding=None` suppresses the warning, functions like this never warned.
>
> I don't see why they should be.  The author clearly knew about the encoding
> argument to open, they clearly intended for a None value to be given in some
> cases, and at the time of writing None meant to use a locale-dependent 
> encoding.
>

It is not clear. The author may just want to "use the default encoding
same to open()".
If so, the caller of the function should be warned. To warn caller,
this function can use
`encoding=io.text_encoding(encoding)` as described in the PEP.


> > We are not discussing about changing default encoding for now.
>
> The section "Prepare to change the default encoding to UTF-8" gave me the
> impression that this was meant as a stepping stone on the way to doing just
> that.  If that was not the intention, my apologies for the misread.
>

This *can* be stepping stone. But it is not a frist goal. This PEP
doesn't discourange omitting encoding option anytime soon when user
really need to use locale encoding.

Default encoding is used for:

 a. Really need to use locale specific encoding
 b. UTF-8 (bug. not work on Windows)
 c. ASCII (not a bug, but slow on Windows)

I assume most usages are (b) and (c). This PEP can reduce them soon.

If we decided to change the default encoding in the future, we need to
warn omitting encoding option. Reducing (b) and (c) will reduce the
total warning shown in the future. This is what "Prepare" means.

Additionally, `encoding="locale"` will be backward/forward compatible
way to use locale-specific encoding when we decided to change the
default encoding.
So this PEP can be a very important stepping stone.

On the other hand, it is not a problem that we can not use
`encoding="locale"` in backward-compatible code *for now*.
Python 3.9 become EOL in 2025. We won't emit warning for the default
encoding until then.

People can use `encoding="locale"` after they drop Python 3.9 support.
No problem.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DBDI5FEJCF2IOTSAS7VELO27MNEQMK2Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Paul Moore
On Wed, 10 Feb 2021 at 16:06, Anders Munch  wrote:
>
> Paul Moore [mailto:p.f.mo...@gmail.com]: wrote:
> > On Wed, 10 Feb 2021 at 14:33, Anders Munch  wrote:
> >> The idea is to make is so that working code only needs to change once, 
> >> even when supporting multiple Python versions.
> >> That one change is to add either an explicit encoding=None (for 
> >> backwards-compatibility) or an explicit encoding='utf-8' (because that was 
> >> intended all along).  No twice about it, one change.
>
> > But then people who added an explicit utf-8 encoding need to remove the 
> > encoding argument again once the default value changes
>
> Why would they do that?  There's no need to remove anything.  Code that 
> doesn't use a default doesn't break because the default changes.

Because I'm against a proposal that forces *everyone* to explicitly
specify the value... Your argument implies that removing the default
altogether would be fine as well.
Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GA3EICY5IN7S7VVH24CX3SPAENKCAMFW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Eric V. Smith

On 2/10/2021 10:29 AM, Paul Moore wrote:

On Wed, 10 Feb 2021 at 14:33, Anders Munch  wrote:


The idea is to make is so that working code only needs to change once, even 
when supporting multiple Python versions.
That one change is to add either an explicit encoding=None (for 
backwards-compatibility) or an explicit encoding='utf-8' (because that was 
intended all along).  No twice about it, one change.

But then people who added an explicit utf-8 encoding need to remove
the encoding argument again once the default value changes. Your
proposal leads to a situation where no-one leaves the encoding
argument to default. If we're going to permanently discourage omitting
the encoding argument, we should just make it mandatory (a change that
I'll argue against, but no-one is currently proposing it, luckily).


Except that all code written after the default has changed (and all 
python versions without that default are no longer supported) won't need 
to specify utf-8. And presumably there's more code to be written in the 
future than already exists.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EG5UUYGC2R36NPXBDKFTFDKUYSYHP6JR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Anders Munch
Paul Moore [mailto:p.f.mo...@gmail.com]: wrote:
> On Wed, 10 Feb 2021 at 14:33, Anders Munch  wrote:
>> The idea is to make is so that working code only needs to change once, even 
>> when supporting multiple Python versions.
>> That one change is to add either an explicit encoding=None (for 
>> backwards-compatibility) or an explicit encoding='utf-8' (because that was 
>> intended all along).  No twice about it, one change.

> But then people who added an explicit utf-8 encoding need to remove the 
> encoding argument again once the default value changes

Why would they do that?  There's no need to remove anything.  Code that doesn't 
use a default doesn't break because the default changes.

regards, Anders

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WBUP2BNLAKHILUM3ZE3A2LVQKAQRXQ7T/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Anders Munch
Inada Naoki [mailto:songofaca...@gmail.com] wrote:
> There are several ways:
> * encoding="latin1" -- This is the best. Works perfectly.
> * Don't touch -- You don't need to enable EncodingWarning.
>  [...]

I'm replying to Victor's statement that ``encoding="utf8" is backward 
compatible´´.

If you're adding encoding="latin1" to the user program, then you are doing 
something very different from what Victor proposed.

regards, Anders

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IENGLLP67LAOVLVSMJSJX4W6K2ZDTPJ7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Paul Moore
On Wed, 10 Feb 2021 at 14:33, Anders Munch  wrote:
>
> On Tue, 9 Feb 2021 at 16:52, Anders Munch  wrote:
> >> How about swapping around "locale" and None?  That is, make "locale" the 
> >> new default that emits a warning, and encoding=None emits no warning.  
> >> That has the advantage that old code can be updated to say encoding=None, 
> >> and then it will work on both old and new Pythons without warning.
> Paul Moore [mailto:p.f.mo...@gmail.com]
> > I don't understand why working code should have to change *twice*.
>
> The idea is to make is so that working code only needs to change once, even 
> when supporting multiple Python versions.
> That one change is to add either an explicit encoding=None (for 
> backwards-compatibility) or an explicit encoding='utf-8' (because that was 
> intended all along).  No twice about it, one change.

But then people who added an explicit utf-8 encoding need to remove
the encoding argument again once the default value changes. Your
proposal leads to a situation where no-one leaves the encoding
argument to default. If we're going to permanently discourage omitting
the encoding argument, we should just make it mandatory (a change that
I'll argue against, but no-one is currently proposing it, luckily).

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QBX375BJCACY726YHZOXVUOWCG2EH3GI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Anders Munch
On Wed, Feb 10, 2021 at 1:46 AM Anders Munch  wrote:
>> How about swapping around "locale" and None?
Inada Naoki   wrote:
> 
> I thought it, but it may not work. Consider about function like this:
> 
> ```
> def read_text(self, encoding=None):
> with open(self._filename, encoding=encoding) as f:
> return f.read()
> ```
> 
> If `encoding=None` suppresses the warning, functions like this never warned.

I don't see why they should be.  The author clearly knew about the encoding
argument to open, they clearly intended for a None value to be given in some
cases, and at the time of writing None meant to use a locale-dependent encoding.

> We are not discussing about changing default encoding for now.

The section "Prepare to change the default encoding to UTF-8" gave me the
impression that this was meant as a stepping stone on the way to doing just
that.  If that was not the intention, my apologies for the misread.

regards, Anders
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2VWZMIBG2VLASF7NCKDEJ5I22PXWI7D7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Anders Munch
On Tue, 9 Feb 2021 at 16:52, Anders Munch  wrote:
>> How about swapping around "locale" and None?  That is, make "locale" the new 
>> default that emits a warning, and encoding=None emits no warning.  That has 
>> the advantage that old code can be updated to say encoding=None, and then it 
>> will work on both old and new Pythons without warning.
Paul Moore [mailto:p.f.mo...@gmail.com]
> I don't understand why working code should have to change *twice*.

The idea is to make is so that working code only needs to change once, even 
when supporting multiple Python versions.
That one change is to add either an explicit encoding=None (for 
backwards-compatibility) or an explicit encoding='utf-8' (because that was 
intended all along).  No twice about it, one change.
 
regards, Anders

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BTIXV47LDYPKNIGWNRQSG6LXG4DORS7W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Inada Naoki
On Wed, Feb 10, 2021 at 11:14 PM Anders Munch  wrote:
>
> This program runs just fine on 3.8.7 Windows, against a file.txt that 
> contains latin-1 text:
>
> with open('file.txt', 'rt') as f:
> print(f.read())
>
> But if I change it to this:
>
> with open('file.txt', 'rt', encoding='utf-8') as f:
> print(f.read())
>
> then it fails with UnicodeDecodeError.   How it that backwards compatible?
>

There are several ways:

* encoding="latin1" -- This is the best. Works perfectly.
* Don't touch -- You don't need to enable EncodingWarning.
* encoding=locale.getpreferredencoding(False) -- Backward compatible.
But doesn't work if you enabled UTF-8 mode.
* encoding="mbcs" -- Backward compatible. Works even when you enabled
UTF-8 mode. But it doesn't work only on Windows.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XPBVG5GU37UDQPDTZIFIGI2WOFYHYQBU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Anders Munch
Victor Stinner [mailto:vstin...@python.org] wrote:
> encoding="utf8" is backward compatible and is likely to fix encoding bugs 
> when the locale encoding is not UTF-8

This program runs just fine on 3.8.7 Windows, against a file.txt that contains 
latin-1 text:

with open('file.txt', 'rt') as f:
print(f.read())

But if I change it to this:

with open('file.txt', 'rt', encoding='utf-8') as f:
print(f.read())

then it fails with UnicodeDecodeError.   How it that backwards compatible?

regards, Anders

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SDFLXIW64ESKDBARCHC2A2JA4NFPBZ2Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Inada Naoki
On Wed, Feb 10, 2021 at 5:00 PM Paul Moore  wrote:
>
> Let's just assume until you can convince me that setting UTF-8 mode
> globally is a good idea,

Oh, you misunderstood me. My proposal is not setting UTF-8 mode globally.
What I proposed is setting UTF-8 mode per env (e.g. installation,
venv, or conda env).

But this is off topic. The thread for this topic is here.
https://mail.python.org/archives/list/python-id...@python.org/thread/LQVK2UKPSOI2AHYFUWK6ZII2U6QKK6BP/

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZXPBI3WSZ6FCAWWKXNBRNKYXUXUG5FEH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-10 Thread Paul Moore
On Wed, 10 Feb 2021 at 01:29, Inada Naoki  wrote:

> Note that many Python users don't use consoles.

I never suggested that they did. There's a GUI for setting user-level
and system-level environment variables. And whoever is introducing the
user to Python can show them how to set the necessary environment
variable - or do it for them.

Please be clear, I'm not saying I don't understand the difficulties.
But I do question why PYTHONUTF8 is so much different than all the
other environment variables that Python responds to that it needs
special additional options.

Remember - I've already said that I'm not convinced that setting UTF8
mode globally is the right approach. So what you're saying in effect
is that you want to convince me that we should add a new mechanism to
globally set an option that I don't believe should be set globally.
Let's just assume until you can convince me that setting UTF-8 mode
globally is a good idea, there's no point trying to convince me that
we need a new mechanism to do so because environment variables aren't
good enough.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4UHPNJVJ3ACKREUH7WLEAD5ZB4IR5R2K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Jonathan Goble
On Tue, Feb 9, 2021 at 11:29 PM Terry Reedy  wrote:
>
> On 2/9/2021 8:28 PM, Inada Naoki wrote:
>
> > Note that many Python users don't use consoles.
>
> Those of use who do may find it hard to imagine just how easy we have
> made computing.
>
> My daughter minored in Computer Science about 6 years ago.  She never
> saw Command Prompt until the summer after her Junior year when she
> watched me use it to install numpy and other packages for her.  I had to
> do it because 'Run pip install numpy', etc, was met with a blank stare.
>   I had taught her Python with IDLE, downloaded and install with a
> browser, and had neglected to teach her 'Dos' until then.
>
> So had her CS classes.  Those previous used Racket in a Dr. something
> environment and Java in, I believe, Eclipse.  Also downloaded and
> installed with a browser.

Speaking as a current CS undergraduate student here (senior graduating
in December 2021). At my university, the freshman/sophomore-level
programming classes do not assume or expect any type of command line
knowledge. They all rely on GUI tools (Eclipse, IntelliJ, or NetBeans
for the freshman Java courses, Visual Studio for Data Structures in
C++).

There is one course, typically taken in either the second or third
semester for traditional students, called Operating Systems Concepts
and Usage, that broadly discusses how operating systems function, but
is also designed as a first introduction to Linux and to the command
line. (Until this point, the only operating system students are
assumed to be familiar with is Windows.) For many students, this
course is their first ever exposure to the command prompt.

After that, students in this program don't generally *need* to touch
the command line again in their studies until they hit 4000-level
courses, and even then only a few courses require it. Outside of that
one introductory course, I've only had two courses so far that
actually required command line usage. Everything else so far has
offered GUI options, even many upper level courses.

I think it's a disservice to fail to expose students to the command
line more and earlier, but the fact is, that failure happens and
happens often, and developers need to be conscious of that.

Despite my own ease and comfort with the command-line (which dates
back to learning my way around DOS at the age of 5) to the point of
almost always having a terminal window open on my daily Debian
machine, I frequently find myself opting for point-and-click solutions
to common problems, even Git operations (which are so easy and
powerful in VS Code with the GitLens extension). GUI tools grow more
powerful by the day, and it's very easy to get deep into a computer
science program these days and not be comfortable with the command
line and/or not know how to change environment variables.

Python, as a common introductory language used by many thousands of
people who have never taken a university computer course, never mind
majoring in computer science, shouldn't have basic features that
depend on the likely false assumption that the user has ever seen a
command prompt or an environment variable, much less comprehend how to
use them.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UKGADFE6OK66BNUPE36NVHXBZZSOR7OD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Terry Reedy

On 2/9/2021 8:28 PM, Inada Naoki wrote:


Note that many Python users don't use consoles.


Those of use who do may find it hard to imagine just how easy we have 
made computing.


My daughter minored in Computer Science about 6 years ago.  She never 
saw Command Prompt until the summer after her Junior year when she 
watched me use it to install numpy and other packages for her.  I had to 
do it because 'Run pip install numpy', etc, was met with a blank stare. 
 I had taught her Python with IDLE, downloaded and install with a 
browser, and had neglected to teach her 'Dos' until then.


So had her CS classes.  Those previous used Racket in a Dr. something 
environment and Java in, I believe, Eclipse.  Also downloaded and 
installed with a browser.



They just starts
Jupyter Notebook, or they just write .py file and run it in the
Minecraft mods.


Also similar.

--
Terry Jan Reedy
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O5SS7I2H2GRCM2YBOVYX7CWUYTX2J7AO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Inada Naoki
On Wed, Feb 10, 2021 at 5:50 AM Paul Moore  wrote:
>
> On Tue, 9 Feb 2021 at 16:54, Inada Naoki  wrote:
> >
> > On Tue, Feb 9, 2021 at 9:31 PM Paul Moore  wrote:
> > >
> > > Personally, I'm not at all keen on the idea of making users always
> > > specify encoding in the first place, even if it's "just for the
> > > transition".
> >
> > I agree with you. But as I wrote in the PEP, omitted encoding caused
> > much troubles already.
> > Windows users can not just `pip install somepkg` because some library
> > authors write `long_description=open("README.md").read()` in setup.py.
> >
> > I am trying to fix this situation by two parallel approaches:
> >
> > * (This PEP) Provide a tool for finding this type of bugs, and
> > recommend `encoding="utf-8"` for cross-platform library authors.
> > * (Author thread) Make UTF-8 mode more usable for Windows users,
> > especially students.
>
> Thanks for explaining (again). There's so much debate, across multiple
> proposals, that I can barely follow it. I'm impressed that you're
> managing to keep things straight at all :-)
>
> I guess my views on this PEP come down to
>
> * I see no harm in having a tool that helps developers spot
> platform-specific assumptions about encoding.
> * Realistically, I'd be surprised if developers actually use such a
> tool. If they were likely to do so, they could probably just as easily
> locate all the uses of open() in their code, and check that way. So
> I'm not sure this proposal is actually worth it, even if the end
> result would be very beneficial.
> * In the setup.py case, why don't those same Windows users complain
> that the library fails to install? A quick bug report, followed by a
> simple fix, seems more likely to happen than the developer suddenly
> deciding to scan their code for encoding issues.
>

Yes, some issues are solved already.
On the other hand, there are dozen question about UnicodeDecodeError
in Q sites like Stack Overflow.
Many people don't know what the error means, and how to report it correctly.

I sometime set PYTHONWARNINGS=deafult in my bashrc and find
DeprecationWarnings in libraries I am using, and report them.

On the other hand, it is difficult to find omitted `encoding="utf-8"`,
because I use macOS and Linux in daily development.
If there is PYTHONWARNENCODING, I can write `export
PYTHONWARNENCODING=1` in my .bashrc.


> Regarding the wider question of UTF8 as default, my views can probably
> be summarised as follows:
>
> * If you want to write correct code to deal with encodings, there is
> no substitute for carefully considering every bytes/string conversion,
> deciding how you are going to identify the encoding to use, and then
> specifying that encoding explicitly. Default values for encodings have
> no place in such code.
> * In reality, though, that's far too much work for many situations.
> Default encodings are a necessary convenience, particularly for simple
> scripts, or for people who can't, or don't want to, do the analysis
> that the "correct" approach implies.

Yes. and the UTF-8 is the default encoding for s.encode() already.

> * Picking the right default is *hard*. Changing the default is even
> harder, unfortunately.
> * I feel that we already have a number of mechanisms (PEPs 538 and
> 540) trying to tackle this issue. Adding yet more suggests to me that
> we'd be better off pausing and working out why we still have an issue.
> We should be moving towards *fewer* mechanisms, not more.
> * We have UTF-8 mode, and users can set it per-process (via flag or
> environment variable) per-user or per-site (by environment variable).
> I don't honestly believe that a user (whatever OS they work on) who is
> capable of writing Python code, can't be shown how to set an
> environment variable. I see no reason to suggest we need yet another
> way to set UTF-8 mode, or that a per-interpreter or per-virtualenv
> setting is particularly crucial (suggestions that have been made in
> the Python-Ideas threads).

Note that many Python users don't use consoles. They just starts
Jupyter Notebook, or they just write .py file and run it in the
Minecraft mods.

> * UTF-8 is likely to be the most appropriate default encoding for
> Python in the longer term, and I agree that Windows is fast
> approaching the point where a UTF-8 encoding is more appropriate than
> the ANSI codepage for "new stuff". But there's a lot of legacy files
> and applications around, and I suspect that a UTF-8 default will
> inconvenience a lot of people working with such data. But equally,
> such people may not be in a huge rush to switch to the latest Python
> version. Whichever way we go, though, some people will be
> inconvenienced.
>
> I'm also somewhat bemused by the rather negative view of "Windows
> beginners" that lies behind a lot of these discussions. People's
> experiences may well differ, but the people I see using (and learning)
> Python on Windows are often experienced computer users, maybe
> developers with significant 

[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Paul Moore
On Tue, 9 Feb 2021 at 16:54, Inada Naoki  wrote:
>
> On Tue, Feb 9, 2021 at 9:31 PM Paul Moore  wrote:
> >
> > Personally, I'm not at all keen on the idea of making users always
> > specify encoding in the first place, even if it's "just for the
> > transition".
>
> I agree with you. But as I wrote in the PEP, omitted encoding caused
> much troubles already.
> Windows users can not just `pip install somepkg` because some library
> authors write `long_description=open("README.md").read()` in setup.py.
>
> I am trying to fix this situation by two parallel approaches:
>
> * (This PEP) Provide a tool for finding this type of bugs, and
> recommend `encoding="utf-8"` for cross-platform library authors.
> * (Author thread) Make UTF-8 mode more usable for Windows users,
> especially students.

Thanks for explaining (again). There's so much debate, across multiple
proposals, that I can barely follow it. I'm impressed that you're
managing to keep things straight at all :-)

I guess my views on this PEP come down to

* I see no harm in having a tool that helps developers spot
platform-specific assumptions about encoding.
* Realistically, I'd be surprised if developers actually use such a
tool. If they were likely to do so, they could probably just as easily
locate all the uses of open() in their code, and check that way. So
I'm not sure this proposal is actually worth it, even if the end
result would be very beneficial.
* In the setup.py case, why don't those same Windows users complain
that the library fails to install? A quick bug report, followed by a
simple fix, seems more likely to happen than the developer suddenly
deciding to scan their code for encoding issues.

Regarding the wider question of UTF8 as default, my views can probably
be summarised as follows:

* If you want to write correct code to deal with encodings, there is
no substitute for carefully considering every bytes/string conversion,
deciding how you are going to identify the encoding to use, and then
specifying that encoding explicitly. Default values for encodings have
no place in such code.
* In reality, though, that's far too much work for many situations.
Default encodings are a necessary convenience, particularly for simple
scripts, or for people who can't, or don't want to, do the analysis
that the "correct" approach implies.
* Picking the right default is *hard*. Changing the default is even
harder, unfortunately.
* I feel that we already have a number of mechanisms (PEPs 538 and
540) trying to tackle this issue. Adding yet more suggests to me that
we'd be better off pausing and working out why we still have an issue.
We should be moving towards *fewer* mechanisms, not more.
* We have UTF-8 mode, and users can set it per-process (via flag or
environment variable) per-user or per-site (by environment variable).
I don't honestly believe that a user (whatever OS they work on) who is
capable of writing Python code, can't be shown how to set an
environment variable. I see no reason to suggest we need yet another
way to set UTF-8 mode, or that a per-interpreter or per-virtualenv
setting is particularly crucial (suggestions that have been made in
the Python-Ideas threads).
* UTF-8 is likely to be the most appropriate default encoding for
Python in the longer term, and I agree that Windows is fast
approaching the point where a UTF-8 encoding is more appropriate than
the ANSI codepage for "new stuff". But there's a lot of legacy files
and applications around, and I suspect that a UTF-8 default will
inconvenience a lot of people working with such data. But equally,
such people may not be in a huge rush to switch to the latest Python
version. Whichever way we go, though, some people will be
inconvenienced.

I'm also somewhat bemused by the rather negative view of "Windows
beginners" that lies behind a lot of these discussions. People's
experiences may well differ, but the people I see using (and learning)
Python on Windows are often experienced computer users, maybe
developers with significant experience in Java or other "enterprise
languages", or data scientists who have a lot of knowledge of
computers, but are relatively new to programming. Or systems admins,
or database specialists, who want to use Python to write scripts on
Windows. None of those people fit the picture of people who wouldn't
know how to set an environment variable, or configure their
environment. On the other hand, (in my experience) they often don't
really have much knowledge of character encodings, and tend to just
use whatever default their PC uses, and expect it to work. They *can*,
however, understand when an encoding problem is explained to them, and
can set an explicit encoding once they know they need to.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 

[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Victor Stinner
On Tue, Feb 9, 2021 at 5:51 PM Anders Munch  wrote:
> Victor Stinner [mailto:vstin...@python.org] wrote:
> > The warning can explicitly suggest to use encoding="utf8", it should work 
> > in almost all cases.
>
> The warning should also explain how to get backwards-compatible behaviour, 
> i.e. suggest encoding="locale".

encoding="utf8" is backward compatible and is likely to fix encoding
bugs when the locale encoding is not UTF-8. It is likely what the
developer expected, without knowing that open(filename) does not
always use UTF-8. See PEP 597 rationale.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/L2H34EXNE7XUQX3XILLPDNGOQHVK6ENR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Inada Naoki
On Wed, Feb 10, 2021 at 1:46 AM Anders Munch  wrote:
>
>
> Inada Naoki   wrote:
> > This warning is opt-in warning like BytesWarning.
>
> What use is a warning that no-one sees?

At least, I see.
We can fix stdlib and tests first, and fix some major tools too.

After that, `encoding="locale"` becomes backward/forward compatible at
some point.

> When the default is switched to encoding="utf8", it will break software, and 
> people need to be warned of that.
> UnicodeDecodeError's will abound when files that used to be read in a 
> single-byte encoding fails to decode as utf-8. All it takes is a single é.
> If the default encoding is ever to change, there's no way around a noisy 
> warning.
>

Please read the PEP and some my posts in this threads.
We are not discussing about changing default encoding for now.

This PEP provides a tool to find missing `encoding="utf-8"` bug for now.
The goal of the PEP is encourage `encoding="utf-8"` when the user
assumes encoding is UTF-8.

If we decide to change the default encoding. EncodingWarning can be
used to discourage omitting the `encoding` option.
But it is out of scope of the PEP. We don't discourage omitting
encoding option in Python 3.10.


> How about swapping around "locale" and None?  That is, make "locale" the new 
> default that emits a warning, and encoding=None emits no warning.  That has 
> the advantage that old code can be updated to say encoding=None, and then it 
> will work on both old and new Pythons without warning.
>

I thought it, but it may not work. Consider about function like this:

```
def read_text(self, encoding=None):
with open(self._filename, encoding=encoding) as f:
return f.read()
```

If `encoding=None` suppresses the warning, functions like this never warned.

So I think current PEP is better.
If users want to use locale encoding, they don't need to fix the
warning anytime soon. They can wait to drop Python 3.9 support.
If they want to fix all warnings soon, they can
`encoding=locale.getpreferredencoding(False)`.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4Q74PW673RMBMQTDZXHTVE6X7FT6DSAL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Paul Moore
On Tue, 9 Feb 2021 at 16:52, Anders Munch  wrote:
> How about swapping around "locale" and None?  That is, make "locale" the new 
> default that emits a warning, and encoding=None emits no warning.  That has 
> the advantage that old code can be updated to say encoding=None, and then it 
> will work on both old and new Pythons without warning.

I don't understand why working code should have to change *twice*. I'm
fine with the idea that people *actually* relying on the current
default will need to switch when the default changes, but making them
change once to silence the warning and then again to explicitly select
the old default is pretty annoying.

If we don't want people to use the default encoding, we should just
make encoding a required argument and stop pretending. If omitting the
encoding and using the default is intended to be a supported usage,
then we should *not* penalise people doing that. Changing the default
is a backward-incompatible change, that's enough of an inconvenience.
Changing the (behaviour of the) default *twice* is just making things
worse.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/36HJRYU6R6NEDZY7QSKS3DEKRY6OLTI4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Inada Naoki
On Tue, Feb 9, 2021 at 9:31 PM Paul Moore  wrote:
>
> Personally, I'm not at all keen on the idea of making users always
> specify encoding in the first place, even if it's "just for the
> transition".

I agree with you. But as I wrote in the PEP, omitted encoding caused
much troubles already.
Windows users can not just `pip install somepkg` because some library
authors write `long_description=open("README.md").read()` in setup.py.

I am trying to fix this situation by two parallel approaches:

* (This PEP) Provide a tool for finding this type of bugs, and
recommend `encoding="utf-8"` for cross-platform library authors.
* (Author thread) Make UTF-8 mode more usable for Windows users,
especially students.


> If we want to switch the default encoding from the locale encoding to
> UTF-8, we should find a way to do that which *doesn't* mean that
> there's a "transitional" state where using the default is considered
> bad practice. That helps no-one, and just adds confusion, which will
> last far longer than that one release (there will be people
> encountering StackOverflow questions on the topic long after the
> default has changed).
>
> Maybe we just have to accept that we can't work out what people are
> intending, and just state in advance in the documentation that the
> default will change, then it's documented as an upcoming breaking
> change that people can address (if they read the release notes, but we
> seem to be assuming they'll spot a warning, so why not assume they
> read the release notes, too?).
>

This PEP doesn't cover how to change the default encoding. So this is
slightly off topic.
I have two ideas for changing the default encoding:

(a) Regular deprecation period: Emitting EncodingWarning by default
(3.14 or later), and change the default encoding later (3.17 or
later).
(b) Enable UTF-8 mode default on Windows. Users can disable UTF-8 mode
for backward compatibility.

Steve Dower againsted to (b) very strongly. He suggested to emit
DeprecationWarning.
https://discuss.python.org/t/pep-597-enable-utf-8-mode-by-default-on-windows/3122/16

On the other hand, some core-dev don't like emitting Warning for all
omitted `encoding` option.

So I don't have strong opinion about which approach is better. I want
to see how EncodingWarning and UTF-8 mode are adopted.

I want to implement both EncodingWarning and per-site UTF-8 mode
setting in Python 3.10.
5+ years later, we will see which approach is adopted by users.

* If EncodingWarning is widely adopted by many developers, we can
discuss approach (a).
* If UTF-8 mode becomes the best practice for Windows users, we can
discuss approach (b).

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DY4OPCBKHHRJZMXEJ43MXPNXJ4EUS6MM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Anders Munch
Victor Stinner [mailto:vstin...@python.org] wrote:
> The warning can explicitly suggest to use encoding="utf8", it should work in 
> almost all cases.

The warning should also explain how to get backwards-compatible behaviour, i.e. 
suggest encoding="locale". 

Inada Naoki   wrote:
> This warning is opt-in warning like BytesWarning.

What use is a warning that no-one sees?  When the default is switched to 
encoding="utf8", it will break software, and people need to be warned of that.
UnicodeDecodeError's will abound when files that used to be read in a 
single-byte encoding fails to decode as utf-8. All it takes is a single é.
If the default encoding is ever to change, there's no way around a noisy 
warning.

How about swapping around "locale" and None?  That is, make "locale" the new 
default that emits a warning, and encoding=None emits no warning.  That has the 
advantage that old code can be updated to say encoding=None, and then it will 
work on both old and new Pythons without warning.

regards, Anders
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GZOHZAXKJDRJPF32U2ET5E32SOYXHR5E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Paul Moore
On Tue, 9 Feb 2021 at 16:28, Inada Naoki  wrote:
>
> On Wed, Feb 10, 2021 at 1:19 AM Paul Moore  wrote:
> >
> > But people who currently don't specify the encoding, and *don't* have
> > any issue (because the system locale is correct) will be getting told
> > to introduce a bug into their code, if they follow that advice :-(
> >
>
> This warning is opt-in warning like BytesWarning.
>
> It will be a good tool to find potential problems for people knows
> what is the problem.
> But it is not recommended for users who don't understand what is the problem.

Ah, OK. I missed that point in the long email chain. Sorry.
Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UUX2IR655Y6JOCOQBPHHQTPUUTVFA5XA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Inada Naoki
On Wed, Feb 10, 2021 at 1:19 AM Paul Moore  wrote:
>
> But people who currently don't specify the encoding, and *don't* have
> any issue (because the system locale is correct) will be getting told
> to introduce a bug into their code, if they follow that advice :-(
>

This warning is opt-in warning like BytesWarning.

It will be a good tool to find potential problems for people knows
what is the problem.
But it is not recommended for users who don't understand what is the problem.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SJKTVKW3DQCPRFRTGOUL73EI6BOGWDFF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Paul Moore
But people who currently don't specify the encoding, and *don't* have
any issue (because the system locale is correct) will be getting told
to introduce a bug into their code, if they follow that advice :-(

Paul

On Tue, 9 Feb 2021 at 16:03, Victor Stinner  wrote:
>
> On Tue, Feb 9, 2021 at 1:31 PM Paul Moore  wrote:
> > If we can't provide a good recommendation
> > to the user on what to do, we shouldn't be warning them that what they
> > are currently doing is wrong.
>
> The warning can explicitly suggest to use encoding="utf8", it should
> work in almost all cases.
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HSV6QSJKAUFS7LWZVEZUWTUD5A6DCFFL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Victor Stinner
On Tue, Feb 9, 2021 at 1:31 PM Paul Moore  wrote:
> If we can't provide a good recommendation
> to the user on what to do, we shouldn't be warning them that what they
> are currently doing is wrong.

The warning can explicitly suggest to use encoding="utf8", it should
work in almost all cases.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NMHJAPSNE7XI65DKV6EB55HGQW5XRAA6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Paul Moore
On Tue, 9 Feb 2021 at 11:55, Inada Naoki  wrote:

> I think only we can do is documenting the option like this:
>
> """
> EncodingWarning is warning to find missing encoding="utf-8" option. It
> is common pitfall that many Windows user
> Don't try to fix them if you need to use locale specific encoding.
> """

I'm a very strong -1 on having programs generate warnings that the
user isn't supposed to fix. If we can't provide a good recommendation
to the user on what to do, we shouldn't be warning them that what they
are currently doing is wrong. I've seen far too many examples of
people thinking "well, users can ignore the warning, it's not shown by
default" and then users' code being broken because of a situation we
didn't think about (most recently, the Python test suite, which runs
the venv tests with warnings converted to errors, which broke on a pip
release that contains a deprecation warning from packaging).

IMO, if we issue a warning, we *must* be able to advise the user how
to fix it. Otherwise we shouldn't be assuming we know what's correct
better than the user.

Personally, I'm not at all keen on the idea of making users always
specify encoding in the first place, even if it's "just for the
transition". There are far too many people in my experience who
wouldn't have a clue what to do when faced with that decision. And the
people (again in my experience) who don't know how to make that choice
are *precisely* the people for whom the system-defined default is what
they want. Certainly, if they are getting stuff off the internet, they
will more often get UTF-8, but I tend to find that people with limited
understanding of these issues are much more comfortable with the idea
that "stuff off the internet needs weird settings like this UTF-8
thing whatever it is", than they are with the idea that they have to
tell Python how to read that text file they just got from their boss,
who's still got Windows 7 on his PC...

If we want to switch the default encoding from the locale encoding to
UTF-8, we should find a way to do that which *doesn't* mean that
there's a "transitional" state where using the default is considered
bad practice. That helps no-one, and just adds confusion, which will
last far longer than that one release (there will be people
encountering StackOverflow questions on the topic long after the
default has changed).

Maybe we just have to accept that we can't work out what people are
intending, and just state in advance in the documentation that the
default will change, then it's documented as an upcoming breaking
change that people can address (if they read the release notes, but we
seem to be assuming they'll spot a warning, so why not assume they
read the release notes, too?).

Paul

PS I've hesitated about saying this before, as I'm very aware that
being from the UK, any problems I have with encodings are relatively
minor, so I want to let the people with real problems have their say.
But when we're talking about telling users not to fix warnings, I feel
the need to speak up.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5W7YFNY7BLCS25ZICWMH57XH5REITI34/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Inada Naoki
On Tue, Feb 9, 2021 at 7:23 PM Victor Stinner  wrote:
>
> I recall that something like 1 year ago, I basically tried to
> implement something like your PEP, to see if the stdlib calls open()
> without specifying an encoding. There were so many warnings, that the
> output was barely readable.
>
> The warning would only be useful if there is a way to modify the code
> to make the warning quiet (fix the issue) without losing support with
> Python 3.9 and older.
>
> I understand that open(filename) must be replaced with open(filename,
> encoding=("locale" if sys.version_info >= (3, 10) else None)) to make
> it backward and forward compatibility without emitting an
> EncodingWarning.

I think most of them must be replaced with encoding="ascii" or encoding="utf-8".

And encoding=locale.getpreferredencoding(False) is backward/forward
compatible way.
There is very little difference between encoding=None and
encoding=locale.getpreferredencoding(False).
But it is not a problem for most use cases.
Only applications using PYTHONLEGACYWINDOWSSTDIO and open() for
console I/O are affected by difference between them.


> One issue is that some people may blindly copy/paste
> this code pattern without thinking if "locale" is the proper encoding.
>

Isn't it same if the code pattern become `encoding=getattr(io,
"LOCALE_ENCODING", None)`,
or `encoding=locale.getpreferredencoding(False)`?

I think only we can do is documenting the option like this:

"""
EncodingWarning is warning to find missing encoding="utf-8" option. It
is common pitfall that many Windows user
Don't try to fix them if you need to use locale specific encoding.
"""

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YLAC2WJZ2TX7I3I6TSWA4GWPP5NNETUH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-09 Thread Victor Stinner
On Sat, Feb 6, 2021 at 3:26 PM Inada Naoki  wrote:
> I changed my mind. Since there is no plan to change the default
> encoding for now,
> no need to encourage `encoding="locale"` soon.
>
> Until users can drop Python 3.9 support, they can use EncodingWarning
> only for finding missing `encoding="utf-8"` or `encoding="ascii"`.
>
> I will remove the io.LOCALE_ENCODING.

I recall that something like 1 year ago, I basically tried to
implement something like your PEP, to see if the stdlib calls open()
without specifying an encoding. There were so many warnings, that the
output was barely readable.

The warning would only be useful if there is a way to modify the code
to make the warning quiet (fix the issue) without losing support with
Python 3.9 and older.

I understand that open(filename) must be replaced with open(filename,
encoding=("locale" if sys.version_info >= (3, 10) else None)) to make
it backward and forward compatibility without emitting an
EncodingWarning. One issue is that some people may blindly copy/paste
this code pattern without thinking if "locale" is the proper encoding.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6CRWIH6AJ43H2IRQZDJFUSSYUFPDSY3L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-06 Thread Inada Naoki
I send a pull request https://github.com/python/peps/pull/1799

* Add Backward/Forward Compatibility section
* Add How to teach this section
* Remove io.LOCALE_ENCODING constant


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TRIGYFRJSVSUWFQDYIUZI64BB4J323UN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-06 Thread Inada Naoki
On Tue, Feb 2, 2021 at 1:40 PM Inada Naoki  wrote:
>
> On Tue, Feb 2, 2021 at 12:23 AM Victor Stinner  wrote:
> >
> >
> > > Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
> > > be used to avoid confusing ``LookupError: unknown encoding: locale``
> > > error when the code is run in old Python accidentally.
> >
> > I'm not sure that it is useful. I like a simple "locale" literal
> > string. If there is a constant is io, people may start to think that
> > it's specific and will add "import io" just to get the string
> > "locale".
> >
> > I don't think that we should care too much about the error message
> > rased by old Python versions.
> >
>
> This constant not only for replacing "locale" litera. As example code
> in the PEP, it can be used to test wheather TextIOWrapper supports
> `encoding="locale"` .
>
> `open(fn, encoding=getattr(io, "LOCALE_ENCODING", None))` works both
> for Python ~3.9 and Python 3.10~.
>

I changed my mind. Since there is no plan to change the default
encoding for now,
no need to encourage `encoding="locale"` soon.

Until users can drop Python 3.9 support, they can use EncodingWarning
only for finding missing `encoding="utf-8"` or `encoding="ascii"`.

I will remove the io.LOCALE_ENCODING.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4SRSQQXRLQSXG4RLZGXHFEFTTBVDKPWK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-04 Thread Inada Naoki
On Tue, Feb 2, 2021 at 8:16 PM Victor Stinner  wrote:
>
> > > I understand that encoding=locale.get_locale_encoding() would be
> > > different from encoding="locale":
> > > encoding=locale.get_locale_encoding() doesn't call
> > > os.device_encoding(), right?
> > >
> >
> > Yes.
>
> Would it be useful to add a io.get_locale_encoding(fd)->str (maybe
> "get_default_encoding"?) function which gives the chosen encoding from
> a file descriptor, similar to open(fd, encoding="locale").encoding?
> The os.device_encoding() call is not obvious.
>

I don't think it's so useful. encoding=None is 99% same to
encoding=locale.getpreferedencoding(False).

On Unix, os.device_encoding() just returns locale encoding.
On Windows, os.device_encoding() is very unlikely used. open() uses
WindowsConsoleIO for console unless PYTHONLEGACYWINDOWSSTDIO is set
and encoding for it is UTF-8.

And that's why I removed the detailed behavior from the PEP. It is too
detailed and almost unrelated to EncodingWarning.
I wrote a simple comment in this section instead.
https://www.python.org/dev/peps/pep-0597/#locale-is-not-a-codec-alias

>
> > > > Opt-in warning
> > > > ---
> > > >
> > > > Although ``DeprecationWarning`` is suppressed by default, emitting
> > > > ``DeprecationWarning`` always when ``encoding`` option is omitted
> > > > would be too noisy.
> > >
> > > The PEP is not very clear. Does "-X warn_encoding" only emits the
> > > warning, or does it also display it by default? Does it add a warning
> > > filter for EncodingWarning?
> > >
> >
> > This section is not the spec. This section is the rationale for adding
> > EncodingWarning instead of using DeprecationWarning.
> >
> > As spec saying, EncodingWarning is a subclass of Warning. So it is
> > displayed by default. But it is not emitted by default.
> >
> > When -X encoding_warning (or -X warn_default_encoding) is used, the
> > warning is emitted and shown unless the user suppresses warnings.
>
> I understand that EncodingWarning is always displayed by default
> (default warning filters don't ignore it, whereas DeprecationWarning
> are ignored by default), but no warning is emitted by default. Ok,
> that makes sense. Maybe try to say it explicitly in the PEP.
>
>
> > This PEP doesn't have "backward compatibility" section because the PEP
> > doesn't break any backward compatibility.
>
> IMO it's a good thing to always have the section, just to say that you
> took time to think about backward compatibility ;-) The section can be
> empty, like just say "there is no incompatible change" ;-)
>
>
> > And if developers want to support Python ~3.9 and use -X
> > warn_default_encoding on 3.10, they need to write
> > `encoding=getattr(io, "LOCALE_ENCODING", None)`, as written in the
> > spec.
>
> Maybe repeat it in the Backward Compatibility section.
>
> It's important to provide a way to prevent the warning without losing
> the support for old Python versions.
>

will do.

>
> > > The main question is if it's possible to use encoding="locale" on
> > > Python 3.6-3.9 (maybe using some ugly hacks).
> >
> > No.
>
> Hum. To write code compatible with Python 3.9, I understand that
> encoding=None is the closest to encoding="locale".
>
> And I understand that encoding=getattr(io, "LOCALE_ENCODING", None) is
> backward and forward compatible ;-)
>
> Well, encoding=None will hopefully remain accepted with your PEP
> anyway for lazy developers ;-)
>

Yes. I don't think this warning is enabled by default in near future.
So developers can just use the option to find missing `encoding="utf-8"` bug.


>
> > Oh, I'm sorry. I want to make it in 3.10.
>
> Since it doesn't change anything by default, the warning is only
> displayed when you opt-in for it, IMO Python 3.10 target is
> reasonable.
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.

-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FZ567UQIEKO5IIVSQPUFCSZJOZBMYD4D/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-02 Thread Victor Stinner
On Tue, Feb 2, 2021 at 5:40 AM Inada Naoki  wrote:
> > In Python 3.10, I added _locale._get_locale_encoding() function which
> > is exactly what the encoding used by open() when no encoding is
> > specified (encoding=None) and when os.device_encoding(fd) returns
> > None. See _Py_GetLocaleEncoding() for the C implementation
> > (Python/fileutils.c).
> >
> > Maybe we should add a public locale.get_locale_encoding() function? On
> > Unix, this function uses nl_langinfo(CODESET) *without* setting
> > LC_CTYPE locale to the user preferred locale.
> >
>
> I can not imagine any use case. Isn't it just confusing?

It's the same than locale.getpreferredencoding(False) but with a more
explicit name, no argument and a *sane default behavior* (don't change
the LC_CTYPE locale temporarily).

The use case is to pass text to the OS (or get text from the OS) when
you cannot pass text directly, but must encode it (or decode it)
manually. Not all use cases involve files ;-)

Example of locale.getpreferredencoding() usage:

* XML ElementTree uses locale.getpreferredencoding() when
encoding="unicode" is used
* Deprecate gettext functions use it to encode to bytes
* the cgi module uses it to encode the URL query string for the CGI
stdin (GET and HEAD methods)

I dislike getpreferredencoding() because by default it changes
temporarily the LC_CTYPE locale which affects all threads, and this is
bad.

Well, it doesn't have to be part of the PEP ;-)

> > I understand that encoding=locale.get_locale_encoding() would be
> > different from encoding="locale":
> > encoding=locale.get_locale_encoding() doesn't call
> > os.device_encoding(), right?
> >
>
> Yes.

Would it be useful to add a io.get_locale_encoding(fd)->str (maybe
"get_default_encoding"?) function which gives the chosen encoding from
a file descriptor, similar to open(fd, encoding="locale").encoding?
The os.device_encoding() call is not obvious.


> > Maybe the PEP should also explain (in a "How to teach this" section?)
> > when encoding="locale" is better than a specific encoding, like
> > encoding="utf-8" or encoding="cp1252". In my experience, it's mostly
> > for the inter-operability which other applications which also use the
> > current locale encoding.
>
> This option is for experts who are publishing cross-platform
> libraries, frameworks, etc.
>
> For students, I am suggesting another idea that make UTF-8 mode more 
> accessible.

Maybe just say that in "How to teach this" section in the PEP?

In case of doubt, pass encoding="utf-8". Only use encoding="locale" if
you understand that the encoding changes depending on the platform and
the user locale. The common issue with encoding="locale" is that files
should not be exchanged between two computers. encoding="locale" is
good for files which remain local. It's also good for interoperability
with other applications which use the locale encoding and with the
terminal.


> > > Opt-in warning
> > > ---
> > >
> > > Although ``DeprecationWarning`` is suppressed by default, emitting
> > > ``DeprecationWarning`` always when ``encoding`` option is omitted
> > > would be too noisy.
> >
> > The PEP is not very clear. Does "-X warn_encoding" only emits the
> > warning, or does it also display it by default? Does it add a warning
> > filter for EncodingWarning?
> >
>
> This section is not the spec. This section is the rationale for adding
> EncodingWarning instead of using DeprecationWarning.
>
> As spec saying, EncodingWarning is a subclass of Warning. So it is
> displayed by default. But it is not emitted by default.
>
> When -X encoding_warning (or -X warn_default_encoding) is used, the
> warning is emitted and shown unless the user suppresses warnings.

I understand that EncodingWarning is always displayed by default
(default warning filters don't ignore it, whereas DeprecationWarning
are ignored by default), but no warning is emitted by default. Ok,
that makes sense. Maybe try to say it explicitly in the PEP.


> This PEP doesn't have "backward compatibility" section because the PEP
> doesn't break any backward compatibility.

IMO it's a good thing to always have the section, just to say that you
took time to think about backward compatibility ;-) The section can be
empty, like just say "there is no incompatible change" ;-)


> And if developers want to support Python ~3.9 and use -X
> warn_default_encoding on 3.10, they need to write
> `encoding=getattr(io, "LOCALE_ENCODING", None)`, as written in the
> spec.

Maybe repeat it in the Backward Compatibility section.

It's important to provide a way to prevent the warning without losing
the support for old Python versions.


> > The main question is if it's possible to use encoding="locale" on
> > Python 3.6-3.9 (maybe using some ugly hacks).
>
> No.

Hum. To write code compatible with Python 3.9, I understand that
encoding=None is the closest to encoding="locale".

And I understand that encoding=getattr(io, "LOCALE_ENCODING", None) is
backward and forward 

[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 12:23 AM Victor Stinner  wrote:
>
> Hi Inada-san,
>
> I followed the discussions on your different PEP and I like overall
> your latest PEP :-) I have some minor remarks.
>
> On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki  wrote:
> > The warning is disabled by default. New ``-X warn_encoding``
> > command-line option and ``PYTHONWARNENCODING`` environment variable
> > are used to enable the warnings.
>
> Maybe "warn implicit encoding" or "warn omit encoding" (not sure if
> it's make sense written like that in english ;-)) would be more
> explicit.
>

Yes, it's explicit. So I used `PYTHONWARNDEFAULTENCODING` first.
But I feel it's unreadable. That's why I shorten the option name.

I wait to see more feedback about naming.

>
> > Options to enable the warning
> > --
> >
> > ``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
> > environment variable are added. They are used to enable the
> > ``EncodingWarning``.
> >
> > ``sys.flags.encoding_warning`` is also added. The flag represents
> > ``EncodingWarning`` is enabled.
>
> Nitpick: I would prefer using the same name for the -X option and the
> sys.flags attribute (ex: sys.flags.warn_encoding).
>

OK, I will change the flag name same to option name.

>
> > ``encoding="locale"`` option
> > 
> >
> > ``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
> > same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
> > emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
>
> Can you please define if os.device_encoding(fd) is called if
> encoding="locale" is used? It seems so, so it's not obvious from the
> PEP.
>

OK.

>
> In Python 3.10, I added _locale._get_locale_encoding() function which
> is exactly what the encoding used by open() when no encoding is
> specified (encoding=None) and when os.device_encoding(fd) returns
> None. See _Py_GetLocaleEncoding() for the C implementation
> (Python/fileutils.c).
>
> Maybe we should add a public locale.get_locale_encoding() function? On
> Unix, this function uses nl_langinfo(CODESET) *without* setting
> LC_CTYPE locale to the user preferred locale.
>

I can not imagine any use case. Isn't it just confusing?


> I understand that encoding=locale.get_locale_encoding() would be
> different from encoding="locale":
> encoding=locale.get_locale_encoding() doesn't call
> os.device_encoding(), right?
>

Yes.

>
> Maybe the PEP should also explain (in a "How to teach this" section?)
> when encoding="locale" is better than a specific encoding, like
> encoding="utf-8" or encoding="cp1252". In my experience, it's mostly
> for the inter-operability which other applications which also use the
> current locale encoding.
>

This option is for experts who are publishing cross-platform
libraries, frameworks, etc.

For students, I am suggesting another idea that make UTF-8 mode more accessible.

>
> > Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
> > be used to avoid confusing ``LookupError: unknown encoding: locale``
> > error when the code is run in old Python accidentally.
>
> I'm not sure that it is useful. I like a simple "locale" literal
> string. If there is a constant is io, people may start to think that
> it's specific and will add "import io" just to get the string
> "locale".
>
> I don't think that we should care too much about the error message
> rased by old Python versions.
>

This constant not only for replacing "locale" litera. As example code
in the PEP, it can be used to test wheather TextIOWrapper supports
`encoding="locale"` .

`open(fn, encoding=getattr(io, "LOCALE_ENCODING", None))` works both
for Python ~3.9 and Python 3.10~.


>
>
> > Opt-in warning
> > ---
> >
> > Although ``DeprecationWarning`` is suppressed by default, emitting
> > ``DeprecationWarning`` always when ``encoding`` option is omitted
> > would be too noisy.
>
> The PEP is not very clear. Does "-X warn_encoding" only emits the
> warning, or does it also display it by default? Does it add a warning
> filter for EncodingWarning?
>

This section is not the spec. This section is the rationale for adding
EncodingWarning instead of using DeprecationWarning.

As spec saying, EncodingWarning is a subclass of Warning. So it is
displayed by default. But it is not emitted by default.

When -X encoding_warning (or -X warn_default_encoding) is used, the
warning is emitted and shown unless the user suppresses warnings.

>
> The PEP has no "Backward compatibility" section. Is it possible to
> monkey-patch Python to implement this PEP (maybe only partially) on
> old Python versions? I'm asking to prepare existing projects for
> future EncodingWarning.
>

This PEP doesn't have "backward compatibility" section because the PEP
doesn't break any backward compatibility.
Unless the option is enabled, no warnings are emitted by the PEP, like
`-b` option and BytesWarning.

And if developers want to support Python ~3.9 

[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-01 Thread Victor Stinner
Hi Inada-san,

I followed the discussions on your different PEP and I like overall
your latest PEP :-) I have some minor remarks.

On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki  wrote:
> The warning is disabled by default. New ``-X warn_encoding``
> command-line option and ``PYTHONWARNENCODING`` environment variable
> are used to enable the warnings.

Maybe "warn implicit encoding" or "warn omit encoding" (not sure if
it's make sense written like that in english ;-)) would be more
explicit.


> Options to enable the warning
> --
>
> ``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
> environment variable are added. They are used to enable the
> ``EncodingWarning``.
>
> ``sys.flags.encoding_warning`` is also added. The flag represents
> ``EncodingWarning`` is enabled.

Nitpick: I would prefer using the same name for the -X option and the
sys.flags attribute (ex: sys.flags.warn_encoding).


> ``encoding="locale"`` option
> 
>
> ``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
> same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
> emit ``EncodingWarning`` when ``encoding="locale"`` is specified.

Can you please define if os.device_encoding(fd) is called if
encoding="locale" is used? It seems so, so it's not obvious from the
PEP.


In Python 3.10, I added _locale._get_locale_encoding() function which
is exactly what the encoding used by open() when no encoding is
specified (encoding=None) and when os.device_encoding(fd) returns
None. See _Py_GetLocaleEncoding() for the C implementation
(Python/fileutils.c).

Maybe we should add a public locale.get_locale_encoding() function? On
Unix, this function uses nl_langinfo(CODESET) *without* setting
LC_CTYPE locale to the user preferred locale.

I understand that encoding=locale.get_locale_encoding() would be
different from encoding="locale":
encoding=locale.get_locale_encoding() doesn't call
os.device_encoding(), right?


Maybe the PEP should also explain (in a "How to teach this" section?)
when encoding="locale" is better than a specific encoding, like
encoding="utf-8" or encoding="cp1252". In my experience, it's mostly
for the inter-operability which other applications which also use the
current locale encoding.


By the way, I recently rewrote the documentation about the encodings
used by Python:

* https://docs.python.org/dev/glossary.html#term-locale-encoding
* https://docs.python.org/dev/glossary.html#term-locale-encoding
* 
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_encoding
* https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.stdio_encoding
* https://docs.python.org/dev/library/os.html#utf8-mode


> Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
> be used to avoid confusing ``LookupError: unknown encoding: locale``
> error when the code is run in old Python accidentally.

I'm not sure that it is useful. I like a simple "locale" literal
string. If there is a constant is io, people may start to think that
it's specific and will add "import io" just to get the string
"locale".

I don't think that we should care too much about the error message
rased by old Python versions.



> Opt-in warning
> ---
>
> Although ``DeprecationWarning`` is suppressed by default, emitting
> ``DeprecationWarning`` always when ``encoding`` option is omitted
> would be too noisy.

The PEP is not very clear. Does "-X warn_encoding" only emits the
warning, or does it also display it by default? Does it add a warning
filter for EncodingWarning?


The PEP has no "Backward compatibility" section. Is it possible to
monkey-patch Python to implement this PEP (maybe only partially) on
old Python versions? I'm asking to prepare existing projects for
future EncodingWarning.

The main question is if it's possible to use encoding="locale" on
Python 3.6-3.9 (maybe using some ugly hacks). By the way, your PEP has
no target Python version ;-) Do you want to get it in Python 3.10 or
3.11?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YEYN2ZE4AYWFJJDTQZPJHTCXOPA5MGO5/
Code of Conduct: http://python.org/psf/codeofconduct/