[Python-Dev] Re: PEP 597: Add optional EncodingWarning
To demonstrate how this warning is useful, I used my reference implementation. When I try `pip install`, I found these issues soon. https://bugs.python.org/issue43214 (Open pth file with locale-encoding) https://github.com/pypa/pip/pull/9608 (Not a real bug, but open JSON file with locale-encoding) And when creating a PR for pip, I found this issue in tox: https://github.com/tox-dev/tox/issues/1908 (Open toml file with locale-encoding, may not work on Windows) Although most developers won't use this option, I and few other developers can put `export PYTHONWARNENCODING=1` in .bashrc and will find many possible bugs that happen only on Windows, even if they don't use Windows daily development. Isn't this option worth enough? -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/K7PVGEHDB3BXLNFZ6UWFJOKCC337UTWO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
In the documentation (not sure whether it should be the documentation for "open" or for encoding), include at least a link to instructions on how to (try to) verify that your codebase is using the encoding parameter properly. Those instructions would say something like "Add the following lines to end of Lib\site.py: _origopen=open def open(...): if ... warnings.warn(...) _origopen(...) " -jJ On Fri, Feb 12, 2021 at 6:28 PM Inada Naoki wrote: > > On Sat, Feb 13, 2021 at 4:53 AM Jim J. Jewett wrote: > > > > Offering encoding="locale" (or open.locale or ... ) instead of a long > > function call using False (locale.getpreferredencoding(False)) seems like a > > win for Explicit is Better Than Implicit. It would then be possible to say > > "yeah, locale really is what I meant". > > > > Err... unless the charset determination is so tricky that it ends up just > > adding another not-quite-right near-but-not-exact-synonym. > > > > Adding a new Warning subclass, and maybe a new warning type, and maybe a > > new environment variable, and maybe a new launch flag ... these all seem to > > risk just making things more complicated without sufficient gain. > > > > Would a recipe for site-packages be sufficient, or does this need to run > > too early in the bootstrapping process? > > > > -jJ > > What does "a recipe for site-packages" mean? > > -- > Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MSK5HN4IGUMBRF4PM7IZYMI7OJGD4KJC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Sat, Feb 13, 2021 at 4:53 AM Jim J. Jewett wrote: > > Offering encoding="locale" (or open.locale or ... ) instead of a long > function call using False (locale.getpreferredencoding(False)) seems like a > win for Explicit is Better Than Implicit. It would then be possible to say > "yeah, locale really is what I meant". > > Err... unless the charset determination is so tricky that it ends up just > adding another not-quite-right near-but-not-exact-synonym. > > Adding a new Warning subclass, and maybe a new warning type, and maybe a new > environment variable, and maybe a new launch flag ... these all seem to risk > just making things more complicated without sufficient gain. > > Would a recipe for site-packages be sufficient, or does this need to run too > early in the bootstrapping process? > > -jJ What does "a recipe for site-packages" mean? -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4ZOMEDEZ72SU7FDTTF5XUIPOA5SU72R6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Offering encoding="locale" (or open.locale or ... ) instead of a long function call using False (locale.getpreferredencoding(False)) seems like a win for Explicit is Better Than Implicit. It would then be possible to say "yeah, locale really is what I meant". Err... unless the charset determination is so tricky that it ends up just adding another not-quite-right near-but-not-exact-synonym. Adding a new Warning subclass, and maybe a new warning type, and maybe a new environment variable, and maybe a new launch flag ... these all seem to risk just making things more complicated without sufficient gain. Would a recipe for site-packages be sufficient, or does this need to run too early in the bootstrapping process? -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VUVVGVCBLVR55ELDLX44SFLBK7ED7WGG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Fri, Feb 12, 2021 at 12:45 PM Jim J. Jewett wrote: > > On Thu, Feb 11, 2021 at 7:35 PM Inada Naoki wrote: > > > The PEP helps developers living on UTF-8 locale to find missing > > `encoding="utf-8"` bug. > > This type of bug is very common, and many Windows users are suffered > > by the bug when reading JSON, YAML, TOML, Markdown, or any other UTF-8 > > files. > > I think this is where we have been talking past each other. > > You seem to be assuming that the programmer knows the correct > encoding, presumably because they (or their program) wrote it. Not always, but many times. > You > then assume that they neglected to mention the encoding out of > forgetfulness, perhaps because on their system, everything is always > UTF-8. This clearly does happen, but the people who would make this > mistake most often -- they probably wouldn't think to test their code > under a special mode that catches only this. (They might run a linter > that looked for all sorts of problems, including this.) > Some Python experts can write `export PYTHONWARNENCODING=1` in their .bashrc. They can find such mistakes not only in their codes but also in libraries they are using. Since they are experts, they can understand the warning and report it to the library author correctly. So this option helps library authors even if they don't use this option. > I instead assume that the programmer really doesn't know the encoding, > because the file is supplied by the user. (The user may not know > either, since it is really supplied by some other program, but ... > neither python nor the programmer knows for sure.) > In this case, the > warning is not just a false alarm, but is actively misleading. > > -jJ This option is opt-in. People don't understand what this warning means should not opt-in the warning. Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KLYUYKLHWCTTK7HOYNPDRPRS6WIQQU7K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Fri, Feb 12, 2021 at 12:28 PM Jim J. Jewett wrote: > > (I apologize if my summaries distort what Inada Naoki > explained.) > > He said that some people use the default None when they really want > either UTF-8 or ASCII. Yes. Even Python core developers. For example: https://bugs.python.org/issue33684 This is just one example. I saw many codes using default encoding to read JSON, YAML, TOML, Markdown, etc... > > My concern is that the warning will be a false alarm if they really do > need whatever locale returns, and that case may still be common. (If > web browsers had stopped bothering to sniff for other charsets, then > maybe that situation really was getting rare.) > That's one of reason why this warning is opt-in, like BytesWarning. > I asked when encoding=None is actually different from encoding=locale, > currently spelled encoding=locale.getpreferredencoding(False) > I don't understand this sentence. This PEP proposes `encoding="locale"` that is equal to encoding=None but don't emit EncodingWarning. There was discussion about difference between `encoding=None` and `encoding=locale.getpreferredencoding(False)` in this thread. > They can be different on Windows console, presumably because the > environment settings that control locale may differ from the charset > actually used by the console. Even then, it only differs for open() > when PYTHONLEGACYWINDOWSSTDIO is set, and for TextIOWrapper() When the > file is not _WindowsConsoleIO > > To me, that sounds narrow enough to be a windows issue, rather than an > issue with open. Yes. So if user want to specify locale-specific encoding and don't want to drop Python 3.9 support, user can use encoding=locale.getpreferredencoding(False). But this PEP doesn't recommend it. Third party libraries can use `encoding="locale"` after they drop Python 3.9 support. > Is there some way to write an encoding that sniffs > for charsets, particularly on windows, and to use that as the default > instead of assuming that locale will be correct? > > -jJ There is no reliable way, AFAIK. -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LJASRUN5G2PYEUOT7H34LGGBYEHBUB3C/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Thu, Feb 11, 2021 at 7:35 PM Inada Naoki wrote: > The PEP helps developers living on UTF-8 locale to find missing > `encoding="utf-8"` bug. > This type of bug is very common, and many Windows users are suffered > by the bug when reading JSON, YAML, TOML, Markdown, or any other UTF-8 > files. I think this is where we have been talking past each other. You seem to be assuming that the programmer knows the correct encoding, presumably because they (or their program) wrote it. You then assume that they neglected to mention the encoding out of forgetfulness, perhaps because on their system, everything is always UTF-8. This clearly does happen, but the people who would make this mistake most often -- they probably wouldn't think to test their code under a special mode that catches only this. (They might run a linter that looked for all sorts of problems, including this.) I instead assume that the programmer really doesn't know the encoding, because the file is supplied by the user. (The user may not know either, since it is really supplied by some other program, but ... neither python nor the programmer knows for sure.) In this case, the warning is not just a false alarm, but is actively misleading. -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LHMFNFIOATO46NVOVCUOKFQCRWCZLY7M/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
(I apologize if my summaries distort what Inada Naoki explained.) He said that some people use the default None when they really want either UTF-8 or ASCII. My concern is that the warning will be a false alarm if they really do need whatever locale returns, and that case may still be common. (If web browsers had stopped bothering to sniff for other charsets, then maybe that situation really was getting rare.) I asked when encoding=None is actually different from encoding=locale, currently spelled encoding=locale.getpreferredencoding(False) They can be different on Windows console, presumably because the environment settings that control locale may differ from the charset actually used by the console. Even then, it only differs for open() when PYTHONLEGACYWINDOWSSTDIO is set, and for TextIOWrapper() When the file is not _WindowsConsoleIO To me, that sounds narrow enough to be a windows issue, rather than an issue with open. Is there some way to write an encoding that sniffs for charsets, particularly on windows, and to use that as the default instead of assuming that locale will be correct? -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SSALFO3RTPX7QZ7B2MOWTZKYCJ5XKWK4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On 2/11/21, Inada Naoki wrote: > > There is little difference between `encoding=None` and > `encoding=locale.getpreferredencoding(False)`. The difference is: > > * When Python is using Windows, and > * When when the file is console, and > * (for open()) When PYTHONLEGACYWINDOWSSTDIO is set > * (for TextIOWrapper()) When the file is not _WindowsConsoleIO > > encoding=None uses console codepage but os.device_encoding() -- i.e. _Py_device_encoding() -- only works for hard-coded file descriptors 0, 1, and 2, instead of detecting a console file. So opening "CON", "CONIN$", or "CONOUT$" has never used the console input or output code page, nor has opening a duped standard I/O fd such as open(os.dup(0)). It would be easy to generalize _Py_device_encoding() to detect console files, but it's new behavior. Python 3.8+ introduced a bug (issue 42261) in which, even with legacy standard I/O enabled and file descriptors 0-2, the console input and output code pages are ignored. For example: C:\>chcp 437 Active code page: 437 C:\>set PYTHONLEGACYWINDOWSSTDIO=1 C:\>py -3.9 -c "import sys; print(sys.stdout.encoding)" cp1252 Regarding the last bullet point, io.TextIOWrapper doesn't know anything about io._WindowsConsoleIO. The decision to use UTF-8 is in io.open(). So manually wrapping a _WindowsConsoleIO file with TextIOWrapper uses the locale preferred encoding instead of UTF-8. For example: >>> fb = open('conin$', 'rb') >>> fb.raw <_io._WindowsConsoleIO mode='rb' closefd=True> >>> f = io.TextIOWrapper(fb) >>> f.encoding 'cp1252' I don't know whether it's worth making TextIOWrapper check for _WindowsConsoleIO in order to make it use UTF-8. It's not common to manually wrap a binary-mode file. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QBNH3XGSNBQ7XIJ5E542JIQ5Q5E63MCU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Thu, Feb 11, 2021 at 4:44 PM Jim J. Jewett wrote: > > The PEP helps when the locale is ASCII or C, but that isn't enforced in > actual files. I am confident that this is a frequent problem for packages > downloaded from mostly-English sites, including many software repositories. > The PEP helps developers living on UTF-8 locale to find missing `encoding="utf-8"` bug. This type of bug is very common, and many Windows users are suffered by the bug when reading JSON, YAML, TOML, Markdown, or any other UTF-8 files. > It does not seem to be a win when the locale is something incompatible with > utf-8, such as Latin-1, or whatever is still common in Japan. The > surrogate-escape mechanism allows a proper round-trip, but python itself will > stop processing the characters correctly. > Surrogate-escape mechanism doesn't relating this PEP. > For interactive use, when talking to another program (such as a terminal) > instead of an already existing file, the backwards compatibility problem > seems worse. > This PEP is 100% backward compatible. > Changing the default to utf-8 (after a deprecation period showing how to make > locale an explicit default) may be reasonable, but claiming that it is > backwards compatible ... I didn't get that impression from the PEP. > This PEP doesn't propose to change the default encoding. *If* we decide to change the default encoding in the future (maybe, 2025 or later) and start emitting DeprecationWarning where `encoding` option is omitted, this PEP help it by: * `encoding="locale"` option can be used since Python 3.10, and * The number of DeprecationWarning shown is decreased because we can add `encoding="utf-8"` many places before the time. At least, we can fix all EncodingWarning in stdlib. Maybe, the "Prepare to change the default encoding to UTF-8" is misleading. I will try to fix the section or remove the section. -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JBBRBR6AUTGP2SAVAUJVZJ3GM6FJQEBV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Fri, Feb 12, 2021 at 5:18 AM Jim J. Jewett wrote: > > Inada Naoki wrote: > > > Default encoding is used for: > > > a. Really need to use locale specific encoding > > b. UTF-8 (bug. not work on Windows) > > c. ASCII (not a bug, but slow on Windows) > > > I assume most usages are (b) and (c). This PEP can reduce them soon. > > Is this just an assumption, based on those times being visible to someone who > installs a lot of packages, or has the use of any locale other than UTF-8 and > ASCII really gone down a lot? Have browsers stopped using charset sniffing? > Using "most" is my fault. I am not good at Englsh. I should use "many" here. You can see many bugs caused by not specifying `encoding="utf-8"` in Q sites. I wrote some number about this common bugs in the PEP. UTF-8 is used for 96.3% of web sites [1], although browser still use charset sniffing. But how is it relating to this PEP? [1] https://w3techs.com/technologies/details/en-utf8 > > Additionally, encoding="locale" will be backward/forward compatible > > What would be the problem with changing the default from None to locale? It doesn't work on Python ~3.9. So using `encoding="locale"` is not recommended anytime soon until user drops Python 3.9 support. > (I think you mentioned that they are the same 99% of the time; is that other > 1% likely to be cases where locale is wrong but None is right? Would there > be a better way to represent that 1%?) > `encoding="locale"` and `encoding=None` has same behavior except `encoding="locale"` doesn't emit EncodingWarning even when it is opt-in. There is little difference between `encoding=None` and `encoding=locale.getpreferredencoding(False)`. The difference is: * When Python is using Windows, and * When when the file is console, and * (for open()) When PYTHONLEGACYWINDOWSSTDIO is set * (for TextIOWrapper()) When the file is not _WindowsConsoleIO encoding=None uses console codepage but encoding=locale.getpreferredencoding(False) uses Otherwise, encoding=None and encoding=locale.getpreferredencoding(False) are same. So `encoding=locale.getpreferredencoding(False)` can be used to specify locale-specific encoding explicitly. But this PEP doesn't recommend it. This PEP recommend to use EncodingWarning for just finding missing `encoding="utf-8"` (or any other specific encoding). -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PD4BTBAQHFUYOCF5QKIBDIMHATPVEFPW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Fri, Feb 12, 2021 at 6:34 AM Paul Moore wrote: > > On Thu, 11 Feb 2021 at 21:05, Jim J. Jewett wrote: > > > > Who will benefit from this new warning? > > > > Is this basically just changing builtins.open by adding: > > > > if encoding is None and sys.flags.encoding_warning: # and not Android > > and not -X utf8 ? > > warnings.warn(EncodingWarning("Are you sure you want locale instead > > of utf-8?")) > > > > Even for the few people with the knowledge, time, interest, and authority > > to fix the code, is that really helpful? > > > > Helpful enough to put it directly in python as an optional mode, separate > > from the dev mode or show all warnings mode? Why not just add it to a > > linter, or write a 2to3 style checker? Or at least emit or not based on a > > warnings filter? > > That's a very good point. If this warning is of use, why have none of > the well-known linters implemented it? And why not prototype the > proposal in them, at least? Python-ideas posts routinely get pushed to > justify "why can't this be done in an external library?" and that > probably applies here too. > * Linters can not add `encoding="locale"` to Python. * This PEP provides the way to shift where warnings is emitted. def my_read_file(filename, encoding=None): encoding = io.text_encoding(encoding) with open(filename, encoding=encoding) with f: return f.read() This function is not warned. Caller of this function is warned instead. It is difficult to implement in the Linter. -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CKMRUBEI3UHEXSELZIQBA6NZCK77O75T/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Thu, 11 Feb 2021 at 21:05, Jim J. Jewett wrote: > > Who will benefit from this new warning? > > Is this basically just changing builtins.open by adding: > > if encoding is None and sys.flags.encoding_warning: # and not Android and > not -X utf8 ? > warnings.warn(EncodingWarning("Are you sure you want locale instead > of utf-8?")) > > Even for the few people with the knowledge, time, interest, and authority to > fix the code, is that really helpful? > > Helpful enough to put it directly in python as an optional mode, separate > from the dev mode or show all warnings mode? Why not just add it to a > linter, or write a 2to3 style checker? Or at least emit or not based on a > warnings filter? That's a very good point. If this warning is of use, why have none of the well-known linters implemented it? And why not prototype the proposal in them, at least? Python-ideas posts routinely get pushed to justify "why can't this be done in an external library?" and that probably applies here too. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VRBGH3ECNJBZMX7LINDNXYHQSKTKRTEX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Who will benefit from this new warning? Is this basically just changing builtins.open by adding: if encoding is None and sys.flags.encoding_warning: # and not Android and not -X utf8 ? warnings.warn(EncodingWarning("Are you sure you want locale instead of utf-8?")) Even for the few people with the knowledge, time, interest, and authority to fix the code, is that really helpful? Helpful enough to put it directly in python as an optional mode, separate from the dev mode or show all warnings mode? Why not just add it to a linter, or write a 2to3 style checker? Or at least emit or not based on a warnings filter? -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GXZFCYFK7VOUSVZ5BVDCUW3JNJG6KPRS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Inada Naoki wrote: > Default encoding is used for: > a. Really need to use locale specific encoding > b. UTF-8 (bug. not work on Windows) > c. ASCII (not a bug, but slow on Windows) > I assume most usages are (b) and (c). This PEP can reduce them soon. Is this just an assumption, based on those times being visible to someone who installs a lot of packages, or has the use of any locale other than UTF-8 and ASCII really gone down a lot? Have browsers stopped using charset sniffing? > Additionally, encoding="locale" will be backward/forward compatible What would be the problem with changing the default from None to locale? (I think you mentioned that they are the same 99% of the time; is that other 1% likely to be cases where locale is wrong but None is right? Would there be a better way to represent that 1%?) -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NGAF753ALAPMUKNJWFBYLDOTYTUJH6ZG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Thu, Feb 11, 2021 at 4:44 PM Jim J. Jewett wrote: > > I just reread PEP 597, then re-reread the Rationale. > Do you read current PEP 597, or old PEP 597 in discuss.python.org? -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UKGDVMHUNNNRA4D4UCG4RLPZDIVKNNEY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 9, 2021 at 9:51 PM Paul Moore wrote: > * Realistically, I'd be surprised if developers actually use such a > tool. If they were likely to do so, they could probably just as easily > locate all the uses of open() in their code, and check that way. So > I'm not sure this proposal is actually worth it, even if the end > result would be very beneficial. That's ok, they are many Python features which are only used by a minority of users. For me it's similar to the Python Development Mode: https://docs.python.org/dev/library/devmode.html Most users and developers will never use it, but for developers who care, it's a useful tool (it ease the discovery of issues). Victor ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QRYY3M2RXQ7W33DO4TRJBNXWTV6N6BQE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
I just reread PEP 597, then re-reread the Rationale. The PEP helps when the locale is ASCII or C, but that isn't enforced in actual files. I am confident that this is a frequent problem for packages downloaded from mostly-English sites, including many software repositories. It does not seem to be a win when the locale is something incompatible with utf-8, such as Latin-1, or whatever is still common in Japan. The surrogate-escape mechanism allows a proper round-trip, but python itself will stop processing the characters correctly. For interactive use, when talking to another program (such as a terminal) instead of an already existing file, the backwards compatibility problem seems worse. Changing the default to utf-8 (after a deprecation period showing how to make locale an explicit default) may be reasonable, but claiming that it is backwards compatible ... I didn't get that impression from the PEP. -jJ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RA5SLRB4M7IDLVZKQ3NWVACBLHII2BTR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 11:58 PM Anders Munch wrote: > > On Wed, Feb 10, 2021 at 1:46 AM Anders Munch wrote: > >> How about swapping around "locale" and None? > Inada Naoki wrote: > > > > I thought it, but it may not work. Consider about function like this: > > > > ``` > > def read_text(self, encoding=None): > > with open(self._filename, encoding=encoding) as f: > > return f.read() > > ``` > > > > If `encoding=None` suppresses the warning, functions like this never warned. > > I don't see why they should be. The author clearly knew about the encoding > argument to open, they clearly intended for a None value to be given in some > cases, and at the time of writing None meant to use a locale-dependent > encoding. > It is not clear. The author may just want to "use the default encoding same to open()". If so, the caller of the function should be warned. To warn caller, this function can use `encoding=io.text_encoding(encoding)` as described in the PEP. > > We are not discussing about changing default encoding for now. > > The section "Prepare to change the default encoding to UTF-8" gave me the > impression that this was meant as a stepping stone on the way to doing just > that. If that was not the intention, my apologies for the misread. > This *can* be stepping stone. But it is not a frist goal. This PEP doesn't discourange omitting encoding option anytime soon when user really need to use locale encoding. Default encoding is used for: a. Really need to use locale specific encoding b. UTF-8 (bug. not work on Windows) c. ASCII (not a bug, but slow on Windows) I assume most usages are (b) and (c). This PEP can reduce them soon. If we decided to change the default encoding in the future, we need to warn omitting encoding option. Reducing (b) and (c) will reduce the total warning shown in the future. This is what "Prepare" means. Additionally, `encoding="locale"` will be backward/forward compatible way to use locale-specific encoding when we decided to change the default encoding. So this PEP can be a very important stepping stone. On the other hand, it is not a problem that we can not use `encoding="locale"` in backward-compatible code *for now*. Python 3.9 become EOL in 2025. We won't emit warning for the default encoding until then. People can use `encoding="locale"` after they drop Python 3.9 support. No problem. Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DBDI5FEJCF2IOTSAS7VELO27MNEQMK2Z/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, 10 Feb 2021 at 16:06, Anders Munch wrote: > > Paul Moore [mailto:p.f.mo...@gmail.com]: wrote: > > On Wed, 10 Feb 2021 at 14:33, Anders Munch wrote: > >> The idea is to make is so that working code only needs to change once, > >> even when supporting multiple Python versions. > >> That one change is to add either an explicit encoding=None (for > >> backwards-compatibility) or an explicit encoding='utf-8' (because that was > >> intended all along). No twice about it, one change. > > > But then people who added an explicit utf-8 encoding need to remove the > > encoding argument again once the default value changes > > Why would they do that? There's no need to remove anything. Code that > doesn't use a default doesn't break because the default changes. Because I'm against a proposal that forces *everyone* to explicitly specify the value... Your argument implies that removing the default altogether would be fine as well. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GA3EICY5IN7S7VVH24CX3SPAENKCAMFW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On 2/10/2021 10:29 AM, Paul Moore wrote: On Wed, 10 Feb 2021 at 14:33, Anders Munch wrote: The idea is to make is so that working code only needs to change once, even when supporting multiple Python versions. That one change is to add either an explicit encoding=None (for backwards-compatibility) or an explicit encoding='utf-8' (because that was intended all along). No twice about it, one change. But then people who added an explicit utf-8 encoding need to remove the encoding argument again once the default value changes. Your proposal leads to a situation where no-one leaves the encoding argument to default. If we're going to permanently discourage omitting the encoding argument, we should just make it mandatory (a change that I'll argue against, but no-one is currently proposing it, luckily). Except that all code written after the default has changed (and all python versions without that default are no longer supported) won't need to specify utf-8. And presumably there's more code to be written in the future than already exists. Eric ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EG5UUYGC2R36NPXBDKFTFDKUYSYHP6JR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Paul Moore [mailto:p.f.mo...@gmail.com]: wrote: > On Wed, 10 Feb 2021 at 14:33, Anders Munch wrote: >> The idea is to make is so that working code only needs to change once, even >> when supporting multiple Python versions. >> That one change is to add either an explicit encoding=None (for >> backwards-compatibility) or an explicit encoding='utf-8' (because that was >> intended all along). No twice about it, one change. > But then people who added an explicit utf-8 encoding need to remove the > encoding argument again once the default value changes Why would they do that? There's no need to remove anything. Code that doesn't use a default doesn't break because the default changes. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WBUP2BNLAKHILUM3ZE3A2LVQKAQRXQ7T/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Inada Naoki [mailto:songofaca...@gmail.com] wrote: > There are several ways: > * encoding="latin1" -- This is the best. Works perfectly. > * Don't touch -- You don't need to enable EncodingWarning. > [...] I'm replying to Victor's statement that ``encoding="utf8" is backward compatible´´. If you're adding encoding="latin1" to the user program, then you are doing something very different from what Victor proposed. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IENGLLP67LAOVLVSMJSJX4W6K2ZDTPJ7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, 10 Feb 2021 at 14:33, Anders Munch wrote: > > On Tue, 9 Feb 2021 at 16:52, Anders Munch wrote: > >> How about swapping around "locale" and None? That is, make "locale" the > >> new default that emits a warning, and encoding=None emits no warning. > >> That has the advantage that old code can be updated to say encoding=None, > >> and then it will work on both old and new Pythons without warning. > Paul Moore [mailto:p.f.mo...@gmail.com] > > I don't understand why working code should have to change *twice*. > > The idea is to make is so that working code only needs to change once, even > when supporting multiple Python versions. > That one change is to add either an explicit encoding=None (for > backwards-compatibility) or an explicit encoding='utf-8' (because that was > intended all along). No twice about it, one change. But then people who added an explicit utf-8 encoding need to remove the encoding argument again once the default value changes. Your proposal leads to a situation where no-one leaves the encoding argument to default. If we're going to permanently discourage omitting the encoding argument, we should just make it mandatory (a change that I'll argue against, but no-one is currently proposing it, luckily). Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QBX375BJCACY726YHZOXVUOWCG2EH3GI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 1:46 AM Anders Munch wrote: >> How about swapping around "locale" and None? Inada Naoki wrote: > > I thought it, but it may not work. Consider about function like this: > > ``` > def read_text(self, encoding=None): > with open(self._filename, encoding=encoding) as f: > return f.read() > ``` > > If `encoding=None` suppresses the warning, functions like this never warned. I don't see why they should be. The author clearly knew about the encoding argument to open, they clearly intended for a None value to be given in some cases, and at the time of writing None meant to use a locale-dependent encoding. > We are not discussing about changing default encoding for now. The section "Prepare to change the default encoding to UTF-8" gave me the impression that this was meant as a stepping stone on the way to doing just that. If that was not the intention, my apologies for the misread. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2VWZMIBG2VLASF7NCKDEJ5I22PXWI7D7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, 9 Feb 2021 at 16:52, Anders Munch wrote: >> How about swapping around "locale" and None? That is, make "locale" the new >> default that emits a warning, and encoding=None emits no warning. That has >> the advantage that old code can be updated to say encoding=None, and then it >> will work on both old and new Pythons without warning. Paul Moore [mailto:p.f.mo...@gmail.com] > I don't understand why working code should have to change *twice*. The idea is to make is so that working code only needs to change once, even when supporting multiple Python versions. That one change is to add either an explicit encoding=None (for backwards-compatibility) or an explicit encoding='utf-8' (because that was intended all along). No twice about it, one change. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BTIXV47LDYPKNIGWNRQSG6LXG4DORS7W/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 11:14 PM Anders Munch wrote: > > This program runs just fine on 3.8.7 Windows, against a file.txt that > contains latin-1 text: > > with open('file.txt', 'rt') as f: > print(f.read()) > > But if I change it to this: > > with open('file.txt', 'rt', encoding='utf-8') as f: > print(f.read()) > > then it fails with UnicodeDecodeError. How it that backwards compatible? > There are several ways: * encoding="latin1" -- This is the best. Works perfectly. * Don't touch -- You don't need to enable EncodingWarning. * encoding=locale.getpreferredencoding(False) -- Backward compatible. But doesn't work if you enabled UTF-8 mode. * encoding="mbcs" -- Backward compatible. Works even when you enabled UTF-8 mode. But it doesn't work only on Windows. Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XPBVG5GU37UDQPDTZIFIGI2WOFYHYQBU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Victor Stinner [mailto:vstin...@python.org] wrote: > encoding="utf8" is backward compatible and is likely to fix encoding bugs > when the locale encoding is not UTF-8 This program runs just fine on 3.8.7 Windows, against a file.txt that contains latin-1 text: with open('file.txt', 'rt') as f: print(f.read()) But if I change it to this: with open('file.txt', 'rt', encoding='utf-8') as f: print(f.read()) then it fails with UnicodeDecodeError. How it that backwards compatible? regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SDFLXIW64ESKDBARCHC2A2JA4NFPBZ2Y/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 5:00 PM Paul Moore wrote: > > Let's just assume until you can convince me that setting UTF-8 mode > globally is a good idea, Oh, you misunderstood me. My proposal is not setting UTF-8 mode globally. What I proposed is setting UTF-8 mode per env (e.g. installation, venv, or conda env). But this is off topic. The thread for this topic is here. https://mail.python.org/archives/list/python-id...@python.org/thread/LQVK2UKPSOI2AHYFUWK6ZII2U6QKK6BP/ Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZXPBI3WSZ6FCAWWKXNBRNKYXUXUG5FEH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, 10 Feb 2021 at 01:29, Inada Naoki wrote: > Note that many Python users don't use consoles. I never suggested that they did. There's a GUI for setting user-level and system-level environment variables. And whoever is introducing the user to Python can show them how to set the necessary environment variable - or do it for them. Please be clear, I'm not saying I don't understand the difficulties. But I do question why PYTHONUTF8 is so much different than all the other environment variables that Python responds to that it needs special additional options. Remember - I've already said that I'm not convinced that setting UTF8 mode globally is the right approach. So what you're saying in effect is that you want to convince me that we should add a new mechanism to globally set an option that I don't believe should be set globally. Let's just assume until you can convince me that setting UTF-8 mode globally is a good idea, there's no point trying to convince me that we need a new mechanism to do so because environment variables aren't good enough. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4UHPNJVJ3ACKREUH7WLEAD5ZB4IR5R2K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 9, 2021 at 11:29 PM Terry Reedy wrote: > > On 2/9/2021 8:28 PM, Inada Naoki wrote: > > > Note that many Python users don't use consoles. > > Those of use who do may find it hard to imagine just how easy we have > made computing. > > My daughter minored in Computer Science about 6 years ago. She never > saw Command Prompt until the summer after her Junior year when she > watched me use it to install numpy and other packages for her. I had to > do it because 'Run pip install numpy', etc, was met with a blank stare. > I had taught her Python with IDLE, downloaded and install with a > browser, and had neglected to teach her 'Dos' until then. > > So had her CS classes. Those previous used Racket in a Dr. something > environment and Java in, I believe, Eclipse. Also downloaded and > installed with a browser. Speaking as a current CS undergraduate student here (senior graduating in December 2021). At my university, the freshman/sophomore-level programming classes do not assume or expect any type of command line knowledge. They all rely on GUI tools (Eclipse, IntelliJ, or NetBeans for the freshman Java courses, Visual Studio for Data Structures in C++). There is one course, typically taken in either the second or third semester for traditional students, called Operating Systems Concepts and Usage, that broadly discusses how operating systems function, but is also designed as a first introduction to Linux and to the command line. (Until this point, the only operating system students are assumed to be familiar with is Windows.) For many students, this course is their first ever exposure to the command prompt. After that, students in this program don't generally *need* to touch the command line again in their studies until they hit 4000-level courses, and even then only a few courses require it. Outside of that one introductory course, I've only had two courses so far that actually required command line usage. Everything else so far has offered GUI options, even many upper level courses. I think it's a disservice to fail to expose students to the command line more and earlier, but the fact is, that failure happens and happens often, and developers need to be conscious of that. Despite my own ease and comfort with the command-line (which dates back to learning my way around DOS at the age of 5) to the point of almost always having a terminal window open on my daily Debian machine, I frequently find myself opting for point-and-click solutions to common problems, even Git operations (which are so easy and powerful in VS Code with the GitLens extension). GUI tools grow more powerful by the day, and it's very easy to get deep into a computer science program these days and not be comfortable with the command line and/or not know how to change environment variables. Python, as a common introductory language used by many thousands of people who have never taken a university computer course, never mind majoring in computer science, shouldn't have basic features that depend on the likely false assumption that the user has ever seen a command prompt or an environment variable, much less comprehend how to use them. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UKGADFE6OK66BNUPE36NVHXBZZSOR7OD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On 2/9/2021 8:28 PM, Inada Naoki wrote: Note that many Python users don't use consoles. Those of use who do may find it hard to imagine just how easy we have made computing. My daughter minored in Computer Science about 6 years ago. She never saw Command Prompt until the summer after her Junior year when she watched me use it to install numpy and other packages for her. I had to do it because 'Run pip install numpy', etc, was met with a blank stare. I had taught her Python with IDLE, downloaded and install with a browser, and had neglected to teach her 'Dos' until then. So had her CS classes. Those previous used Racket in a Dr. something environment and Java in, I believe, Eclipse. Also downloaded and installed with a browser. They just starts Jupyter Notebook, or they just write .py file and run it in the Minecraft mods. Also similar. -- Terry Jan Reedy ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/O5SS7I2H2GRCM2YBOVYX7CWUYTX2J7AO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 5:50 AM Paul Moore wrote: > > On Tue, 9 Feb 2021 at 16:54, Inada Naoki wrote: > > > > On Tue, Feb 9, 2021 at 9:31 PM Paul Moore wrote: > > > > > > Personally, I'm not at all keen on the idea of making users always > > > specify encoding in the first place, even if it's "just for the > > > transition". > > > > I agree with you. But as I wrote in the PEP, omitted encoding caused > > much troubles already. > > Windows users can not just `pip install somepkg` because some library > > authors write `long_description=open("README.md").read()` in setup.py. > > > > I am trying to fix this situation by two parallel approaches: > > > > * (This PEP) Provide a tool for finding this type of bugs, and > > recommend `encoding="utf-8"` for cross-platform library authors. > > * (Author thread) Make UTF-8 mode more usable for Windows users, > > especially students. > > Thanks for explaining (again). There's so much debate, across multiple > proposals, that I can barely follow it. I'm impressed that you're > managing to keep things straight at all :-) > > I guess my views on this PEP come down to > > * I see no harm in having a tool that helps developers spot > platform-specific assumptions about encoding. > * Realistically, I'd be surprised if developers actually use such a > tool. If they were likely to do so, they could probably just as easily > locate all the uses of open() in their code, and check that way. So > I'm not sure this proposal is actually worth it, even if the end > result would be very beneficial. > * In the setup.py case, why don't those same Windows users complain > that the library fails to install? A quick bug report, followed by a > simple fix, seems more likely to happen than the developer suddenly > deciding to scan their code for encoding issues. > Yes, some issues are solved already. On the other hand, there are dozen question about UnicodeDecodeError in Q sites like Stack Overflow. Many people don't know what the error means, and how to report it correctly. I sometime set PYTHONWARNINGS=deafult in my bashrc and find DeprecationWarnings in libraries I am using, and report them. On the other hand, it is difficult to find omitted `encoding="utf-8"`, because I use macOS and Linux in daily development. If there is PYTHONWARNENCODING, I can write `export PYTHONWARNENCODING=1` in my .bashrc. > Regarding the wider question of UTF8 as default, my views can probably > be summarised as follows: > > * If you want to write correct code to deal with encodings, there is > no substitute for carefully considering every bytes/string conversion, > deciding how you are going to identify the encoding to use, and then > specifying that encoding explicitly. Default values for encodings have > no place in such code. > * In reality, though, that's far too much work for many situations. > Default encodings are a necessary convenience, particularly for simple > scripts, or for people who can't, or don't want to, do the analysis > that the "correct" approach implies. Yes. and the UTF-8 is the default encoding for s.encode() already. > * Picking the right default is *hard*. Changing the default is even > harder, unfortunately. > * I feel that we already have a number of mechanisms (PEPs 538 and > 540) trying to tackle this issue. Adding yet more suggests to me that > we'd be better off pausing and working out why we still have an issue. > We should be moving towards *fewer* mechanisms, not more. > * We have UTF-8 mode, and users can set it per-process (via flag or > environment variable) per-user or per-site (by environment variable). > I don't honestly believe that a user (whatever OS they work on) who is > capable of writing Python code, can't be shown how to set an > environment variable. I see no reason to suggest we need yet another > way to set UTF-8 mode, or that a per-interpreter or per-virtualenv > setting is particularly crucial (suggestions that have been made in > the Python-Ideas threads). Note that many Python users don't use consoles. They just starts Jupyter Notebook, or they just write .py file and run it in the Minecraft mods. > * UTF-8 is likely to be the most appropriate default encoding for > Python in the longer term, and I agree that Windows is fast > approaching the point where a UTF-8 encoding is more appropriate than > the ANSI codepage for "new stuff". But there's a lot of legacy files > and applications around, and I suspect that a UTF-8 default will > inconvenience a lot of people working with such data. But equally, > such people may not be in a huge rush to switch to the latest Python > version. Whichever way we go, though, some people will be > inconvenienced. > > I'm also somewhat bemused by the rather negative view of "Windows > beginners" that lies behind a lot of these discussions. People's > experiences may well differ, but the people I see using (and learning) > Python on Windows are often experienced computer users, maybe > developers with significant
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, 9 Feb 2021 at 16:54, Inada Naoki wrote: > > On Tue, Feb 9, 2021 at 9:31 PM Paul Moore wrote: > > > > Personally, I'm not at all keen on the idea of making users always > > specify encoding in the first place, even if it's "just for the > > transition". > > I agree with you. But as I wrote in the PEP, omitted encoding caused > much troubles already. > Windows users can not just `pip install somepkg` because some library > authors write `long_description=open("README.md").read()` in setup.py. > > I am trying to fix this situation by two parallel approaches: > > * (This PEP) Provide a tool for finding this type of bugs, and > recommend `encoding="utf-8"` for cross-platform library authors. > * (Author thread) Make UTF-8 mode more usable for Windows users, > especially students. Thanks for explaining (again). There's so much debate, across multiple proposals, that I can barely follow it. I'm impressed that you're managing to keep things straight at all :-) I guess my views on this PEP come down to * I see no harm in having a tool that helps developers spot platform-specific assumptions about encoding. * Realistically, I'd be surprised if developers actually use such a tool. If they were likely to do so, they could probably just as easily locate all the uses of open() in their code, and check that way. So I'm not sure this proposal is actually worth it, even if the end result would be very beneficial. * In the setup.py case, why don't those same Windows users complain that the library fails to install? A quick bug report, followed by a simple fix, seems more likely to happen than the developer suddenly deciding to scan their code for encoding issues. Regarding the wider question of UTF8 as default, my views can probably be summarised as follows: * If you want to write correct code to deal with encodings, there is no substitute for carefully considering every bytes/string conversion, deciding how you are going to identify the encoding to use, and then specifying that encoding explicitly. Default values for encodings have no place in such code. * In reality, though, that's far too much work for many situations. Default encodings are a necessary convenience, particularly for simple scripts, or for people who can't, or don't want to, do the analysis that the "correct" approach implies. * Picking the right default is *hard*. Changing the default is even harder, unfortunately. * I feel that we already have a number of mechanisms (PEPs 538 and 540) trying to tackle this issue. Adding yet more suggests to me that we'd be better off pausing and working out why we still have an issue. We should be moving towards *fewer* mechanisms, not more. * We have UTF-8 mode, and users can set it per-process (via flag or environment variable) per-user or per-site (by environment variable). I don't honestly believe that a user (whatever OS they work on) who is capable of writing Python code, can't be shown how to set an environment variable. I see no reason to suggest we need yet another way to set UTF-8 mode, or that a per-interpreter or per-virtualenv setting is particularly crucial (suggestions that have been made in the Python-Ideas threads). * UTF-8 is likely to be the most appropriate default encoding for Python in the longer term, and I agree that Windows is fast approaching the point where a UTF-8 encoding is more appropriate than the ANSI codepage for "new stuff". But there's a lot of legacy files and applications around, and I suspect that a UTF-8 default will inconvenience a lot of people working with such data. But equally, such people may not be in a huge rush to switch to the latest Python version. Whichever way we go, though, some people will be inconvenienced. I'm also somewhat bemused by the rather negative view of "Windows beginners" that lies behind a lot of these discussions. People's experiences may well differ, but the people I see using (and learning) Python on Windows are often experienced computer users, maybe developers with significant experience in Java or other "enterprise languages", or data scientists who have a lot of knowledge of computers, but are relatively new to programming. Or systems admins, or database specialists, who want to use Python to write scripts on Windows. None of those people fit the picture of people who wouldn't know how to set an environment variable, or configure their environment. On the other hand, (in my experience) they often don't really have much knowledge of character encodings, and tend to just use whatever default their PC uses, and expect it to work. They *can*, however, understand when an encoding problem is explained to them, and can set an explicit encoding once they know they need to. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 9, 2021 at 5:51 PM Anders Munch wrote: > Victor Stinner [mailto:vstin...@python.org] wrote: > > The warning can explicitly suggest to use encoding="utf8", it should work > > in almost all cases. > > The warning should also explain how to get backwards-compatible behaviour, > i.e. suggest encoding="locale". encoding="utf8" is backward compatible and is likely to fix encoding bugs when the locale encoding is not UTF-8. It is likely what the developer expected, without knowing that open(filename) does not always use UTF-8. See PEP 597 rationale. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/L2H34EXNE7XUQX3XILLPDNGOQHVK6ENR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 1:46 AM Anders Munch wrote: > > > Inada Naoki wrote: > > This warning is opt-in warning like BytesWarning. > > What use is a warning that no-one sees? At least, I see. We can fix stdlib and tests first, and fix some major tools too. After that, `encoding="locale"` becomes backward/forward compatible at some point. > When the default is switched to encoding="utf8", it will break software, and > people need to be warned of that. > UnicodeDecodeError's will abound when files that used to be read in a > single-byte encoding fails to decode as utf-8. All it takes is a single é. > If the default encoding is ever to change, there's no way around a noisy > warning. > Please read the PEP and some my posts in this threads. We are not discussing about changing default encoding for now. This PEP provides a tool to find missing `encoding="utf-8"` bug for now. The goal of the PEP is encourage `encoding="utf-8"` when the user assumes encoding is UTF-8. If we decide to change the default encoding. EncodingWarning can be used to discourage omitting the `encoding` option. But it is out of scope of the PEP. We don't discourage omitting encoding option in Python 3.10. > How about swapping around "locale" and None? That is, make "locale" the new > default that emits a warning, and encoding=None emits no warning. That has > the advantage that old code can be updated to say encoding=None, and then it > will work on both old and new Pythons without warning. > I thought it, but it may not work. Consider about function like this: ``` def read_text(self, encoding=None): with open(self._filename, encoding=encoding) as f: return f.read() ``` If `encoding=None` suppresses the warning, functions like this never warned. So I think current PEP is better. If users want to use locale encoding, they don't need to fix the warning anytime soon. They can wait to drop Python 3.9 support. If they want to fix all warnings soon, they can `encoding=locale.getpreferredencoding(False)`. Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4Q74PW673RMBMQTDZXHTVE6X7FT6DSAL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, 9 Feb 2021 at 16:52, Anders Munch wrote: > How about swapping around "locale" and None? That is, make "locale" the new > default that emits a warning, and encoding=None emits no warning. That has > the advantage that old code can be updated to say encoding=None, and then it > will work on both old and new Pythons without warning. I don't understand why working code should have to change *twice*. I'm fine with the idea that people *actually* relying on the current default will need to switch when the default changes, but making them change once to silence the warning and then again to explicitly select the old default is pretty annoying. If we don't want people to use the default encoding, we should just make encoding a required argument and stop pretending. If omitting the encoding and using the default is intended to be a supported usage, then we should *not* penalise people doing that. Changing the default is a backward-incompatible change, that's enough of an inconvenience. Changing the (behaviour of the) default *twice* is just making things worse. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/36HJRYU6R6NEDZY7QSKS3DEKRY6OLTI4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 9, 2021 at 9:31 PM Paul Moore wrote: > > Personally, I'm not at all keen on the idea of making users always > specify encoding in the first place, even if it's "just for the > transition". I agree with you. But as I wrote in the PEP, omitted encoding caused much troubles already. Windows users can not just `pip install somepkg` because some library authors write `long_description=open("README.md").read()` in setup.py. I am trying to fix this situation by two parallel approaches: * (This PEP) Provide a tool for finding this type of bugs, and recommend `encoding="utf-8"` for cross-platform library authors. * (Author thread) Make UTF-8 mode more usable for Windows users, especially students. > If we want to switch the default encoding from the locale encoding to > UTF-8, we should find a way to do that which *doesn't* mean that > there's a "transitional" state where using the default is considered > bad practice. That helps no-one, and just adds confusion, which will > last far longer than that one release (there will be people > encountering StackOverflow questions on the topic long after the > default has changed). > > Maybe we just have to accept that we can't work out what people are > intending, and just state in advance in the documentation that the > default will change, then it's documented as an upcoming breaking > change that people can address (if they read the release notes, but we > seem to be assuming they'll spot a warning, so why not assume they > read the release notes, too?). > This PEP doesn't cover how to change the default encoding. So this is slightly off topic. I have two ideas for changing the default encoding: (a) Regular deprecation period: Emitting EncodingWarning by default (3.14 or later), and change the default encoding later (3.17 or later). (b) Enable UTF-8 mode default on Windows. Users can disable UTF-8 mode for backward compatibility. Steve Dower againsted to (b) very strongly. He suggested to emit DeprecationWarning. https://discuss.python.org/t/pep-597-enable-utf-8-mode-by-default-on-windows/3122/16 On the other hand, some core-dev don't like emitting Warning for all omitted `encoding` option. So I don't have strong opinion about which approach is better. I want to see how EncodingWarning and UTF-8 mode are adopted. I want to implement both EncodingWarning and per-site UTF-8 mode setting in Python 3.10. 5+ years later, we will see which approach is adopted by users. * If EncodingWarning is widely adopted by many developers, we can discuss approach (a). * If UTF-8 mode becomes the best practice for Windows users, we can discuss approach (b). Regards, -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DY4OPCBKHHRJZMXEJ43MXPNXJ4EUS6MM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Victor Stinner [mailto:vstin...@python.org] wrote: > The warning can explicitly suggest to use encoding="utf8", it should work in > almost all cases. The warning should also explain how to get backwards-compatible behaviour, i.e. suggest encoding="locale". Inada Naoki wrote: > This warning is opt-in warning like BytesWarning. What use is a warning that no-one sees? When the default is switched to encoding="utf8", it will break software, and people need to be warned of that. UnicodeDecodeError's will abound when files that used to be read in a single-byte encoding fails to decode as utf-8. All it takes is a single é. If the default encoding is ever to change, there's no way around a noisy warning. How about swapping around "locale" and None? That is, make "locale" the new default that emits a warning, and encoding=None emits no warning. That has the advantage that old code can be updated to say encoding=None, and then it will work on both old and new Pythons without warning. regards, Anders ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GZOHZAXKJDRJPF32U2ET5E32SOYXHR5E/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, 9 Feb 2021 at 16:28, Inada Naoki wrote: > > On Wed, Feb 10, 2021 at 1:19 AM Paul Moore wrote: > > > > But people who currently don't specify the encoding, and *don't* have > > any issue (because the system locale is correct) will be getting told > > to introduce a bug into their code, if they follow that advice :-( > > > > This warning is opt-in warning like BytesWarning. > > It will be a good tool to find potential problems for people knows > what is the problem. > But it is not recommended for users who don't understand what is the problem. Ah, OK. I missed that point in the long email chain. Sorry. Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UUX2IR655Y6JOCOQBPHHQTPUUTVFA5XA/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Wed, Feb 10, 2021 at 1:19 AM Paul Moore wrote: > > But people who currently don't specify the encoding, and *don't* have > any issue (because the system locale is correct) will be getting told > to introduce a bug into their code, if they follow that advice :-( > This warning is opt-in warning like BytesWarning. It will be a good tool to find potential problems for people knows what is the problem. But it is not recommended for users who don't understand what is the problem. -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SJKTVKW3DQCPRFRTGOUL73EI6BOGWDFF/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
But people who currently don't specify the encoding, and *don't* have any issue (because the system locale is correct) will be getting told to introduce a bug into their code, if they follow that advice :-( Paul On Tue, 9 Feb 2021 at 16:03, Victor Stinner wrote: > > On Tue, Feb 9, 2021 at 1:31 PM Paul Moore wrote: > > If we can't provide a good recommendation > > to the user on what to do, we shouldn't be warning them that what they > > are currently doing is wrong. > > The warning can explicitly suggest to use encoding="utf8", it should > work in almost all cases. > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HSV6QSJKAUFS7LWZVEZUWTUD5A6DCFFL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 9, 2021 at 1:31 PM Paul Moore wrote: > If we can't provide a good recommendation > to the user on what to do, we shouldn't be warning them that what they > are currently doing is wrong. The warning can explicitly suggest to use encoding="utf8", it should work in almost all cases. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NMHJAPSNE7XI65DKV6EB55HGQW5XRAA6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, 9 Feb 2021 at 11:55, Inada Naoki wrote: > I think only we can do is documenting the option like this: > > """ > EncodingWarning is warning to find missing encoding="utf-8" option. It > is common pitfall that many Windows user > Don't try to fix them if you need to use locale specific encoding. > """ I'm a very strong -1 on having programs generate warnings that the user isn't supposed to fix. If we can't provide a good recommendation to the user on what to do, we shouldn't be warning them that what they are currently doing is wrong. I've seen far too many examples of people thinking "well, users can ignore the warning, it's not shown by default" and then users' code being broken because of a situation we didn't think about (most recently, the Python test suite, which runs the venv tests with warnings converted to errors, which broke on a pip release that contains a deprecation warning from packaging). IMO, if we issue a warning, we *must* be able to advise the user how to fix it. Otherwise we shouldn't be assuming we know what's correct better than the user. Personally, I'm not at all keen on the idea of making users always specify encoding in the first place, even if it's "just for the transition". There are far too many people in my experience who wouldn't have a clue what to do when faced with that decision. And the people (again in my experience) who don't know how to make that choice are *precisely* the people for whom the system-defined default is what they want. Certainly, if they are getting stuff off the internet, they will more often get UTF-8, but I tend to find that people with limited understanding of these issues are much more comfortable with the idea that "stuff off the internet needs weird settings like this UTF-8 thing whatever it is", than they are with the idea that they have to tell Python how to read that text file they just got from their boss, who's still got Windows 7 on his PC... If we want to switch the default encoding from the locale encoding to UTF-8, we should find a way to do that which *doesn't* mean that there's a "transitional" state where using the default is considered bad practice. That helps no-one, and just adds confusion, which will last far longer than that one release (there will be people encountering StackOverflow questions on the topic long after the default has changed). Maybe we just have to accept that we can't work out what people are intending, and just state in advance in the documentation that the default will change, then it's documented as an upcoming breaking change that people can address (if they read the release notes, but we seem to be assuming they'll spot a warning, so why not assume they read the release notes, too?). Paul PS I've hesitated about saying this before, as I'm very aware that being from the UK, any problems I have with encodings are relatively minor, so I want to let the people with real problems have their say. But when we're talking about telling users not to fix warnings, I feel the need to speak up. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5W7YFNY7BLCS25ZICWMH57XH5REITI34/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 9, 2021 at 7:23 PM Victor Stinner wrote: > > I recall that something like 1 year ago, I basically tried to > implement something like your PEP, to see if the stdlib calls open() > without specifying an encoding. There were so many warnings, that the > output was barely readable. > > The warning would only be useful if there is a way to modify the code > to make the warning quiet (fix the issue) without losing support with > Python 3.9 and older. > > I understand that open(filename) must be replaced with open(filename, > encoding=("locale" if sys.version_info >= (3, 10) else None)) to make > it backward and forward compatibility without emitting an > EncodingWarning. I think most of them must be replaced with encoding="ascii" or encoding="utf-8". And encoding=locale.getpreferredencoding(False) is backward/forward compatible way. There is very little difference between encoding=None and encoding=locale.getpreferredencoding(False). But it is not a problem for most use cases. Only applications using PYTHONLEGACYWINDOWSSTDIO and open() for console I/O are affected by difference between them. > One issue is that some people may blindly copy/paste > this code pattern without thinking if "locale" is the proper encoding. > Isn't it same if the code pattern become `encoding=getattr(io, "LOCALE_ENCODING", None)`, or `encoding=locale.getpreferredencoding(False)`? I think only we can do is documenting the option like this: """ EncodingWarning is warning to find missing encoding="utf-8" option. It is common pitfall that many Windows user Don't try to fix them if you need to use locale specific encoding. """ -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YLAC2WJZ2TX7I3I6TSWA4GWPP5NNETUH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Sat, Feb 6, 2021 at 3:26 PM Inada Naoki wrote: > I changed my mind. Since there is no plan to change the default > encoding for now, > no need to encourage `encoding="locale"` soon. > > Until users can drop Python 3.9 support, they can use EncodingWarning > only for finding missing `encoding="utf-8"` or `encoding="ascii"`. > > I will remove the io.LOCALE_ENCODING. I recall that something like 1 year ago, I basically tried to implement something like your PEP, to see if the stdlib calls open() without specifying an encoding. There were so many warnings, that the output was barely readable. The warning would only be useful if there is a way to modify the code to make the warning quiet (fix the issue) without losing support with Python 3.9 and older. I understand that open(filename) must be replaced with open(filename, encoding=("locale" if sys.version_info >= (3, 10) else None)) to make it backward and forward compatibility without emitting an EncodingWarning. One issue is that some people may blindly copy/paste this code pattern without thinking if "locale" is the proper encoding. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6CRWIH6AJ43H2IRQZDJFUSSYUFPDSY3L/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
I send a pull request https://github.com/python/peps/pull/1799 * Add Backward/Forward Compatibility section * Add How to teach this section * Remove io.LOCALE_ENCODING constant -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TRIGYFRJSVSUWFQDYIUZI64BB4J323UN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 2, 2021 at 1:40 PM Inada Naoki wrote: > > On Tue, Feb 2, 2021 at 12:23 AM Victor Stinner wrote: > > > > > > > Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can > > > be used to avoid confusing ``LookupError: unknown encoding: locale`` > > > error when the code is run in old Python accidentally. > > > > I'm not sure that it is useful. I like a simple "locale" literal > > string. If there is a constant is io, people may start to think that > > it's specific and will add "import io" just to get the string > > "locale". > > > > I don't think that we should care too much about the error message > > rased by old Python versions. > > > > This constant not only for replacing "locale" litera. As example code > in the PEP, it can be used to test wheather TextIOWrapper supports > `encoding="locale"` . > > `open(fn, encoding=getattr(io, "LOCALE_ENCODING", None))` works both > for Python ~3.9 and Python 3.10~. > I changed my mind. Since there is no plan to change the default encoding for now, no need to encourage `encoding="locale"` soon. Until users can drop Python 3.9 support, they can use EncodingWarning only for finding missing `encoding="utf-8"` or `encoding="ascii"`. I will remove the io.LOCALE_ENCODING. -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4SRSQQXRLQSXG4RLZGXHFEFTTBVDKPWK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 2, 2021 at 8:16 PM Victor Stinner wrote: > > > > I understand that encoding=locale.get_locale_encoding() would be > > > different from encoding="locale": > > > encoding=locale.get_locale_encoding() doesn't call > > > os.device_encoding(), right? > > > > > > > Yes. > > Would it be useful to add a io.get_locale_encoding(fd)->str (maybe > "get_default_encoding"?) function which gives the chosen encoding from > a file descriptor, similar to open(fd, encoding="locale").encoding? > The os.device_encoding() call is not obvious. > I don't think it's so useful. encoding=None is 99% same to encoding=locale.getpreferedencoding(False). On Unix, os.device_encoding() just returns locale encoding. On Windows, os.device_encoding() is very unlikely used. open() uses WindowsConsoleIO for console unless PYTHONLEGACYWINDOWSSTDIO is set and encoding for it is UTF-8. And that's why I removed the detailed behavior from the PEP. It is too detailed and almost unrelated to EncodingWarning. I wrote a simple comment in this section instead. https://www.python.org/dev/peps/pep-0597/#locale-is-not-a-codec-alias > > > > > Opt-in warning > > > > --- > > > > > > > > Although ``DeprecationWarning`` is suppressed by default, emitting > > > > ``DeprecationWarning`` always when ``encoding`` option is omitted > > > > would be too noisy. > > > > > > The PEP is not very clear. Does "-X warn_encoding" only emits the > > > warning, or does it also display it by default? Does it add a warning > > > filter for EncodingWarning? > > > > > > > This section is not the spec. This section is the rationale for adding > > EncodingWarning instead of using DeprecationWarning. > > > > As spec saying, EncodingWarning is a subclass of Warning. So it is > > displayed by default. But it is not emitted by default. > > > > When -X encoding_warning (or -X warn_default_encoding) is used, the > > warning is emitted and shown unless the user suppresses warnings. > > I understand that EncodingWarning is always displayed by default > (default warning filters don't ignore it, whereas DeprecationWarning > are ignored by default), but no warning is emitted by default. Ok, > that makes sense. Maybe try to say it explicitly in the PEP. > > > > This PEP doesn't have "backward compatibility" section because the PEP > > doesn't break any backward compatibility. > > IMO it's a good thing to always have the section, just to say that you > took time to think about backward compatibility ;-) The section can be > empty, like just say "there is no incompatible change" ;-) > > > > And if developers want to support Python ~3.9 and use -X > > warn_default_encoding on 3.10, they need to write > > `encoding=getattr(io, "LOCALE_ENCODING", None)`, as written in the > > spec. > > Maybe repeat it in the Backward Compatibility section. > > It's important to provide a way to prevent the warning without losing > the support for old Python versions. > will do. > > > > The main question is if it's possible to use encoding="locale" on > > > Python 3.6-3.9 (maybe using some ugly hacks). > > > > No. > > Hum. To write code compatible with Python 3.9, I understand that > encoding=None is the closest to encoding="locale". > > And I understand that encoding=getattr(io, "LOCALE_ENCODING", None) is > backward and forward compatible ;-) > > Well, encoding=None will hopefully remain accepted with your PEP > anyway for lazy developers ;-) > Yes. I don't think this warning is enabled by default in near future. So developers can just use the option to find missing `encoding="utf-8"` bug. > > > Oh, I'm sorry. I want to make it in 3.10. > > Since it doesn't change anything by default, the warning is only > displayed when you opt-in for it, IMO Python 3.10 target is > reasonable. > > Victor > -- > Night gathers, and now my watch begins. It shall not end until my death. -- Inada Naoki ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FZ567UQIEKO5IIVSQPUFCSZJOZBMYD4D/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 2, 2021 at 5:40 AM Inada Naoki wrote: > > In Python 3.10, I added _locale._get_locale_encoding() function which > > is exactly what the encoding used by open() when no encoding is > > specified (encoding=None) and when os.device_encoding(fd) returns > > None. See _Py_GetLocaleEncoding() for the C implementation > > (Python/fileutils.c). > > > > Maybe we should add a public locale.get_locale_encoding() function? On > > Unix, this function uses nl_langinfo(CODESET) *without* setting > > LC_CTYPE locale to the user preferred locale. > > > > I can not imagine any use case. Isn't it just confusing? It's the same than locale.getpreferredencoding(False) but with a more explicit name, no argument and a *sane default behavior* (don't change the LC_CTYPE locale temporarily). The use case is to pass text to the OS (or get text from the OS) when you cannot pass text directly, but must encode it (or decode it) manually. Not all use cases involve files ;-) Example of locale.getpreferredencoding() usage: * XML ElementTree uses locale.getpreferredencoding() when encoding="unicode" is used * Deprecate gettext functions use it to encode to bytes * the cgi module uses it to encode the URL query string for the CGI stdin (GET and HEAD methods) I dislike getpreferredencoding() because by default it changes temporarily the LC_CTYPE locale which affects all threads, and this is bad. Well, it doesn't have to be part of the PEP ;-) > > I understand that encoding=locale.get_locale_encoding() would be > > different from encoding="locale": > > encoding=locale.get_locale_encoding() doesn't call > > os.device_encoding(), right? > > > > Yes. Would it be useful to add a io.get_locale_encoding(fd)->str (maybe "get_default_encoding"?) function which gives the chosen encoding from a file descriptor, similar to open(fd, encoding="locale").encoding? The os.device_encoding() call is not obvious. > > Maybe the PEP should also explain (in a "How to teach this" section?) > > when encoding="locale" is better than a specific encoding, like > > encoding="utf-8" or encoding="cp1252". In my experience, it's mostly > > for the inter-operability which other applications which also use the > > current locale encoding. > > This option is for experts who are publishing cross-platform > libraries, frameworks, etc. > > For students, I am suggesting another idea that make UTF-8 mode more > accessible. Maybe just say that in "How to teach this" section in the PEP? In case of doubt, pass encoding="utf-8". Only use encoding="locale" if you understand that the encoding changes depending on the platform and the user locale. The common issue with encoding="locale" is that files should not be exchanged between two computers. encoding="locale" is good for files which remain local. It's also good for interoperability with other applications which use the locale encoding and with the terminal. > > > Opt-in warning > > > --- > > > > > > Although ``DeprecationWarning`` is suppressed by default, emitting > > > ``DeprecationWarning`` always when ``encoding`` option is omitted > > > would be too noisy. > > > > The PEP is not very clear. Does "-X warn_encoding" only emits the > > warning, or does it also display it by default? Does it add a warning > > filter for EncodingWarning? > > > > This section is not the spec. This section is the rationale for adding > EncodingWarning instead of using DeprecationWarning. > > As spec saying, EncodingWarning is a subclass of Warning. So it is > displayed by default. But it is not emitted by default. > > When -X encoding_warning (or -X warn_default_encoding) is used, the > warning is emitted and shown unless the user suppresses warnings. I understand that EncodingWarning is always displayed by default (default warning filters don't ignore it, whereas DeprecationWarning are ignored by default), but no warning is emitted by default. Ok, that makes sense. Maybe try to say it explicitly in the PEP. > This PEP doesn't have "backward compatibility" section because the PEP > doesn't break any backward compatibility. IMO it's a good thing to always have the section, just to say that you took time to think about backward compatibility ;-) The section can be empty, like just say "there is no incompatible change" ;-) > And if developers want to support Python ~3.9 and use -X > warn_default_encoding on 3.10, they need to write > `encoding=getattr(io, "LOCALE_ENCODING", None)`, as written in the > spec. Maybe repeat it in the Backward Compatibility section. It's important to provide a way to prevent the warning without losing the support for old Python versions. > > The main question is if it's possible to use encoding="locale" on > > Python 3.6-3.9 (maybe using some ugly hacks). > > No. Hum. To write code compatible with Python 3.9, I understand that encoding=None is the closest to encoding="locale". And I understand that encoding=getattr(io, "LOCALE_ENCODING", None) is backward and forward
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
On Tue, Feb 2, 2021 at 12:23 AM Victor Stinner wrote: > > Hi Inada-san, > > I followed the discussions on your different PEP and I like overall > your latest PEP :-) I have some minor remarks. > > On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki wrote: > > The warning is disabled by default. New ``-X warn_encoding`` > > command-line option and ``PYTHONWARNENCODING`` environment variable > > are used to enable the warnings. > > Maybe "warn implicit encoding" or "warn omit encoding" (not sure if > it's make sense written like that in english ;-)) would be more > explicit. > Yes, it's explicit. So I used `PYTHONWARNDEFAULTENCODING` first. But I feel it's unreadable. That's why I shorten the option name. I wait to see more feedback about naming. > > > Options to enable the warning > > -- > > > > ``-X warn_encoding`` option and the ``PYTHONWARNENCODING`` > > environment variable are added. They are used to enable the > > ``EncodingWarning``. > > > > ``sys.flags.encoding_warning`` is also added. The flag represents > > ``EncodingWarning`` is enabled. > > Nitpick: I would prefer using the same name for the -X option and the > sys.flags attribute (ex: sys.flags.warn_encoding). > OK, I will change the flag name same to option name. > > > ``encoding="locale"`` option > > > > > > ``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means > > same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't > > emit ``EncodingWarning`` when ``encoding="locale"`` is specified. > > Can you please define if os.device_encoding(fd) is called if > encoding="locale" is used? It seems so, so it's not obvious from the > PEP. > OK. > > In Python 3.10, I added _locale._get_locale_encoding() function which > is exactly what the encoding used by open() when no encoding is > specified (encoding=None) and when os.device_encoding(fd) returns > None. See _Py_GetLocaleEncoding() for the C implementation > (Python/fileutils.c). > > Maybe we should add a public locale.get_locale_encoding() function? On > Unix, this function uses nl_langinfo(CODESET) *without* setting > LC_CTYPE locale to the user preferred locale. > I can not imagine any use case. Isn't it just confusing? > I understand that encoding=locale.get_locale_encoding() would be > different from encoding="locale": > encoding=locale.get_locale_encoding() doesn't call > os.device_encoding(), right? > Yes. > > Maybe the PEP should also explain (in a "How to teach this" section?) > when encoding="locale" is better than a specific encoding, like > encoding="utf-8" or encoding="cp1252". In my experience, it's mostly > for the inter-operability which other applications which also use the > current locale encoding. > This option is for experts who are publishing cross-platform libraries, frameworks, etc. For students, I am suggesting another idea that make UTF-8 mode more accessible. > > > Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can > > be used to avoid confusing ``LookupError: unknown encoding: locale`` > > error when the code is run in old Python accidentally. > > I'm not sure that it is useful. I like a simple "locale" literal > string. If there is a constant is io, people may start to think that > it's specific and will add "import io" just to get the string > "locale". > > I don't think that we should care too much about the error message > rased by old Python versions. > This constant not only for replacing "locale" litera. As example code in the PEP, it can be used to test wheather TextIOWrapper supports `encoding="locale"` . `open(fn, encoding=getattr(io, "LOCALE_ENCODING", None))` works both for Python ~3.9 and Python 3.10~. > > > > Opt-in warning > > --- > > > > Although ``DeprecationWarning`` is suppressed by default, emitting > > ``DeprecationWarning`` always when ``encoding`` option is omitted > > would be too noisy. > > The PEP is not very clear. Does "-X warn_encoding" only emits the > warning, or does it also display it by default? Does it add a warning > filter for EncodingWarning? > This section is not the spec. This section is the rationale for adding EncodingWarning instead of using DeprecationWarning. As spec saying, EncodingWarning is a subclass of Warning. So it is displayed by default. But it is not emitted by default. When -X encoding_warning (or -X warn_default_encoding) is used, the warning is emitted and shown unless the user suppresses warnings. > > The PEP has no "Backward compatibility" section. Is it possible to > monkey-patch Python to implement this PEP (maybe only partially) on > old Python versions? I'm asking to prepare existing projects for > future EncodingWarning. > This PEP doesn't have "backward compatibility" section because the PEP doesn't break any backward compatibility. Unless the option is enabled, no warnings are emitted by the PEP, like `-b` option and BytesWarning. And if developers want to support Python ~3.9
[Python-Dev] Re: PEP 597: Add optional EncodingWarning
Hi Inada-san, I followed the discussions on your different PEP and I like overall your latest PEP :-) I have some minor remarks. On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki wrote: > The warning is disabled by default. New ``-X warn_encoding`` > command-line option and ``PYTHONWARNENCODING`` environment variable > are used to enable the warnings. Maybe "warn implicit encoding" or "warn omit encoding" (not sure if it's make sense written like that in english ;-)) would be more explicit. > Options to enable the warning > -- > > ``-X warn_encoding`` option and the ``PYTHONWARNENCODING`` > environment variable are added. They are used to enable the > ``EncodingWarning``. > > ``sys.flags.encoding_warning`` is also added. The flag represents > ``EncodingWarning`` is enabled. Nitpick: I would prefer using the same name for the -X option and the sys.flags attribute (ex: sys.flags.warn_encoding). > ``encoding="locale"`` option > > > ``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means > same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't > emit ``EncodingWarning`` when ``encoding="locale"`` is specified. Can you please define if os.device_encoding(fd) is called if encoding="locale" is used? It seems so, so it's not obvious from the PEP. In Python 3.10, I added _locale._get_locale_encoding() function which is exactly what the encoding used by open() when no encoding is specified (encoding=None) and when os.device_encoding(fd) returns None. See _Py_GetLocaleEncoding() for the C implementation (Python/fileutils.c). Maybe we should add a public locale.get_locale_encoding() function? On Unix, this function uses nl_langinfo(CODESET) *without* setting LC_CTYPE locale to the user preferred locale. I understand that encoding=locale.get_locale_encoding() would be different from encoding="locale": encoding=locale.get_locale_encoding() doesn't call os.device_encoding(), right? Maybe the PEP should also explain (in a "How to teach this" section?) when encoding="locale" is better than a specific encoding, like encoding="utf-8" or encoding="cp1252". In my experience, it's mostly for the inter-operability which other applications which also use the current locale encoding. By the way, I recently rewrote the documentation about the encodings used by Python: * https://docs.python.org/dev/glossary.html#term-locale-encoding * https://docs.python.org/dev/glossary.html#term-locale-encoding * https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_encoding * https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.stdio_encoding * https://docs.python.org/dev/library/os.html#utf8-mode > Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can > be used to avoid confusing ``LookupError: unknown encoding: locale`` > error when the code is run in old Python accidentally. I'm not sure that it is useful. I like a simple "locale" literal string. If there is a constant is io, people may start to think that it's specific and will add "import io" just to get the string "locale". I don't think that we should care too much about the error message rased by old Python versions. > Opt-in warning > --- > > Although ``DeprecationWarning`` is suppressed by default, emitting > ``DeprecationWarning`` always when ``encoding`` option is omitted > would be too noisy. The PEP is not very clear. Does "-X warn_encoding" only emits the warning, or does it also display it by default? Does it add a warning filter for EncodingWarning? The PEP has no "Backward compatibility" section. Is it possible to monkey-patch Python to implement this PEP (maybe only partially) on old Python versions? I'm asking to prepare existing projects for future EncodingWarning. The main question is if it's possible to use encoding="locale" on Python 3.6-3.9 (maybe using some ugly hacks). By the way, your PEP has no target Python version ;-) Do you want to get it in Python 3.10 or 3.11? Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YEYN2ZE4AYWFJJDTQZPJHTCXOPA5MGO5/ Code of Conduct: http://python.org/psf/codeofconduct/