On Sat, Jan 23, 2021 at 9:22 PM Inada Naoki <songofaca...@gmail.com> wrote:

> On Sun, Jan 24, 2021 at 10:17 AM Guido van Rossum <gu...@python.org>
> wrote:
> >
> > I have definitely seen BOMs written by Notepad on Windows 10.
> >
> > Why can’t the future be that open() in text mode guesses the encoding?
>
> I don't like guessing. As a Japanese, I have seen many mojibake caused
> by the wrong guess.
> I don't think guessing encoding is not a good part of reliable software.
>

I agree that guessing encodings in general is a bad idea and is an avenue
for subtle localization issues - bad things will happen when it guesses
wrong, and it will lead to code that works properly on the developer's
machine and fails for end users. It makes sense for a text editor to try to
guess, because showing the user something is better than nothing (and if it
guesses wrong the user can easily see that, and perhaps take some manual
action to correct it). It does not make sense for a programming language to
guess, because the user cannot easily detect or correct an incorrect guess,
and mistakes will tend to be propagated rather than caught.

On the other hand, if we add `open_utf8()`, it's easy to ignore BOM:
>

Rather than introducing a new `open_utf8` function, I'd suggest the
following:

1. Deprecate calling `open` for text mode (the default) unless an
`encoding=` is specified, and 3 years after deprecation change the default
encoding for `open` to "utf-8-sig" for reading and "utf-8" for writing (to
ignore a BOM if one exists when reading, but to not create a BOM when
writing).
2. At the same time as the deprecation is announced, introduce a new
__future__ import named "utf8_open" or something like that, to opt into the
future behavior of `open` defaulting to utf-8-sig or utf-8 when opening a
file in text mode and no explicit encoding is specified.

I think a __future__ import solves the problem better than introducing a
new function would. Users who already have a UTF-8 locale (the majority of
users on the majority of platforms) could simply turn on the new __future__
import in any files where they're calling open() with no change in
behavior, suppressing the deprecation warning. Users who have a non-UTF-8
locale and want to keep opening text files in that non-UTF-8 locale by
default can add encoding=locale.getpreferredencoding(False) to retain the
old behavior, suppressing the deprecation warning. And perhaps we could
make a shortcut for that, like encoding="locale".

~Matt
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UACU527OLD6DLI5URTMALWVOSPEKKADA/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to