On 3/17/2021 7:34 PM, Ivan Pozdeev via Python-Dev wrote:
On 17.03.2021 20:30, Steve Dower wrote:
On 3/17/2021 8:00 AM, Michał Górny wrote:
How about writing paths as bytestrings in the long term? I think this
should eliminate the necessity of knowing the correct encoding for
the filesystem.
That's what we're trying to do, the problem is that they start as
strings, and so we need to convert them to a bytestring.
That conversion is the encoding ;)
And yeah, for reading, I'd use a UTF-8 reader that falls back to
locale on failure (and restarts reading the file). But for writing, we
need the tools that create these files (including Notepad!) to use the
encoding we want.
I don't see a problem with using a file encoding specification like in
Python source files.
Since site.py is under our control, we can introduce it easily.
We can opt to allow only UTF-8 here -- then we wait out a transitional
period and disallow anything else than UTF-8 (then the specification can
be removed, too).
The only thing we can introduce *easily* is an error when the
(exclusively third-party) tools that create them aren't up to date.
Getting everyone to specify the encoding we want is a much bigger
problem with a much slower solution.
This particular file is probably the worst case scenario, but preferring
UTF-8 and handling existing files with a fallback is the best we can do
(especially since an assumption of UTF-8 can be invalidated on a
particular file, whereas most locale encodings cannot). Once we openly
document that it should be UTF-8, tools will have a chance to catch up,
and eventually the fallback will become harmless.
Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/5B53GCQNYXFBYAHSJKI6I34XAV6S67HN/
Code of Conduct: http://python.org/psf/codeofconduct/