On Mon, 25 Jan 2021 at 20:02, Christopher Barker <python...@gmail.com> wrote: > using a system setting as a default is a really bad idea in this day of > interconnected computers.
I'd mildly dispute this. There are (significant) downsides with the default behaviour being system-dependent, yes, but there are *also* disadvantages in having Python not behave consistently with other tools/programs on the same system. However, on POSIX, things are generally consistent, and *already* default to UTF-8. So the proposal is mostly going to affect Windows. And on Windows, there's not much consistency even on a single machine at the moment. Between OEM and ANSI codepages, and other tools that default to UTF-8 "because that's the future", there's not much platform consistency for Python to conform to anyway... > But back to PEP 597, and how to get there: > > 1) We need to start with a consensus about where we want Python to be in N > versions. That is not specifically laid out in the PEP but it does imply that > in the sometime-long-in-the-future: > > - TextIOWrapper will have utf-8 as the default, rather than > `locale.getpreferredencoding(False)` > this behaviour will then be inherited by: > - `open()` without a binary flag in the mode > > - `Path.read_text` > - there will be a string that can be passed to encoding that will indicate > that the system default should be used. > > (and any other utility functions that use TextIOWrapper) > > Forgive me if there is already a consensus on this -- but this discussion has > brought up some thoughts. There's a fundamental assumption here that I think needs to be made explicit. Which is that we're assuming that whatever N happens to be, we anticipate that `locale.getpreferredencoding(False)` will still be something other than UTF-8. That's *already* false on most POSIX systems, and TBH I get the impression that Microsoft is pushing quite hard to move Windows 10 to a UTF-8 by default position (although "fast" in Microsoft terms may still be slow to the rest of us ;-)) So I think that the real question here is "do we want to move Python to "UTF8-by-default" faster than the OS vendors are going? And I think that the answer to that is much less obvious. It probably also depends heavily on your locale - I doubt it's an accident that Inada-san¹ is proposing this, and he's from Japan :-) Personally, as an English speaker based in the UK, I'll be happy when UTF-8 is the default everywhere, but I can live with the status quo until that happens. But I'm not the main target for this change. > 1) As TextIOWrapper is an "implementation detail" for most Python developers, > maybe it shouldn't have a default encoding at all, and leave the default > implementation(s) up to the helper functions, like open() and > Path.read_text() -- that would mean changes in more places, but would allow > different utility functions to make different choices. *shrug*. That sounds plausible, but it's a backward compatibility break that doesn't offer any significant benefits, so I suspect it's not worth doing in practice. > 2) Inada proposed an open_text() function be introduced as a stepping stone, > with the new behaviour. This led to one person asking if that would imply a > open_binary() function as well. An answer to that was no -- as no one is > suggesting any changes to open()'s behavior for binary files. > However, I kind of like the idea. We now have two (at least) different file > objects potentially returned by open(): TextIOWrapper, and > BufferedReader/Writer. And the TextIOWrapper has some pretty different > behavior. I *think* that in virtually all cases, when the code is written, > the author knows whether they want a binary or text file, so it may make > sense to have two different open() functions, rather than having the Type > returned be a function of what mode flags are passed. > > This would make it easier for people (and tools) to reason about the code > with static analysis: > > e.g.: > > open_text().read() would return a string > open_binary().read() would return bytes These are good arguments for having explicit open_text and open_binary functions. I don't *like* the idea, because they feel unnecessarily verbose to me, but I can accept that this might just be because I'm used to open(). I do think that having open_text, but *not* having open_binary, would be a bit confusing. Particularly as pathlib has read_text and read_binary, so it would be inconsistent as well. > This would also make the path to a future with different defaults smoother -- > plain "open" gets deprecated -- any new code uses one of the open_* > functions, and that new code will never need to be changed again. > > Back in the day, a single open() function made more sense. After all, the > only difference in the result for binary mode was that linefeed translation > was turned off (and the C legacy of course). In fact, this did lead to > errors, when folks accidentally left off the 'b', and tested only on *nix > systems. That, at least, is less of an issue now; as the text and binary > objects are more different, you are far more likely to get errors right away > -- but still at run time -- static analysis is still tricky. This, on the other hand, I'm unequivocally against. The sheer quantity of breakage that would be caused by deprecating open() makes this a complete non-starter. Even if we only "deprecate in documentation", we'd be invalidating huge amounts of advice, books and training materials. > On to: > > > Path.open() was added in Python 3.4. Path.read_text() and >> >> Path.write_text() was added in Python 3.5. >> Their history is shorter than built-in open(). Changing its default >> encoding should be easier than built-in open and TextIOWrapper. >> New default encodings are: >> >> * read_text() default encoding is "utf-8-sig" >> * write_text() default encoding is "utf-8" >> * open() default encoding is "utf-8-sig" when mode is "r" or None, >> "utf-8" otherwise. > >> How do you think this idea? > > +1 there is a lot less legacy with Path -- we can move faster. And I honestly > still wonder if making utf-8 the default with cause or fix more bugs :-) But having open(filename) do something different than Path(filename).open() seems like it's asking for trouble. It would be a source of a lot of unexpected bugs for people migrating from filenames as strings to pathlib, and the *last* thing you want during a migration is having to track down unexpected behavioural differences you hadn't planned for. > A thought on that -- there is currently both kinds of code "in the wild": > (A) code that uses the default, when they really want utf-8 -- currently a > bug, won't be a bug in the future. > (B) code that uses the default when it really does want the system encoding. > -- currently correct, will become a bug in the future > > It's anyone's guess which of these is more common, but one thing to consider > is that (A) is a hidden bug that might reveal itself in the hands of end > users who knows when in the future. Whereas (B) will be a bug that is likely > to reveal itself fairly quickly (though perhaps also in the (confused) hands > of end users as well) There's also (C) code that uses the default, where that default is already UTF-8. Which is probably most non-Windows systems. Those have no bug, and this change will make no difference to them. Also, (A) is "currently a bug, won't be a bug when the system encoding switches to UTF-8", whereas (B) is "currently correct, will remain correct when the system default becomes UTF-8". So switching Python's default can be seen as: (A) removes an existing bug a bit sooner. (B) introduces a bug which will go away again when the system switches to UTF-8 or the user changes their code. (C) makes no difference. Frankly, I don't think there's a good answer here, and there will likely be as many opinions as there are participants in the discussion. Paul ¹ I'm not 100% clear on what the polite form of address is for Japanese names, please let me know if I should be using a different form :-) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VKDWSFDU4WTP3BTPO3LQKVQQDKGOPWDU/ Code of Conduct: http://python.org/psf/codeofconduct/