On Sat, Jan 11, 2020 at 2:30 AM Andrew Barnert <abarn...@yahoo.com> wrote: > > On Jan 10, 2020, at 03:45, Inada Naoki <songofaca...@gmail.com> wrote: > > > > Hi, all. > > > > I believe UTF-8 should be chosen by default for text encoding. > > Correct me if I’m wrong, but I think in Python 3.7 on Windows 10, the > filesystem encoding is already UTF-8, and the stdio console files are UTF-8 > (but under the covers actually wrap the native UTF-16 console APIs instead of > using msvcrt stdio), so the only issue is the locale encoding, right? >
You're right. It is used by default in many places. Some examples: * Opening text files: open("README.md") * Pipe in text mode: subprocess.check_output(["ls", "-l"], text=True) > Also, PYTHONUTF8 is only supported on Unix, so presumably it’s ignored if you > set it on Windows, right? If so, you need to also add support for it, not > just set it in the installer. PYTHONUTF8 is supported on Windows already. You can use "set PYTHONUTF8=1" to enable UTF-8 mode. > One last thing: On Linux, you often use the locale coercion feature instead > of the assume-UTF-8 feature. (For example, if you’re running a subprocess and > want to ensure its stdout is UTF-8…) Is there an equivalent issue for > Windows, or a very different but equally important one that needs to be > solved differently, or is there just nothing relevant here? > On Windows, there is no way to ensure subprocess to use UTF-8. * Some application always use UTF-8. * Some application always use legacy encoding. * Some application checks GetConsoleOutputCP. (CLI only) * Some application have their own setting for stdout encoding. (e.g. PowerShell Core) > > * Windows 10 (1903) adds per-process option to change active code page > > to UTF-8 and call the system code page "legacy". > > If you do that, won’t Python 3.7 already use UTF-8 for the locale, because > the active code page is what it sets the startup value to match? I don't do that. And I don't think we should do this: * It can be used only in Windows 10 1903~ * Setting manifest is harder than setting an environment variable. It is too difficult to opt-inout. * It makes "mbcs" encoding to UTF-8 too. There is no way to use legacy encoding explicitly. So I think UTF-8 mode is better than this Windows feature. -- Inada Naoki <songofaca...@gmail.com> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z6FWDRORXVWIYMEPTZEM3UC3UUYMQJFD/ Code of Conduct: http://python.org/psf/codeofconduct/