On Sat, Jan 11, 2020 at 2:30 AM Andrew Barnert <abarn...@yahoo.com> wrote:
>
> On Jan 10, 2020, at 03:45, Inada Naoki <songofaca...@gmail.com> wrote:
> >
> > Hi, all.
> >
> > I believe UTF-8 should be chosen by default for text encoding.
>
> Correct me if I’m wrong, but I think in Python 3.7 on Windows 10, the 
> filesystem encoding is already UTF-8, and the stdio console files are UTF-8 
> (but under the covers actually wrap the native UTF-16 console APIs instead of 
> using msvcrt stdio), so the only issue is the locale encoding, right?
>

You're right.  It is used by default in many places.  Some examples:

* Opening text files: open("README.md")
* Pipe in text mode: subprocess.check_output(["ls", "-l"], text=True)

> Also, PYTHONUTF8 is only supported on Unix, so presumably it’s ignored if you 
> set it on Windows, right? If so, you need to also add support for it, not 
> just set it in the installer.

PYTHONUTF8 is supported on Windows already.
You can use "set PYTHONUTF8=1" to enable UTF-8 mode.


> One last thing: On Linux, you often use the locale coercion feature instead 
> of the assume-UTF-8 feature. (For example, if you’re running a subprocess and 
> want to ensure its stdout is UTF-8…) Is there an equivalent issue for 
> Windows, or a very different but equally important one that needs to be 
> solved differently, or is there just nothing relevant here?
>

On Windows, there is no way to ensure subprocess to use UTF-8.

* Some application always use UTF-8.
* Some application always use legacy encoding.
* Some application checks GetConsoleOutputCP.  (CLI only)
* Some application have their own setting for stdout encoding.  (e.g.
PowerShell Core)


> > * Windows 10 (1903) adds per-process option to change active code page
> > to UTF-8 and call the system code page "legacy".
>
> If you do that, won’t Python 3.7 already use UTF-8 for the locale, because 
> the active code page is what it sets the startup value to match?

I don't do that.  And I don't think we should do this:

* It can be used only in Windows 10 1903~
* Setting manifest is harder than setting an environment variable.  It
is too difficult to opt-inout.
* It makes "mbcs" encoding to UTF-8 too.  There is no way to use
legacy encoding explicitly.

So I think UTF-8 mode is better than this Windows feature.


-- 
Inada Naoki  <songofaca...@gmail.com>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Z6FWDRORXVWIYMEPTZEM3UC3UUYMQJFD/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to