Re: [Python-ideas] Fix default encodings on Windows

Nick Coghlan Fri, 19 Aug 2016 00:35:29 -0700

On 19 August 2016 at 08:05, Chris Barker <chris.bar...@noaa.gov> wrote:
> On Thu, Aug 18, 2016 at 6:23 AM, Steve Dower <steve.do...@python.org> wrote:
>>
>> "You consistently ignore Makefiles, .ini, etc."
>>
>> Do people really do open('makefile', 'rb'), extract filenames and try to
>> use them without ever decoding the file contents?
>
>
> I'm sure they do :-(
>
> But this has always confused me - back in the python2 "good old days" text
> and binary mode were exactly the same on *nix -- so folks sometimes fell
> into the trap of opening binary files as text on *nix, and then it failing
> on Windows but I can't image why anyone would have done the opposite.
>
> So in porting to py3, they would have had to *add* that 'b' (and a bunch of
> b'filename') to keep the good old bytes is text interface.
>
> Why would anyone do that?


For a fair amount of *nix-centric code that primarily works with ASCII
data, adding the 'b' prefix is the easiest way to get into the common
subset of Python 2 & 3.

However, this means that such code is currently relying on deprecated
functionality on Windows, and if we actually followed through on the
deprecation with feature removal, Steve's expectation (which I agree
with) is that many affected projects would just drop Windows support
entirely, rather than changing their code to use str instead of bytes
(at least under Python 3 on Windows).

The end result of Steve's proposed changes should be that such code
would typically do the right thing across all of Mac OS X, Linux and
WIndows, as long as the latter two are configured to use "utf-8" as
their default locale encoding or active code page (respectively).

Linux and Windows would still both have situations encountered with
ssh environment variable forwarding and with East Asian system
configurations that have the potential to result in mojibake, where
these challenges come up mainly with network communications on Linux,
and local file processing on Windows.

The reason I like Steve's proposal is that it gets us to a better
baseline situation for cross-platform compatibility (including with
the CLR and JVM API models), and replaces the status quo with three
smaller as yet unsolved problems:

- network protocol interoperability on Linux systems configured with a
non UTF-8 locale
- system access on Linux servers with a forwarded SSH environment that
doesn't match the server settings
- processing file contents on Windows systems with an active code page
other than UTF-8

For Linux, our answer is basically "UTF-8 is really the only system
locale that works properly for other reasons, so we'll essentially
wait for non-UTF-8 Linux systems to slowly age out of humanity's
collective IT infrastructure"

For Windows,  our preliminary answer is the same as the situation on
Linux, which is why Stephen's concerned by the proposal - it reduces
the incentive for folks to support Windows *properly*, by switching to
modeling paths as text the way pathlib does.

However, it seems to me that those higher level pathlib APIs are the
best way to encourage future code to be more WIndows friendly - they
sweep a lot of these messy low level concerns under the API rug, so
more Python 3 native code will use str paths by default, with bytes
paths mainly showing in Python 2/3 compatible code bases and some
optimised data processing code.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

Reply via email to