On 19 August 2016 at 08:05, Chris Barker <chris.bar...@noaa.gov> wrote: > On Thu, Aug 18, 2016 at 6:23 AM, Steve Dower <steve.do...@python.org> wrote: >> >> "You consistently ignore Makefiles, .ini, etc." >> >> Do people really do open('makefile', 'rb'), extract filenames and try to >> use them without ever decoding the file contents? > > > I'm sure they do :-( > > But this has always confused me - back in the python2 "good old days" text > and binary mode were exactly the same on *nix -- so folks sometimes fell > into the trap of opening binary files as text on *nix, and then it failing > on Windows but I can't image why anyone would have done the opposite. > > So in porting to py3, they would have had to *add* that 'b' (and a bunch of > b'filename') to keep the good old bytes is text interface. > > Why would anyone do that?
For a fair amount of *nix-centric code that primarily works with ASCII data, adding the 'b' prefix is the easiest way to get into the common subset of Python 2 & 3. However, this means that such code is currently relying on deprecated functionality on Windows, and if we actually followed through on the deprecation with feature removal, Steve's expectation (which I agree with) is that many affected projects would just drop Windows support entirely, rather than changing their code to use str instead of bytes (at least under Python 3 on Windows). The end result of Steve's proposed changes should be that such code would typically do the right thing across all of Mac OS X, Linux and WIndows, as long as the latter two are configured to use "utf-8" as their default locale encoding or active code page (respectively). Linux and Windows would still both have situations encountered with ssh environment variable forwarding and with East Asian system configurations that have the potential to result in mojibake, where these challenges come up mainly with network communications on Linux, and local file processing on Windows. The reason I like Steve's proposal is that it gets us to a better baseline situation for cross-platform compatibility (including with the CLR and JVM API models), and replaces the status quo with three smaller as yet unsolved problems: - network protocol interoperability on Linux systems configured with a non UTF-8 locale - system access on Linux servers with a forwarded SSH environment that doesn't match the server settings - processing file contents on Windows systems with an active code page other than UTF-8 For Linux, our answer is basically "UTF-8 is really the only system locale that works properly for other reasons, so we'll essentially wait for non-UTF-8 Linux systems to slowly age out of humanity's collective IT infrastructure" For Windows, our preliminary answer is the same as the situation on Linux, which is why Stephen's concerned by the proposal - it reduces the incentive for folks to support Windows *properly*, by switching to modeling paths as text the way pathlib does. However, it seems to me that those higher level pathlib APIs are the best way to encourage future code to be more WIndows friendly - they sweep a lot of these messy low level concerns under the API rug, so more Python 3 native code will use str paths by default, with bytes paths mainly showing in Python 2/3 compatible code bases and some optimised data processing code. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/