On Mar 10, 2020, at 08:01, David Mertz <me...@gnosis.cx> wrote:
>> Most real-world UNIX systems only support ASCII-compatible encodings. 
>> There's no reason not to solve the problem on such systems by using 
>> os.fsdecode().
> 
> Huh?!
> 
> Is my Ubuntu derivative not "real world"?
> 
> 666-tmp % uname -a
> Linux popkdm 5.3.0-7629-generic #31~1581628825~19.10~f90b7d5-Ubuntu SMP Fri 
> Feb 14 19:56:45 UTC  x86_64 x86_64 x86_64 GNU/Linux
> 667-tmp % touch ✗—Not-ASCII
> 668-tmp % ls ✗*
> ✗—Not-ASCII

Technically your Ubuntu derivative is not a real-world UNIX system, because 
it’s not a UNIX system. Only a handful of Linux distros bother to be certified, 
because it’s not worth the cost unless you need to sell to some corporate or 
government department who have some regulation requiring UNIX.

And practically, I’m pretty sure that’s UTF-8, which is ASCII-compatible: every 
byte from 0-127 always means the same thing as it does in ASCII. This means you 
can, e.g., do path.split(os.pathsep.encode('ascii')) and know you’re getting 
the right behavior. The same thing works for Latin-1 and friends, and the IBM 
code pages in the “extended ASCII” group, and so on—those are the kinds of 
things Random was presumably talking about, because they are commonly used in 
real-world UNIX systems.

There are also things that are not ASCII-compatible but are close. For example, 
in Shift-JIS, a couple low bytes have a different meaning than in ASCII, and 
many of them can also appear as part of a 2-byte character—but ASCII NUL and 
slash still always mean NUL and slash, so you can use it for your Linux 
filesystems. (Although you will have a lot of trouble in the shell, because 
your backslash escape is now a yen escape, and 64 other characters have the 
same byte invisibly as their second byte.)

Things that are not even that ASCII-compatible include UTF-16, EBCDIC code 
pages, 80s Atari encoding, etc.; they are not commonly used in real-world UNIX 
systems. Which I think was Random’s point.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TQB2FHMQ76DWXJ6S7RCBEGA3264IERQ3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to