On Mar 10, 2020, at 08:01, David Mertz <me...@gnosis.cx> wrote:
>> Most real-world UNIX systems only support ASCII-compatible encodings.
>> There's no reason not to solve the problem on such systems by using
>> os.fsdecode().
>
> Huh?!
>
> Is my Ubuntu derivative not "real world"?
>
> 666-tmp % uname -a
> Linux popkdm 5.3.0-7629-generic #31~1581628825~19.10~f90b7d5-Ubuntu SMP Fri
> Feb 14 19:56:45 UTC x86_64 x86_64 x86_64 GNU/Linux
> 667-tmp % touch ✗—Not-ASCII
> 668-tmp % ls ✗*
> ✗—Not-ASCII
Technically your Ubuntu derivative is not a real-world UNIX system, because
it’s not a UNIX system. Only a handful of Linux distros bother to be certified,
because it’s not worth the cost unless you need to sell to some corporate or
government department who have some regulation requiring UNIX.
And practically, I’m pretty sure that’s UTF-8, which is ASCII-compatible: every
byte from 0-127 always means the same thing as it does in ASCII. This means you
can, e.g., do path.split(os.pathsep.encode('ascii')) and know you’re getting
the right behavior. The same thing works for Latin-1 and friends, and the IBM
code pages in the “extended ASCII” group, and so on—those are the kinds of
things Random was presumably talking about, because they are commonly used in
real-world UNIX systems.
There are also things that are not ASCII-compatible but are close. For example,
in Shift-JIS, a couple low bytes have a different meaning than in ASCII, and
many of them can also appear as part of a 2-byte character—but ASCII NUL and
slash still always mean NUL and slash, so you can use it for your Linux
filesystems. (Although you will have a lot of trouble in the shell, because
your backslash escape is now a yen escape, and 64 other characters have the
same byte invisibly as their second byte.)
Things that are not even that ASCII-compatible include UTF-16, EBCDIC code
pages, 80s Atari encoding, etc.; they are not commonly used in real-world UNIX
systems. Which I think was Random’s point.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/TQB2FHMQ76DWXJ6S7RCBEGA3264IERQ3/
Code of Conduct: http://python.org/psf/codeofconduct/