On Fri, Jun 8, 2018 at 3:10 AM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 2018-06-07 08:45, Chris Angelico wrote: >> Under Linux, a file name contains bytes, most commonly representing >> UTF-8 sequences. So... an ASCIIZ string *can* contain that character, >> or at least a representation of it. Yet it cannot contain "\0". >> > I've seen a variation of UTF-8 that encodes U+0000 as 2 bytes so that a zero > byte can be used as a terminator. > > It's therefore not impossible to have a version of Linux that allowed a > (Unicode) "\0" in a filename.
Considering that Linux treats filenames as raw bytes, that's not surprising. The mangled encoding you refer to is a horrendous cheat, though, and violates several of the design principles of UTF-8, so I do not recommend it EVER. The correct way for Python to handle and represent such a file name would be to use the U+DCxx range to carry the bytes through unchanged - not using "\0". ChrisA -- https://mail.python.org/mailman/listinfo/python-list