On 16 May 2017, at 14:23, Hans Åberg via Unicode <[email protected]> wrote: > > You don't. You have a filename, which is a octet sequence of unknown > encoding, and want to deal with it. Therefore, valid Unicode transformations > of the filename may result in that is is not being reachable. > > It only matters that the correct octet sequence is handed back to the > filesystem. All current filsystems, as far as experts could recall, use octet > sequences at the lowest level; whatever encoding is used is built in a layer > above.
HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. FAT 8.3 names are also encoded, but the encoding isn’t specified (more specifically, MS-DOS and Windows assume an encoding based on your locale, which could cause all kinds of fun if you swapped disks with someone from a different country, and IIRC there are some shenanigans for Japan because of the use of 0xe5 as a deleted file marker). There are some less widely used filesystems that require a particular encoding also (BeOS’ BFS used UTF-8, for instance). Also, Mac OS X and iOS use UTF-8 at the BSD layer; if a filesystem is in use whose names can’t be converted to UTF-8, the Darwin kernel uses a percent encoding scheme(!) It looks like Apple has changed its mind for APFS and is going with the “bag of bytes” approach that’s typical of other systems; at least, that’s what it appears to have done on iOS. Kind regards, Alastair. -- http://alastairs-place.net

