On 16 May 2017, at 14:23, Hans Åberg via Unicode <[email protected]> wrote:
> 
> You don't. You have a filename, which is a octet sequence of unknown 
> encoding, and want to deal with it. Therefore, valid Unicode transformations 
> of the filename may result in that is is not being reachable.
> 
> It only matters that the correct octet sequence is handed back to the 
> filesystem. All current filsystems, as far as experts could recall, use octet 
> sequences at the lowest level; whatever encoding is used is built in a layer 
> above. 

HFS(+), NTFS and VFAT long filenames are all encoded in some variation on 
UCS-2/UTF-16.  FAT 8.3 names are also encoded, but the encoding isn’t specified 
(more specifically, MS-DOS and Windows assume an encoding based on your locale, 
which could cause all kinds of fun if you swapped disks with someone from a 
different country, and IIRC there are some shenanigans for Japan because of the 
use of 0xe5 as a deleted file marker).  There are some less widely used 
filesystems that require a particular encoding also (BeOS’ BFS used UTF-8, for 
instance).

Also, Mac OS X and iOS use UTF-8 at the BSD layer; if a filesystem is in use 
whose names can’t be converted to UTF-8, the Darwin kernel uses a percent 
encoding scheme(!)

It looks like Apple has changed its mind for APFS and is going with the “bag of 
bytes” approach that’s typical of other systems; at least, that’s what it 
appears to have done on iOS.

Kind regards,

Alastair.

--
http://alastairs-place.net


Reply via email to