Roland Kuhn wrote:

Well, certainly. But I find it very irritating that a filesystem should somehow interpret and _change_ a filename based on the assumption of UTF-8 encoding, even if the filename's byte sequence happens to conform to the UTF-8 rules. Why bother? It's much easier and much more portable to regard filenames as opaque byte sequences.

If you do not canonicalize the form that is written to the file server
directory entry there are two problems:

(1) The same text in Unicode can be represented by different sequences of characters. As a result you could have client A and client B both create a file with the same name that can not be visually distinguished by the end user. Now which one do you open?

(2) Since the directory lookups are performed using a hash table, a file with the name being searched for might exist but it cannot be found because the input to the hash function on client B is different than the input used to create the entry on client A.

Storing file names as opaque octet sequences is broken in other ways. Depending on the character set used on the client the file name might or might not be representable since the octet sequence contains no indication whether the sequence is CP437, CP850, CP1252, ISO Latin-1,
ISO-Latin-9, UTF-7, UTF-8, etc.



_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to