Re: [OpenAFS-devel] AFS vs UNICODE

Jeffrey Altman Tue, 06 May 2008 14:02:10 -0700

Roland Kuhn wrote:

Well, certainly. But I find it very irritating that a filesystem shouldsomehow interpret and _change_ a filename based on the assumption ofUTF-8 encoding, even if the filename's byte sequence happens to conformto the UTF-8 rules. Why bother? It's much easier and much more portableto regard filenames as opaque byte sequences.


If you do not canonicalize the form that is written to the file server
directory entry there are two problems:

(1) The same text in Unicode can be represented by different sequencesof characters. As a result you could have client A and client B bothcreate a file with the same name that can not be visually distinguishedby the end user. Now which one do you open?

(2) Since the directory lookups are performed using a hash table, a filewith the name being searched for might exist but it cannot be foundbecause the input to the hash function on client B is different than theinput used to create the entry on client A.

Storing file names as opaque octet sequences is broken in other ways.Depending on the character set used on the client the file name might ormight not be representable since the octet sequence contains noindication whether the sequence is CP437, CP850, CP1252, ISO Latin-1,

ISO-Latin-9, UTF-7, UTF-8, etc.



_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] AFS vs UNICODE

Reply via email to