Re: [OpenAFS-devel] AFS vs UNICODE

Jeffrey Altman Tue, 06 May 2008 15:43:08 -0700

Dave Botsch wrote:

(1) The same text in Unicode can be represented by different sequences of characters. As a result you could have client A and client B both create a file with the same name that can not be visually distinguished by the end user. Now which one do you open?
If everyone is using UTF-8 encoding, does the above problem still exist? And
by UTF-8, I mean the "real" UTF-8 encoding, not one of the many variants
(which would mean that the oafs client would have to do some translation)?


UTF-8 is simply an encoding of UNICODE.  The problem is with UNICODE.

I suggest you read the UNICODE specification and in particular Annex 15

  http://unicode.org/reports/tr15/

(2) Since the directory lookups are performed using a hash table, a file with the name being searched for might exist but it cannot be found because the input to the hash function on client B is different than the input used to create the entry on client A.
It would be different because of #1 above?

no

Storing file names as opaque octet sequences is broken in other ways. Depending on the character set used on the client the file name might or might not be representable since the octet sequence contains no indication whether the sequence is CP437, CP850, CP1252, ISO Latin-1,
ISO-Latin-9, UTF-7, UTF-8, etc.
So, if we know what sequence we're using...?

How do local filesystems handle this? I might very well create a file on my
ext3 filesystem usbkey with my locale set to ISO Latin-1 then try to access
it from another box with the charset set to UTF-16 (or something completely
different). Or maybe I named the file using some non-arabic character set?


They do not handle it.

I've read that Java uses a "modified UTF-8" which can cause there to be 6
instead of 4 octets per character... how does this not break other
applications from being able to access the files on the local filesystem?


There is nothing modified about UTF-8 encoding requiring up to 6 octets.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS-devel] AFS vs UNICODE

Reply via email to