While implementing the UNICODE support for the Windows client it has become clear that the Unicode Normalization issues have already bitten us. Erik Dalén pointed out to me earlier today that there are interoperability issues between MacOS X and Linux clients. MacOS X clients produce file names using Unicode Normalization Form D whereas Linux clients are either using Unicode Normalization Form C or nothing at all. The end result is that when a MacOS X client attempts to open a file created by a Linux client, MacOS X will create the resource fork file using NFD and then fail to open the file because the NFD encoded name does not exist in the AFS directory hash table.

This is going to be fairly easy for the Windows client to adjust to because it already does not rely upon the AFS directory hash table for file name lookups. The Windows client cannot rely on the hash table because it is case sensitive and Windows lookups are case insensitive with a preference for exact matches. On Windows, the client will perform a NFC conversion on all names in the directory that are valid UTF-8 as part of its B+ tree construction process. It will also perform a NFC conversion on all symlink targets. Finally, all strings provided by the operating system will be NFC converted before performing the directory lookup or creating a symlink.

The question is what to do about the clients on other platforms that do rely on the AFS directory hash table and on the file servers.

The reason that we want to use NFC is because the resulting strings are shorter and therefore the effective length of file names can be longer.

Whatever we do we are going to have an interop problem on MacOS X but since upgrading MacOS X clients is so much easier to do than other platforms I will suggest that we bite the bullet there.

Proposal:

  1. MacOS X and Linux clients begin to apply NFC to all UTF-8 strings
     obtained from the operating system whether for directory lookup,
     object creation, or symlink target creation.
  2. Implement NFC conversion in the Salvager.  This will apply to all
     names in directories and will require that directory hash tables
     be fixed when a name is changed to NFC.  It will also have to
     apply to symlink targets.
  3. In the File Server, apply NFC conversion to the names provided in
     CreateFile, Link and Symlink RPCs as well as the targets in the
     Link and Symlink RPCs.

The real problem with this problem is that once the new file server is deployed and the salvager is run against the volumes the existing MacOS X clients will fail to be able to read any files in AFS. If anyone has an idea of how to address the Unicode normalization problem going forward that doesn't result in an interop failure for existing clients, please say something.

Jeffrey Altman





Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to