Following my bug report yesterday adding a check for JFS, I wanted to supply
some additional information.

The basic problem here is that the dcache code pulls out inode numbers and
then looks them up later.  In older versions of Linux, this was done with
iget().  In recent Linux 2.6 kernels, it's done by faking up a file handle
with type FILEID_INO32_GEN and using the file system's fh_to_dentry()
function.  The limitation on file systems is now primarily which ones
support FILEID_INO32_GEN and the generation==0 hack.

I've done a full audit of the file systems included in the Linux 2.6.35
source tree, and found:

1) uses FILEID_INO32_GEN (should work):
  efs
  exofs
  ext2/3/4
  jffs2
  jfs
  ufs

2) uses FILEID_INO32_GEN (no generation==0 hack, but trivial to add):
  ntfs
  xfs

3) uses custom file handle format:
  btrfs
  ceph
  fat
  fuse
  gfs2
  isofs
  ocfs2
  reiserfs
  udf

It seems to me that making type 3 FSes work would be as “simple” as making
the AFS module use encode_fh() and store the file handle actually generated
by the file system.  This would take slightly more memory, as we'd have to
store the type and length.  Even in the worst case (btrfs with
connectable==true, which we don't have to use), the maximum file handle size
is 40 bytes, so figure 44 bytes extra per dcache file.  If we decide to use
connectable==false (ceph and fat ignore this, but keep their file handles
within the NFSv2 limit of 20 bytes anyway), then we only need 24 extra bytes
per dcache file.

More importantly, this will require quite a few changes throughout the AFS
module code, because it likes to pass around inode numbers.  However, other
systems could also use the change and not be dependent on a single file
system type for AFS cache any more, so this has potentially widespread
benefit.

In any case, I think it would be beneficial to at least do a feature test at
startup time rather than encode specific file system types in afsd as is
currently done.  I propose to do this by calling encode_fh(), checking that
the return type is FILEID_INO32_GEN, setting the generation count to 0, and
calling fh_to_dentry().  If this does not work, we can punt with an error.
 This would enable all type 1 FSes to work immediately (which includes at
least one non-integrated port of ZFS), and type 2 FSes to work if/when
patches get integrated.

Any thoughts?

Reply via email to