Jeffrey Hutzelman wrote:



On Wednesday, February 23, 2005 05:13:52 PM -0800 Mike Fedyk <[EMAIL PROTECTED]> wrote:

Jeffrey Hutzelman wrote:

AFS does copy-on-write at the per-vnode layer.  Each vnode has
metadata which is kept in the volume's vnode indices; among other
things, this includes the identifier of the physical file which
contains the vnode's contents (for the inode fileserver, this is an
inode number; for namei it's a 64-bit "virtual inode number" which can
be used to derive the filename). The underlying inode has a link count
(in the filesystem for inode; in the link table for namei) which
reflects how many vnodes have references to that inode.  When you
write to a vnode whose underlying inode has more than one reference,
the fileserver allocates a new one for the vnode you're writing to,
and copies the contents.


OK, I get it now.  An inode fileserver uses the link count on the
underlying filesystem (ext3 for instance), and a namei server uses a
large file (or possibly block device) with an AFS specific filesystem
format.  Is that right?


Not quite. Both inode and namei fileservers store their data in individual files on the local filesystem. Each local file corresponds to the contents of one vnode (file, directory, or symlink) in the AFS filesystem, or to some particular kind of per-volume metadata (a volume header or vnode index). The different between the two backends lies largely in how those files are located by the fileserver.

In an inode fileserver (the traditional model), the vnode index contains the inode numbers of the underlying files for each vnode; the inode numbers of the indices themselves are stored in the volume header (the Vxxx.vol files at the top level of each vice partition). These inodes have no regular directory entries which point to them; they are manipulated via a set of special system calls provided by the AFS kernel module. In this model, the link counts on the underlying inodes reflect the number of vnodes referring to that inode; when the link count is decremented to zero, the inode is automatically freed by the normal kernel filesystem code.

In a namei fileserver, the underlying files are normal files in the filesystem. The vnode indices contain virtual "inode numbers" which are used to compute the file's actual filename; we then open the files by name. Since these are normal files on an unmodified local filesystem, their link counts in the underlying filesystem represent the number of actual links to them, which is always 1. Information about how many vnodes are using that file is stored in the "link table", which is an additional per-volume metadata file. This is the only backend currently available on Linux.

It looks like namei fileservers can make clones much faster since they only need to update the link table, which is in one file and not spread over the filesystem, but it's not as clean since there is duplicated functionality at the AFS and underlying filesystem level. Though it would work well on systems that don't have hard links (FAT16/32 for example -- though I wouldn't recommend using that configuration...)


Also, this cuts much of the usefulness of a clone for the fine-grained backups, since the COW is done at the vnode/file level, instead of some internal block level (which I though AFS had, but it doesn't it turns out).


There is no fileserver backend which stores data in a large file or directly to a block device, and there never has been. Such a thing would be possible, but it's not clear that it would be superior to the existing backends.

Nevermind, I saw so much talking from the CODA (which forked from AFS a while ago) people about using a separate partition (it turned out to be only a large meta-data file) that I presumed that also contained the data.


Mike
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to