Re: [OpenAFS] afs semantics

Jeffrey Hutzelman Sat, 10 Jun 2006 18:38:17 -0700

On Saturday, June 10, 2006 07:40:25 AM -0400 Jeffrey Altman<[EMAIL PROTECTED]> wrote:

Adam Megacz wrote:

The people who write darcs (an incredibly powerful/flexible version
control system) are looking into making sure that it works properly on
AFS, and were looking for an authoritative, official statement of
exactly how AFS file semantics differ from UNIX semantics:

  http://bugs.darcs.net/issue117

Specifically, can anybody comment on these points?

  1. If two processes on different clients both attempt to
     open(O_CREAT|O_EXCL), does AFS guarantee that no more than one of
     them will succeed?


It should.  There have been recent reports that this may not be true
on some platforms either because of a bug.  However, insufficient data
has been collected to determine if in fact this is a bug.  If it is a
bug I suspect it is a bug in the client on some bug not all platforms.

Yes; modulo bugs, two simultaneous exclusive creates of the same name inthe same directory will not both succeed. Actually, I think the bug towhich Jeff is referring is not a violation of this guarantee -- it resultsin _neither_ of the creates succeeding.

  2. If two processes both attempt to rename() the same [source] file,
     does AFS guarantee that exactly one of them succeeds?


It should.  Again, if this is not true it would be a bug in the client.

Well, it guarantees that _at most_ one succeeds. Of course, it is possiblefor both operations to fail for reasons having nothing to do with the race.However, assuming one succeeds, the other should fail with ENOENT.

Note that this only works when renaming a file within a volume, and only ifthe rename would not result in a single file having links in more than onedirectory. A rename() call that would violate these constraints willinstead return EXDEV.

  3. If client "A" makes two inode-level changes (creat, remove,
     rename, etc), is it ever possible for client "B" to see the
     second change before the first one?


Not possible.  AFS does not distribute changes to clients.  It simply
notifies clients that the known state of the object has changed.  The
client could find out about the first change or the combination of the
first and second changes, but never the second and not the first.

I'm not sure what you mean here by "inode-level"; the system callsmentioned are characterized by the fact that they result in changes todirectory contents. For operations with this property, and with respect toa single directory, Jeff's analysis is correct. Changes to theauthoritative copy of a directory are always performed at the fileserver,never by clients, and these changes are serialized. Clients never receivepartial directory updates from the fileserver; if a directory changes, theclient must fetch a complete new copy of the directory. This fetch isalways done in a single RPC, so the client always recieves a complete,self-consistent copy of the directory. If a single client makes twochanges in some particular order, other clients will always see the changesin the same order, because no version of the directory ever existed whichcontains only the second change.

Note that this guarantee applies only with respect to any one directory.For changes to multiple directories, a considerably more complex analysisis required, and depending on the situation, changes might become visibleto other clients in the "wrong" order.

It is possible to test whether or not a path is located within AFS by
using "fs whichcell <path>".  If it returns successfully, you have an
AFS path.  If not, then not.  Darcs might want to test this to determine
whether or not alternative behaviors should be used.

Of course, it's also possible to do this using the corresponding pioctl, ifyou are willing to grow a dependency on AFS libraries or libkafs.





Some more direct responses to the questions Juliusz is actually asking:

Thanks, although I'd prefer authoritative docs to a FAQ entry.

There is no authoritative documentation at this level, and there never hasbeen. The FAQ is the closest you're going to get, but if you ask precisequestions on openafs-devel, you're likely to get authoritative answers.

Note that the afs3-standardization list is about the AFS 3 _protocol_, andin fact is primarily about extending that protocol and resolvingambiguities in a consistent way, so as to maintain interoperability. Whilea complete protocol specification would be nice, writing it does not seemto be high on anyone's to-do list. What this list is explicitly _not_about is defining the behavior and semantics of any particularimplementation, including OpenAFS. So, the semantics of the UNIX systemcall interface with respect to AFS are out of scope.

Hard links:                                             [ User ]

      In AFS, hard links (eg: ln old new) are only valid within a
      directory.


This will definitely break ``darcs get'' and ``optimize --relink''
(anything else?).  We can work around the issue, but you'll have to
tell us in what way link(2) fails when the above constraint is
violated.

Attempts to create hard links between files in different AFS directorieswill fail with EXDEV. For the case of links in different volumes, thischeck is done early, in the client (though of course, it also fails if theclient fails to perform the check). For attempts to create a link in adifferent directory in the same volume, the check is done fairly late, andso you're more likely to get errors like EACCES or EISDIR, if those apply.

Note that you can rename a file from one directory to another within thesame volume, as long as the file does not have more than one existing link.Attempts to rename a file with multiple links, or to rename a file into adifferent volume, will fail with EXDEV.

 - how does open(O_CREAT | O_EXCL) work?

It works as advertised - if the specified file already exists, theoperation fails with EEXIST.

  - is link(2) consistent w.r.t. link and open?
  - is rename(2) consistent w.r.t. rename and open?

I'm not sure what is meant here by "consistent". Where I come from, anoperation that is "consistent" is one that never transitions from a validstate to an invalid one. All of open, rename, and link have this property,and all of them obey the same rules with respect to what are consideredvalid states of the filesystem.

      AFS does not support byte-range locking within a file,
      although lockf() and fcntl() calls will return 0 (success).

This is careless.  I fully agree that SVR4-style locks are brain-
damaged beyond hope, but fcntl(2) over AFS should fail with ENOSYS
rather than returning success!

We try fairly hard to support whatever locking interfaces are available onany given platform. Regardless of the interface used, AFS supportswhole-file locking, both between processes on the same client system andbetween client systems. It does not implement partial-file locking at all,because applications which actually _rely_ on fine-grained locking alsotend to rely on such locks to act as fine-grained data consistencybarriers, and such semantics would be quite difficult for AFS to supportbetween clients.

Adam, I really need authoritative documentation on

  - consistency properties of AFS;
  - restrictions of Unix system calls on AFS.

There is no complete, authoritative documentation on these issues. As Imentioned above, someone asking specific, well-defined questions on theopenafs-devel list ([email protected]) would be likely to getauthoritative answers.




-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]>
  Sr. Research Systems Programmer
  School of Computer Science - Research Computing Facility
  Carnegie Mellon University - Pittsburgh, PA

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] afs semantics

Reply via email to