On Saturday, June 10, 2006 07:40:25 AM -0400 Jeffrey Altman <[EMAIL PROTECTED]> wrote:

Adam Megacz wrote:
The people who write darcs (an incredibly powerful/flexible version
control system) are looking into making sure that it works properly on
AFS, and were looking for an authoritative, official statement of
exactly how AFS file semantics differ from UNIX semantics:

  http://bugs.darcs.net/issue117

Specifically, can anybody comment on these points?

  1. If two processes on different clients both attempt to
     open(O_CREAT|O_EXCL), does AFS guarantee that no more than one of
     them will succeed?

It should.  There have been recent reports that this may not be true
on some platforms either because of a bug.  However, insufficient data
has been collected to determine if in fact this is a bug.  If it is a
bug I suspect it is a bug in the client on some bug not all platforms.

Yes; modulo bugs, two simultaneous exclusive creates of the same name in the same directory will not both succeed. Actually, I think the bug to which Jeff is referring is not a violation of this guarantee -- it results in _neither_ of the creates succeeding.


  2. If two processes both attempt to rename() the same [source] file,
     does AFS guarantee that exactly one of them succeeds?

It should.  Again, if this is not true it would be a bug in the client.

Well, it guarantees that _at most_ one succeeds. Of course, it is possible for both operations to fail for reasons having nothing to do with the race. However, assuming one succeeds, the other should fail with ENOENT.

Note that this only works when renaming a file within a volume, and only if the rename would not result in a single file having links in more than one directory. A rename() call that would violate these constraints will instead return EXDEV.


  3. If client "A" makes two inode-level changes (creat, remove,
     rename, etc), is it ever possible for client "B" to see the
     second change before the first one?

Not possible.  AFS does not distribute changes to clients.  It simply
notifies clients that the known state of the object has changed.  The
client could find out about the first change or the combination of the
first and second changes, but never the second and not the first.

I'm not sure what you mean here by "inode-level"; the system calls mentioned are characterized by the fact that they result in changes to directory contents. For operations with this property, and with respect to a single directory, Jeff's analysis is correct. Changes to the authoritative copy of a directory are always performed at the fileserver, never by clients, and these changes are serialized. Clients never receive partial directory updates from the fileserver; if a directory changes, the client must fetch a complete new copy of the directory. This fetch is always done in a single RPC, so the client always recieves a complete, self-consistent copy of the directory. If a single client makes two changes in some particular order, other clients will always see the changes in the same order, because no version of the directory ever existed which contains only the second change.

Note that this guarantee applies only with respect to any one directory. For changes to multiple directories, a considerably more complex analysis is required, and depending on the situation, changes might become visible to other clients in the "wrong" order.


It is possible to test whether or not a path is located within AFS by
using "fs whichcell <path>".  If it returns successfully, you have an
AFS path.  If not, then not.  Darcs might want to test this to determine
whether or not alternative behaviors should be used.

Of course, it's also possible to do this using the corresponding pioctl, if you are willing to grow a dependency on AFS libraries or libkafs.




Some more direct responses to the questions Juliusz is actually asking:


Thanks, although I'd prefer authoritative docs to a FAQ entry.

There is no authoritative documentation at this level, and there never has been. The FAQ is the closest you're going to get, but if you ask precise questions on openafs-devel, you're likely to get authoritative answers.

Note that the afs3-standardization list is about the AFS 3 _protocol_, and in fact is primarily about extending that protocol and resolving ambiguities in a consistent way, so as to maintain interoperability. While a complete protocol specification would be nice, writing it does not seem to be high on anyone's to-do list. What this list is explicitly _not_ about is defining the behavior and semantics of any particular implementation, including OpenAFS. So, the semantics of the UNIX system call interface with respect to AFS are out of scope.



Hard links:                                             [ User ]

      In AFS, hard links (eg: ln old new) are only valid within a
      directory.

This will definitely break ``darcs get'' and ``optimize --relink''
(anything else?).  We can work around the issue, but you'll have to
tell us in what way link(2) fails when the above constraint is
violated.

Attempts to create hard links between files in different AFS directories will fail with EXDEV. For the case of links in different volumes, this check is done early, in the client (though of course, it also fails if the client fails to perform the check). For attempts to create a link in a different directory in the same volume, the check is done fairly late, and so you're more likely to get errors like EACCES or EISDIR, if those apply.

Note that you can rename a file from one directory to another within the same volume, as long as the file does not have more than one existing link. Attempts to rename a file with multiple links, or to rename a file into a different volume, will fail with EXDEV.


 - how does open(O_CREAT | O_EXCL) work?

It works as advertised - if the specified file already exists, the operation fails with EEXIST.


  - is link(2) consistent w.r.t. link and open?
  - is rename(2) consistent w.r.t. rename and open?

I'm not sure what is meant here by "consistent". Where I come from, an operation that is "consistent" is one that never transitions from a valid state to an invalid one. All of open, rename, and link have this property, and all of them obey the same rules with respect to what are considered valid states of the filesystem.


      AFS does not support byte-range locking within a file,
      although lockf() and fcntl() calls will return 0 (success).

This is careless.  I fully agree that SVR4-style locks are brain-
damaged beyond hope, but fcntl(2) over AFS should fail with ENOSYS
rather than returning success!

We try fairly hard to support whatever locking interfaces are available on any given platform. Regardless of the interface used, AFS supports whole-file locking, both between processes on the same client system and between client systems. It does not implement partial-file locking at all, because applications which actually _rely_ on fine-grained locking also tend to rely on such locks to act as fine-grained data consistency barriers, and such semantics would be quite difficult for AFS to support between clients.


Adam, I really need authoritative documentation on

  - consistency properties of AFS;
  - restrictions of Unix system calls on AFS.

There is no complete, authoritative documentation on these issues. As I mentioned above, someone asking specific, well-defined questions on the openafs-devel list ([email protected]) would be likely to get authoritative answers.



-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]>
  Sr. Research Systems Programmer
  School of Computer Science - Research Computing Facility
  Carnegie Mellon University - Pittsburgh, PA

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to