I'm in the process of updating the Michigan disconnected operation code for the Unix tree, so here are my thoughts on what I'm doing there. Bear in mind that none of this has been accepted into the tree yet! Sorry for polluting the Windows list with Unix comments

(I've set followups to openafs-devel)

Jeffrey Altman wrote:

Disconnected operations should not be a globally setting.  That is
acceptable for a research project that demonstrates the capability but
it is not acceptable for real world environments in which some servers
or cells may not be accessible while others remain accessible.

I guess this depends on what you're trying to achieve through providing disconnected operation, and the quality of the user experience you can provide when performing re-integration. Looking at other disconnected systems, one of the usability challenges of Coda is that clients can go disconnected without the user's knowledge, and so the user can end up having to resolve integration conflicts which aren't of their making, and which they were completely unaware of. This tends to score badly for usability, as it violates some of the user's fundamental assumptions. Providing a system which requires an explicit 'go disconnected' step has the advantage that the user is aware both of when they disconnected from the network, and when they reconnected. This allows them to rationalise any conflict resolution steps that they have to perform.

That's not to say that 'opportunistic' disconnection (as I'm christening the solution you outline - where the cache manager continues to serve files for which it had a valid callback when the file server disappeared, without any user interaction) doesn't have real uses - I just think that the usability challenges are far higher.

(1) how do you ensure that you have all of the data for all of the files
and directories that the user wishes to access in the cache?   AFS
caches arbitrary blocks not whole files or directories.

I'll add to this:

1a) How do you ensure that the data you have in the cache is sufficiently recent to be of use to the client

The naive mechanism, as implemented by the Michigan code, just serves whatever happens to be in the cache back to the user. The problem is that, depending on the size of your cache against your normal working set, it's possible that you might get files that are months, out of date. The normal AFS way of resolving this is to hold callbacks for these files - you could extend this to disconnected operation by adding a 'pinning' functionality, where a user indicates to the cache manager that they want a particular file to be available offline, and the cache manager should ensure that its always up to date on the client. However, if you attempt to hold callbacks for every file in a users offline set, then you're likely to cause severe callback storms with the fileserver (multiple clients hold more than the fileserver's maximum number of callbacks - fileserver starts breaking older callbacks, clients see callback breaks and attempt to update pinned files, fileserver creates new callbacks for these, and round and round we go)

The question of how we ensure acceptable recency, without making fileserver changes, is a tricky one.

(2) how do you synchronize read and write locks when the file server is
not accessible?

It's relatively easy to maintain a list of the locks granted by the cache manager whilst in disconnected mode, and you can ensure that the locking protects processes running on the same machine from each other. The issue is what you do when reconnecting. The cache manager plays the list of locally granted locks to the fileserver, and all is well if it grants them. But, what happens if the fileserver refuses a lock. You can't recall locks which have already been issued, so you can have a situation where there's a process happily writing to a file, under what it believes is a write lock, whilst it actually has no lock at all on the server. As I see it, there are three options 1) Ignore the problem; 2) Fail reads and writes to that file descriptor as soon as the lock fails; 3) 'Defer' reintegration of that file until it is closed, and deal with the problem then.

This is a much bigger issue on Windows than Unix, though.

(3) how do you interact with the end user to notify them of collisions
and what do you do when there are collisions?

I'm currently implementing a collison resolution policy of "last closer wins". Whilst this does have the potential to cause significant data loss, it has the big advantage over more complex resolution policies that it's explainable to, and understandable by, the user. At the moment collisions get logged in the system log. It would be possible to take advantage of some of the new desktop technologies appearing for Unix to get those messages closer to the user (although, on multi-user machines, desktop based notifications break down)

(5) how do you address access control issues for files that are offline?

The Michigan code simply disables access control when a machine goes offline. With the Unix model, this is more acceptable - machines only go offline with an explicit command, which can only be issued by the super user. The super user has access to the cache contents, anyway. However, this doesn't help with people who have implemented access controls to protect themselves from silly mistakes.

I've got a provisional implementation of 'local' tokens which can be used to convey CPS information from the userland to the cache manager, but won't be usable in a connected environment. My eventual plan is that it's possible to 'stash' access data for a particular userid to a file, from where it can be reloaded while the cache maneger is offline. However, as soon as you start using these you run in to ...

(6) how do you ensure that the file are synchronized back to file server
with the same user credentials that were intended to be used when the
files were modified?

This is tricky. I don't (yet) have a good answer to this one. At the moment, all replays have to come from a single identity (and their token had better be valid when reintegration starts)

Cheers,

Simon.
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to