[OpenAFS-devel] Re : Help needed in developing disconnected operation for OpenAFS on windows

Simon Wilkinson Tue, 04 Mar 2008 04:05:18 -0800

I'm in the process of updating the Michigan disconnected operationcode for the Unix tree, so here are my thoughts on what I'm doingthere. Bear in mind that none of this has been accepted into the treeyet! Sorry for polluting the Windows list with Unix comments


(I've set followups to openafs-devel)


Jeffrey Altman wrote:

Disconnected operations should not be a globally setting.  That is
acceptable for a research project that demonstrates the capability but
it is not acceptable for real world environments in which some servers
or cells may not be accessible while others remain accessible.

I guess this depends on what you're trying to achieve throughproviding disconnected operation, and the quality of the userexperience you can provide when performing re-integration. Looking atother disconnected systems, one of the usability challenges of Codais that clients can go disconnected without the user's knowledge, andso the user can end up having to resolve integration conflicts whicharen't of their making, and which they were completely unaware of.This tends to score badly for usability, as it violates some of theuser's fundamental assumptions. Providing a system which requires anexplicit 'go disconnected' step has the advantage that the user isaware both of when they disconnected from the network, and when theyreconnected. This allows them to rationalise any conflict resolutionsteps that they have to perform.

That's not to say that 'opportunistic' disconnection (as I'mchristening the solution you outline - where the cache managercontinues to serve files for which it had a valid callback when thefile server disappeared, without any user interaction) doesn't havereal uses - I just think that the usability challenges are far higher.

(1) how do you ensure that you have all of the data for all of thefiles
and directories that the user wishes to access in the cache?   AFS
caches arbitrary blocks not whole files or directories.


I'll add to this:

1a) How do you ensure that the data you have in the cache issufficiently recent to be of use to the client

The naive mechanism, as implemented by the Michigan code, just serveswhatever happens to be in the cache back to the user. The problem isthat, depending on the size of your cache against your normal workingset, it's possible that you might get files that are months, out ofdate. The normal AFS way of resolving this is to hold callbacks forthese files - you could extend this to disconnected operation byadding a 'pinning' functionality, where a user indicates to the cachemanager that they want a particular file to be available offline, andthe cache manager should ensure that its always up to date on theclient. However, if you attempt to hold callbacks for every file in ausers offline set, then you're likely to cause severe callback stormswith the fileserver (multiple clients hold more than the fileserver'smaximum number of callbacks - fileserver starts breaking oldercallbacks, clients see callback breaks and attempt to update pinnedfiles, fileserver creates new callbacks for these, and round andround we go)

The question of how we ensure acceptable recency, without makingfileserver changes, is a tricky one.

(2) how do you synchronize read and write locks when the fileserver is
not accessible?

It's relatively easy to maintain a list of the locks granted by thecache manager whilst in disconnected mode, and you can ensure thatthe locking protects processes running on the same machine from eachother. The issue is what you do when reconnecting. The cache managerplays the list of locally granted locks to the fileserver, and all iswell if it grants them. But, what happens if the fileserver refuses alock. You can't recall locks which have already been issued, so youcan have a situation where there's a process happily writing to afile, under what it believes is a write lock, whilst it actually hasno lock at all on the server. As I see it, there are three options 1)Ignore the problem; 2) Fail reads and writes to that file descriptoras soon as the lock fails; 3) 'Defer' reintegration of that fileuntil it is closed, and deal with the problem then.


This is a much bigger issue on Windows than Unix, though.

(3) how do you interact with the end user to notify them of collisions
and what do you do when there are collisions?

I'm currently implementing a collison resolution policy of "lastcloser wins". Whilst this does have the potential to causesignificant data loss, it has the big advantage over more complexresolution policies that it's explainable to, and understandable by,the user. At the moment collisions get logged in the system log. Itwould be possible to take advantage of some of the new desktoptechnologies appearing for Unix to get those messages closer to theuser (although, on multi-user machines, desktop based notificationsbreak down)

(5) how do you address access control issues for files that areoffline?

The Michigan code simply disables access control when a machine goesoffline. With the Unix model, this is more acceptable - machines onlygo offline with an explicit command, which can only be issued by thesuper user. The super user has access to the cache contents, anyway.However, this doesn't help with people who have implemented accesscontrols to protect themselves from silly mistakes.

I've got a provisional implementation of 'local' tokens which can beused to convey CPS information from the userland to the cachemanager, but won't be usable in a connected environment. My eventualplan is that it's possible to 'stash' access data for a particularuserid to a file, from where it can be reloaded while the cachemaneger is offline. However, as soon as you start using these you runin to ...

(6) how do you ensure that the file are synchronized back to fileserver
with the same user credentials that were intended to be used when the
files were modified?

This is tricky. I don't (yet) have a good answer to this one. At themoment, all replays have to come from a single identity (and theirtoken had better be valid when reintegration starts)


Cheers,

Simon.
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

[OpenAFS-devel] Re : Help needed in developing disconnected operation for OpenAFS on windows

Reply via email to