On Thu, 3 Jul 2014 21:45:47 +0000 Mark Vitale <mvit...@sinenomine.net> wrote:
> (cross-posted to openafs-devel and port-solaris) I don't think this has anything to do with Solaris, but I'll keep it on the cc list for now so this doesn't just look responseless. (And in case this came from some Solaris historical artifact or something.) > The AFS Unix cache manager has two ways to shutdown: > > - 'afsd -shutdown' requires that /afs already be umounted; it then > sets afs_cold_shutdown=1 and calls afs_shutdown(). > > - 'umount /afs' also calls afs_shutdown() on most platforms. Some of > them set afs_cold_shutdown=1 first, while others do not. > > If afs_shutdown() is called "warm" (afs_cold_shutdown==0), the > shutdown logic skips the clearing and releasing of some resources. I > see no rhyme or reason to which resources AFS leaves unreleased. Nor > do I understand the (possibly historical) reason for why there is a > distinction between cold and warm shutdown. My understanding is that a WARM shutdown is via 'umount' (i.e., the normal way), and a COLD shutdown is via afsd -shutdown. That is the definition of those terms; that's all they mean (or maybe the defining factor is "if /afs is mounted" or some other similar-yet-technically-different distinction). Note that this (and below) just comes from my own experience with the code; not from any proper documented or otherwise authoritative source. Normally you can never shutdown via 'afsd -shutdown' while /afs is mounted. The reason 'afsd -shutdown', and thus, the COLD shutdown procedure, exists at all is: - If mounting /afs fails in the middle of initialization, you can't umount /afs. So you can 'afsd -shutdown' instead to de-initialize some things that came up before afs was mounted. - You can run some of the client daemons without actually mounting /afs in "normal" operation. I'm not sure if I've ever done this, but I have a vague recollection from some code comment or post somewhere referencing running the nfs xlator PAG manager this way. Maybe there are other reasons, other ways. The reason that some cleanup procedures are different is because during the normal WARM shutdown procedure, we assume some daemons or other things will cleanup after themselves; but during a COLD procedure we're not sure if they're really there or functioning properly, so there's some extra cleanup along the way. I may not be remembering that properly and there may be mistakes; you'll have to be more specific if you want more info. The reason why I think Russ associates COLD shutdown procedures with brokenness is probably because that would happen if something broke during initialization. Init scripts usually try both ways of shutting down on 'stop': umount /afs afsd -shutdown Just to handle both cases. If AFS started successfully, the 'afsd -shutdown' would do nothing, because either the 'umount' would succeed and we'd already be shutdown, or the 'umount' failed and 'afsd -shutdown' would refuse to run because /afs was still mounted. Conversely, if AFS did not start successfully, the 'umount' would fail because /afs is not mounted, and we'd try the 'afsd -shutdown'. For unsuccessful starts, in the past, our handling of errors during init wasn't great (and still isn't now, but it used to be worse), so things would be left in an inconsistent state, but wouldn't break in a panic or whatever until we tried to shutdown. So a lot of the time on Linx in the past, whenever you saw a COLD shutdown, it was because something broke during startup in a way we did not handle, and we panic'd on shutdown. For WARM shutdowns, we started up successfully so there were no such issues. If you look at the platforms that set COLD shutdown during umount, you just see AIX, obsd, and nbsd, and the 'COLD' shutdown was enabled when support was added for a certain platform version. I would guess that doing so is "incorrect", and someone just set it to workaround some problem and never mentioned it, and possibly didn't really know why it was there. (Of course, I'm not one to talk, since most of what I'm saying here is also just guessing.) > Then there is the question of when it is safe to rmmod/modunload the > libafs kernel module. Does warm or cold shutdown affect the answer to > this question? It's supposed to be safe after either shutdown is finished. But there can always be bugs with that, and the ability to stop the client at all on Solaris is only a few years old (which is why some older mentions of AFS on Solaris say you can't do it), and there have been a bug or two already in its shutdown support in that time. And since COLD shutdowns are more rare (from my understanding above), of course the code path is less exercised and more prone to bugs. -- Andrew Deason adea...@sinenomine.net _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel