On Fri, 2013-11-08 at 11:32 -0500, Benjamin Kaduk wrote: > Looking at the viced, for example, vl_Initialize() calls ClientAuth and > shortly thereafter loops over the vlservers and calls rx_NewConnection on > them to pass to ubik_ClientInit. We could probably through a probe RPC in > there and fall back to the previous key if we get the "bad key" error. > This is a layer where we can conveniently log, so we should be sure to do > so if we fall back to an old key.
... and suddenly restarting a fileserver takes several minutes instead of a couple of seconds, if a dbserver happens to be down. That's the difference between an essentially invisible outage and a visible one. Multiply this by the number of servers, since you have to probe all of them in order to know whether you can use the new key (or at least, which key to use for which connections). This isn't a price/risk that comes into play only when upgrading, either -- it happens any time you restart a fileserver. Of course, it's going to be worse at the times when the fallback capability is most important, such as when I've been able to upgrade all of my servers except the one that's broken waiting on a part that won't be in until next week. Now, what about volservers? A volserver has to be able to do RPCs to any other volserver. It doesn't even know what servers _exist_ when it starts up, so it has to do them as it discovers them. I also really dislike the notion that rekeying is such an exceptional situation that it requires an administrator to manually keep track of things and restart servers in a particular order relative to when the key has been changed and to which servers it has been distributed. That makes it hard to deploy a mechanism that rekeys automatically. If we're going to solve this problem at all, let's figure out how to solve it right, please. -- Jeff _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel