On 12/14/2010 11:33 AM, Gary Buhrmaster wrote: > On Tue, Dec 14, 2010 at 07:47, Derrick Brashear <[email protected]> wrote: > >>> c) Just state that 1.4.5 is "too old" to bother >> >> possibly that being today. > > While I tend to be of the opinion that at some point you > just have to throw away the bath water (regardless of > the baby squid that has been living in it for a few years, > and has now grown into a full fledged unmanaged monster). > The problem for this case is that use of the RPC will > crash the server.
1. May crash the server if a race is lost. 2. There are client versions in the wild that already make this call. 3. There are other bugs in the vulnerable versions that were fixed post 1.4.5 which can: a. deadlock the server b. crash the server c. give bogus data to the clients > And it seems likely that if a site > is still running older servers it means that site is not > actively managing (and by that I mean managing > at all) their infrastructure. An OpenAFS server that > crashes (repeatedly) may be an excuse for someone > to just blame OpenAFS for being a POS, remove it > from their environment, and bad mouth it. I do not > think we want that, even though I would be tempted > to just have calamari and call it a day. Those servers may already been experiencing similar problems. > I think the only pragmatic solution is to hold ones nose > and use the "implied" capability by checking for the > other (GetStatistics64) RPC. And vow that this is the > absolute last time (until the next time :-). And, for > this type of problem, we actually have a plan for > the future with the capabilities RPC. The Capabilities RPC does not help us. We either throw out the RPC and start over with a new RPC that does the same thing or we make use of it. A capability bit that says "I implement GiveUpAllCallBacks without crashing" is only good until the next AFS Server implementation is shipped that advertises the capability but gets it wrong. The same is true for treating the existence of one RPC as a test for the correct implementation of another. At the moment there are two AFS server implementations. In the future there may be more. (I sure hope so.) Something based on versions permits clients to construct a peer bug list but rx version strings are not structured enough and come from the wrong software layer. I can upgrade the file server without replacing the rx library. Are we really going to throw out an RPC and force re-implementation every time that someone implements it incorrectly? During the four years that this bug was being actively shipped we heard once about a crash report. That crash report was the result of a site that shutdown all of their clients at the same time for a regular workstation refresh. We have not received a report of a crash for this problem since. While I am sensitive to the bad publicity argument, if anything I believe that forcing the owners of an un-managed cell to communicate with the community is a bad thing. If the cell is really un-managed, the owners are already lost to us. if they communicate with us, then we can help them improve the service they provide to their end users by encouraging them to update. Jeffrey Altman
signature.asc
Description: OpenPGP digital signature
