On 12/14/2010 11:33 AM, Gary Buhrmaster wrote:
> On Tue, Dec 14, 2010 at 07:47, Derrick Brashear <[email protected]> wrote:
> 
>>> c) Just state that 1.4.5 is "too old" to bother
>>
>> possibly that being today.
> 
> While I tend to be of the opinion that at some point you
> just have to throw away the bath water (regardless of
> the baby squid that has been living in it for a few years,
> and has now grown into a full fledged unmanaged monster).
> The problem for this case is that use of the RPC will
> crash the server. 

1. May crash the server if a race is lost.

2. There are client versions in the wild that already make
   this call.

3. There are other bugs in the vulnerable versions that
   were fixed post 1.4.5 which can:

   a. deadlock the server

   b. crash the server

   c. give bogus data to the clients

> And it seems likely that if a site
> is still running older servers it means that site is not
> actively managing (and by that I mean managing
> at all) their infrastructure.  An OpenAFS server that
> crashes (repeatedly) may be an excuse for someone
> to just blame OpenAFS for being a POS, remove it
> from their environment, and bad mouth it.  I do not
> think we want that, even though I would be tempted
> to just have calamari and call it a day.

Those servers may already been experiencing similar problems.

> I think the only pragmatic solution is to hold ones nose
> and use the "implied" capability by checking for the
> other (GetStatistics64) RPC.  And vow that this is the
> absolute last time (until the next time :-).  And, for
> this type of problem, we actually have a plan for
> the future with the capabilities RPC.

The Capabilities RPC does not help us.  We either throw out the RPC and
start over with a new RPC that does the same thing or we make use of it.
 A capability bit that says "I implement GiveUpAllCallBacks without
crashing" is only good until the next AFS Server implementation is
shipped that advertises the capability but gets it wrong.  The same is
true for treating the existence of one RPC as a test for the correct
implementation of another.

At the moment there are two AFS server implementations.  In the future
there may be more.  (I sure hope so.)  Something based on versions
permits clients to construct a peer bug list but rx version strings are
not structured enough and come from the wrong software layer.  I can
upgrade the file server without replacing the rx library.

Are we really going to throw out an RPC and force re-implementation
every time that someone implements it incorrectly?

During the four years that this bug was being actively shipped we heard
once about a crash report.  That crash report was the result of a site
that shutdown all of their clients at the same time for a regular
workstation refresh.  We have not received a report of a crash for this
problem since.

While I am sensitive to the bad publicity argument, if anything I
believe that forcing the owners of an un-managed cell to communicate
with the community is a bad thing.  If the cell is really un-managed,
the owners are already lost to us.  if they communicate with us, then we
can help them improve the service they provide to their end users by
encouraging them to update.

Jeffrey Altman

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to