On Feb 20, 2014, at 10:06 , Jeffrey Hutzelman wrote:

> On Wed, 2014-02-19 at 19:33 -0600, Andrew Deason wrote:
>> On Wed, 19 Feb 2014 20:25:14 -0500
>> [email protected] wrote:
>> 
>>> I'm sorry, but I didn't notice this topic come up before.  What
>>> problems would be seen when these clients connect/disconnect to those
>>> ancient versions of file servers?  I'm not asking that the change be
>>> skipped, but just wondering what behavior would be seen.
>> 
>> "Undefined behavior". In theory, anything could happen, but the most
>> likely result is that the fileserver just crashes (SIGSEGV, SIGBUS,
>> SIGABRT, etc). If I recall correctly, the busier the server is, the more
>> likely it will have a problem.
> 
> Pushing out a client change that causes fileservers -- especially
> pre-DAFS fileservers -- to mysteriously crash is kind of poor.
> Announcements to people who are actively following things and likely to
> install new clients won't help server operators whose fileservers
> suddenly start crashing with little or no warning.  I certainly wouldn't
> want to be forced into a "surprise" upgrade.

Who would. But then, upgrading your 1.4.<=5 fileservers to at least 1.4.6 isn't 
that much of an adventure. I ran 1.4.7 servers for a long time, and they were 
pretty good. And if you're running 1.3 or 1.5 servers, you should like 
surprises.

And: such sites have been ignoring an OpenAFS security advisory for more than 
six years, and at risk for almost seven. Because the Windows client has been 
doing it since then. All it takes for the problem to strike is introducing more 
Windows clients, or teaching the exiting ones new tricks - like having a 
sizable cluster run maintenance scripts from AFS and then reboot every night, 
which is how the problem was initially found. Presumably, introducing YFS 
clients (including the iOS one) would trigger it as well. And I'm told Arla 
clients will bite you too.

I figure quite a few sites with such old servers may migrate from Windows XP 
and old clients to new ones which do give up callbacks in the near future. 
Those are in for a surprise, clearly with no warning. I'm not convinced that  
"protecting" them so far was doing them a favor, nor that continuing to do so 
would in the future.

But I admit it's a tough decision.

> It seems like the right way to handle this is to define a capability
> flag to indicate that RXAFS_GiveUpAllCallBacks() is safe, and make the
> call only when the fileserver advertises that flag.  Of course, ideally
> the flag would have been introduced back when the bug was fixed, but
> that ship sailed years ago.

Yes, but it may still be an option. Any estimate what it would take to 
implement it, and to maintain it forever?

> I'm also a little concerned at the insistence on introducing a
> potentially disruptive, backward-incompatible behavior into what's
> supposed to be a stable release series with no mechanism to turn it off.
> Did we become GNOME when I wasn't looking?

With my site admin hat on: I'd like to have this feature. And I wouldn't like 
to wait another couple of years. I do sympathize with admins bitten by it. But 
not enough not to want it, especially for the reasons outlined above.

Changing hats. As the "stable series" release manager: This has been a 
controversial issue for years. And that's not going to change. We can postpone 
it once more, but will then have the same decision to take, and the same 
discussion, and no solution making everyone happy, when we create the 1.9 
branch and/or when we promote 1.9 to "stable" (1.10). We'll still have those 
who want the feature, those who accept it if there's a knob, those who want the 
knob to default to on, those who want it to default to off, those who are 
willing to implement the knob but not a simple on/off one but only a more 
complex one allowing per-site configuration (but that's not feasible anytime 
soon), and those who object to one more knob altogether (especially if it's 
complex).

What I do insist on in my release manager role is to get such issues off the 
table, one way or the other, and not let such deadlocks happen. Postponing such 
decisions is fine as long as there's hope for a much better solution. But once 
it's clear that the situation is not going to change, any decision is better 
than none (once again).

And in doubt, I'm for progress rather than stagnation.

If this discussion, or the one following the planned announcements, turns up a 
killer argument for not introducing the feature ever, fine. In that case, IMO 
it should be removed from the master and 1.7 branches too. If not, I believe 
having it in a stable release after due announcement is the right thing to do. 

-- Stephan


_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to