Jeff:
On Tue, 30 Jan 2007, Jeffrey Hutzelman wrote:
I think that the botched version of the CVS delta:
volume-dont-artificially-untimeout-vlservers-20061218
is crashing some of our AFS clients. I noticed that the fixed version of
this patch made it into CVS yesterday.
This doesn't surprise me much; I suspected that this might cause issues for
any client which actually saw a vlserver go down.
We saw kernel crashes with a backtrace like this:
(crash)
InstallVolumeEntry()
afs_SetupVolume()
afs_NewVolumeByName()
...
...
I think what happened to us was that when the defective code inside
afs_NewVolumeByName() ran, it left garbage inside the newly allocated
struct {,n,u}vldbentry. Eventually, the following code inside
InstallVolumeEntry() loops up to a billion or whatever garbage was in the
nServers entry of the structure:
/* Step through the VLDB entry making sure each server listed is there */
for (i = 0, j = 0; i < ve->nServers; i++) {
if (((ve->serverFlags[i] & mask) == 0)
|| (ve->serverFlags[i] & VLSF_DONTUSE)) {
continue; /* wrong volume or don't use this volume */
}
While executing code inside that loop, a kernel watchdog would eventually
trigger. This, unhelpfully, just made the whole machine hard hang.
I think the watchdog timer just triggered due to the time spent looping in
the kernel.
Fortunately, we had the
chance to fix it before 1.4.3 final. I'm glad there are people out there
deploying release candidates.
The way I see it, if anything goes wrong with the code I'm using, chances
are I'd just be asked to upgrade to the next release (candidate) anyway.
I do browse through the CVS frequently (mainly via the openafs.org web
interface), and I try to read the details on which deltas have gone in
before deciding what to deploy.
My question is, why doesn't the delta name change in this case?
Because the gatekeepers chose to treat it as part of the same delta.
Personally, I wish they wouldn't do this, especially when there's a release
in between. It also confuses wdelta and some other tools if there happen to
have been any other commits to the affected files between the two parts of
the delta.
Ok.
Currently, wdelta's sort-by-date uses the timestamp that is part of the delta
name. Most of the time, this works fine. Actually using the timestamp of
the last commit would be harder, because we'd have to inspect the CVS data
for each affected file to find the timestamps. I'll look into it, but no
promises at this point.
I understand. Thanks for looking into this,
Chris Wing
[EMAIL PROTECTED]
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel