Jeff:

On Tue, 30 Jan 2007, Jeffrey Hutzelman wrote:

I think that the botched version of the CVS delta:

        volume-dont-artificially-untimeout-vlservers-20061218

is crashing some of our AFS clients.  I noticed that the fixed version of
this patch made it into CVS yesterday.

This doesn't surprise me much; I suspected that this might cause issues for any client which actually saw a vlserver go down.

We saw kernel crashes with a backtrace like this:

        (crash)
        InstallVolumeEntry()
        afs_SetupVolume()
        afs_NewVolumeByName()
        ...
        ...

I think what happened to us was that when the defective code inside afs_NewVolumeByName() ran, it left garbage inside the newly allocated struct {,n,u}vldbentry. Eventually, the following code inside InstallVolumeEntry() loops up to a billion or whatever garbage was in the nServers entry of the structure:

    /* Step through the VLDB entry making sure each server listed is there */
    for (i = 0, j = 0; i < ve->nServers; i++) {
        if (((ve->serverFlags[i] & mask) == 0)
            || (ve->serverFlags[i] & VLSF_DONTUSE)) {
            continue;           /* wrong volume or  don't use this volume */
        }


While executing code inside that loop, a kernel watchdog would eventually trigger. This, unhelpfully, just made the whole machine hard hang.

I think the watchdog timer just triggered due to the time spent looping in the kernel.

Fortunately, we had the chance to fix it before 1.4.3 final. I'm glad there are people out there deploying release candidates.

The way I see it, if anything goes wrong with the code I'm using, chances are I'd just be asked to upgrade to the next release (candidate) anyway.

I do browse through the CVS frequently (mainly via the openafs.org web interface), and I try to read the details on which deltas have gone in before deciding what to deploy.

My question is, why doesn't the delta name change in this case?

Because the gatekeepers chose to treat it as part of the same delta. Personally, I wish they wouldn't do this, especially when there's a release in between. It also confuses wdelta and some other tools if there happen to have been any other commits to the affected files between the two parts of the delta.

Ok.

Currently, wdelta's sort-by-date uses the timestamp that is part of the delta name. Most of the time, this works fine. Actually using the timestamp of the last commit would be harder, because we'd have to inspect the CVS data for each affected file to find the timestamps. I'll look into it, but no promises at this point.

I understand.  Thanks for looking into this,

Chris Wing
[EMAIL PROTECTED]
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to