On Thu, 16 Feb 2012 11:15:19 +0100 Åsa Andersson <[email protected]> wrote:
> Hello, > > We upgraded our file servers from 1.6.0 to 1.6.1pre2 last Sunday (most of > our clients are running version 1.6.0) and after that we have seen volumes > going offline and entries in the FileLog indicating they need salvaging. > > Typical FileLog-entries look like this: > > ---------- > Mon Feb 13 09:38:02 2012 Fid 537126959.344.442066 has inconsistent > length (index 573440, inode 524288); volume must be salvaged > ---------- This is a new consistency check. It indicates that the vnode index (which is consulted when someone on a client does e.g. a stat()) says the file is 573440 bytes long, but the actual file data on disk only has 524288 bytes. Before this check, the fileserver would just serve the first 524288 bytes, and client applications tended to just see NULs after that (though the actual content may be undefined, I'm not sure). > or like this: > > ---------- > Mon Feb 13 10:10:47 2012 fssync: breaking all call backs for volume 537126959 > Mon Feb 13 10:10:47 2012 ReadHeader: Failed to open volume info header file > (v>olume=537126959, inode=2306942731429085183); errno=2 > Mon Feb 13 10:10:47 2012 VAttachVolume: Error reading diskDataHandle header > fo>r vol 537126961; error=101 > Mon Feb 13 10:10:47 2012 VAttachVolume: Error attaching volume > /vicepa//V05371>26961.vol; volume needs salvage; error=101 A special file for a volue just doesn't exist on disk. I think this would be... /vicepX/AFSIDat/j/jUy+U/zzzz521u1+0 I don't know what would cause that. Is this after a salvage? Or a release/restore/etc? I would guess 537126961 is the BK volume for 537126959 ? > offline so far. Running salvage seems to fix the volumes. I assume you are not running DAFS? (otherwise, these should be salvaged automatically) > Is 1.6.1pre2 detecting data corruption brought on by 1.6.0 and this is > what we're seeing? For the first one, I think that's possible. If the CoW corruption results in files getting incorrectly truncated, that would certainly cause the first, but I'm not sure if the specific corruption patterns are known. Do you have any idea what file 537126959.344.442066 is? I don't think that's possible for the second one, though. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
