On Tue, 27 Mar 2012 14:01:04 +0200 Matthias Gerstner <[email protected]> wrote:
> The situation with the salvage was as follows: The affected volume > was a pretty large volume containing about 160 gigabytes of data spread > across 3.5 million files. During the salvage I saw a *lot* of log lines > similar to this flying by: > > '??/??/SomeFile' deleted. > > After half an hour of seeing this the volume was back online with less > than 10 gigabytes of data remaining. So I figured the top-level > directory structure got somehow lost. Sorry that I can't provide the > actual log any more. Please save the log if it happens again. Just a directory object being corrupt will not delete its children unless you pass '-orphans remove' to the salvager. However, the default, '-orphans ignore' will keep orphaned data around but it will be effectively invisible until you salvage with '-orphans attach'. > Seems I forgot to mention 'pre1': > > # strings /usr/sbin/vos | grep built > @(#) OpenAFS 1.6.1pre1 built 2012-01-24 > > Is it too risky to use the pre-release? I got used to running the > unstable openafs packages for being able to keep up with recent Linux > kernel versions. That version is known to have issues with data corruption/loss, which are fixed in pre4. I don't know if that's what you're hitting, though. (You can also run a newer client with older servers just fine.) I assume the volserver is running the same version? As Kim said, 'rxdebug <server> 7005 -version' > Now that you say it, it really does look like two things are running > in parallel. But I can't think of how that could be happening. The > backup script is supposed to dump one volume after another in a serial > manner. And on this specific server the backup script is the only > administrative AFS operation that is scheduled at all. Also when I > disable the backup job for a night then nothing shows up in the log at > all. If you turn on the volser audit log with '-auditlog /usr/afs/logs/VolserLog.audit' or something, you can see specifically what operations were run when and by whom. Or turn up the debug level with '-d 125 -log', and you'll see a bunch more information in VolserLog interspersed with everything else. > However, I'm running two pairs of file and volume server. Each machine > performs a backup of its volumes and this happens in parallel. But > this shouldn't affect a single machines log. So, you just have two completely separate servers, and each one is running a fileserver/volserver? Yeah, that shouldn't matter. > I'm getting continued weird behaviour during my backups. Last night > for example a dump was aborted with the following error message: > > 'consealed data inconsistent' That's "sealed data inconsistent". You can get this if your tokens expired sometime during the process (I don't remember / just don't know what causes that vs an 'expired' message). Do you have the output of 'vos' running with '-verbose' by any chance? How long are the backups taking, and are you running this under a tool to refresh tokens? > However the original volume in question remained intact this time. I'm > attaching the VolserLog of this incident. Hmm, did you forget to attach this? -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
