Dumping the RW volume makes it "busy" during the dump, which makes the volume unwritable -- and generates "afs: Waiting for busy volume" errors when a write occurs.
Dumping the .backup is not just a good practice, in my opinion, it is the only sensible practice if keeping writability is important. Large volumes can take a while to dump -- Identifying the software version that is running is better done with "rxdebug" -- it's a nit, but the binaries are not guaranteed to be the same as what's running -- and the "strings | grep" approach only tells you what version the binary is, and not what the running version is ... It does look like more than one operation was in progress -- a volume delete isn't part of a volume dump Kim On 3/26/2012 11:38 AM, Andrew Deason wrote: > On Mon, 26 Mar 2012 17:25:04 +0200 > Matthias Gerstner <[email protected]> wrote: > >> I'm recently experiencing trouble during my backup of OpenAFS volumes. >> I perform backups using the >> >> 'vos dump -server <server> -partition <partition> -clone -id <vol>' > <vol> I presume is an rw volume? > > Just so you know, a more common way of doing this is to use 'vos > backupsys' and then backup the .backup volumes. Nothing 'wrong' with > what you're doing, but it's a less common way. > >> However some days ago the backup of a specific volume failed with >> a bad exit code (255). My backup script thus stopped further processing. >> The concerned volume went offline as a result and did only show up in >> 'vos listvol' as "couldn't attach volume ...". > What did volserver say in VolserLog when that happened? It should give a > reason as to why it could not attach. > >> After running a salvage on the affected volume it was brought back >> online but most of the contained data was deleted due to a supposed >> corruption of the directory strucuture detected during salvage. > SalvageLog will say specifically why. Or SalsrvLog if you are running > DAFS; are you running DAFS? > >> Attached is the VolserLog from the time when the last of the incidents >> occured. > What was the volume id for the volume in question? Possibly 536879790 or > 536879793? > >> I'm currently running openafs 1.6.1 on Gentoo Linux with kernel >> version 3.2.1. > 1.6.1 is not a version that exists yet (or at least, certainly did not > exist on Friday). What version is the volserver, and what version is > 'vos'? (Running `strings </path/to/bin> | grep built` is a sure way to > tell.) > >> Fri Mar 23 00:10:57 2012 1 Volser: Clone: Cloning volume 536879790 to new >> volume 536889517 >> Fri Mar 23 00:16:04 2012 1 Volser: Delete: volume 536889517 deleted >> Fri Mar 23 00:16:04 2012 1 Volser: Clone: Cloning volume 536879793 to new >> volume 536889518 >> Fri Mar 23 00:16:06 2012 VDestroyVolumeDiskHeader: Couldn't unlink disk >> header, error = 2 >> Fri Mar 23 00:16:06 2012 VPurgeVolume: Error -1 when destroying volume >> 536889517 header >> Fri Mar 23 00:16:06 2012 1 Volser: Delete: volume 536889517 deleted >> Fri Mar 23 00:16:09 2012 1 Volser: Delete: volume 536889518 deleted >> Fri Mar 23 00:16:09 2012 VDestroyVolumeDiskHeader: Couldn't unlink disk >> header, error = 2 >> Fri Mar 23 00:16:09 2012 VPurgeVolume: Error -1 when destroying volume >> 536889518 header >> Fri Mar 23 00:16:09 2012 1 Volser: Delete: volume 536889518 deleted >> Fri Mar 23 00:21:20 2012 trans 69 on volume 536889518 is older than 300 >> seconds >> Fri Mar 23 00:21:20 2012 trans 66 on volume 536889517 is older than 300 >> seconds > Hmm, are you sure 'vos dump' is the only thing you are running at the > time? (You're running more than one in parallel... how many do you run > at once?) This sequence of operations does not seem normal for just a > 'vos dump'. > _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
