Hi, We have been running openafs 1.4.10 (specifically 1.4.10+dfsg1-1~bpo50+1 on Debian Lenny) for a month or so now on our fileservers without incident. Yesterday, we switched our VLDB servers from aging solaris machines running 1.4.2 onto Debian lenny machines running 1.4.10. An upgrade from kaserver to krb5 was also done.
However, while everything worked fine for approximately an hour, things went crazy afterwards. After a few hours of unscheduled downtime, the only fix seemed to be to roll back to 1.4.7 (specifically 1.4.7.dfsg1-6+lenny1) on all vldb/fileserver machines. The problem manifested primarily with the fileservers going into continuous salvage loops anytime a volume operation was done, as well as some very strange errors in the VolserLog. Rolling everything to 1.4.7 fixed the issue. Sadly, I forgot to save the logs prior to rolling back and openafs overwrites them nearly immediately, so I don't have those for reference for any good error messages on the main fileservers. I tested this again this morning with a test fileserver, and continue to get an error. Specifically, when releasing a volume, I get the following error: This is a complete release of volume 536885604 Cloning RW volume 536885604 to temporary RO...Failed to clone the RW volume 536885604 : Invalid cross-device link Error in vos release command. : Invalid cross-device link The VolserLog on the fileserver contains: Wed Aug 20 07:38:46 2009 [5] 1 Volser: ListVolumes: Volume 536885602 (V0536885602.vol) will be destroyed on next salvage Wed Aug 20 07:38:46 2009 [7] 1 Volser: Delete: volume 536885602 deleted Wed Aug 20 07:38:46 2009 [10] 1 Volser: Clone: Cloning volume 536885601 to new volume 536885602 Wed Aug 20 07:38:46 2009 [3] VAttachVolume: Failed to open /vicepc/V0536885602.vol (errno 2) Wed Aug 20 07:38:46 2009 [4] 1 Volser: CreateVolume: Unable to create the volume; aborted, error code 18 Wed Aug 20 07:38:46 2009 [4] : Invalid cross-device link Wed Aug 20 07:42:06 2009 [7] 1 Volser: CreateVolume: volume 536885604 (bethbtest) created And so the volume release is unsuccessful. Google search shows that this is only likely to happen if there are old volume parts around. However, this is a brand new volume, and there are no traces of any similar volumes on any partition. This happens with the RW on any partition on the fileserver where the RO is on a different partition (And yes I know not to do that normally, this is just for testing) Anyone have any ideas? I would really like to get everything on 1.4.10 for the performance increases. Thanks, -stefan -- Stefan Strandberg UNIX group Computer Aided Engineering - UW Madison [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
