Ok, an update to the problem I alluded to this morning.

Volume name in question is pub.m.rpmforge. The .backup volume in particular. This volume was backup'd this morning at approx. 0005, with this output from vos backup:

Failed to end the transaction on the rw volume 536873153 ____: server not responding promptly Error in vos backup command. ____: server not responding promptly

That command returned in <5s. I then see this in VolserLog:

Fri Mar  4 00:05:15 2011 1 Volser: Clone: Recloning volume 536873153 to volume 
536873155
Fri Mar  4 00:10:19 2011 trans 13950 on volume 536873153 has been idle for more 
than 300 seconds
Fri Mar  4 00:10:49 2011 trans 13950 on volume 536873153 has been idle for more 
than 330 seconds
Fri Mar  4 00:11:19 2011 trans 13950 on volume 536873153 has been idle for more 
than 360 seconds
Fri Mar  4 00:11:49 2011 trans 13950 on volume 536873153 has been idle for more 
than 390 seconds
Fri Mar  4 00:12:19 2011 trans 13950 on volume 536873153 has been idle for more 
than 420 seconds
Fri Mar  4 00:12:49 2011 trans 13950 on volume 536873153 has been idle for more 
than 450 seconds
Fri Mar  4 00:13:19 2011 trans 13950 on volume 536873153 has been idle for more 
than 480 seconds
Fri Mar  4 00:13:49 2011 trans 13950 on volume 536873153 has been idle for more 
than 510 seconds
Fri Mar  4 00:14:19 2011 trans 13950 on volume 536873153 has been idle for more 
than 540 seconds
Fri Mar  4 00:14:49 2011 trans 13950 on volume 536873153 has been idle for more 
than 570 seconds
Fri Mar  4 00:15:19 2011 trans 13950 on volume 536873153 has been idle for more 
than 600 seconds
Fri Mar  4 00:15:19 2011 trans 13950 on volume 536873153 has timed out

Nothing in any of the other log files, and nothing interesting in FileLog other than:

Mar  4 00:05:15 horvitz fileserver[2236]: VOffline: Volume 536873153 
(pub.m.rpmforge) is now offline (A volume utility is running.)
Mar  4 00:05:15 horvitz fileserver[2236]: fssync: breaking all call backs for 
volume 536873155

(and then tsm goes to access the RW volume, at which point I guess it's brought back online)

Mar  4 01:00:31 horvitz fileserver[2236]: SAFS_FetchStatus,  Fid = 
536873153.1.1, Host 128.118.200.6:7001, Id 117
Mar  4 01:00:31 horvitz fileserver[2236]: VOnline:  volume 536873153 
(pub.m.rpmforge) attached and online

I noticed this when nagios reported that one of the volumes on this server was marked off-line.

Now, interestingly, I just ran another vos backup against the same volume:

$ vos backup pub.m.rpmforge
Created backup volume for pub.m.rpmforge

Fri Mar  4 16:05:14 2011 1 Volser: Clone: Recloning volume 536873153 to volume 
536873155

The pub.m.rpmforge.backup is now on-line.

Subsequent backups seem to be fine.

I'm not sure how related this is to the other issue I saw, where the backup clone was left in a much worse state. The immediate effects of the vos backup are the same, but I'm still not sure what caused the demand salvage of the volume later during the day in that case. In that case, that was a home directory that was very much in use, and something triggered the salvage, there's just nothing in the logs to indicate why.

--andy
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to