> Every night at midnight, we run 'vos backupsys'. For three > nights in a row, on one of the servers I've upgraded to 1.6.5 > and dafs, I've been getting the following errors, and it > mostly stops being a fileserver.
[ ... ] > Sun Feb 9 00:00:03 2014 SYNC_getCom: error receiving command > Sun Feb 9 00:00:03 2014 FSYNC_com: read failed; dropping connection > (cnt=493489) > Sun Feb 9 00:00:03 2014 _VLockFd: conflicting lock held on fd 225, offset > 538046785 by pid 4129 (locktype=1) > Sun Feb 9 00:00:03 2014 VAttachVolume: another program has vol 538046785 > locked > Sun Feb 9 00:00:03 2014 VPreattachVolumeByVp_r: volume 538046785 not in > quiescent state (state 2 flags 0x18) [ ... ] > Sun Feb 9 00:00:03 2014 1 Volser: Clone: Recloning volume 538046785 to > volume 538046787 > Sun Feb 9 00:00:03 2014 SYNC_ask: length field in response inconsistent on > circuit 'FSSYNC' > Sun Feb 9 00:00:03 2014 SYNC_ask: protocol communications failure on circuit > 'FSSYNC'; attempting reconnect to server [ ... ] That " _VLockFd: conflicting lock held" and "VAttachVolume: another program has vol NNNN locked" looks vaguely familiar, and in a case that I have seen it was because a DB server was offline, and 'vos' took a very very long time to switch to an online one. But this was with 1.4 and supposedly 1.6 should have a shorter timeout. In another case that vaguely resembles this there was a race between creating a clone and registering it in the VLDB: http://rt.central.org/rt/Ticket/Display.html?id=131797 It would be interesting to know what processes 21378, 4129, 7155 were doing and why they held a lock on the RW original. _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
