Hello, I'm struggling to find an angle to debug very long "vos release" times with some of our volumes. e.g. some runtimes for the "vos release" of a fedora mirror volume from [1]
2019-08-22 08:46:58 - 2019-08-22 14:50:31 2019-08-23 16:45:09 - 2019-08-24 00:27:14 2019-08-23 00:44:56 - 2019-08-23 08:12:44 Now a little bit changes between each mirror run, but not enough for 8+hours of incremental copying. Extracting for example the first run logs: --- 2019-08-22T08:46:58,270313531+00:00 Kerberos initialization for service/[email protected] + echo 'rsync completed successfully, running vos release.' rsync completed successfully, running vos release. + k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v mirror.fedora mirror.fedora RWrite: 536871006 ROnly: 536871007 number of sites -> 3 server afs01.dfw.openstack.org partition /vicepa RW Site server afs01.dfw.openstack.org partition /vicepa RO Site server afs02.dfw.openstack.org partition /vicepa RO Site This is a complete release of volume 536871006 Re-cloning permanent RO volume 536871007 ... done Getting status of parent volume 536871006... done Starting transaction on RO clone volume 536871007... done Setting volume flags for volume 536871007... done Ending transaction on volume 536871007... done Replacing VLDB entry for mirror.fedora... done Starting transaction on cloned volume 536871007... done Updating existing ro volume 536871007 on afs02.dfw.openstack.org ... Starting ForwardMulti from 536871007 to 536871007 on afs02.dfw.openstack.org (as of Thu Aug 22 02:44:19 2019). updating VLDB ... done Released volume mirror.fedora successfully Kerberos initialization for service/[email protected] + date --iso-8601=ns 2019-08-22T14:50:31,918754893+00:00 --- It seems to me that it must be choosing to remirror the whole RO volume (~450gb) rather than just taking incrementals. I can see nothing of interest in the VolSer; we just start seeing: Thu Aug 22 08:52:21 2019 trans 45925 on volume 536871007 is older than 300 seconds Thu Aug 22 08:52:51 2019 trans 45925 on volume 536871007 is older than 330 seconds which lines up time-wise. There's nothing wrong with the write speeds to vicepa on the server root@afs02:/vicepa# dd if=/dev/zero of=./test.file bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.5336 s, 164 MB/s The only thing that makes this perhaps odd is a client/server version mismatch; the client is running 1.8.3-1 (built from upstream) and the server is standard Xenial 1.6.15-1ubuntu1-debian It's unfortunately hard to say how long this has been going on; at least since we started exporting the update logs with [1]. [2] is the latest logs I could find from July on an older server (which *was* using the openafs 1.6.3 client) which looks very similar to me. Any and all suggestions welcome! -i [1] http://files.openstack.org/mirror/logs/rsync-mirrors/fedora* [2] http://paste.openstack.org/show/767468/ _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
