On Fri, Aug 30, 2019 at 12:35:09PM +1000, Ian Wienand wrote: > Hello, > > I'm struggling to find an angle to debug very long "vos release" times > with some of our volumes. e.g. some runtimes for the "vos release" of > a fedora mirror volume from [1] > > 2019-08-22 08:46:58 - 2019-08-22 14:50:31 > 2019-08-23 16:45:09 - 2019-08-24 00:27:14 > 2019-08-23 00:44:56 - 2019-08-23 08:12:44 > > Now a little bit changes between each mirror run, but not enough for > 8+hours of incremental copying. Extracting for example the first run > logs: > > --- > 2019-08-22T08:46:58,270313531+00:00 > Kerberos initialization for service/[email protected] > + echo 'rsync completed successfully, running vos release.' > rsync completed successfully, running vos release. > + k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v > mirror.fedora > > mirror.fedora > RWrite: 536871006 ROnly: 536871007 > number of sites -> 3 > server afs01.dfw.openstack.org partition /vicepa RW Site > server afs01.dfw.openstack.org partition /vicepa RO Site > server afs02.dfw.openstack.org partition /vicepa RO Site > This is a complete release of volume 536871006 > Re-cloning permanent RO volume 536871007 ... done > Getting status of parent volume 536871006... done > Starting transaction on RO clone volume 536871007... done > Setting volume flags for volume 536871007... done > Ending transaction on volume 536871007... done > Replacing VLDB entry for mirror.fedora... done > Starting transaction on cloned volume 536871007... done > Updating existing ro volume 536871007 on afs02.dfw.openstack.org ... > Starting ForwardMulti from 536871007 to 536871007 on afs02.dfw.openstack.org > (as of Thu Aug 22 02:44:19 2019). > updating VLDB ... done > Released volume mirror.fedora successfully > Kerberos initialization for service/[email protected] > + date --iso-8601=ns > 2019-08-22T14:50:31,918754893+00:00 > --- > > It seems to me that it must be choosing to remirror the whole RO > volume (~450gb) rather than just taking incrementals.
I'm not super-familiar with the volserver code (and the logging infrastructure seems kind of primitive), so I can't provide much insight here. > I can see nothing of interest in the VolSer; we just start seeing: > > Thu Aug 22 08:52:21 2019 trans 45925 on volume 536871007 is older than 300 > seconds > Thu Aug 22 08:52:51 2019 trans 45925 on volume 536871007 is older than 330 > seconds > > which lines up time-wise. There's nothing wrong with the write speeds > to vicepa on the server > > root@afs02:/vicepa# dd if=/dev/zero of=./test.file bs=1G count=1 oflag=dsync > 1+0 records in > 1+0 records out > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.5336 s, 164 MB/s > > The only thing that makes this perhaps odd is a client/server version > mismatch; the client is running 1.8.3-1 (built from upstream) and the > server is standard Xenial 1.6.15-1ubuntu1-debian I don't think that this kind of version mismatch is going to be relevant, though; this is to large extent "core behavior" that shouldn't be changing between releases. -Ben > It's unfortunately hard to say how long this has been going on; at > least since we started exporting the update logs with [1]. [2] is the > latest logs I could find from July on an older server (which *was* > using the openafs 1.6.3 client) which looks very similar to me. > > Any and all suggestions welcome! > > -i > > [1] http://files.openstack.org/mirror/logs/rsync-mirrors/fedora* > [2] http://paste.openstack.org/show/767468/ > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
