On Wed, Sep 11, 2019 at 10:38:26AM +1000, Ian Wienand wrote: > On Fri, Aug 30, 2019 at 12:35:09PM +1000, Ian Wienand wrote: > > I'm struggling to find an angle to debug very long "vos release" times > > with some of our volumes.
> So some more details ... I've managed to take out any write updates > from the equation, but the volume with no updates still takes quite a > long time to release. > Ergo the "vos release" of the volume with no changes has resulted in > about 50gb of data being sent to the R/O mirror, and consequently > long release times. To follow up on this; auristor was very helpful in IRC (thanks, again!) and indeed this "vos release" *was* transferring an unexpectedly large amount of data. The conclusion reached was that the R/O release "backtracks" 15 minutes from before the "Last Updated" time of the R/W volume when requesting incremental updates, to avoid issues with clock skew across hosts. In our situation, the last volume update was a very large pull from the upstream mirror (it happens with new distros, big rebuilds, etc). Then the *next* vos release (the one I documented trying in prior mail) does do incremental updates, but from 15 minutes before the the last update -- in our case this would be basically the whole mirror pull; again. This means in our cron jobs we are pulling lots of data, taking lots of time, hitting timeouts, which then aborts and locks volumes, which then makes a negative feedback loop of even more data to pull next time. Indeed while successive "vos release" would pull all 50gb; by touching a file in the root directory and waiting the next "vos release" completed in seconds. The solution suggested is a 15+ minute sleep and then a trivial update to the volume. This ensures that *next time* you release, you only backtrack into one trivial update and don't risk pulling much more data than required. I implemented this in our scripts with [1] For completeness, I have captured a run of the mirror rsync and extracted the file server audit logs for that run in [2]. However, I think rsync touching too many files is a red-herring. The other thing suggested was that timeouts are best worked around by using "-localauth" to do the vos release somewhere where it won't timeout. remctl was suggested [3] and is apparently commonly used for this purpose. Thanks for the input, -i [1] https://review.opendev.org/#/c/681367 [2] http://people.redhat.com/~iwienand/fedora-mirror-11-09-2019.tar.gz [3] https://www.eyrie.org/~eagle/software/remctl/ _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
