We're getting ready for Fall term. Our oncall staff don't like getting paged in the middle of the night to handle fileserver capacity issues. So, we moved a bunch of volumes around this past week to even out the load. We kept track of when the moves started, and looked at the ones that took the longest.
Here's part of the move log for the volume that took the longest to move (starting at 11:25:26, with the previous 4 seconds earlier, and the next one 28:22 minutes later): 2009/08/14 11:25:20: Processing volume 296, 'user.xxxxxx'. [...] 2009/08/14 11:25:26: Processing volume 297, 'user.yyyyyy'. vos move user.yyyyyy afs58.ifs.umich.edu /vicepa localhost a -localauth Volume 1956456697 moved from afs58.ifs.umich.edu /vicepa to localhost /vicepa vos backup user.yyyyyy -localauth Created backup volume for user.yyyyyy 2009/08/14 11:53:48: Processing volume 298, 'user.zzzzzz'. A vos move shouldn't take 28:22 minutes. Here's part of a combined VolserLog, with fileserver afs58 moving to afs61, that shows what's going on. 2009/08/14 11:25:26 afs58 1 Volser: Clone: Cloning volume 1956456697 to new volume 1957504905 2009/08/14 11:25:34 afs61 VAttachVolume: Failed to open /vicepa/V1956456697.vol (errno 2) 2009/08/14 11:25:34 afs61 1 Volser: CreateVolume: volume 1956456697 (user.yyyyyy) created 2009/08/14 11:30:59 afs58 trans 1914273 on volume 1957504905 is older than 300 seconds 2009/08/14 11:31:01 afs61 trans 236047 on volume 1956456697 is older than 300 seconds 2009/08/14 11:31:29 afs58 trans 1914273 on volume 1957504905 is older than 330 seconds 2009/08/14 11:31:31 afs61 trans 236047 on volume 1956456697 is older than 330 seconds [...] 2009/08/14 11:48:30 afs58 trans 1914273 on volume 1957504905 is older than 1350 seconds 2009/08/14 11:48:32 afs61 trans 236047 on volume 1956456697 is older than 1350 seconds 2009/08/14 11:49:00 afs58 trans 1914273 on volume 1957504905 is older than 1380 seconds 2009/08/14 11:49:02 afs61 trans 236047 on volume 1956456697 is older than 1380 seconds 2009/08/14 11:49:22 afs58 1 Volser: Delete: volume 1956456697 deleted 2009/08/14 11:53:44 afs58 1 Volser: Delete: volume 1957504905 deleted 2009/08/14 11:53:44 afs61 1 Volser: Clone: Cloning volume 1956456697 to new volume 1956456699 The transaction seems to get stuck. We're running OpenAFS-1.4.11 on linux 2.6.30.2. The two machine rooms are on opposite sides of town, 6 or 7 hops away, on what I believe is a 10GB network. If anyone else is seeing the '... is older than ...' messages in their VolserLog's, we'd like to hear about it. If anyone has any ideas what may be causing this, we'd really like to hear that, too. Thanks, all. _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
