On 10/29/2014 06:47 PM, Garance A Drosehn wrote:
Hi.
We have AFS db servers on some ancient hardware, and decided to move
them to be virtual machines on much newer hardware. I've moved one
of them already, and the final result seems to be fine. There was
one minor oddity during the physical-to-virtual move which was a
little worrisome, so I thought I'd ask if there was some other step
that I should do.
We have four machines running as AFS DB servers, and we're virtualizing
only one of those per day.
What I did was get a list of running AFS processes via 'bos status'.
I then did a 'bos stop -wait' for each of those processes (kaserver,
buserver, ptserver, vlserver, upclientetc). We then did the P2V copy
to make a duplicate of the running system into a virtual-machine.
After checking that copied system image, we disconnected the older
hardware-based image from the network, brought up the VM copy, and
I then 'bos start'-ed all the AFS processes which had been 'stop'-ed
before the copy was done. Once those AFS processes were running in
the VM-based image, everything seems perfectly fine.
The oddity is that during the time that the AFS processes were not
running on either machine, AFS access on many of our AFS clients
was pretty slow. Everything worked, but much slower than normal.
I'm pretty sure the delay was all in the lookup-step, and that if
some AFS client already had a file open in AFS then I/O performance
to that file was fine.
Was there some step I should have done so all AFS clients would know
that the DB server was gone, so they shouldn't wait around for replies
from it?
Went through something similar, here is my understanding (corrections
welcome!):
AFS clients-as-in-the-kernel-module will have a preferred VLserver to
talk to (fs getserverpref -vlservers), but should figure out after
~60sec that that one is gone and then switch to the next one (and not
come back until they restart, or their newly-preferred DB server also is
unreachable).
AFS clients-as-in-userspace tools (vos exa, pts) will contact a random
DB server each time, so in your case have 1/4 chance of waiting (no
"learning" over several invocations).
And indeed once the client has already found a particular volume, they
should not notice the DB server outage.
AFAIK there is no gentle way to pre-announce "this one is going away".
You could push a new CellServDB before every update, and run "fs
setserverprefs -vlservers" to penalize the machine that is going away
(or restart the AFS clients), but in our case we didn't do this.
Cheers
jan
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info