On 1/17/2017 3:45 PM, Stephen Joyce wrote: > I know the current best-practice for changing the IP addresses of AFS > database servers is don't do it. > > But assuming that I want/need to change IPs and have available hardware, > is the use of clone dbservers the preferred method? I can tolerate short > service interruptions of up to a few minutes as long as they're planned > for low-utilization times.
um, not really. > Initial condition is 3 dbservers ("OLD") located via AFSDB & SRV, I assume these servers are who, what and when as listed in the CellServDB file distributed from http://www.central.org/csdb.html and included in every OpenAFS distribution. > running 1.6.x. Desired final condition is 3 dbservers ("NEW") with > different IP addresses, also running 1.6.x (for now). The first thing to be aware of is that any entries in the CellServDB file take precedence over information provided via DNS. For recent OpenAFS releases the precedence order is * CellServDB file * DNS SRV * DNS AFSDB The Unix cache manager only uses the IPv4 addresses that are provided in the CellServDB file. Whereas the Windows cache manager only uses the host name and performs a DNS A query on the name to obtain the IP address to use. The CellServDB file contains entries for physics.unc.edu but not cas.unc.edu. Although physics.unc.edu lists the same DB servers as cas.unc.edu. The second thing to be aware of is that a UBIK quorum is defined by the set of dbservers that share a common configuration. Running OpenAFS UBIK servers with a mixture of configurations can lead to more than one dbserver believe it is the master. The UBIK clone servers are interesting because they are documented as being non-voting. That isn't exactly true. All UBIK dbservers must maintain connectivity with every other UBIK dbserver in its configuration. What is special about clones is not that they don't vote but that 1. they cannot vote for themselves 2. their vote for other servers are received and then discarded 3. a clone cannot be the source of the best database. Many sites have experienced problems with UBIK quorums consisting of more than 3 servers. Some sites have successfully run with as many as 5 servers. It really depends on the number of number of clients and the average rate of application RPCs (VL, PT, ...). The primary benefit of using clones in OpenAFS is when you wish to prevent a server with a low IPv4 address from being elected the coordinator (aka sync site). > I'm roughing out a procedure, but my current thinking involves.. > > add 3 NEW dbservers as r/o clones (restarting db procs) I don't believe that using clones at this stage is helpful. Also, you should leave all of the DB servers shutdown for at least 90 seconds when modifying the configuration. > modify DNS to show all 6 IPs. > 'fs newcell' or restart all afsd's (including on servers) You will also need to update the configuration and restart the fileservers. The fileservers are clients of the PT and VL servers but use the server CellServDB file for their server info. > swap clone/non-clone roles so that NEW dbservers are r/w and OLD > dbservers are r/o clones (restarting db procs). At this point, sync must > be a non-clone, r/w "NEW" server. Using clones to prevent the old servers from becoming coordinator is the proper use. You might want to consider only leaving one of the old servers running at this point. Be sure to shutdown all dbservers when the configuration is changed. > Verify with udebug. Any client afsd's > not restarted/newcell'ed won't be able to make pt/vl changes. The fileservers when started modify their VL entry. If their CellServDB files are not updated as well, then they won't be able to registered. > modify DNS to show only 3 NEW IPs > 'fs newcell' or restart of all afsd's (including on servers) > > remove 3 OLD dbservers which must be r/o clones (restarting db procs). > Any client afsd's not restarted/newcell'ed won't be able to query > pt/vlservers. correct. > Because it could take some time to restart/newcell all clients, I'm > thinking of doing the clone addition/dns steps then waiting some time > (week+) before doing the role swap and second dns change. Then waiting > another period of time (week+) before doing the last removal. > > I'm assuming that I can use -auditlog (or even a packet sniffer) to see > what clients might still be using the OLD dbservers prior to the final > decommissioning. rxdebug <dbserver> <port> -peer > Seems a bit too simple. What am I missing? Good luck. Jeffrey Altman
<<attachment: jaltman.vcf>>
smime.p7s
Description: S/MIME Cryptographic Signature