On 6/11/2018 11:44 AM, Andreas Ladanyi wrote:
> Hi,
> 
> i red this page http://docs.openafs.org/QuickStartUnix/HDRWQ114.html
> 
> In my case the new database server has lowest ip and has no database
> content.
> 
> So how is the database synchronised from the old database (old ubik)
> servers which are currently running and with up to date database content ?
> 
> Do i have to backup / restore the DB0 files to the new ubik coordintor
> once ? Or should i dont care about because ubik will do all magic for me ?

When a coordinator (aka sync site) is elected it enters recovery mode.

The first step in recovery mode is "find the latest database".  In
OpenAFS, the coordinator queries all of the non-clone peers for their
database version.  It then decides whether it has the latest database or
another peer does.

At this point the coordinator is in recovery state "Found DB".

If another peer does, then it fetches the more recent database.

At this point the coordinator is in recovery state "Have DB".

The coordinator then ensures that all peers receive the current DB.

At this point the recovery state is "Sent DB".

The coordinator can now begin processing write requests.  After the
first write request the recovery state becomes "Modified DB".

Although there isn't any need to copy the database to the new ubik
server before it is started, it is critically important that the new
ubik server be added to the server CellServDB on all of the existing
ubik servers before the new ubik server is started.

Imagine that your existing ubik servers are B, C and D.  Since the
lowest ranked server receives an extra half vote it is extremely
important that there be agreement on which server is the lowest ranked
server.

In the current configuration all of the servers are in agreement that
the ranking is order from lowest to highest is:

  B < C < D

Therefore, B gets the extra half vote and there are a total of 3.5
votes.  To be elected coordinator requires a minimum of 2 votes.

But what happens if you add server A

  A < B < C < D

In this configuration A gets the extra half vote and there are a total
of 4.5 votes.  To be elected coordinator requires a minimum of 2.5 votes.

When adding a new server without shutting down the cell there will be
some servers that are running with the old configuration and some with
the new one.  For example, if the knew configuration is known by servers
A and D but not B and C then A will receive 2.5 votes and it will be
elected coordinator.  However B will receive 2 votes and believe that it
has been elected coordinator.  This results in two servers accepting
write transactions which will result in data loss.

The underlying problem is that the ubik protocol variant implemented by
OpenAFS does not have a method of verifying which servers share the same
configuration and only permit votes to be cast for and accepted from
servers that share the same configuration.

In order to avoid the risk of database forking I recommend the following
procedure:

1. Update the client CellServDB and DNS SRV/AFSDB records to add the
   new server

2. Update the server CellServDB on all of the fileservers to add the
   new server and restart them (only restart if not also ubik servers)

3. Update the server CellServDB on all of the ubik servers to add the
   new server but do not restart.

4. In order from highest rank to lowest rank (D, C, B):

 a. Stop server  (bos stop <server> -all)

 b. Wait three minutes (to ensure that all other servers notice this
    server shutdown which is necessary to avoid bugs in most OpenAFS
    versions that can lead to database corruption.)

 c. Start server (bos start <server> -all)

 d. Repeat for the next lower ranked server.

5. Start the new server

This order will ensure that there is never any confusion for clients or
ubik servers.


Jeffrey Altman

<<attachment: jaltman.vcf>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to