On 6/18/2018 9:07 AM, Andreas Ladanyi wrote:
>>
>> The ubik clients do not rank servers based upon IP address.  What they
>> do is:
> ok. Then maybe i misunderstood the documentation
> (http://docs.openafs.org/QuickStartUnix/HDRWQ114.html) which tells me
> the machine with lowest ip is "usually"  elected as the ubik coordinator.

The algorithm used to elect the coordinator is specific to the ubik
servers that maintain a synchronized database.  The clients (vos, pts,
cache managers, backup, aklog, pam_afs_session, etc) do not speak ubik;
they speak the application specific protocols (VL, PR, BUDB, etc.).  The
clients do not have any visibility into which ubik instances are
electable, which instances have network connectivity to elicit
sufficient votes, nor what algorithm is used to rank (order) the ubik
instances for election purposes.

AuriStorFS ubik for example permits arbitrary ranking of servers based
upon configuration.  Just because a server has a smaller numeric IPv4
address doesn't mean that it is the best server to be the read/write
copy of the database.

> I followed the instruction on this paper to add a new db server machine
> with lowest ip.
>>
>> 1. compute the length of the ordered server list
>>
>>   A B C D
>>
>> 2. then generate a random number from 0..<length - 1>
>>
>> 3. use that number as an index into the list to decide which is first
>>
>> 4. and reorder the list as if it were a circular queue.  So if the
>> random number selected was 2, then the list would become
>>
>>   C D A B
>>
>> The only time the coordinator must be contacted is for a write
>> transaction.  All read transactions are processed by the first server
>> contacted.
> ok. thanks for explanation.
>>
>> My conclusion is that there is something about your cell configuration
>> that results in a write transaction for each token requested.  For example:
> I straced aklog for some tests and could see if aklog sometimes ask the
> new db server (which is offline) and then wait for a timeout (hangs
> about 15 sec) and if ask the old online db servers from CellServDB
> without timeout (hang).
> 
> This seems to cause the ssh login hanging symptom because pam debug
> shows me hanging about 15 sec when pam_afs calls aklog.
> 
> So on summary it seems to be better to first add the new db server to
> all db servers CellServDB / bos addhost and to bos restart the pt/vl
> instances for ubik corrdinator election on the servers and then to
> update the clients CellServDB.

That depends on whether or not the clients need to be able to find a
writable copy of the database or not.  If the clients must be able to
find the coordinator and the coordinator is a server that is not present
in the client's configuration, then the client won't simply experience a
random timeout but a failure.

> The documentation tells to first update clients CellServDB (when new db
> server with lowest ip) and then bring up new db server.
>>
>>  1. cell name:               example.com
> no, cellname a.b.c
>>
>>  2. One of the following is true:
>>
>>     a. realm name:           AD.EXAMPLE.COM
> no AD
> 
> REALM = A.B.C, MIT Kerberos
>>
>>     b. CellServDB's zeroth ubik server host domain:
>>
>>                              subnet.example.com
> I dont understand this example.


If the cell name is

   foo.example.com

and the Kerberos realm is

   FOO.EXAMPLE.COM

and the host names of the ubik servers are

   afsdb1.bar.example.com
   afsdb2.bar.example.com
   afsdb3.bar.example.com

then the default host to realm mapping of afsdb1.bar.example.com will be
to realm BAR.EXAMPLE.COM not FOO.EXAMPLE.COM.  Since BAR.EXAMPLE.COM !=
FOO.EXAMPLE.COM a foreign cell registration will be attempted.  However,
that doesn't appear to be the source of the delay.  If it were, the
tracing would show aklog attempting to access every protection server
until the coordinator was discovered.

>>  3. auto-registration of foreign PTS IDs enabled:
>>
>>     a. pam_afs_session configuration doesn't disable it
>>
>>     b. aklog executed without -noprdb
> yes, pam_afs_session calls aklog without -noprdb



<<attachment: jaltman.vcf>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to