On 2/4/2018 7:54 AM, Dirk Heinrichs wrote:
> Am 04.02.2018 um 13:29 schrieb Jose M Calhariz:
> 
>> The core of my infra-structure are 4 afsdb
> 
> Wasn't it so that it's better to have an odd number of DB servers (with
> a max. of 5)?

The maximum number of ubik servers in an AFS3 cell is 20.  This is a
protocol constraint.  However, due to performance characteristics it is
unlikely that anyone could run that number of servers in a production
cell.  As the server count increases the number of messages that must be
exchanged to conduct an election, complete database synchronization
recovery, maintain quorum, and complete remote transactions.  These
messages compete with the application level requests arriving from
clients.  As the application level calls (vl, pt, ...) increase the risk
of delayed processing of disk and vote calls increases which can lead to
loss of quorum or remote transaction failures.

The reason that odd numbers of servers are preferred is because of the
failover properties.

one server - single point of failure.  outage leads to read and write
failures.

two servers - single point of failure for writes.  only the lowest ipv4
address server can be elected coordinator.  if it fails, writes are
blocked.  If it fails during a write transaction, read transactions on
the second server are blocked until the first server recovers.

three or four servers - either the first or second lowest ipv4 address
servers can be elected coordinator.  any one server can fail without
loss of write or read.

five or six servers - any of the first three lowest ipv4 address servers
can be elected coordinator.  any two servers can fail without loss of
write or read.

Although adding a fourth server increases the number of servers that can
satisfy read requests, the lack of improved resiliency to failure and
the increased risk of quorum loss makes its less desirable.


The original poster indicated that his ubik servers are virtual
machines.  The OpenAFS Rx stack throughput is limited by the clock speed
of a single processor core.  The 1.6 ubik stack is further limited by
the need to share a single processor core with all of the vote, disk and
application call processing.  As a result, anything that increases the
overhead reduces increases the risk of quorum failures.

This includes virtualization as well as the overhead imposed as a result
of Meltdown and Spectre fixes.  Meltdown and Spectre can provided a
double whammy as a result of increased overhead both within the virtual
machine and within the host's virtualization layer.

AuriStor's UBIK variant does not suffer the scaling problems of AFS3
UBIK.  AuriStor's UBIK has been successfully tested with 80 ubik servers
in a cell. This is possible because of a more efficient protocol that is
 incompatible with AFS3 UBIK and the efficiencies in AuriStor's Rx
implementation.

Jeffrey Altman
AuriStor, Inc.

<<attachment: jaltman.vcf>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to