On 2/4/2018 7:54 AM, Dirk Heinrichs wrote: > Am 04.02.2018 um 13:29 schrieb Jose M Calhariz: > >> The core of my infra-structure are 4 afsdb > > Wasn't it so that it's better to have an odd number of DB servers (with > a max. of 5)?
The maximum number of ubik servers in an AFS3 cell is 20. This is a protocol constraint. However, due to performance characteristics it is unlikely that anyone could run that number of servers in a production cell. As the server count increases the number of messages that must be exchanged to conduct an election, complete database synchronization recovery, maintain quorum, and complete remote transactions. These messages compete with the application level requests arriving from clients. As the application level calls (vl, pt, ...) increase the risk of delayed processing of disk and vote calls increases which can lead to loss of quorum or remote transaction failures. The reason that odd numbers of servers are preferred is because of the failover properties. one server - single point of failure. outage leads to read and write failures. two servers - single point of failure for writes. only the lowest ipv4 address server can be elected coordinator. if it fails, writes are blocked. If it fails during a write transaction, read transactions on the second server are blocked until the first server recovers. three or four servers - either the first or second lowest ipv4 address servers can be elected coordinator. any one server can fail without loss of write or read. five or six servers - any of the first three lowest ipv4 address servers can be elected coordinator. any two servers can fail without loss of write or read. Although adding a fourth server increases the number of servers that can satisfy read requests, the lack of improved resiliency to failure and the increased risk of quorum loss makes its less desirable. The original poster indicated that his ubik servers are virtual machines. The OpenAFS Rx stack throughput is limited by the clock speed of a single processor core. The 1.6 ubik stack is further limited by the need to share a single processor core with all of the vote, disk and application call processing. As a result, anything that increases the overhead reduces increases the risk of quorum failures. This includes virtualization as well as the overhead imposed as a result of Meltdown and Spectre fixes. Meltdown and Spectre can provided a double whammy as a result of increased overhead both within the virtual machine and within the host's virtualization layer. AuriStor's UBIK variant does not suffer the scaling problems of AFS3 UBIK. AuriStor's UBIK has been successfully tested with 80 ubik servers in a cell. This is possible because of a more efficient protocol that is incompatible with AFS3 UBIK and the efficiencies in AuriStor's Rx implementation. Jeffrey Altman AuriStor, Inc.
<<attachment: jaltman.vcf>>
smime.p7s
Description: S/MIME Cryptographic Signature
