corosync makes use of several timeouts, in particular the token and
consensus timeouts. The sum of these two timeouts yields the minimum
time a cluster needs to reestablish a membership after a token loss
due to a complete node failure.

By default, corosync sets the timeouts based on the cluster size [1]:

    token timeout = token + (#nodes - 2) * token_coefficient
    consensus timeout = 1.2 * token timeout

token defaults to 3000ms, token_coefficient defaults to 650ms.

With more than ~30 nodes in the default settings, the sum of token and
consensus timeouts gets close to or exceeds 50-60s. As a result, after
a token loss due to a complete node failure in an HA cluster, the
watchdog may fence nodes because it takes too long to reestablish a
new membership and quorum.

One way to avoid this is to lower the sum of the token and consensus
timeouts. The consensus timeout is intentionally slightly larger than
the token timeout [2], so the definition of the consensus timeout in
terms of the token timeout should be preserved. Since it does make
sense to define both timeouts in terms of the cluster size, the most
viable option to lower the timeouts appears to be to adjust the
token_coefficient. Experiments suggest that the default 650ms is
overly conservative considering the low-latency network requirements
postulated in the admin guide [3].

Hence, create new clusters with a default token coefficient of 125ms.
This keeps the sum of token and consensus timeouts well below 50s for
realistic cluster sizes. Users who prefer a larger token coefficient
can manually override the token coefficient when creating a cluster
via pvecm create. The token coefficient can also be changed for an
existing cluster, this will be documented separately.

Note that knet_ping_interval and knet_ping_timeout are derived from
the token timeout, hence, a lower token coefficient will result in
more frequent kronosnet pings and shorter ping timeouts.

With this change, newly created clusters will always set an explicit
token_coefficient in their corosync.conf.

[1] 
https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#token_coefficient
[2] 
https://github.com/corosync/corosync/commit/b3e19b29058eafc3e808ded7f4c2440c3f957392
[3] 
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements

Signed-off-by: Friedrich Weber <[email protected]>
---
 src/PVE/API2/ClusterConfig.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/PVE/API2/ClusterConfig.pm b/src/PVE/API2/ClusterConfig.pm
index 1bc7bcf..8df257a 100644
--- a/src/PVE/API2/ClusterConfig.pm
+++ b/src/PVE/API2/ClusterConfig.pm
@@ -111,12 +111,21 @@ __PACKAGE__->register_method({
                 minimum => 1,
                 optional => 1,
             },
+            'token-coefficient' => {
+                type => 'integer',
+                description => "Token coefficient to set in the corosync 
configuration.",
+                default => 125,
+                minimum => 0,
+                optional => 1,
+            },
         }),
     },
     returns => { type => 'string' },
     code => sub {
         my ($param) = @_;
 
+        $param->{'token-coefficient'} //= 125;
+
         die "cluster config '$clusterconf' already exists\n" if -f 
$clusterconf;
 
         my $rpcenv = PVE::RPCEnvironment::get();
-- 
2.47.3




Reply via email to