Friedrich Weber <[email protected]> writes: A comment below
> corosync makes use of several timeouts, in particular the token and > consensus timeouts. The sum of these two timeouts yields the minimum > time a cluster needs to reestablish a membership after a token loss > due to a complete node failure. > > By default, corosync sets the timeouts based on the cluster size [1]: > > token timeout = token + (#nodes - 2) * token_coefficient > consensus timeout = 1.2 * token timeout > > token defaults to 3000ms, token_coefficient defaults to 650ms. > > With more than ~30 nodes in the default settings, the sum of token and > consensus timeouts gets close to or exceeds 50-60s. As a result, after > a token loss due to a complete node failure in an HA cluster, the > watchdog may fence nodes because it takes too long to reestablish a > new membership and quorum. > > One way to avoid this is to lower the sum of the token and consensus > timeouts. The consensus timeout is intentionally slightly larger than > the token timeout [2], so the definition of the consensus timeout in > terms of the token timeout should be preserved. Since it does make > sense to define both timeouts in terms of the cluster size, the most > viable option to lower the timeouts appears to be to adjust the > token_coefficient. Experiments suggest that the default 650ms is > overly conservative considering the low-latency network requirements > postulated in the admin guide [3]. > > Hence, create new clusters with a default token coefficient of 125ms. > This keeps the sum of token and consensus timeouts well below 50s for > realistic cluster sizes. Users who prefer a larger token coefficient > can manually override the token coefficient when creating a cluster > via pvecm create. The token coefficient can also be changed for an > existing cluster, this will be documented separately. > > Note that knet_ping_interval and knet_ping_timeout are derived from > the token timeout, hence, a lower token coefficient will result in > more frequent kronosnet pings and shorter ping timeouts. > > With this change, newly created clusters will always set an explicit > token_coefficient in their corosync.conf. > > [1] > https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#token_coefficient > [2] > https://github.com/corosync/corosync/commit/b3e19b29058eafc3e808ded7f4c2440c3f957392 > [3] > https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements > > Signed-off-by: Friedrich Weber <[email protected]> > --- > src/PVE/API2/ClusterConfig.pm | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/src/PVE/API2/ClusterConfig.pm b/src/PVE/API2/ClusterConfig.pm > index 1bc7bcf..8df257a 100644 > --- a/src/PVE/API2/ClusterConfig.pm > +++ b/src/PVE/API2/ClusterConfig.pm > @@ -111,12 +111,21 @@ __PACKAGE__->register_method({ > minimum => 1, > optional => 1, > }, > + 'token-coefficient' => { > + type => 'integer', > + description => "Token coefficient to set in the corosync > configuration.", > + default => 125, > + minimum => 0, >From man 5 corosync.conf's token_coefficient documentation: "This value can be set to 0 resulting in effective removal of this feature.". If we want to expose setting this to 0 I would document that it has a special meaning and what does this entail. I would personally feel more comfortable setting `minimum => 1` for now instead. > + optional => 1, > + }, > }), > }, > returns => { type => 'string' }, > code => sub { > my ($param) = @_; > > + $param->{'token-coefficient'} //= 125; > + > die "cluster config '$clusterconf' already exists\n" if -f > $clusterconf; > > my $rpcenv = PVE::RPCEnvironment::get(); -- Maximiliano
