Friedrich Weber <[email protected]> writes:

> corosync makes use of several timeouts, in particular the token and
> consensus timeouts. The sum of these two timeouts yields the minimum
> time a cluster needs to reestablish a membership after a token loss
> due to a complete node failure.
>
> By default, corosync sets the timeouts based on the cluster size [1]:
>
>     token timeout = token + (#nodes - 2) * token_coefficient
>     consensus timeout = 1.2 * token timeout
>
> token defaults to 3000ms, token_coefficient defaults to 650ms.
>
> With more than ~30 nodes in the default settings, the sum of token and
> consensus timeouts gets close to or exceeds 50-60s. As a result, after
> a token loss due to a complete node failure in an HA cluster, the
> watchdog may fence nodes because it takes too long to reestablish a
> new membership and quorum.
>
> One way to avoid this is to lower the sum of the token and consensus
> timeouts. The consensus timeout is intentionally slightly larger than
> the token timeout [2], so the definition of the consensus timeout in
> terms of the token timeout should be preserved. Since it does make
> sense to define both timeouts in terms of the cluster size, the most
> viable option to lower the timeouts appears to be to adjust the
> token_coefficient. Experiments suggest that the default 650ms is
> overly conservative considering the low-latency network requirements
> postulated in the admin guide [3].
>
> Hence, create new clusters with a default token coefficient of 125ms.
> This keeps the sum of token and consensus timeouts well below 50s for
> realistic cluster sizes. Users who prefer a larger token coefficient
> can manually override the token coefficient when creating a cluster
> via pvecm create. The token coefficient can also be changed for an
> existing cluster, this will be documented separately.
>
> Note that knet_ping_interval and knet_ping_timeout are derived from
> the token timeout, hence, a lower token coefficient will result in
> more frequent kronosnet pings and shorter ping timeouts.
>
> With this change, newly created clusters will always set an explicit
> token_coefficient in their corosync.conf.
>
> [1] 
> https://manpages.debian.org/trixie/corosync/corosync.conf.5.en.html#token_coefficient
> [2] 
> https://github.com/corosync/corosync/commit/b3e19b29058eafc3e808ded7f4c2440c3f957392
> [3] 
> https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements
>
> Signed-off-by: Friedrich Weber <[email protected]>
> ---
>  src/PVE/API2/ClusterConfig.pm | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/PVE/API2/ClusterConfig.pm b/src/PVE/API2/ClusterConfig.pm
> index 1bc7bcf..8df257a 100644
> --- a/src/PVE/API2/ClusterConfig.pm
> +++ b/src/PVE/API2/ClusterConfig.pm
> @@ -111,12 +111,21 @@ __PACKAGE__->register_method({
>                  minimum => 1,
>                  optional => 1,
>              },
> +            'token-coefficient' => {
> +                type => 'integer',
> +                description => "Token coefficient to set in the corosync 
> configuration.",

This description does not help understanding what it does, no more than
its name at least. It would perhaps be preferable to say something along
the lines of:

"Coefficient used to determine Corosync's token timeout. See the
corosync.conf(5) manual for more details."


> +                default => 125,
> +                minimum => 0,
> +                optional => 1,
> +            },
>          }),
>      },
>      returns => { type => 'string' },
>      code => sub {
>          my ($param) = @_;
>  
> +        $param->{'token-coefficient'} //= 125;
> +
>          die "cluster config '$clusterconf' already exists\n" if -f 
> $clusterconf;
>  
>          my $rpcenv = PVE::RPCEnvironment::get();

-- 
Maximiliano



Reply via email to