Re: [ovs-dev] [PATCH v3] docs: Document manual cluster recovery procedure.

Ilya Maximets Thu, 02 May 2024 14:52:07 -0700

On 4/26/24 18:54, Ihar Hrachyshka wrote:
> Remove the notion of cluster/leave --force since it was never
> implemented. Instead of these instructions, document how a broken
> cluster can be re-initialized with the old database contents.
> 
> Signed-off-by: Ihar Hrachyshka <[email protected]>
> 
> ---
> 
> v1: initial version.
> v2: remove --force mentioned in ovsdb-server(1).
> v3: multiple language and markup changes suggested by Ilya.


Thanks, Ihar!  This version looks good to me in general.
I have a couple of minor nits below.  If you agree, I can
fold those in while applying the change.

Let me know what you think.

Best regards, Ilya Maximets.

> 
> ---
>  Documentation/ref/ovsdb.7.rst | 44 ++++++++++++++++++++++++++++-------
>  ovsdb/ovsdb-server.1.in       |  3 +--
>  2 files changed, 37 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
> index 46ed13e61..5766e64b9 100644
> --- a/Documentation/ref/ovsdb.7.rst
> +++ b/Documentation/ref/ovsdb.7.rst
> @@ -315,16 +315,11 @@ The above methods for adding and removing servers only 
> work for healthy
>  clusters, that is, for clusters with no more failures than their maximum
>  tolerance.  For example, in a 3-server cluster, the failure of 2 servers
>  prevents servers joining or leaving the cluster (as well as database access).
> +
>  To prevent data loss or inconsistency, the preferred solution to this problem
>  is to bring up enough of the failed servers to make the cluster healthy 
> again,
> -then if necessary remove any remaining failed servers and add new ones.  If
> -this cannot be done, though, use ``ovs-appctl`` to invoke ``cluster/leave
> ---force`` on a running server.  This command forces the server to which it is
> -directed to leave its cluster and form a new single-node cluster that 
> contains
> -only itself.  The data in the new cluster may be inconsistent with the former
> -cluster: transactions not yet replicated to the server will be lost, and
> -transactions not yet applied to the cluster may be committed.  Afterward, any
> -servers in its former cluster will regard the server to have failed.
> +then if necessary remove any remaining failed servers and add new ones. If 
> this

Nit:  2 spaces between sentences.

> +is not an option, see the next section for `Manual cluster recovery`_.
>  
>  Once a server leaves a cluster, it may never rejoin it.  Instead, create a 
> new
>  server and join it to the cluster.
> @@ -362,6 +357,39 @@ Clustered OVSDB does not support the OVSDB "ephemeral 
> columns" feature.
>  ones when they work with schemas for clustered databases.  Future versions of
>  OVSDB might add support for this feature.
>  
> +Manual cluster recovery
> +~~~~~~~~~~~~~~~~~~~~~~~
> +
> +.. important::

Nit: An empty line here would be nice to be consistent at least
     within this document.

> +   The procedure below will result in ``cid`` and ``sid`` change. A *new*

Nit:  2 spaces between sentences.

> +   cluster will be initialized.
> +
> +To recover a clustered database after a failure:
> +
> +1. Stop *all* old cluster ``ovsdb-server`` instances before proceeding.
> +
> +2. Pick one of the old members which will serve as a bootstrap member of the
> +   to-be-recovered cluster.
> +
> +3. Convert its database file to the standalone format using ``ovsdb-tool
> +   cluster-to-standalone``.
> +
> +4. Backup the standalone database file.
> +
> +5. Create a new single-node cluster with ``ovsdb-tool create-cluster``
> +   using the previously saved standalone database file, then start
> +   ``ovsdb-server``.
> +
> +Once the single-node cluster is up and running and serves the restored data,
> +new members should be created and join the new cluster, as usual 
> (``ovsdb-tool
> +join-cluster``).

I'm having hard time reading 'new members should be created and join' as
my brain wants to relate 'should be' to both 'created' and 'join' and
'should be join' is not a correct construct.

How about: "new members should be created and added to the cluster, as usual,
with ``ovsdb-tool join-cluster``."  ?

Also, should it be a step 6 ?

> +
> +.. note::
> +
> +   The data in the new cluster may be inconsistent with the former cluster:
> +   transactions not yet replicated to the server chosen in step 2 will be 
> lost,
> +   and transactions not yet applied to the cluster may be committed.
> +
>  Upgrading from version 2.14 and earlier to 2.15 and later
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in
> index 9fabf2d67..23b8e6e9c 100644
> --- a/ovsdb/ovsdb-server.1.in
> +++ b/ovsdb/ovsdb-server.1.in
> @@ -461,8 +461,7 @@ This does not result in a three server cluster that lacks 
> quorum.
>  .
>  .IP "\fBcluster/kick \fIdb server\fR"
>  Start graceful removal of \fIserver\fR from \fIdb\fR's cluster, like
> -\fBcluster/leave\fR (without \fB\-\-force\fR) except that it can
> -remove any server, not just this one.
> +\fBcluster/leave\fR, except that it can remove any server, not just this one.
>  .IP
>  \fIserver\fR may be a server ID, as printed by \fBcluster/sid\fR, or
>  the server's local network address as passed to \fBovsdb-tool\fR's

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3] docs: Document manual cluster recovery procedure.

Reply via email to