On Wed, Dec 17, 2025 at 7:14 AM Ilya Maximets <[email protected]> wrote:
>
>  Section 4.2.3 "Disruptive Servers" says:
>
>     ...if a server receives a RequestVote request within the minimum
>     election timeout of hearing from a current leader, it does not
>     update its term or grant its vote.  It can either drop the request,
>     reply with a vote denial, or delay the request; the result is
>     essentially the same...
>
> However, while logic in the code does make it skip the term update, it
> still proceeds to reply with the vote, because when the suppression
> check returns true, we'll not enter the 'if' block and will proceed to
> the RPC 'switch' instead.
>
> It will reply with a vote for the candidate it already voted for on
> this term for an actual vote and it will vote for the requester on
> the pre-vote.  This effectively bypasses the pre-vote scheme as the
> disruptive server will win the pre-vote and proceed to the regular
> election.

Thanks Ilya for fixing this.

This was surprising to me. I checked in more detail and now I see there is
another defect that actually lets the pre-vote succeed. In
raft_handle_vote_request__(), it returns true early
because "!uuid_is_zero(&raft->vote)" before actually comparing which log
end is more up-to-date (usually the disruptive server's log is behind and
should fail the prevote). So I think if rq->is_prevote we should compare
the logs first. Of course this should be a separate patch.

> And while the disruptor will not win the actual vote, it
> has a term +1 and will reply with a "stale term" to the next append
> request, forcing the current leader to step down.
>
> Fix that by actually not responding to the disruptive vote requests,
> i.e. by taking the "drop the request" route.
>
> The test is updated with a new failure model where vote requests are
> actually getting sent out.  There is a value in testing the full RPC
> stop as well, so keeping the current checks as they are.
>
> The FT_FORCE_ELECTION enum entry is not actually needed, but the code
> feels awkward without it.

This is not very important but I think it may be better to avoid this enum
because when the failure_test stays FT_FORCE_ELECTION, some would assume
the behavior would keep forcing elections until it is set back to
FT_NO_TEST, but it is actually a one-time test. Tests such as
FT_TRANSFER_LEADERSHIP automatically sets failure_test back to FT_NO_TEST,
but this one doesn't. It would be consistent if the code sets it back to
FT_NO_TEST but then again the FT_FORCE_ELECTION is useless. So, probably we
should just set failure_test to FT_NO_TEST directly, and it is also clear
from the code that the test is carried out directly. But I don't have a
strong opinion if you want to keep it as is.

Acked-by: Han Zhou <[email protected]>

>
> Also removing the duplicate term check in the vote reply handler,
> as it is now clear that the term check will not be skipped.
>
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered
databases.")
> Signed-off-by: Ilya Maximets <[email protected]>
> ---
>  ovsdb/raft.c           | 23 ++++++++++++++++-------
>  tests/ovsdb-cluster.at | 12 ++++++++++++
>  2 files changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> index 32349c8af..a7e631b2c 100644
> --- a/ovsdb/raft.c
> +++ b/ovsdb/raft.c
> @@ -80,6 +80,7 @@ enum raft_failure_test {
>      FT_CRASH_BEFORE_SEND_SNAPSHOT_REP,
>      FT_DELAY_ELECTION,
>      FT_DONT_SEND_VOTE_REQUEST,
> +    FT_FORCE_ELECTION,
>      FT_STOP_RAFT_RPC,
>      FT_TRANSFER_LEADERSHIP,
>      FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ,
> @@ -3975,10 +3976,6 @@ static void
>  raft_handle_vote_reply(struct raft *raft,
>                         const struct raft_vote_reply *rpy)
>  {
> -    if (!raft_receive_term__(raft, &rpy->common, rpy->term)) {
> -        return;
> -    }
> -
>      if (raft->role != RAFT_CANDIDATE) {
>          return;
>      }
> @@ -4740,10 +4737,12 @@ raft_handle_rpc(struct raft *raft, const union
raft_rpc *rpc)
>          s->last_msg_ts = time_msec();
>      }
>
> +    if (raft_should_suppress_disruptive_server(raft, rpc)) {
> +        return;
> +    }
> +
>      uint64_t term = raft_rpc_get_term(rpc);
> -    if (term
> -        && !raft_should_suppress_disruptive_server(raft, rpc)
> -        && !raft_receive_term__(raft, &rpc->common, term)) {
> +    if (term && !raft_receive_term__(raft, &rpc->common, term)) {
>          if (rpc->type == RAFT_RPC_APPEND_REQUEST) {
>              /* Section 3.3: "If a server receives a request with a stale
term
>               * number, it rejects the request." */
> @@ -5267,6 +5266,7 @@ raft_unixctl_failure_test(struct unixctl_conn *conn
OVS_UNUSED,
>          failure_test = FT_CRASH_BEFORE_SEND_SNAPSHOT_REP;
>      } else if (!strcmp(test, "delay-election")) {
>          failure_test = FT_DELAY_ELECTION;
> +
>          struct raft *raft;
>          HMAP_FOR_EACH (raft, hmap_node, &all_rafts) {
>              if (raft->role == RAFT_FOLLOWER) {
> @@ -5275,6 +5275,15 @@ raft_unixctl_failure_test(struct unixctl_conn
*conn OVS_UNUSED,
>          }
>      } else if (!strcmp(test, "dont-send-vote-request")) {
>          failure_test = FT_DONT_SEND_VOTE_REQUEST;
> +    } else if (!strcmp(test, "force-election")) {
> +        failure_test = FT_FORCE_ELECTION;
> +
> +        struct raft *raft;
> +        HMAP_FOR_EACH (raft, hmap_node, &all_rafts) {
> +            if (raft->role != RAFT_LEADER) {
> +                raft_start_election(raft, true, false);
> +            }
> +        }
>      } else if (!strcmp(test, "stop-raft-rpc")) {
>          failure_test = FT_STOP_RAFT_RPC;
>      } else if (!strcmp(test,
> diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
> index 91a76cb81..ff18ec556 100644
> --- a/tests/ovsdb-cluster.at
> +++ b/tests/ovsdb-cluster.at
> @@ -1005,6 +1005,18 @@ for i in $(seq $n); do
>      AT_CHECK([ovs-appctl -t $(pwd)/s$i cluster/status $schema_name |
grep "Term: 1"], [0], [ignore])
>  done
>
> +# Now force election without pausing RPC, so the requests are actually
sent.
> +AT_CHECK([ovs-appctl -t $(pwd)/s2 cluster/failure-test force-election],
[0], [ignore])
> +
> +# The server transitions into a candidate role while engaging the test
> +# and shortly after it should step back to be a follower.
> +OVS_WAIT_UNTIL([ovs-appctl -t $(pwd)/s2 cluster/status $schema_name |
grep "Role: follower"])
> +
> +# No term change.
> +for i in $(seq $n); do
> +    AT_CHECK([ovs-appctl -t $(pwd)/s$i cluster/status $schema_name |
grep "Term: 1"], [0], [ignore])
> +done
> +
>  for i in $(seq $n); do
>      OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid])
>  done
> --
> 2.52.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to