On 31 Jan 2026, at 2:32, Ilya Maximets wrote:

> We recently fixed a couple issues with actually ignoring a disruptive
> server as well as ignoring pre-vote replies during the actual vote:
>
>  c34c21bb0184 ("ovsdb: raft: Actually suppress the disruptive server.")
>  5f12cd410acf ("ovsdb: raft: Discard pre-vote replies during the actual 
> election.")
>
> Without both of these fixes, the following scenario was possible:
>
>   1. A cluster with 3 servers: A (leader), B and C.  Term: X.
>   2. C goes down.
>   3. A and B commit extra data --> C now has an outdated log.
>   4. A transfers leadership and goes down.
>   5. B initiates the election increasing the term to X+1 (no pre-vote).
>   6. B goes down.
>   7. Now the whole cluster is down with database files containing terms
>      X, X+1, and X, accordingly.  Log on C is behind.
>   8. All servers go back up.
>   9. C initiates pre-vote on term X.
>  10. A sends pre-vote for C on term X (we do not compare the log yet).
>  11. B sends pre-vote for C on term X+1, because in the absence of
>      commit c34c21bb0184, we send a reply even if the term doesn't
>      match as long as request is considered disruptive.
>  12. C receives pre-vote for itself from A on term X.
>  13. C now has 2 out of 3 pre-votes (self and A).
>  14. C immediately initiates the actual vote on term X+1.
>  15. C receives pre-vote for itself from B on term X+1.
>  16. In the absence of commit 5f12cd410acf, C treats the pre-vote
>      from B as an actual vote.
>  17. C now thinks that it has 2 out of 3 actual votes and declares
>      itself a leader for term X+1.
>  18. A doesn't send an actual vote, because C has outdated log.
>  19. B sends an actual vote reply voting for itself, because it
>      already voted on term X+1 for itself at step 5.
>  20. C, as a leader, ignores the extra vote from B.
>  21. C sends append requests to A and B with an outdated log.
>  22. A and B acknowledge a new leader.
>  23. A and B attempt to truncate their logs below the commit index.
>  24. A and B crash on assertion failure and can't recover, because
>      the illegal truncation is part of their logs now.
>
> In this situation it may also be possible to have two leaders
> elected at the same time, in case A or B elect themselves before
> C sends the first append request, as they never voted for C, so
> can vote for each other.
>
> Either one of the fixes above breaks the scenario.  With them, B
> wouldn't send a pre-vote on a mismatching term and C wouldn't treat
> it as an actual vote.
>
> Adding a test that reproduces it, to have a better coverage, as we
> thought that pre-vote replies during the actual vote should not be
> possible.  They still should not, but only since we got the other
> fix in place.
>
> Additionally, step 10 can also be improved in the future by actually
> comparing the log length on a pre-vote (not in this patch).
>
> The described scenario actually happened in the ovn-kubernetes CI,
> as it used an older OVS 3.4.1 Fedora package that doesn't have the
> aforementioned fixes.
>
> Signed-off-by: Ilya Maximets <[email protected]>

Thanks for the patch, Ilya. The changes make sense to me according to your 
description.

Acked-by: Eelco Chaudron <[email protected]>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to