On 31 Jan 2026, at 2:32, Ilya Maximets wrote:
> We recently fixed a couple issues with actually ignoring a disruptive
> server as well as ignoring pre-vote replies during the actual vote:
>
> c34c21bb0184 ("ovsdb: raft: Actually suppress the disruptive server.")
> 5f12cd410acf ("ovsdb: raft: Discard pre-vote replies during the actual
> election.")
>
> Without both of these fixes, the following scenario was possible:
>
> 1. A cluster with 3 servers: A (leader), B and C. Term: X.
> 2. C goes down.
> 3. A and B commit extra data --> C now has an outdated log.
> 4. A transfers leadership and goes down.
> 5. B initiates the election increasing the term to X+1 (no pre-vote).
> 6. B goes down.
> 7. Now the whole cluster is down with database files containing terms
> X, X+1, and X, accordingly. Log on C is behind.
> 8. All servers go back up.
> 9. C initiates pre-vote on term X.
> 10. A sends pre-vote for C on term X (we do not compare the log yet).
> 11. B sends pre-vote for C on term X+1, because in the absence of
> commit c34c21bb0184, we send a reply even if the term doesn't
> match as long as request is considered disruptive.
> 12. C receives pre-vote for itself from A on term X.
> 13. C now has 2 out of 3 pre-votes (self and A).
> 14. C immediately initiates the actual vote on term X+1.
> 15. C receives pre-vote for itself from B on term X+1.
> 16. In the absence of commit 5f12cd410acf, C treats the pre-vote
> from B as an actual vote.
> 17. C now thinks that it has 2 out of 3 actual votes and declares
> itself a leader for term X+1.
> 18. A doesn't send an actual vote, because C has outdated log.
> 19. B sends an actual vote reply voting for itself, because it
> already voted on term X+1 for itself at step 5.
> 20. C, as a leader, ignores the extra vote from B.
> 21. C sends append requests to A and B with an outdated log.
> 22. A and B acknowledge a new leader.
> 23. A and B attempt to truncate their logs below the commit index.
> 24. A and B crash on assertion failure and can't recover, because
> the illegal truncation is part of their logs now.
>
> In this situation it may also be possible to have two leaders
> elected at the same time, in case A or B elect themselves before
> C sends the first append request, as they never voted for C, so
> can vote for each other.
>
> Either one of the fixes above breaks the scenario. With them, B
> wouldn't send a pre-vote on a mismatching term and C wouldn't treat
> it as an actual vote.
>
> Adding a test that reproduces it, to have a better coverage, as we
> thought that pre-vote replies during the actual vote should not be
> possible. They still should not, but only since we got the other
> fix in place.
>
> Additionally, step 10 can also be improved in the future by actually
> comparing the log length on a pre-vote (not in this patch).
>
> The described scenario actually happened in the ovn-kubernetes CI,
> as it used an older OVS 3.4.1 Fedora package that doesn't have the
> aforementioned fixes.
>
> Signed-off-by: Ilya Maximets <[email protected]>
Thanks for the patch, Ilya. The changes make sense to me according to your
description.
Acked-by: Eelco Chaudron <[email protected]>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev