On 5/25/22 02:11, Han Zhou wrote: > > > On Tue, May 24, 2022 at 1:18 PM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: >> >> While becoming a follower, the leader aborts all the current >> 'in-flight' commands, so the higher layers can re-try corresponding >> transactions when the new leader is elected. However, most of these >> commands are already sent to followers as append requests, hence they >> will actually be committed by the majority of the cluster members, >> i.e. will be treated as committed by the new leader, unless there is >> an actual network problem between servers. However, the old leader >> will decline append replies, since it's not the leader anymore and >> commands are already completed with RAFT_CMD_LOST_LEADERSHIP status. >> >> New leader will replicate the commit index back to the old leader. >> Old leader will re-try the previously "failed" transaction, because >> "cluster error"s are temporary. >> >> If a transaction had some prerequisites that didn't allow double >> committing or there are other database constraints (like indexes) that >> will not allow a transaction to be committed twice, the server will >> reply to the client with a false-negative transaction result. >> >> If there are no prerequisites or additional database constraints, >> the server will execute the same transaction again, but as a follower. >> >> E.g. in the OVN case, this may result in creation of duplicated logical >> switches / routers / load balancers. I.e. resources with the same >> non-indexed name. That may cause issues later where ovn-nbctl will >> not be able to add ports to these switches / routers. >> >> Suggested solution is to not complete (abort) the commands, but allow >> them to be completed with the commit index update from a new leader. >> It the similar behavior to what we do in order to complete commands >> in a backward scenario when the follower becomes a leader. That >> scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated >> transaction execution when leader failover."). >> >> Code paths for leader and follower inside the raft_update_commit_index >> were very similar, so they were refactored into one, since we also >> needed an ability to complete more than one command for a follower. >> >> Failure test added to exercise scenario of a leadership transfer. >> >> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered >> databases.") >> Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.") >> Reported-at: https://bugzilla.redhat.com/2046340 >> <https://bugzilla.redhat.com/2046340> >> Signed-off-by: Ilya Maximets <[email protected] <mailto:[email protected]>> >> --- >> >> Version 2: >> - Updated the commit mesage clarifying the word 'completed'. [Han] >> - Dropped the log reset part since it's not really necessary >> and a bit misleading. [Han] >> > Thanks Ilya! > > Acked-by: Han Zhou <[email protected] <mailto:[email protected]>>
Thanks for review! Applied and backported down to 2.13. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
