On 5/5/22 14:24, Ilya Maximets wrote:
> While becoming a follower, the leader completes all the current
> 'in-flight' commands, so the higher layers can re-try corresponding
> transactions when the new leader is elected.  However, most of these
> commands are already sent to followers as append requests, hence they
> will actually be committed by the majority of the cluster members,
> i.e. will be treated as committed by the new leader, unless there is
> an actual network problem between servers.  However, the old leader
> will decline append replies, since it's not the leader anymore and
> commands are already completed with RAFT_CMD_LOST_LEADERSHIP status.
> 
> New leader will replicate the commit index back to the old leader.
> Old leader will re-try the previously "failed" transaction, because
> "cluster error"s are temporary.
> 
> If a transaction had some prerequisites that didn't allow double
> committing or there are other database constraints (like indexes) that
> will not allow a transaction to be committed twice, the server will
> reply to the client with a false-negative transaction result.
> 
> If there are no prerequisites or additional database constraints,
> the server will execute the same transaction again, but as a follower.
> 
> E.g. in the OVN case, this may result in creation of duplicated logical
> switches / routers / load balancers.  I.e. resources with the same
> non-indexed name.  That may cause issues later where ovn-nbctl will
> not be able to add ports to these switches / routers.
> 
> Suggested solution is to not complete the commands, but allow them to
> be completed with the commit index update from a new leader.
> It the similar behavior to what we do in order to complete commands
> in a backward scenario when the follower becomes a leader.  That
> scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated
> transaction execution when leader failover.").
> 
> Additionally, truncating the current log down to commit index in order
> to avoid possible log inconsistencies with a new leader.  Followers
> should not have uncommitted entries in the log.  New leader will send
> them back, if they were actually committed.
> 
> Code paths for leader and follower inside the raft_update_commit_index
> were very similar, so they were refactored into one, since we also
> needed an ability to complete more than one command for a follower.
> 
> Failure test added to exercise scenario of a leadership transfer.
> 
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
> databases.")
> Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.")
> Reported-at: https://bugzilla.redhat.com/2046340
> Signed-off-by: Ilya Maximets <[email protected]>
> ---

Hi Ilya,

This change looks OK to me.  However, I think it's best to wait for
Han's review too before applying this.

Acked-by: Dumitru Ceara <[email protected]>

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to