On 5/25/22 02:11, Han Zhou wrote:
> 
> 
> On Tue, May 24, 2022 at 1:18 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
>>
>> While becoming a follower, the leader aborts all the current
>> 'in-flight' commands, so the higher layers can re-try corresponding
>> transactions when the new leader is elected.  However, most of these
>> commands are already sent to followers as append requests, hence they
>> will actually be committed by the majority of the cluster members,
>> i.e. will be treated as committed by the new leader, unless there is
>> an actual network problem between servers.  However, the old leader
>> will decline append replies, since it's not the leader anymore and
>> commands are already completed with RAFT_CMD_LOST_LEADERSHIP status.
>>
>> New leader will replicate the commit index back to the old leader.
>> Old leader will re-try the previously "failed" transaction, because
>> "cluster error"s are temporary.
>>
>> If a transaction had some prerequisites that didn't allow double
>> committing or there are other database constraints (like indexes) that
>> will not allow a transaction to be committed twice, the server will
>> reply to the client with a false-negative transaction result.
>>
>> If there are no prerequisites or additional database constraints,
>> the server will execute the same transaction again, but as a follower.
>>
>> E.g. in the OVN case, this may result in creation of duplicated logical
>> switches / routers / load balancers.  I.e. resources with the same
>> non-indexed name.  That may cause issues later where ovn-nbctl will
>> not be able to add ports to these switches / routers.
>>
>> Suggested solution is to not complete (abort) the commands, but allow
>> them to be completed with the commit index update from a new leader.
>> It the similar behavior to what we do in order to complete commands
>> in a backward scenario when the follower becomes a leader.  That
>> scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated
>> transaction execution when leader failover.").
>>
>> Code paths for leader and follower inside the raft_update_commit_index
>> were very similar, so they were refactored into one, since we also
>> needed an ability to complete more than one command for a follower.
>>
>> Failure test added to exercise scenario of a leadership transfer.
>>
>> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
>> databases.")
>> Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.")
>> Reported-at: https://bugzilla.redhat.com/2046340 
>> <https://bugzilla.redhat.com/2046340>
>> Signed-off-by: Ilya Maximets <[email protected] <mailto:[email protected]>>
>> ---
>>
>> Version 2:
>>   - Updated the commit mesage clarifying the word 'completed'.  [Han]
>>   - Dropped the log reset part since it's not really necessary
>>     and a bit misleading.  [Han]
>>
> Thanks Ilya!
> 
> Acked-by: Han Zhou <[email protected] <mailto:[email protected]>>

Thanks for review!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to