On 5/23/22 01:58, Han Zhou wrote:
> 
> 
> Thanks Ilya for addressing this problem! Please see some comments below:
> On Thu, May 5, 2022 at 5:24 AM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
>>
>> While becoming a follower, the leader completes all the current
> nit: The word "completes" here may be a little confusing. We are talking 
> about complete with error instead of SUCCESS, so better to say "cancels" or 
> "aborts".
> 
>> 'in-flight' commands, so the higher layers can re-try corresponding
>> transactions when the new leader is elected.  However, most of these
>> commands are already sent to followers as append requests, hence they
>> will actually be committed by the majority of the cluster members,
>> i.e. will be treated as committed by the new leader, unless there is
>> an actual network problem between servers.  However, the old leader
>> will decline append replies, since it's not the leader anymore and
>> commands are already completed with RAFT_CMD_LOST_LEADERSHIP status.
>>
>> New leader will replicate the commit index back to the old leader.
>> Old leader will re-try the previously "failed" transaction, because
>> "cluster error"s are temporary.
>>
>> If a transaction had some prerequisites that didn't allow double
>> committing or there are other database constraints (like indexes) that
>> will not allow a transaction to be committed twice, the server will
>> reply to the client with a false-negative transaction result.
>>
>> If there are no prerequisites or additional database constraints,
>> the server will execute the same transaction again, but as a follower.
>>
>> E.g. in the OVN case, this may result in creation of duplicated logical
>> switches / routers / load balancers.  I.e. resources with the same
>> non-indexed name.  That may cause issues later where ovn-nbctl will
>> not be able to add ports to these switches / routers.
>>
>> Suggested solution is to not complete the commands, but allow them to
>> be completed with the commit index update from a new leader.
>> It the similar behavior to what we do in order to complete commands
>> in a backward scenario when the follower becomes a leader.  That
>> scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated
>> transaction execution when leader failover.").
>>
>> Additionally, truncating the current log down to commit index in order
>> to avoid possible log inconsistencies with a new leader.  Followers
>> should not have uncommitted entries in the log.  New leader will send
>> them back, if they were actually committed.
> 
> This is something I am a little concerned about. It may still be correct but 
> I feel it is unnecessary and may be less efficient.
> - If the old leader has sent append requests before it stepped down, the new 
> leader and the old leader would both have the entry, so it will be committed 
> when the new leader finishes the first round of heart beat and the command 
> would be completed. If we truncate the entry from the old leader, it would 
> take another round trip from the new leader to the old leader to append the 
> entry (when the new leader notices that the old leader's log is behind from 
> an append-reply from the old leader).
> - If the old leader hasn't sent append requests to any servers for the 
> command before stepping down, but has the entry in it's raft log, then there 
> will be a conflict detected at the old leader when receiving append-request 
> from the new leader, and in raft_handle_append_entries() it should resolve 
> the conflict by discarding its own entry - so, it is not harmful to have 
> uncommitted entries in the follower (old leader)'s log.
> 
> Did you observe any problem during testing if not truncating the log?

No.  I didn't.  In fact, I didn't have this part in the first
version of the fix, but added later because it seemed logical
to me to not have uncommitted entries in the log.  More safe,
I guess.

But I think you're right.  And actually, thinking more about
this, it's the same condition as if follower received some
entries from the leader, which are not yet fully committed, i.e.:

1. Leader sends uncommitted entries to a follower.
2. Follower adds them to the log.
3. Leadership change happens before entries are written by other
   followers, so not yet committed.
4. New leader without these entries commits something else.

This should be a fully legit scenario where uncommitted entries
are in the follower's log.  And, yes, this follower will discard
them once the update received from the new leader.

I'll drop this part of the patch and send v2.  Thanks!

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to