On Fri, May 14, 2021 at 7:05 AM Ilya Maximets <[email protected]> wrote:
>
> On 5/10/21 3:16 PM, Dumitru Ceara wrote:
> > On 5/6/21 2:47 PM, Ilya Maximets wrote:
> >
> > [...]
> >
> > Oops, I meant to send this along with the acked-by:
> >
> >>
> >> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> >> index d2ff643b2..658a016b2 100644
> >> --- a/ovsdb/raft.c
> >> +++ b/ovsdb/raft.c
> >> @@ -4160,9 +4160,24 @@ raft_may_snapshot(const struct raft *raft)
> >>              && !raft->leaving
> >>              && !raft->left
> >>              && !raft->failed
> >> +            && raft->role != RAFT_LEADER
> >>              && raft->last_applied >= raft->log_start);
> >>  }
> >>
> >> +/* Prepares for soon snapshotting.  */
> >
> > Nit: one space too many.  And I'd probably rephrase this to "Prepare for
> > snapshotting soon.", but I'm not a native speaker, I'm not sure if
> > that's better.
> >
>
> Thanks!  I'm not sure if this variant is better and I wanted to keep
> comments uniform with other functions.  So, I removed the extra space,
> but kept the text as is.
>
> Applied to master and backported down to 2.13 as this change is important
> for large scale setups.
>
> Best regards, Ilya Maximets.

Hi Ilya, thanks for fixing the problem and sorry for my late response. This
approach looks reasonable and the code looks good, but I still have some
concerns not sure if you had already thought about or tested.

This patch increases the chance that a leader is responsive and so the
cluster can process new transactions. (I suppose the leader transfer
process completes immediately, but I didn't test.)

However, it also increases the frequency of leader elections - every
snapshot on leader node would trigger a leader election. We set the leader
election timer big so that in large scale deployment leader election is not
triggered unnecessarily, because it introduces churns in the cluster. For
example, ovn-northd (and probably a lot of CMS writable client workers that
use lead-only option) would need to reconnect to the new leader. Also,
ongoing transactions (including those initiated by followers) would all get
disturbed and repropagated through the new leader (not sure if there can be
failures in untested corner cases). If the snapshot time is not beyond the
requirement of the response time in a production deployment, I'd rather let
the leader keep its role and finish the snapshot instead of introducing
these churns (and of course setting the election timer long enough for the
snapshot to complete). If the snapshot time is beyond the requirement,
transferring the leader role sounds reasonable, but not really sure if the
frequent leader election in a large scale deployment would increase the
chance of hitting problems of corner cases given that we don't have enough
test cases for raft leader transfer and its impact to the ongoing
transactions.

There may be even more complex scenarios that need to be considered if we
do this. What if other nodes are doing snapshot at the same time or at very
close time? Since all the nodes' logs are synced it is likely that they
would do snapshot close to each other. Suppose the leader is trying to do
it first and then one round of snapshot may trigger the leader transfer 3
times (in a 3-node cluster). If there is already a follower node doing
snapshot in a 3-node cluster, then transfering the leader would not help
the response time because in the next couple of seconds only the new leader
is not doing snapshot but any transaction needs at least one of the
followers to replicate the change and respond to the leader. In addition,
is there any chance the install_snapshot procedure kicks in and what would
be the impact?

These are just my random thoughts and maybe some of them aren't real
problems at all, but I just want to put them here and see if they bring any
cautions or are already ruled out.

Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to