On Mon, Mar 04, 2019 at 01:53:38PM -0800, Han Zhou wrote:
> On Mon, Mar 4, 2019 at 1:31 PM Ben Pfaff <[email protected]> wrote:
> >
> > On Fri, Mar 01, 2019 at 10:56:37AM -0800, Han Zhou wrote:
> > > From: Han Zhou <[email protected]>
> > >
> > > In current OVSDB Raft design, when there are multiple transactions
> > > pending, either from same server node or different nodes in the
> > > cluster, only the first one can be successful at once, and following
> > > ones will fail at the prerequisite check on leader node, because
> > > the first one will update the expected prerequisite eid on leader
> > > node, and the prerequisite used for proposing a commit has to be
> > > committed eid, so it is not possible for a node to use the latest
> > > prerequisite expected by the leader to propose a commit until the
> > > lastest transaction is committed by the leader and updated the
> > > committed_index on the node.
> > >
> > > Current implementation proposes the commit as soon as the transaction
> > > is requested by the client, which results in continously retry which
> > > causes high CPU load and waste.
> > >
> > > Particularly, even if all clients are using leader_only to connect to
> > > only the leader, the prereq check failure still happens a lot when
> > > a batch of transactions are pending on the leader node - the leader
> > > node proposes a batch of commits using the same committed eid as
> > > prerequisite and it updates the expected prereq as soon as the first
> > > one is in progress, but it needs time to append to followers and wait
> > > until majority replies to update the committed_index, which results in
> > > continously useless retries of the following transactions proposed by
> > > the leader itself.
> > >
> > > This patch doesn't change the design but simplely pre-checks if current
> > > eid is same as prereq, before proposing the commit, to avoid waste of
> > > CPU cycles, for both leader and followers. When clients use leader_only
> > > mode, this patch completely eliminates the prereq check failures.
> > >
> > > In scale test of OVN with 1k HVs and creating and binding 10k lports,
> > > the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost
> > > reduction on followers. (The test was with leader election base time
> > > set to 10000ms, because otherwise the test couldn't complete because
> > > of the frequent leader re-election.)
> > >
> > > This is just one of the related performance problems of the prereq
> > > checking mechanism dicussed at:
> > >
> > > https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html
> > > Signed-off-by: Han Zhou <[email protected]>
> >
> > I *think* that this patch is going to be unreliable.  It appears to me
> > that what it does is wait until the current eid presented by the raft
> > storage is the one that we want.  But I don't think it's guaranteed that
> > that will ever happen.  What if we lose the raft connection, reconnect,
> > and skip past that particular eid?  I think in that kind of a case we'd
> > keep the trigger around forever and never discard it.
> 
> The function ovsdb_txn_precheck_prereq() compares the db->prereq with
> the current eid from raft storage. Both values can change from
> iteration to iteration. If raft reconnected and skipped the previous
> eid, it shouldn't matter because the function checks the new prereq
> against the new *current* eid.
> 
> In fact, prereq is last applied entry, so at least some day it should
> catch up with the current eid, unless there are always new changes
> appended to the log before the current node catch up. In that case
> even without this change, the current node cannot propose any commit
> sucessfully because it will encounter a prereq check failure. This
> commit just avoid the waste of CPU and bandwidth in that same
> situation - when current node cannot catch up with the latest appended
> entry.

After reading more code, and your explanation, I now understand.

I applied this to master.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to