On Mon, Mar 04, 2019 at 01:53:38PM -0800, Han Zhou wrote: > On Mon, Mar 4, 2019 at 1:31 PM Ben Pfaff <[email protected]> wrote: > > > > On Fri, Mar 01, 2019 at 10:56:37AM -0800, Han Zhou wrote: > > > From: Han Zhou <[email protected]> > > > > > > In current OVSDB Raft design, when there are multiple transactions > > > pending, either from same server node or different nodes in the > > > cluster, only the first one can be successful at once, and following > > > ones will fail at the prerequisite check on leader node, because > > > the first one will update the expected prerequisite eid on leader > > > node, and the prerequisite used for proposing a commit has to be > > > committed eid, so it is not possible for a node to use the latest > > > prerequisite expected by the leader to propose a commit until the > > > lastest transaction is committed by the leader and updated the > > > committed_index on the node. > > > > > > Current implementation proposes the commit as soon as the transaction > > > is requested by the client, which results in continously retry which > > > causes high CPU load and waste. > > > > > > Particularly, even if all clients are using leader_only to connect to > > > only the leader, the prereq check failure still happens a lot when > > > a batch of transactions are pending on the leader node - the leader > > > node proposes a batch of commits using the same committed eid as > > > prerequisite and it updates the expected prereq as soon as the first > > > one is in progress, but it needs time to append to followers and wait > > > until majority replies to update the committed_index, which results in > > > continously useless retries of the following transactions proposed by > > > the leader itself. > > > > > > This patch doesn't change the design but simplely pre-checks if current > > > eid is same as prereq, before proposing the commit, to avoid waste of > > > CPU cycles, for both leader and followers. When clients use leader_only > > > mode, this patch completely eliminates the prereq check failures. > > > > > > In scale test of OVN with 1k HVs and creating and binding 10k lports, > > > the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost > > > reduction on followers. (The test was with leader election base time > > > set to 10000ms, because otherwise the test couldn't complete because > > > of the frequent leader re-election.) > > > > > > This is just one of the related performance problems of the prereq > > > checking mechanism dicussed at: > > > > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html > > > Signed-off-by: Han Zhou <[email protected]> > > > > I *think* that this patch is going to be unreliable. It appears to me > > that what it does is wait until the current eid presented by the raft > > storage is the one that we want. But I don't think it's guaranteed that > > that will ever happen. What if we lose the raft connection, reconnect, > > and skip past that particular eid? I think in that kind of a case we'd > > keep the trigger around forever and never discard it. > > The function ovsdb_txn_precheck_prereq() compares the db->prereq with > the current eid from raft storage. Both values can change from > iteration to iteration. If raft reconnected and skipped the previous > eid, it shouldn't matter because the function checks the new prereq > against the new *current* eid. > > In fact, prereq is last applied entry, so at least some day it should > catch up with the current eid, unless there are always new changes > appended to the log before the current node catch up. In that case > even without this change, the current node cannot propose any commit > sucessfully because it will encounter a prereq check failure. This > commit just avoid the waste of CPU and bandwidth in that same > situation - when current node cannot catch up with the latest appended > entry.
After reading more code, and your explanation, I now understand. I applied this to master. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
