On Mon, Mar 18, 2019 at 2:49 PM Ben Pfaff <[email protected]> wrote: > > On Fri, Mar 15, 2019 at 04:17:35PM -0700, Han Zhou wrote: > > From: Han Zhou <[email protected]> > > > > When update is requested from follower, the leader sends AppendRequest > > to all followers and wait until AppendReply received from majority, and > > then it will update commit index - the new entry is regarded as committed > > in raft log. However, this commit will not be notified to followers > > (including the one initiated the request) until next heartbeat (ping > > timeout), if no other pending requests. This results in long latency > > for updates made through followers, especially when a batch of updates > > are requested through the same follower. > > > > $ time for i in `seq 1 100`; do ovn-nbctl ls-add ls$i; done > > > > real 0m34.154s > > user 0m0.083s > > sys 0m0.250s > > > > This patch solves the problem by sending heartbeat as soon as the commit > > index is updated in leader. It also avoids unnessary heartbeat by resetting > > the ping timer whenever AppendRequest is broadcasted. With this patch > > the performance is improved more than 50 times in same test: > > > > $ time for i in `seq 1 100`; do ovn-nbctl ls-add ls$i; done > > > > real 0m0.564s > > user 0m0.080s > > sys 0m0.199s > > > > Some sleep is added in torture test cases because of the improved > > performance, otherwise the tests will all be skipped. > > > > Signed-off-by: Han Zhou <[email protected]> > > --- > > > > Notes: > > v1->v2: adjust torture test case so that it passes without overload CPU. > > With this patch, on my laptop, running test 2525 seems to always skip > it, with results similar to the following: > > ## ------------------------------- ## > ## openvswitch 2.11.90 test suite. ## > ## ------------------------------- ## > 2525: OVSDB 3-server torture test - kill/restart leader skipped > (ovsdb-cluster.at:198) > > ## ------------- ## > ## Test results. ## > ## ------------- ## > > 0 tests were successful. > 1 test was skipped. > make[3]: Leaving directory '/home/blp/nicira/ovs/_build' > make[2]: Leaving directory '/home/blp/nicira/ovs/_build' > make[1]: Leaving directory '/home/blp/nicira/ovs/_build' > > real 0m9.194s > user 0m3.693s > sys 0m1.658s > blp@sigill:~/nicira/ovs/_build(0)$
Sorry to hear :(. It was pretty stable on my laptop - maybe my laptop is slower than yours :). I just sent V3 to make the test case more stable. I reduced the interval of the checking loop so that it can detect phase changes and trigger the operations asap. I ran all torture tests with -j1, -j5 and -j10. All cases passed without skipping. I hope it is stable on your laptop, too. Could you try again? Thanks, Han _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
