Re: [ovs-dev] [ovs-dev, v4] ovsdb raft: Sync commit index to followers without delay.

Ben Pfaff Wed, 20 Mar 2019 11:10:16 -0700

On Wed, Mar 20, 2019 at 10:44:55AM -0700, Han Zhou wrote:
> On Wed, Mar 20, 2019 at 9:56 AM Ben Pfaff <[email protected]> wrote:
> >
> > On Wed, Mar 20, 2019 at 08:28:50AM -0700, Han Zhou wrote:
> > > On Wed, Mar 20, 2019 at 5:02 AM Ilya Maximets <[email protected]> 
> > > wrote:
> > > >
> > > > On 20.03.2019 8:56, Han Zhou wrote:
> > > > > From: Han Zhou <[email protected]>
> > > > >
> > > > > When update is requested from follower, the leader sends AppendRequest
> > > > > to all followers and wait until AppendReply received from majority, 
> > > > > and
> > > > > then it will update commit index - the new entry is regarded as 
> > > > > committed
> > > > > in raft log. However, this commit will not be notified to followers
> > > > > (including the one initiated the request) until next heartbeat (ping
> > > > > timeout), if no other pending requests. This results in long latency
> > > > > for updates made through followers, especially when a batch of updates
> > > > > are requested through the same follower.
> > > > >
> > > > > $ time for i in `seq 1 100`; do ovn-nbctl ls-add ls$i; done
> > > > >
> > > > > real    0m34.154s
> > > > > user    0m0.083s
> > > > > sys 0m0.250s
> > > > >
> > > > > This patch solves the problem by sending heartbeat as soon as the 
> > > > > commit
> > > > > index is updated in leader. It also avoids unnessary heartbeat by 
> > > > > resetting
> > > > > the ping timer whenever AppendRequest is broadcasted. With this patch
> > > > > the performance is improved more than 50 times in same test:
> > > > >
> > > > > $ time for i in `seq 1 100`; do ovn-nbctl ls-add ls$i; done
> > > > >
> > > > > real    0m0.564s
> > > > > user    0m0.080s
> > > > > sys 0m0.199s
> > > > >
> > > > > Torture test cases are also updated because otherwise the tests will
> > > > > all be skipped because of the improved performance.
> > > > >
> > > > > Signed-off-by: Han Zhou <[email protected]>
> > > > > ---
> > > > >
> > > > > Notes:
> > > > >     v3->v4: Update torture tests again. Instead of sleeping, the size 
> > > > > of
> > > > >     transaction of each client is increased to slow down the 
> > > > > execution so that the
> > > > >     chance of parallel executions are not reduced.
> > > > >
> > > >
> > > > Unfortunately, this patch fails all the testsuite runs on TravisCI:
> > > >
> > > >   https://travis-ci.org/ovsrobot/ovs/builds/508777615
> > > >
> > > > And some on CirrusCI too:
> > > >
> > > >   https://cirrus-ci.com/build/5201766546145280
> > > >
> > > > Best regards, Ilya Maximets.
> > > >
> > >
> > > Does the CI retry failed tests? The failed ones are some of the
> > > torture tests in ovsdb-cluster.at, which was discussed here:
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2019-March/357373.html
> > >
> > > Basically, the failures are real bugs that are not caused by this
> > > patch code itself, but triggered by the test case change in this
> > > patch.
> > >
> > > The test cases are improved in this patch so that can now find the bug
> > > that was not found before. To avoid CI failure, we can either merge V3
> > > (the tests were less effective), or wait until the bug is fixed.
> >
> > Both of these do retry failed tests.  You can see the details from the
> > logs at the URLs that Ilya cited.
> 
> Yes, checking the log again, I saw the failed torture tests are
> retried once, and some of them failed again when retrying, which make
> me more confident for the effectiveness of the updated test cases. I
> may be distracted today but I will continue debugging tomorrow. I am
> pretty confident that the bug is not related to the current patch,
> because it is easy to reproduce the failures such as test 2528 and
> 2533 with current master applying only the torture test case change.


By the way, I totally support this effort and I'm really looking forward
to applying the fixes when we figure out how to make the tests both
effective and pass in the normal case.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, v4] ovsdb raft: Sync commit index to followers without delay.

Reply via email to