On Mon, Sep 12, 2022 at 9:57 AM Numan Siddique <[email protected]> wrote:
>
> On Fri, Sep 9, 2022 at 2:25 AM Han Zhou <[email protected]> wrote:
> >
> > On Thu, Sep 8, 2022 at 8:46 AM Ilya Maximets <[email protected]> wrote:
> > >
> > > northd doesn't process changes incrementally, so it makes sense to
> > > accumulate more database updates and process them in bulk, so we can
> > > cover everything in a single recompute.
> > >
> > > ovsdb-server has a mechanism to start accumulating changes if the
> > > client doesn't receive them fast enough, but it relies on the receive
> > > buffer size, which is a few hundreds of KB on a typical system.
> > > Unfortunately, that is enough to queue up several hundreds of small
> > > updates, and it takes northd a lot of time to process them if poll
> > > intervals are large, receiving at most 50 messages on each iteration
> > > (half of which are updates for a _Server database).
> > >
> > > Calling ovsdb_idl_run() as long as something changes.  This allows to
> > > quickly process large bursts of database updates.  For example, it
> > > takes only 30-40 seconds for 'ovn-nbctl --wait=hv sync' to finish on
> > > a 500-node cluster after the startup phase of the density-heavy
> > > ovn-heater test, instead of 6-8 minutes without this change.
> > >
> > > 500 ms seems like a reasonable hard limit to avoid spinning for too
> > > long if the database is changed constantly at a fast pace.
> > >
> > > Very long polling is also logged at INFO level to notify users.
> > > Not using WARN or higher because it may happen under normal
conditions,
> > > e.g. on the initial connection to a large database or another type
> > > of a single large update.  Other notable polling attempts are logged
> > > at debug level.
> > >
> > > Signed-off-by: Ilya Maximets <[email protected]>
> > > ---
> > >  northd/ovn-northd.c | 38 ++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 34 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > index bd35802ed..96f17f15f 100644
> > > --- a/northd/ovn-northd.c
> > > +++ b/northd/ovn-northd.c
> > > @@ -685,6 +685,36 @@ get_probe_interval(const char *db, const struct
> > nbrec_nb_global *nb)
> > >      return interval;
> > >  }
> > >
> > > +static struct ovsdb_idl_txn *
> > > +run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
> > > +{
> > > +    unsigned long long duration, start = time_msec();
> > > +    unsigned int seqno = UINT_MAX;
> > > +    struct ovsdb_idl_txn *txn;
> > > +    int n = 0;
> > > +
> > > +    /* Accumulate database changes as long as there are some,
> > > +     * but no longer than half a second. */
> > > +    while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
> > > +           && time_msec() - start < 500) {
> > > +        seqno = ovsdb_idl_get_seqno(idl_loop->idl);
> > > +        ovsdb_idl_run(idl_loop->idl);
> > > +        n++;
> > > +    }
> > > +
> > > +    txn = ovsdb_idl_loop_run(idl_loop);
> > > +
> > > +    duration = time_msec() - start;
> > > +    /* ovsdb_idl_run() is called at least 2 times.  Once directly and
> > > +     * once in the ovsdb_idl_loop_run().  n > 2 means that we
received
> > > +     * data on at least 2 subsequent calls. */
> > > +    if (n > 2 || duration > 100) {
> > > +        VLOG(duration > 500 ? VLL_INFO : VLL_DBG,
> > > +             "%s IDL run: %d iterations in %lld ms", name, n + 1,
> > duration);
> > > +    }
> > > +    return txn;
> > > +}
> > > +
> > >  int
> > >  main(int argc, char *argv[])
> > >  {
> > > @@ -821,8 +851,8 @@ main(int argc, char *argv[])
> > >                  ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
> > >              }
> > >
> > > -            struct ovsdb_idl_txn *ovnnb_txn =
> > > -                        ovsdb_idl_loop_run(&ovnnb_idl_loop);
> > > +            struct ovsdb_idl_txn *ovnnb_txn =
> > run_idl_loop(&ovnnb_idl_loop,
> > > +
> > "OVN_Northbound");
> > >              unsigned int new_ovnnb_cond_seqno =
> > >
> >  ovsdb_idl_get_condition_seqno(ovnnb_idl_loop.idl);
> > >              if (new_ovnnb_cond_seqno != ovnnb_cond_seqno) {
> > > @@ -833,8 +863,8 @@ main(int argc, char *argv[])
> > >                  ovnnb_cond_seqno = new_ovnnb_cond_seqno;
> > >              }
> > >
> > > -            struct ovsdb_idl_txn *ovnsb_txn =
> > > -                        ovsdb_idl_loop_run(&ovnsb_idl_loop);
> > > +            struct ovsdb_idl_txn *ovnsb_txn =
> > run_idl_loop(&ovnsb_idl_loop,
> > > +
> > "OVN_Southbound");
> > >              unsigned int new_ovnsb_cond_seqno =
> > >
> >  ovsdb_idl_get_condition_seqno(ovnsb_idl_loop.idl);
> > >              if (new_ovnsb_cond_seqno != ovnsb_cond_seqno) {
> > > --
> > > 2.34.3
> > >
> >
> > Thanks Ilya for the great improvement! Applied to main.
> > It is not a bug fix but it seems a good candidate for backporting to
22.09
> > given that the change is small and we haven't released yet. @Mark
Michelson
> > <[email protected]> @Numan Siddique <[email protected]> what do you
think?
>
> +1 from me.
>
> Thanks
> Numan
>
Thanks Numan. I backported it to 22.09.

Han

> >
> > Han
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to