On Tue, Jun 6, 2023 at 3:08 PM Ilya Maximets <[email protected]> wrote:
>
> On 6/6/23 22:12, Han Zhou wrote:
> >
> >
> > On Tue, Jun 6, 2023 at 12:10 PM Ilya Maximets <[email protected]
<mailto:[email protected]>> wrote:
> >>
> >> On 3/27/23 21:43, Ilya Maximets wrote:
> >> > Change sets in OVSDB monitor are storing all the changes that
happened
> >> > between a particular transaction ID and now.  Initial change set
> >> > basically contains all the data.
> >> >
> >> > On each monitor request a new initial change set is created by
creating
> >> > an empty change set and adding all the database rows.  Then it is
> >> > converted into JSON reply and immediately untracked and destroyed.
> >> >
> >> > This is causing significant performance issues if many clients are
> >> > requesting new monitors at the same time.  For example, that is
> >> > happening after database schema conversion, because conversion
triggers
> >> > cancellation of all monitors.  After cancellation, every client sends
> >> > a new monitor request.  The server then creates a new initial change
> >> > set, sends a reply, destroys initial change set and repeats that for
> >> > each client.  On a system with 200 MB database and 500 clients,
> >> > cluster of 3 servers spends 20 minutes replying to all the clients
> >> > (200 MB x 500 = 100 GB):
> >> >
> >> >   timeval|WARN|Unreasonably long 1201525ms poll interval
> >> >
> >> > Of course, all the clients are already disconnected due to inactivity
> >> > at this point.  When they are re-connecting back, server accepts new
> >> > connections one at a time, so inactivity probes will not be triggered
> >> > anymore, but it still takes another 20 minutes to handle all the
> >> > incoming connections.
> >> >
> >> > Let's keep the initial change set around for as long as the monitor
> >> > itself exists.  This will allow us to not construct a new change set
> >> > on each new monitor request and even utilize the JSON cache in some
> >> > cases.  All that at a relatively small maintenance cost, since we'll
> >> > need to commit changes to one extra change set on every transaction.
> >> > Measured memory usage increase due to keeping around a shallow copy
> >> > of a database is about 10%.  Measured CPU usage difference during
> >> > normal operation is negligible.
> >> >
> >> > With this change it takes only 30 seconds to send out all the monitor
> >> > replies in the example above.  So, it's a 40x performance
improvement.
> >> > On a more reasonable setup with 250 nodes, the process takes up to
> >> > 8-10 seconds instead of 4-5 minutes.
> >> >
> >> > Conditional monitoring will benefit from this change as well, however
> >> > results might be less impressive due to lack of JSON cache.
> >> >
> >> > Signed-off-by: Ilya Maximets <[email protected] <mailto:
[email protected]>>
> >> > ---
> >> >  ovsdb/monitor.c | 5 ++++-
> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> >> > index 191befcae..3cdd03b20 100644
> >> > --- a/ovsdb/monitor.c
> >> > +++ b/ovsdb/monitor.c
> >> > @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct
ovsdb_monitor *dbmon,
> >> >      ovs_assert(mcs);
> >> >      if (--mcs->n_refs == 0) {
> >> >          if (mcs == dbmon->init_change_set) {
> >> > -            dbmon->init_change_set = NULL;
> >> > +            /* The initial change set should exist as long as the
> >> > +             * monitor itself. */
> >> > +            mcs->n_refs++;
> >> > +            return;
> >> >          } else if (mcs == dbmon->new_change_set) {
> >> >              dbmon->new_change_set = NULL;
> >> >          }
> >>
> >> Hi.  I'm thinking about backporting this one patch to 3.1.  It's our
latest
> >> stable version and it's starting to get used at scale.  Without the
change
> >> it's really hard to manage simultaneous re-connections in large (250+
nodes)
> >> OVN clusters.
> >>
> >> The change itself is fairly simple, so shouldn't cause any issues in
3.1.
> >> We're running all our scale tests with it since it was accepted without
> >> any issues.
> >>
> >> Backporting to 2.17 is probably not advisable, because 2.17 doesn't
have
> >> lazy-copy mechanism for database objects.
> >>
> >> Any thoughts on this?
> >>
> >> Best regards, Ilya Maximets.
> >
> > Sorry for missing this in the original code review. I think there is a
problem in this patch. (It was one of my TODOs years ago, and this is what
I found from my notes :))
> >
> > It may cause problems because clients monitoring the same tables and
columns could have different conditions, and if a new client comes with a
column in the condition that doesn't exist in the monitor. The new column
will be added to the monitor in ovsdb_monitor_condition_bind(). However,
the initial change set generated before that would not have such columns.
It would crash when evaluating conditions for the new client against the
initial set. The solution is to always include all columns for initial
change set if we want to keep it. Maybe we should have a test case for this.
>
>
> Hmm.  So, it's a case where new monitor has a condition on a column
> that it doesn't monitor.  Right?
>
> It should not be hard to write a test for this scenario.
>
> I guess, another solution will be to destroy initial change set whenever
> a new column is added to the monitor.  It should not really happen in
> the real world applications, as I don't think OVN, for example, is having
> conditions on columns that it doesn't monitor.  So, destruction will
> not be triggered in normal cases and we will not have any side effects
> like increased memory/CPU usage if we will add all the columns.
>
Probably having conditions on columns that it doesn't monitor is not
abnormal, although I don't recall immediately any such use cases in OVN.
However, I like your approach because when it comes to scale, a large
number of clients should have condition columns in common and the
destruction should happen in very rare cases, even if many of them have
conditions on columns that they don't monitor.

> Maybe something like this (not tested):
>
> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> index 04dcd2298..247f6f615 100644
> --- a/ovsdb/monitor.c
> +++ b/ovsdb/monitor.c
> @@ -478,6 +478,7 @@ ovsdb_monitor_add_column(struct ovsdb_monitor *dbmon,
>                           enum ovsdb_monitor_selection select,
>                           bool monitored)
>  {
> +    struct ovsdb_monitor_change_set *mcs;
>      struct ovsdb_monitor_table *mt;
>      struct ovsdb_monitor_column *c;
>
> @@ -488,6 +489,18 @@ ovsdb_monitor_add_column(struct ovsdb_monitor *dbmon,
>          return column->name;
>      }
>
> +    mcs = dbmon->init_change_set;
> +    if (mcs) {
> +        /* A new column is going to be added to the monitor.  Existing
> +         * initial change set doesn't have it, so can no longer be used.
> +         * Initial change set is never used by more than one session at
> +         * the same time, so it's safe to destroy it here. */
> +        ovs_assert(mcs->n_refs == 1);
> +        ovsdb_monitor_json_cache_destroy(dbmon, mcs);
> +        ovsdb_monitor_change_set_destroy(mcs);
> +        dbmon->init_change_set = NULL;
> +    }
> +
>      if (mt->n_columns >= mt->allocated_columns) {
>          mt->columns = x2nrealloc(mt->columns, &mt->allocated_columns,
>                                   sizeof *mt->columns);
> ---
>
> What do you think?
>

At a quick glance, it looks good to me.

Thanks,
Han

> >
> > With this addressed, I have no objection for backporting to 3.1.
> >
> > Thanks,
> > Han
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to