On 3/27/23 21:43, Ilya Maximets wrote:
> Change sets in OVSDB monitor are storing all the changes that happened
> between a particular transaction ID and now.  Initial change set
> basically contains all the data.
> 
> On each monitor request a new initial change set is created by creating
> an empty change set and adding all the database rows.  Then it is
> converted into JSON reply and immediately untracked and destroyed.
> 
> This is causing significant performance issues if many clients are
> requesting new monitors at the same time.  For example, that is
> happening after database schema conversion, because conversion triggers
> cancellation of all monitors.  After cancellation, every client sends
> a new monitor request.  The server then creates a new initial change
> set, sends a reply, destroys initial change set and repeats that for
> each client.  On a system with 200 MB database and 500 clients,
> cluster of 3 servers spends 20 minutes replying to all the clients
> (200 MB x 500 = 100 GB):
> 
>   timeval|WARN|Unreasonably long 1201525ms poll interval
> 
> Of course, all the clients are already disconnected due to inactivity
> at this point.  When they are re-connecting back, server accepts new
> connections one at a time, so inactivity probes will not be triggered
> anymore, but it still takes another 20 minutes to handle all the
> incoming connections.
> 
> Let's keep the initial change set around for as long as the monitor
> itself exists.  This will allow us to not construct a new change set
> on each new monitor request and even utilize the JSON cache in some
> cases.  All that at a relatively small maintenance cost, since we'll
> need to commit changes to one extra change set on every transaction.
> Measured memory usage increase due to keeping around a shallow copy
> of a database is about 10%.  Measured CPU usage difference during
> normal operation is negligible.

I'm sure I'm missing something but I'm seeing this when triggering a
schema conversion in an OVN sandbox (clustered NB):

parse_txn()
-> update_schema()
   -> ovsdb_replace()
      -> ovsdb_monitor_prereplace_db()
         // Here we iterate and delete each jsonrpc_monitor
         // associated with the db.
         // Deleting the last one will trigger the monitor to be
         // deleted too:
         -> ovsdb_jsonrpc_monitor_destroy()
            -> ovsdb_monitor_remove_jsonrpc_monitor()
               -> ovsdb_monitor_destroy()

At this point the initial change set gets destroyed too.  And, AFAICT,
this is before the clients reconnected.  This confuses me, I guess; how
come your change works? :)

> 
> With this change it takes only 30 seconds to send out all the monitor
> replies in the example above.  So, it's a 40x performance improvement.
> On a more reasonable setup with 250 nodes, the process takes up to
> 8-10 seconds instead of 4-5 minutes.
> 
> Conditional monitoring will benefit from this change as well, however
> results might be less impressive due to lack of JSON cache.
> 
> Signed-off-by: Ilya Maximets <[email protected]>
> ---
>  ovsdb/monitor.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> index 191befcae..3cdd03b20 100644
> --- a/ovsdb/monitor.c
> +++ b/ovsdb/monitor.c
> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct ovsdb_monitor 
> *dbmon,
>      ovs_assert(mcs);
>      if (--mcs->n_refs == 0) {
>          if (mcs == dbmon->init_change_set) {
> -            dbmon->init_change_set = NULL;
> +            /* The initial change set should exist as long as the
> +             * monitor itself. */
> +            mcs->n_refs++;
> +            return;
>          } else if (mcs == dbmon->new_change_set) {
>              dbmon->new_change_set = NULL;
>          }

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to