On 3/27/23 21:43, Ilya Maximets wrote:
> Change sets in OVSDB monitor are storing all the changes that happened
> between a particular transaction ID and now. Initial change set
> basically contains all the data.
>
> On each monitor request a new initial change set is created by creating
> an empty change set and adding all the database rows. Then it is
> converted into JSON reply and immediately untracked and destroyed.
>
> This is causing significant performance issues if many clients are
> requesting new monitors at the same time. For example, that is
> happening after database schema conversion, because conversion triggers
> cancellation of all monitors. After cancellation, every client sends
> a new monitor request. The server then creates a new initial change
> set, sends a reply, destroys initial change set and repeats that for
> each client. On a system with 200 MB database and 500 clients,
> cluster of 3 servers spends 20 minutes replying to all the clients
> (200 MB x 500 = 100 GB):
>
> timeval|WARN|Unreasonably long 1201525ms poll interval
>
> Of course, all the clients are already disconnected due to inactivity
> at this point. When they are re-connecting back, server accepts new
> connections one at a time, so inactivity probes will not be triggered
> anymore, but it still takes another 20 minutes to handle all the
> incoming connections.
>
> Let's keep the initial change set around for as long as the monitor
> itself exists. This will allow us to not construct a new change set
> on each new monitor request and even utilize the JSON cache in some
> cases. All that at a relatively small maintenance cost, since we'll
> need to commit changes to one extra change set on every transaction.
> Measured memory usage increase due to keeping around a shallow copy
> of a database is about 10%. Measured CPU usage difference during
> normal operation is negligible.
I'm sure I'm missing something but I'm seeing this when triggering a
schema conversion in an OVN sandbox (clustered NB):
parse_txn()
-> update_schema()
-> ovsdb_replace()
-> ovsdb_monitor_prereplace_db()
// Here we iterate and delete each jsonrpc_monitor
// associated with the db.
// Deleting the last one will trigger the monitor to be
// deleted too:
-> ovsdb_jsonrpc_monitor_destroy()
-> ovsdb_monitor_remove_jsonrpc_monitor()
-> ovsdb_monitor_destroy()
At this point the initial change set gets destroyed too. And, AFAICT,
this is before the clients reconnected. This confuses me, I guess; how
come your change works? :)
>
> With this change it takes only 30 seconds to send out all the monitor
> replies in the example above. So, it's a 40x performance improvement.
> On a more reasonable setup with 250 nodes, the process takes up to
> 8-10 seconds instead of 4-5 minutes.
>
> Conditional monitoring will benefit from this change as well, however
> results might be less impressive due to lack of JSON cache.
>
> Signed-off-by: Ilya Maximets <[email protected]>
> ---
> ovsdb/monitor.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/ovsdb/monitor.c b/ovsdb/monitor.c
> index 191befcae..3cdd03b20 100644
> --- a/ovsdb/monitor.c
> +++ b/ovsdb/monitor.c
> @@ -609,7 +609,10 @@ ovsdb_monitor_untrack_change_set(struct ovsdb_monitor
> *dbmon,
> ovs_assert(mcs);
> if (--mcs->n_refs == 0) {
> if (mcs == dbmon->init_change_set) {
> - dbmon->init_change_set = NULL;
> + /* The initial change set should exist as long as the
> + * monitor itself. */
> + mcs->n_refs++;
> + return;
> } else if (mcs == dbmon->new_change_set) {
> dbmon->new_change_set = NULL;
> }
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev