On 7/1/22 17:45, Ilya Maximets wrote: > On 7/1/22 17:31, Dumitru Ceara wrote: >> On 7/1/22 01:34, Ilya Maximets wrote: >>> Conversion of the database data into JSON object, serialization >>> and destruction of that object are the most heavy operations >>> during the database compaction. If these operations are moved >>> to a separate thread, the main thread can continue processing >>> database requests in the meantime. >>> >> >> Hi Ilya, >> >>> With this change, the compaction is split in 3 phases: >>> >>> 1. Initialization: >>> - Create a copy of the database. >>> - Remember current database index. >>> - Start a separate thread to convert a copy of the database >>> into serialized JSON object. >>> >>> 2. Wait: >>> - Continue normal operation until compaction thread is done. >>> - Meanwhile, compaction thread: >>> * Convert database copy to JSON. >>> * Serialize resulted JSON. >>> * Destroy original JSON object. >>> >>> 3. Finish: >>> - Destroy the database copy. >>> - Take the snapshot created by the thread. >>> - Write on disk. >>> >> >> The approach sounds good to me. >> >>> The key for this schema to be fast is the ability to create >>> a shallow copy of the database. This doesn't take too much >>> time allowing the thread to do most of work. >>> >>> Database copy is created and destroyed only by the main thread, >>> so there is no need for synchronization. >>> >>> Such solution allows to reduce the time main thread is blocked >>> by compaction by 80-90%. For example, in ovn-heater tests >>> with 120 node density-heavy scenario, where compaction normally >>> takes 5-6 seconds at the end of a test, measured compaction >>> times was all below 1 second with the change applied. Also, >>> note that these measured times are the sum of phases 1 and 3, >>> so actual poll intervals are about half a second in this case. >> >> Nice! >> >>> >>> Only implemented for raft storage for now. The implementation >>> for standalone databases can be added later by using a file >>> offset as a database index and copying newly added changes >>> from the old file to a new one during ovsdb_log_replace(). >>> >>> Reported-at: https://bugzilla.redhat.com/2069108 >>> Signed-off-by: Ilya Maximets <[email protected]> >>> --- >>> ovsdb/ovsdb-server.c | 18 +++++- >>> ovsdb/ovsdb.c | 143 +++++++++++++++++++++++++++++++++++++++---- >>> ovsdb/ovsdb.h | 24 ++++++++ >>> ovsdb/raft.c | 8 ++- >>> ovsdb/raft.h | 3 +- >>> ovsdb/row.c | 17 +++++ >>> ovsdb/row.h | 1 + >>> ovsdb/storage.c | 11 ++-- >>> ovsdb/storage.h | 3 +- >>> 9 files changed, 204 insertions(+), 24 deletions(-) >>> >>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c >>> index 5549b4e3a..eae2f6679 100644 >>> --- a/ovsdb/ovsdb-server.c >>> +++ b/ovsdb/ovsdb-server.c >>> @@ -252,7 +252,9 @@ main_loop(struct server_config *config, >>> remove_db(config, node, >>> xasprintf("removing database %s because storage " >>> "disconnected permanently", >>> node->name)); >>> - } else if (ovsdb_storage_should_snapshot(db->db->storage)) { >>> + } else if (!ovsdb_snapshot_in_progress(db->db) >>> + && (ovsdb_storage_should_snapshot(db->db->storage) >>> || >>> + ovsdb_snapshot_ready(db->db))) { >>> log_and_free_error(ovsdb_snapshot(db->db, trim_memory)); >>> } >>> } >>> @@ -287,6 +289,7 @@ main_loop(struct server_config *config, >>> ovsdb_trigger_wait(db->db, time_msec()); >>> ovsdb_storage_wait(db->db->storage); >>> ovsdb_storage_read_wait(db->db->storage); >>> + ovsdb_snapshot_wait(db->db); >>> } >>> if (run_process) { >>> process_wait(run_process); >>> @@ -1552,11 +1555,20 @@ ovsdb_server_compact(struct unixctl_conn *conn, int >>> argc, >>> ? !strcmp(node->name, db_name) >>> : node->name[0] != '_') { >>> if (db->db) { >>> + struct ovsdb_error *error = NULL; >>> + >>> VLOG_INFO("compacting %s database by user request", >>> node->name); >>> >>> - struct ovsdb_error *error = ovsdb_snapshot(db->db, >>> - trim_memory); >>> + error = ovsdb_snapshot(db->db, trim_memory); >>> + if (!error && ovsdb_snapshot_in_progress(db->db)) { >>> + while (ovsdb_snapshot_in_progress(db->db)) { >>> + ovsdb_snapshot_wait(db->db); >>> + poll_block(); >>> + } >>> + error = ovsdb_snapshot(db->db, trim_memory); >>> + } >>> + >>> if (error) { >>> char *s = ovsdb_error_to_string(error); >>> ds_put_format(&reply, "%s\n", s); >>> diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c >>> index 91b4a01af..8cbefbe3d 100644 >>> --- a/ovsdb/ovsdb.c >>> +++ b/ovsdb/ovsdb.c >>> @@ -25,9 +25,13 @@ >>> #include "file.h" >>> #include "monitor.h" >>> #include "openvswitch/json.h" >>> +#include "openvswitch/poll-loop.h" >>> +#include "ovs-thread.h" >>> #include "ovsdb-error.h" >>> #include "ovsdb-parser.h" >>> #include "ovsdb-types.h" >>> +#include "row.h" >>> +#include "seq.h" >>> #include "simap.h" >>> #include "storage.h" >>> #include "table.h" >>> @@ -461,6 +465,21 @@ ovsdb_destroy(struct ovsdb *db) >>> if (db) { >>> struct shash_node *node; >>> >>> + /* Need to wait for compaction thread to finish the work. */ >>> + while (ovsdb_snapshot_in_progress(db)) { >>> + ovsdb_snapshot_wait(db); >>> + poll_block(); >>> + } >>> + if (ovsdb_snapshot_ready(db)) { >>> + struct ovsdb_error *error = ovsdb_snapshot(db, false); >>> + >>> + if (error) { >>> + char *s = ovsdb_error_to_string_free(error); >>> + VLOG_INFO("%s: %s", db->name, s); >>> + free(s); >> >> Nit: this is very similar to log_and_free_error(). > > That's because I copied this piece from there. :)
I didn't want to make it sound like that. Or, at least, not explicitly. :) > I wasn't sure if it worth re-naming and moving the > function to some common header. It seems simple > enough to just re-implement. Also, we can print > a bit more information like a database name if not > limited by only having a error message. > Ok, let's leave it as-is. >> >> For now the rest of the patch looks good to me. > > Thanks for review! > >> I still want to test it some more though. > > Sure. > I left a few more comments on the original patch. >> >> Thanks, >> Dumitru >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
