On 7/1/22 17:31, Dumitru Ceara wrote: > On 7/1/22 01:34, Ilya Maximets wrote: >> Conversion of the database data into JSON object, serialization >> and destruction of that object are the most heavy operations >> during the database compaction. If these operations are moved >> to a separate thread, the main thread can continue processing >> database requests in the meantime. >> > > Hi Ilya, > >> With this change, the compaction is split in 3 phases: >> >> 1. Initialization: >> - Create a copy of the database. >> - Remember current database index. >> - Start a separate thread to convert a copy of the database >> into serialized JSON object. >> >> 2. Wait: >> - Continue normal operation until compaction thread is done. >> - Meanwhile, compaction thread: >> * Convert database copy to JSON. >> * Serialize resulted JSON. >> * Destroy original JSON object. >> >> 3. Finish: >> - Destroy the database copy. >> - Take the snapshot created by the thread. >> - Write on disk. >> > > The approach sounds good to me. > >> The key for this schema to be fast is the ability to create >> a shallow copy of the database. This doesn't take too much >> time allowing the thread to do most of work. >> >> Database copy is created and destroyed only by the main thread, >> so there is no need for synchronization. >> >> Such solution allows to reduce the time main thread is blocked >> by compaction by 80-90%. For example, in ovn-heater tests >> with 120 node density-heavy scenario, where compaction normally >> takes 5-6 seconds at the end of a test, measured compaction >> times was all below 1 second with the change applied. Also, >> note that these measured times are the sum of phases 1 and 3, >> so actual poll intervals are about half a second in this case. > > Nice! > >> >> Only implemented for raft storage for now. The implementation >> for standalone databases can be added later by using a file >> offset as a database index and copying newly added changes >> from the old file to a new one during ovsdb_log_replace(). >> >> Reported-at: https://bugzilla.redhat.com/2069108 >> Signed-off-by: Ilya Maximets <[email protected]> >> --- >> ovsdb/ovsdb-server.c | 18 +++++- >> ovsdb/ovsdb.c | 143 +++++++++++++++++++++++++++++++++++++++---- >> ovsdb/ovsdb.h | 24 ++++++++ >> ovsdb/raft.c | 8 ++- >> ovsdb/raft.h | 3 +- >> ovsdb/row.c | 17 +++++ >> ovsdb/row.h | 1 + >> ovsdb/storage.c | 11 ++-- >> ovsdb/storage.h | 3 +- >> 9 files changed, 204 insertions(+), 24 deletions(-) >> >> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c >> index 5549b4e3a..eae2f6679 100644 >> --- a/ovsdb/ovsdb-server.c >> +++ b/ovsdb/ovsdb-server.c >> @@ -252,7 +252,9 @@ main_loop(struct server_config *config, >> remove_db(config, node, >> xasprintf("removing database %s because storage " >> "disconnected permanently", >> node->name)); >> - } else if (ovsdb_storage_should_snapshot(db->db->storage)) { >> + } else if (!ovsdb_snapshot_in_progress(db->db) >> + && (ovsdb_storage_should_snapshot(db->db->storage) || >> + ovsdb_snapshot_ready(db->db))) { >> log_and_free_error(ovsdb_snapshot(db->db, trim_memory)); >> } >> } >> @@ -287,6 +289,7 @@ main_loop(struct server_config *config, >> ovsdb_trigger_wait(db->db, time_msec()); >> ovsdb_storage_wait(db->db->storage); >> ovsdb_storage_read_wait(db->db->storage); >> + ovsdb_snapshot_wait(db->db); >> } >> if (run_process) { >> process_wait(run_process); >> @@ -1552,11 +1555,20 @@ ovsdb_server_compact(struct unixctl_conn *conn, int >> argc, >> ? !strcmp(node->name, db_name) >> : node->name[0] != '_') { >> if (db->db) { >> + struct ovsdb_error *error = NULL; >> + >> VLOG_INFO("compacting %s database by user request", >> node->name); >> >> - struct ovsdb_error *error = ovsdb_snapshot(db->db, >> - trim_memory); >> + error = ovsdb_snapshot(db->db, trim_memory); >> + if (!error && ovsdb_snapshot_in_progress(db->db)) { >> + while (ovsdb_snapshot_in_progress(db->db)) { >> + ovsdb_snapshot_wait(db->db); >> + poll_block(); >> + } >> + error = ovsdb_snapshot(db->db, trim_memory); >> + } >> + >> if (error) { >> char *s = ovsdb_error_to_string(error); >> ds_put_format(&reply, "%s\n", s); >> diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c >> index 91b4a01af..8cbefbe3d 100644 >> --- a/ovsdb/ovsdb.c >> +++ b/ovsdb/ovsdb.c >> @@ -25,9 +25,13 @@ >> #include "file.h" >> #include "monitor.h" >> #include "openvswitch/json.h" >> +#include "openvswitch/poll-loop.h" >> +#include "ovs-thread.h" >> #include "ovsdb-error.h" >> #include "ovsdb-parser.h" >> #include "ovsdb-types.h" >> +#include "row.h" >> +#include "seq.h" >> #include "simap.h" >> #include "storage.h" >> #include "table.h" >> @@ -461,6 +465,21 @@ ovsdb_destroy(struct ovsdb *db) >> if (db) { >> struct shash_node *node; >> >> + /* Need to wait for compaction thread to finish the work. */ >> + while (ovsdb_snapshot_in_progress(db)) { >> + ovsdb_snapshot_wait(db); >> + poll_block(); >> + } >> + if (ovsdb_snapshot_ready(db)) { >> + struct ovsdb_error *error = ovsdb_snapshot(db, false); >> + >> + if (error) { >> + char *s = ovsdb_error_to_string_free(error); >> + VLOG_INFO("%s: %s", db->name, s); >> + free(s); > > Nit: this is very similar to log_and_free_error().
That's because I copied this piece from there. :) I wasn't sure if it worth re-naming and moving the function to some common header. It seems simple enough to just re-implement. Also, we can print a bit more information like a database name if not limited by only having a error message. > > For now the rest of the patch looks good to me. Thanks for review! > I still want to test it some more though. Sure. > > Thanks, > Dumitru > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
