On 7/1/22 17:31, Dumitru Ceara wrote:
> On 7/1/22 01:34, Ilya Maximets wrote:
>> Conversion of the database data into JSON object, serialization
>> and destruction of that object are the most heavy operations
>> during the database compaction.  If these operations are moved
>> to a separate thread, the main thread can continue processing
>> database requests in the meantime.
>>
> 
> Hi Ilya,
> 
>> With this change, the compaction is split in 3 phases:
>>
>> 1. Initialization:
>>    - Create a copy of the database.
>>    - Remember current database index.
>>    - Start a separate thread to convert a copy of the database
>>      into serialized JSON object.
>>
>> 2. Wait:
>>    - Continue normal operation until compaction thread is done.
>>    - Meanwhile, compaction thread:
>>      * Convert database copy to JSON.
>>      * Serialize resulted JSON.
>>      * Destroy original JSON object.
>>
>> 3. Finish:
>>    - Destroy the database copy.
>>    - Take the snapshot created by the thread.
>>    - Write on disk.
>>
> 
> The approach sounds good to me.
> 
>> The key for this schema to be fast is the ability to create
>> a shallow copy of the database.  This doesn't take too much
>> time allowing the thread to do most of work.
>>
>> Database copy is created and destroyed only by the main thread,
>> so there is no need for synchronization.
>>
>> Such solution allows to reduce the time main thread is blocked
>> by compaction by 80-90%.  For example, in ovn-heater tests
>> with 120 node density-heavy scenario, where compaction normally
>> takes 5-6 seconds at the end of a test, measured compaction
>> times was all below 1 second with the change applied.  Also,
>> note that these measured times are the sum of phases 1 and 3,
>> so actual poll intervals are about half a second in this case.
> 
> Nice!
> 
>>
>> Only implemented for raft storage for now.  The implementation
>> for standalone databases can be added later by using a file
>> offset as a database index and copying newly added changes
>> from the old file to a new one during ovsdb_log_replace().
>>
>> Reported-at: https://bugzilla.redhat.com/2069108
>> Signed-off-by: Ilya Maximets <[email protected]>
>> ---
>>  ovsdb/ovsdb-server.c |  18 +++++-
>>  ovsdb/ovsdb.c        | 143 +++++++++++++++++++++++++++++++++++++++----
>>  ovsdb/ovsdb.h        |  24 ++++++++
>>  ovsdb/raft.c         |   8 ++-
>>  ovsdb/raft.h         |   3 +-
>>  ovsdb/row.c          |  17 +++++
>>  ovsdb/row.h          |   1 +
>>  ovsdb/storage.c      |  11 ++--
>>  ovsdb/storage.h      |   3 +-
>>  9 files changed, 204 insertions(+), 24 deletions(-)
>>
>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
>> index 5549b4e3a..eae2f6679 100644
>> --- a/ovsdb/ovsdb-server.c
>> +++ b/ovsdb/ovsdb-server.c
>> @@ -252,7 +252,9 @@ main_loop(struct server_config *config,
>>                  remove_db(config, node,
>>                            xasprintf("removing database %s because storage "
>>                                      "disconnected permanently", 
>> node->name));
>> -            } else if (ovsdb_storage_should_snapshot(db->db->storage)) {
>> +            } else if (!ovsdb_snapshot_in_progress(db->db)
>> +                       && (ovsdb_storage_should_snapshot(db->db->storage) ||
>> +                           ovsdb_snapshot_ready(db->db))) {
>>                  log_and_free_error(ovsdb_snapshot(db->db, trim_memory));
>>              }
>>          }
>> @@ -287,6 +289,7 @@ main_loop(struct server_config *config,
>>              ovsdb_trigger_wait(db->db, time_msec());
>>              ovsdb_storage_wait(db->db->storage);
>>              ovsdb_storage_read_wait(db->db->storage);
>> +            ovsdb_snapshot_wait(db->db);
>>          }
>>          if (run_process) {
>>              process_wait(run_process);
>> @@ -1552,11 +1555,20 @@ ovsdb_server_compact(struct unixctl_conn *conn, int 
>> argc,
>>              ? !strcmp(node->name, db_name)
>>              : node->name[0] != '_') {
>>              if (db->db) {
>> +                struct ovsdb_error *error = NULL;
>> +
>>                  VLOG_INFO("compacting %s database by user request",
>>                            node->name);
>>  
>> -                struct ovsdb_error *error = ovsdb_snapshot(db->db,
>> -                                                           trim_memory);
>> +                error = ovsdb_snapshot(db->db, trim_memory);
>> +                if (!error && ovsdb_snapshot_in_progress(db->db)) {
>> +                    while (ovsdb_snapshot_in_progress(db->db)) {
>> +                        ovsdb_snapshot_wait(db->db);
>> +                        poll_block();
>> +                    }
>> +                    error = ovsdb_snapshot(db->db, trim_memory);
>> +                }
>> +
>>                  if (error) {
>>                      char *s = ovsdb_error_to_string(error);
>>                      ds_put_format(&reply, "%s\n", s);
>> diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c
>> index 91b4a01af..8cbefbe3d 100644
>> --- a/ovsdb/ovsdb.c
>> +++ b/ovsdb/ovsdb.c
>> @@ -25,9 +25,13 @@
>>  #include "file.h"
>>  #include "monitor.h"
>>  #include "openvswitch/json.h"
>> +#include "openvswitch/poll-loop.h"
>> +#include "ovs-thread.h"
>>  #include "ovsdb-error.h"
>>  #include "ovsdb-parser.h"
>>  #include "ovsdb-types.h"
>> +#include "row.h"
>> +#include "seq.h"
>>  #include "simap.h"
>>  #include "storage.h"
>>  #include "table.h"
>> @@ -461,6 +465,21 @@ ovsdb_destroy(struct ovsdb *db)
>>      if (db) {
>>          struct shash_node *node;
>>  
>> +        /* Need to wait for compaction thread to finish the work. */
>> +        while (ovsdb_snapshot_in_progress(db)) {
>> +            ovsdb_snapshot_wait(db);
>> +            poll_block();
>> +        }
>> +        if (ovsdb_snapshot_ready(db)) {
>> +            struct ovsdb_error *error = ovsdb_snapshot(db, false);
>> +
>> +            if (error) {
>> +                char *s = ovsdb_error_to_string_free(error);
>> +                VLOG_INFO("%s: %s", db->name, s);
>> +                free(s);
> 
> Nit: this is very similar to log_and_free_error().

That's because I copied this piece from there. :)
I wasn't sure if it worth re-naming and moving the
function to some common header.  It seems simple
enough to just re-implement.  Also, we can print
a bit more information like a database name if not
limited by only having a error message.

> 
> For now the rest of the patch looks good to me.

Thanks for review!

> I still want to test it some more though.

Sure.

> 
> Thanks,
> Dumitru
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to