On 7/1/22 17:45, Ilya Maximets wrote:
> On 7/1/22 17:31, Dumitru Ceara wrote:
>> On 7/1/22 01:34, Ilya Maximets wrote:
>>> Conversion of the database data into JSON object, serialization
>>> and destruction of that object are the most heavy operations
>>> during the database compaction.  If these operations are moved
>>> to a separate thread, the main thread can continue processing
>>> database requests in the meantime.
>>>
>>
>> Hi Ilya,
>>
>>> With this change, the compaction is split in 3 phases:
>>>
>>> 1. Initialization:
>>>    - Create a copy of the database.
>>>    - Remember current database index.
>>>    - Start a separate thread to convert a copy of the database
>>>      into serialized JSON object.
>>>
>>> 2. Wait:
>>>    - Continue normal operation until compaction thread is done.
>>>    - Meanwhile, compaction thread:
>>>      * Convert database copy to JSON.
>>>      * Serialize resulted JSON.
>>>      * Destroy original JSON object.
>>>
>>> 3. Finish:
>>>    - Destroy the database copy.
>>>    - Take the snapshot created by the thread.
>>>    - Write on disk.
>>>
>>
>> The approach sounds good to me.
>>
>>> The key for this schema to be fast is the ability to create
>>> a shallow copy of the database.  This doesn't take too much
>>> time allowing the thread to do most of work.
>>>
>>> Database copy is created and destroyed only by the main thread,
>>> so there is no need for synchronization.
>>>
>>> Such solution allows to reduce the time main thread is blocked
>>> by compaction by 80-90%.  For example, in ovn-heater tests
>>> with 120 node density-heavy scenario, where compaction normally
>>> takes 5-6 seconds at the end of a test, measured compaction
>>> times was all below 1 second with the change applied.  Also,
>>> note that these measured times are the sum of phases 1 and 3,
>>> so actual poll intervals are about half a second in this case.
>>
>> Nice!
>>
>>>
>>> Only implemented for raft storage for now.  The implementation
>>> for standalone databases can be added later by using a file
>>> offset as a database index and copying newly added changes
>>> from the old file to a new one during ovsdb_log_replace().
>>>
>>> Reported-at: https://bugzilla.redhat.com/2069108
>>> Signed-off-by: Ilya Maximets <[email protected]>
>>> ---
>>>  ovsdb/ovsdb-server.c |  18 +++++-
>>>  ovsdb/ovsdb.c        | 143 +++++++++++++++++++++++++++++++++++++++----
>>>  ovsdb/ovsdb.h        |  24 ++++++++
>>>  ovsdb/raft.c         |   8 ++-
>>>  ovsdb/raft.h         |   3 +-
>>>  ovsdb/row.c          |  17 +++++
>>>  ovsdb/row.h          |   1 +
>>>  ovsdb/storage.c      |  11 ++--
>>>  ovsdb/storage.h      |   3 +-
>>>  9 files changed, 204 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
>>> index 5549b4e3a..eae2f6679 100644
>>> --- a/ovsdb/ovsdb-server.c
>>> +++ b/ovsdb/ovsdb-server.c
>>> @@ -252,7 +252,9 @@ main_loop(struct server_config *config,
>>>                  remove_db(config, node,
>>>                            xasprintf("removing database %s because storage "
>>>                                      "disconnected permanently", 
>>> node->name));
>>> -            } else if (ovsdb_storage_should_snapshot(db->db->storage)) {
>>> +            } else if (!ovsdb_snapshot_in_progress(db->db)
>>> +                       && (ovsdb_storage_should_snapshot(db->db->storage) 
>>> ||
>>> +                           ovsdb_snapshot_ready(db->db))) {
>>>                  log_and_free_error(ovsdb_snapshot(db->db, trim_memory));
>>>              }
>>>          }
>>> @@ -287,6 +289,7 @@ main_loop(struct server_config *config,
>>>              ovsdb_trigger_wait(db->db, time_msec());
>>>              ovsdb_storage_wait(db->db->storage);
>>>              ovsdb_storage_read_wait(db->db->storage);
>>> +            ovsdb_snapshot_wait(db->db);
>>>          }
>>>          if (run_process) {
>>>              process_wait(run_process);
>>> @@ -1552,11 +1555,20 @@ ovsdb_server_compact(struct unixctl_conn *conn, int 
>>> argc,
>>>              ? !strcmp(node->name, db_name)
>>>              : node->name[0] != '_') {
>>>              if (db->db) {
>>> +                struct ovsdb_error *error = NULL;
>>> +
>>>                  VLOG_INFO("compacting %s database by user request",
>>>                            node->name);
>>>  
>>> -                struct ovsdb_error *error = ovsdb_snapshot(db->db,
>>> -                                                           trim_memory);
>>> +                error = ovsdb_snapshot(db->db, trim_memory);
>>> +                if (!error && ovsdb_snapshot_in_progress(db->db)) {
>>> +                    while (ovsdb_snapshot_in_progress(db->db)) {
>>> +                        ovsdb_snapshot_wait(db->db);
>>> +                        poll_block();
>>> +                    }
>>> +                    error = ovsdb_snapshot(db->db, trim_memory);
>>> +                }
>>> +
>>>                  if (error) {
>>>                      char *s = ovsdb_error_to_string(error);
>>>                      ds_put_format(&reply, "%s\n", s);
>>> diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c
>>> index 91b4a01af..8cbefbe3d 100644
>>> --- a/ovsdb/ovsdb.c
>>> +++ b/ovsdb/ovsdb.c
>>> @@ -25,9 +25,13 @@
>>>  #include "file.h"
>>>  #include "monitor.h"
>>>  #include "openvswitch/json.h"
>>> +#include "openvswitch/poll-loop.h"
>>> +#include "ovs-thread.h"
>>>  #include "ovsdb-error.h"
>>>  #include "ovsdb-parser.h"
>>>  #include "ovsdb-types.h"
>>> +#include "row.h"
>>> +#include "seq.h"
>>>  #include "simap.h"
>>>  #include "storage.h"
>>>  #include "table.h"
>>> @@ -461,6 +465,21 @@ ovsdb_destroy(struct ovsdb *db)
>>>      if (db) {
>>>          struct shash_node *node;
>>>  
>>> +        /* Need to wait for compaction thread to finish the work. */
>>> +        while (ovsdb_snapshot_in_progress(db)) {
>>> +            ovsdb_snapshot_wait(db);
>>> +            poll_block();
>>> +        }
>>> +        if (ovsdb_snapshot_ready(db)) {
>>> +            struct ovsdb_error *error = ovsdb_snapshot(db, false);
>>> +
>>> +            if (error) {
>>> +                char *s = ovsdb_error_to_string_free(error);
>>> +                VLOG_INFO("%s: %s", db->name, s);
>>> +                free(s);
>>
>> Nit: this is very similar to log_and_free_error().
> 
> That's because I copied this piece from there. :)

I didn't want to make it sound like that.  Or, at least, not explicitly. :)

> I wasn't sure if it worth re-naming and moving the
> function to some common header.  It seems simple
> enough to just re-implement.  Also, we can print
> a bit more information like a database name if not
> limited by only having a error message.
> 

Ok, let's leave it as-is.

>>
>> For now the rest of the patch looks good to me.
> 
> Thanks for review!
> 
>> I still want to test it some more though.
> 
> Sure.
> 

I left a few more comments on the original patch.

>>
>> Thanks,
>> Dumitru
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to