On Tue, Aug 31, 2021 at 9:12 AM Ilya Maximets <[email protected]> wrote:
>
> On 8/31/21 2:18 AM, Han Zhou wrote:
> >
> >
> > On Tue, Aug 24, 2021 at 12:00 PM Ilya Maximets <[email protected]
<mailto:[email protected]>> wrote:
> >>
> >> Raft log entries (and raft database snapshot) contains json objects
> >> of the data.  Follower receives append requests with data that gets
> >> parsed and added to the raft log.  Leader receives execution requests,
> >> parses data out of them and adds to the log.  In both cases, later
> >> ovsdb-server reads the log with ovsdb_storage_read(), constructs
> >> transaction and updates the database.  On followers these json objects
> >> in common case are never used again.  Leader may use them to send
> >> append requests or snapshot installation requests to followers.
> >> However, all these operations (except for ovsdb_storage_read()) are
> >> just serializing the json in order to send it over the network.
> >>
> >> Json objects are significantly larger than their serialized string
> >> representation.  For example, the snapshot of the database from one of
> >> the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as
> >> a json object from the total 3.8 GB consumed by ovsdb-server process.
> >>
> >
> > Thanks Ilya for this optimization!
> > I am surprised that the serialized string is so much smaller than the
json object.
>
> Yeah. There seems to be a lot of overhead from the memory allocator.
> JSON object consists of millions of small chunks of malloc'ed memory.
>
> >
> >> ovsdb_storage_read() for a given raft entry happens only once in a
> >> lifetime, so after this call, we can serialize the json object, store
> >> the string representation and free the actual json object that ovsdb
> >> will never need again.  This can save a lot of memory and can also
> >> save serialization time, because each raft entry for append requests
> >> and snapshot installation requests serialized only once instead of
> >> doing that every time such request needs to be sent.
> >>
> >> JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate
> >> pre-serialized data into raft_header and similar json objects.
> >>
> >> One major special case is creation of a database snapshot.
> >> Snapshot installation request received over the network will be parsed
> >> and read by ovsdb-server just like any other raft log entry.  However,
> >> snapshots created locally with raft_store_snapshot() will never be
> >> read back, because they reflect the current state of the database,
> >> hence already applied.  For this case we can free the json object
> >> right after writing snapshot on disk.
> >>
> >> Tests performed with ovn-heater on 60 node density-light scenario,
> >> where on-disk database goes up to 97 MB, shows average memory
> >> consumption of ovsdb-server Southbound DB processes decreased by 58%
> >> (from 602 MB to 256 MB per process) and peak memory consumption
> >> decreased by 40% (from 1288 MB to 771 MB).
> >>
> >> Test with 120 nodes on density-heavy scenario with 270 MB on-disk
> >> database shows 1.5 GB memory consumption decrease as expected.
> >
> > Is the 1.5 GB decrease from the original 3.8 GB total?
>
> Yep.  This are average numbers thoug, so absolute values at particular
> moments may vary.  I have a data for the ther test where memory
> consumption in absolute value at the end of a test goes down form 4.3
> to 2.6 GB.
>
> > This is a great optimization. Please see my minor comments/questions
below.
> >
> >> Also, total CPU time consumed by the Southbound DB process reduced
> >> from 296 to 256 minutes.  Number of unreasonably long poll intervals
> >> reduced from 2896 down to 1934.
> >>
> >> Deserialization is also implemented just in case.  I didn't see this
> >> function being invoked in practice.
> >>
> >> Signed-off-by: Ilya Maximets <[email protected] <mailto:
[email protected]>>
> >> ---
> >>  ovsdb/ovsdb-tool.c   |  10 ++--
> >>  ovsdb/raft-private.c | 111 ++++++++++++++++++++++++++++++++++++++-----
> >>  ovsdb/raft-private.h |  12 ++++-
> >>  ovsdb/raft.c         |  94 ++++++++++++++++++++----------------
> >>  ovsdb/raft.h         |   3 +-
> >>  ovsdb/storage.c      |   4 +-
> >>  6 files changed, 170 insertions(+), 64 deletions(-)
> >>
> >> diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
> >> index 05a0223e7..903b2ebc9 100644
> >> --- a/ovsdb/ovsdb-tool.c
> >> +++ b/ovsdb/ovsdb-tool.c
> >> @@ -919,7 +919,7 @@ print_raft_header(const struct raft_header *h,
> >>          if (!uuid_is_zero(&h->snap.eid)) {
> >>              printf(" prev_eid: %04x\n", uuid_prefix(&h->snap.eid, 4));
> >>          }
> >> -        print_data("prev_", h->snap.data, schemap, names);
> >> +        print_data("prev_", raft_entry_get_data(&h->snap), schemap,
names);
> >>      }
> >>  }
> >>
> >> @@ -973,11 +973,13 @@ raft_header_to_standalone_log(const struct
raft_header *h,
> >>                                struct ovsdb_log *db_log_data)
> >>  {
> >>      if (h->snap_index) {
> >> -        if (!h->snap.data || json_array(h->snap.data)->n != 2) {
> >> +        const struct json *data = raft_entry_get_data(&h->snap);
> >> +
> >> +        if (!data || json_array(data)->n != 2) {
> >>              ovs_fatal(0, "Incorrect raft header data array length");
> >>          }
> >>
> >> -        struct json_array *pa = json_array(h->snap.data);
> >> +        struct json_array *pa = json_array(data);
> >>          struct json *schema_json = pa->elems[0];
> >>          struct ovsdb_error *error = NULL;
> >>
> >> @@ -1373,7 +1375,7 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
> >>                  }
> >>                  struct raft_entry *e = &s->entries[log_idx];
> >>                  e->term = r->term;
> >> -                e->data = r->entry.data;
> >> +                raft_entry_set_data_nocopy(e, r->entry.data);
> >>                  e->eid = r->entry.eid;
> >>                  e->servers = r->entry.servers;
> >>                  break;
> >> diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c
> >> index 26d39a087..33c20e71b 100644
> >> --- a/ovsdb/raft-private.c
> >> +++ b/ovsdb/raft-private.c
> >> @@ -18,11 +18,16 @@
> >>
> >>  #include "raft-private.h"
> >>
> >> +#include "coverage.h"
> >>  #include "openvswitch/dynamic-string.h"
> >>  #include "ovsdb-error.h"
> >>  #include "ovsdb-parser.h"
> >>  #include "socket-util.h"
> >>  #include "sset.h"
> >> +
> >> +COVERAGE_DEFINE(raft_entry_deserialize);
> >> +COVERAGE_DEFINE(raft_entry_serialize);
> >> +
> >>
> >>  /* Addresses of Raft servers. */
> >>
> >> @@ -281,7 +286,8 @@ void
> >>  raft_entry_clone(struct raft_entry *dst, const struct raft_entry *src)
> >>  {
> >>      dst->term = src->term;
> >> -    dst->data = json_nullable_clone(src->data);
> >> +    dst->data.full_json = json_nullable_clone(src->data.full_json);
> >> +    dst->data.serialized = json_nullable_clone(src->data.serialized);
> >>      dst->eid = src->eid;
> >>      dst->servers = json_nullable_clone(src->servers);
> >>      dst->election_timer = src->election_timer;
> >> @@ -291,7 +297,8 @@ void
> >>  raft_entry_uninit(struct raft_entry *e)
> >>  {
> >>      if (e) {
> >> -        json_destroy(e->data);
> >> +        json_destroy(e->data.full_json);
> >> +        json_destroy(e->data.serialized);
> >>          json_destroy(e->servers);
> >>      }
> >>  }
> >> @@ -301,8 +308,9 @@ raft_entry_to_json(const struct raft_entry *e)
> >>  {
> >>      struct json *json = json_object_create();
> >>      raft_put_uint64(json, "term", e->term);
> >> -    if (e->data) {
> >> -        json_object_put(json, "data", json_clone(e->data));
> >> +    if (raft_entry_has_data(e)) {
> >> +        json_object_put(json, "data",
> >> +
 json_clone(raft_entry_get_serialized_data(e)));
> >>          json_object_put_format(json, "eid", UUID_FMT,
UUID_ARGS(&e->eid));
> >>      }
> >>      if (e->servers) {
> >> @@ -323,9 +331,10 @@ raft_entry_from_json(struct json *json, struct
raft_entry *e)
> >>      struct ovsdb_parser p;
> >>      ovsdb_parser_init(&p, json, "raft log entry");
> >>      e->term = raft_parse_required_uint64(&p, "term");
> >> -    e->data = json_nullable_clone(
> >> +    raft_entry_set_data(e,
> >>          ovsdb_parser_member(&p, "data", OP_OBJECT | OP_ARRAY |
OP_OPTIONAL));
> >> -    e->eid = e->data ? raft_parse_required_uuid(&p, "eid") :
UUID_ZERO;
> >> +    e->eid = raft_entry_has_data(e)
> >> +             ? raft_parse_required_uuid(&p, "eid") : UUID_ZERO;
> >>      e->servers = json_nullable_clone(
> >>          ovsdb_parser_member(&p, "servers", OP_OBJECT | OP_OPTIONAL));
> >>      if (e->servers) {
> >> @@ -344,9 +353,86 @@ bool
> >>  raft_entry_equals(const struct raft_entry *a, const struct raft_entry
*b)
> >>  {
> >>      return (a->term == b->term
> >> -            && json_equal(a->data, b->data)
> >>              && uuid_equals(&a->eid, &b->eid)
> >> -            && json_equal(a->servers, b->servers));
> >> +            && json_equal(a->servers, b->servers)
> >> +            && json_equal(raft_entry_get_data(a),
raft_entry_get_data(b)));
> >> +}
> >> +
> >> +bool
> >> +raft_entry_has_data(const struct raft_entry *e)
> >> +{
> >> +    return e->data.full_json || e->data.serialized;
> >> +}
> >> +
> >> +/* Triggers generation of a serialized copy of the data. */
> >> +void
> >> +raft_entry_data_serialize(struct raft_entry *e)
> >> +{
> >> +    if (!raft_entry_has_data(e) || e->data.serialized) {
> >> +        return;
> >> +    }
> >> +    COVERAGE_INC(raft_entry_serialize);
> >> +    e->data.serialized =
json_serialized_object_create(e->data.full_json);
> >> +}
> >> +
> >> +static void
> >> +raft_entry_data_deserialize(struct raft_entry *e)
> >> +{
> >> +    if (!raft_entry_has_data(e) || e->data.full_json) {
> >> +        return;
> >> +    }
> >> +    COVERAGE_INC(raft_entry_deserialize);
> >> +    e->data.full_json =
json_from_serialized_object(e->data.serialized);
> >> +}
> >> +
> >> +
> >> +void
> >> +raft_entry_set_data_nocopy(struct raft_entry *e, struct json *json)
> >> +{
> >> +    ovs_assert(!json || json->type != JSON_SERIALIZED_OBJECT);
> >> +    e->data.full_json = json;
> >> +    e->data.serialized = NULL;
> >
> > It may be more clear to just name the function
raft_entry_set_parsed_data_nocopy() and rename the below function to
raft_entry_set_parsed_data(), making it clear that the function only sets
the parsed data.
>
> Yeah, it makes sense.  Dumitru had a similar comment with a bit different
> suggestion.  Renaming makes sense in general, I'll do that.
>
> > If it is assumed that this function should only be called when e->data
is empty, it is probably better to do an assertion for serialized == NULL,
instead of setting serialized to NULL.
>
> Unfortuantely, it's hard to do.  I had assertion in the initial
> version of a patch, but it turned out that callers typically do
> no have a clean memory.  Entry allocated on stack or re-used entry
> passed here most of the time, so we can't actually check the current
> value, because it's not NULL, but not a valid pointer either.
>
> >
> >> +}
> >> +
> >> +void
> >> +raft_entry_set_data(struct raft_entry *e, const struct json *json)
> >> +{
> >> +    raft_entry_set_data_nocopy(e, json_nullable_clone(json));
> >> +}
> >> +
> >> +/* Returns a pointer to the fully parsed json object of the data.
> >> + * Caller takes the ownership of the result.
> >> + *
> >> + * Performance notice: Subsequent call of this function for the same
data
> >> + * object will lead to re-parsing of the serialized json, unless
updated
> >> + * with raft_entry_set_data(). */
> >> +struct json *
> >> +raft_entry_steal_parsed_data(struct raft_entry *e)
> >> +{
> >> +    /* Ensure that both versions exists. */
> >> +    raft_entry_data_serialize(e);
> >> +    raft_entry_data_deserialize(e);
> >> +
> >> +    struct json *json = e->data.full_json;
> >> +    e->data.full_json = NULL;
> >> +
> >> +    return json;
> >> +}
> >> +
> >> +/* Returns a pointer to the fully parsed json object of the data. */
> >> +const struct json *
> >> +raft_entry_get_data(const struct raft_entry *e)
> >
> > Better to rename to raft_entry_get_parsed_data() to be consistent with
the raft_entry_steal_parsed_data().
>
> OK.
>
> >
> >> +{
> >> +    raft_entry_data_deserialize(CONST_CAST(struct raft_entry *, e));
> >
> > This suggests that this get() function could implicitly introduce a
cost of deserialization, but in the implementation I didn't see any place
this get() function is called against a raft_entry that doesn't have
full_json already, which means the raft_entry_data_deserialize() function
is never called. If that's the case, would it be more clear just remove
this line and the function raft_entry_data_deserialize()? It makes the life
cycle of a raft_entry object more clear: it is always constructed with a
full_json, then serialized, and then at some point with the full_json
stolen. Did I misunderstand?
>
> You're right, it's never called.  I added deserialization just in case.
> With renaming of this function it will make sense to just add assertion
> here.  I'll remove deserialization and add assertion.
>
> >
> >> +    return e->data.full_json;
> >> +}
> >> +
> >> +/* Returns a pointer to the JSON_SERIALIZED_OBJECT of the data. */
> >> +const struct json *
> >> +raft_entry_get_serialized_data(const struct raft_entry *e)
> >> +{
> >> +    raft_entry_data_serialize(CONST_CAST(struct raft_entry *, e));
> >> +    return e->data.serialized;
> >>  }
> >>
> >>  void
> >> @@ -402,8 +488,8 @@ raft_header_from_json__(struct raft_header *h,
struct ovsdb_parser *p)
> >>           * present, all of them must be. */
> >>          h->snap_index = raft_parse_optional_uint64(p, "prev_index");
> >>          if (h->snap_index) {
> >> -            h->snap.data = json_nullable_clone(
> >> -                ovsdb_parser_member(p, "prev_data", OP_ANY));
> >> +            raft_entry_set_data(
> >> +                &h->snap, ovsdb_parser_member(p, "prev_data",
OP_ANY));
> >>              h->snap.eid = raft_parse_required_uuid(p, "prev_eid");
> >>              h->snap.term = raft_parse_required_uint64(p, "prev_term");
> >>              h->snap.election_timer = raft_parse_optional_uint64(
> >> @@ -455,8 +541,9 @@ raft_header_to_json(const struct raft_header *h)
> >>      if (h->snap_index) {
> >>          raft_put_uint64(json, "prev_index", h->snap_index);
> >>          raft_put_uint64(json, "prev_term", h->snap.term);
> >> -        if (h->snap.data) {
> >> -            json_object_put(json, "prev_data",
json_clone(h->snap.data));
> >> +        if (raft_entry_has_data(&h->snap)) {
> >> +            json_object_put(json, "prev_data",
> >> +                json_clone(raft_entry_get_serialized_data(&h->snap)));
> >>          }
> >>          json_object_put_format(json, "prev_eid",
> >>                                 UUID_FMT, UUID_ARGS(&h->snap.eid));
> >> diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h
> >> index a69e37e5c..c39842593 100644
> >> --- a/ovsdb/raft-private.h
> >> +++ b/ovsdb/raft-private.h
> >> @@ -118,7 +118,10 @@ void raft_servers_format(const struct hmap
*servers, struct ds *ds);
> >>   * entry.  */
> >>  struct raft_entry {
> >>      uint64_t term;
> >> -    struct json *data;
> >> +    struct {
> >> +        struct json *full_json;   /* Fully parsed JSON object. */
> >> +        struct json *serialized;  /* JSON_SERIALIZED_OBJECT version
of data. */
> >> +    } data;
> >>      struct uuid eid;
> >>      struct json *servers;
> >>      uint64_t election_timer;
> >> @@ -130,6 +133,13 @@ struct json *raft_entry_to_json(const struct
raft_entry *);
> >>  struct ovsdb_error *raft_entry_from_json(struct json *, struct
raft_entry *)
> >>      OVS_WARN_UNUSED_RESULT;
> >>  bool raft_entry_equals(const struct raft_entry *, const struct
raft_entry *);
> >> +bool raft_entry_has_data(const struct raft_entry *);
> >> +void raft_entry_set_data(struct raft_entry *, const struct json *);
> >> +void raft_entry_set_data_nocopy(struct raft_entry *, struct json *);
> >> +void raft_entry_data_serialize(struct raft_entry *);
> >> +struct json *raft_entry_steal_parsed_data(struct raft_entry *);
> >> +const struct json *raft_entry_get_data(const struct raft_entry *);
> >> +const struct json *raft_entry_get_serialized_data(const struct
raft_entry *);
> >>
> >>  /* On disk data serialization and deserialization. */
> >>
> >> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> >> index 2fb515651..cca551496 100644
> >> --- a/ovsdb/raft.c
> >> +++ b/ovsdb/raft.c
> >> @@ -494,11 +494,11 @@ raft_create_cluster(const char *file_name, const
char *name,
> >>          .snap_index = index++,
> >>          .snap = {
> >>              .term = term,
> >> -            .data = json_nullable_clone(data),
> >>              .eid = uuid_random(),
> >>              .servers = json_object_create(),
> >>          },
> >>      };
> >> +    raft_entry_set_data(&h.snap, data);
> >>      shash_add_nocopy(json_object(h.snap.servers),
> >>                       xasprintf(UUID_FMT, UUID_ARGS(&h.sid)),
> >>                       json_string_create(local_address));
> >> @@ -727,10 +727,10 @@ raft_add_entry(struct raft *raft,
> >>      uint64_t index = raft->log_end++;
> >>      struct raft_entry *entry = &raft->entries[index -
raft->log_start];
> >>      entry->term = term;
> >> -    entry->data = data;
> >>      entry->eid = eid ? *eid : UUID_ZERO;
> >>      entry->servers = servers;
> >>      entry->election_timer = election_timer;
> >> +    raft_entry_set_data_nocopy(entry, data);
> >>      return index;
> >>  }
> >>
> >> @@ -741,13 +741,16 @@ raft_write_entry(struct raft *raft, uint64_t
term, struct json *data,
> >>                   const struct uuid *eid, struct json *servers,
> >>                   uint64_t election_timer)
> >>  {
> >> +    uint64_t index = raft_add_entry(raft, term, data, eid, servers,
> >> +                                    election_timer);
> >> +    const struct json *entry_data = raft_entry_get_serialized_data(
> >> +                                      &raft->entries[index -
raft->log_start]);
> >>      struct raft_record r = {
> >>          .type = RAFT_REC_ENTRY,
> >>          .term = term,
> >>          .entry = {
> >> -            .index = raft_add_entry(raft, term, data, eid, servers,
> >> -                                    election_timer),
> >> -            .data = data,
> >> +            .index = index,
> >> +            .data = CONST_CAST(struct json *, entry_data),
> >>              .servers = servers,
> >>              .election_timer = election_timer,
> >>              .eid = eid ? *eid : UUID_ZERO,
> >> @@ -2161,7 +2164,7 @@ raft_get_eid(const struct raft *raft, uint64_t
index)
> >>  {
> >>      for (; index >= raft->log_start; index--) {
> >>          const struct raft_entry *e = raft_get_entry(raft, index);
> >> -        if (e->data) {
> >> +        if (raft_entry_has_data(e)) {
> >>              return &e->eid;
> >>          }
> >>      }
> >> @@ -2826,8 +2829,8 @@ raft_truncate(struct raft *raft, uint64_t
new_end)
> >>      return servers_changed;
> >>  }
> >>
> >> -static const struct json *
> >> -raft_peek_next_entry(struct raft *raft, struct uuid *eid)
> >> +static const struct raft_entry *
> >> +raft_peek_next_entry(struct raft *raft)
> >>  {
> >>      /* Invariant: log_start - 2 <= last_applied <= commit_index <
log_end. */
> >>      ovs_assert(raft->log_start <= raft->last_applied + 2);
> >> @@ -2839,32 +2842,20 @@ raft_peek_next_entry(struct raft *raft, struct
uuid *eid)
> >>      }
> >>
> >>      if (raft->log_start == raft->last_applied + 2) {
> >> -        *eid = raft->snap.eid;
> >> -        return raft->snap.data;
> >> +        return &raft->snap;
> >>      }
> >>
> >>      while (raft->last_applied < raft->commit_index) {
> >>          const struct raft_entry *e = raft_get_entry(raft,
> >>
 raft->last_applied + 1);
> >> -        if (e->data) {
> >> -            *eid = e->eid;
> >> -            return e->data;
> >> +        if (raft_entry_has_data(e)) {
> >> +            return e;
> >>          }
> >>          raft->last_applied++;
> >>      }
> >>      return NULL;
> >>  }
> >>
> >> -static const struct json *
> >> -raft_get_next_entry(struct raft *raft, struct uuid *eid)
> >> -{
> >> -    const struct json *data = raft_peek_next_entry(raft, eid);
> >> -    if (data) {
> >> -        raft->last_applied++;
> >> -    }
> >> -    return data;
> >> -}
> >> -
> >>  /* Updates commit index in raft log. If commit index is already
up-to-date
> >>   * it does nothing and return false, otherwise, returns true. */
> >>  static bool
> >> @@ -2878,7 +2869,7 @@ raft_update_commit_index(struct raft *raft,
uint64_t new_commit_index)
> >>          while (raft->commit_index < new_commit_index) {
> >>              uint64_t index = ++raft->commit_index;
> >>              const struct raft_entry *e = raft_get_entry(raft, index);
> >> -            if (e->data) {
> >> +            if (raft_entry_has_data(e)) {
> >>                  struct raft_command *cmd
> >>                      = raft_find_command_by_eid(raft, &e->eid);
> >>                  if (cmd) {
> >> @@ -3059,7 +3050,8 @@ raft_handle_append_entries(struct raft *raft,
> >>      for (; i < n_entries; i++) {
> >>          const struct raft_entry *e = &entries[i];
> >>          error = raft_write_entry(raft, e->term,
> >> -                                 json_nullable_clone(e->data),
&e->eid,
> >> +
json_nullable_clone(raft_entry_get_data(e)),
> >> +                                 &e->eid,
> >>                                   json_nullable_clone(e->servers),
> >>                                   e->election_timer);
> >>          if (error) {
> >> @@ -3314,20 +3306,26 @@ bool
> >>  raft_has_next_entry(const struct raft *raft_)
> >>  {
> >>      struct raft *raft = CONST_CAST(struct raft *, raft_);
> >> -    struct uuid eid;
> >> -    return raft_peek_next_entry(raft, &eid) != NULL;
> >> +    return raft_peek_next_entry(raft) != NULL;
> >>  }
> >>
> >>  /* Returns the next log entry or snapshot from 'raft', or NULL if
there are
> >> - * none left to read.  Stores the entry ID of the log entry in
'*eid'.  Stores
> >> - * true in '*is_snapshot' if the returned data is a snapshot, false
if it is a
> >> - * log entry. */
> >> -const struct json *
> >> -raft_next_entry(struct raft *raft, struct uuid *eid, bool
*is_snapshot)
> >> + * none left to read.  Stores the entry ID of the log entry in '*eid'.
> >> + *
> >> + * The caller takes ownership of the result. */
> >> +struct json *
> >> +raft_next_entry(struct raft *raft, struct uuid *eid)
> >>  {
> >> -    const struct json *data = raft_get_next_entry(raft, eid);
> >> -    *is_snapshot = data == raft->snap.data;
> >> -    return data;
> >> +    const struct raft_entry *e = raft_peek_next_entry(raft);
> >> +
> >> +    if (!e) {
> >> +        return NULL;
> >> +    }
> >> +
> >> +    raft->last_applied++;
> >> +    *eid = e->eid;
> >> +
> >> +    return raft_entry_steal_parsed_data(CONST_CAST(struct raft_entry
*, e));
> >
> > It is better add comments here why steal the parsed data, i.e. the
considerations mentioned in the commit message. It would help readers to
get the bigger picture.
>
> OK.
>
> >
> >>  }
> >>
> >>  /* Returns the log index of the last-read snapshot or log entry. */
> >> @@ -3420,6 +3418,7 @@ raft_send_install_snapshot_request(struct raft
*raft,
> >>                                     const struct raft_server *s,
> >>                                     const char *comment)
> >>  {
> >> +    const struct json *data =
raft_entry_get_serialized_data(&raft->snap);
> >>      union raft_rpc rpc = {
> >>          .install_snapshot_request = {
> >>              .common = {
> >> @@ -3432,7 +3431,7 @@ raft_send_install_snapshot_request(struct raft
*raft,
> >>              .last_term = raft->snap.term,
> >>              .last_servers = raft->snap.servers,
> >>              .last_eid = raft->snap.eid,
> >> -            .data = raft->snap.data,
> >> +            .data = CONST_CAST(struct json *, data),
> >>              .election_timer = raft->election_timer, /* use latest
value */
> >>          }
> >>      };
> >> @@ -3998,12 +3997,13 @@ raft_write_snapshot(struct raft *raft, struct
ovsdb_log *log,
> >>      /* Write log records. */
> >>      for (uint64_t index = new_log_start; index < raft->log_end;
index++) {
> >>          const struct raft_entry *e = &raft->entries[index -
raft->log_start];
> >> +        const struct json *log_data =
raft_entry_get_serialized_data(e);
> >>          struct raft_record r = {
> >>              .type = RAFT_REC_ENTRY,
> >>              .term = e->term,
> >>              .entry = {
> >>                  .index = index,
> >> -                .data = e->data,
> >> +                .data = CONST_CAST(struct json *, log_data),
> >>                  .servers = e->servers,
> >>                  .election_timer = e->election_timer,
> >>                  .eid = e->eid,
> >> @@ -4093,19 +4093,23 @@ raft_handle_install_snapshot_request__(
> >>
> >>      /* Case 3: The new snapshot starts past the end of our current
log, so
> >>       * discard all of our current log. */
> >> -    const struct raft_entry new_snapshot = {
> >> +    struct raft_entry new_snapshot = {
> >>          .term = rq->last_term,
> >> -        .data = rq->data,
> >>          .eid = rq->last_eid,
> >> -        .servers = rq->last_servers,
> >> +        .servers = json_clone(rq->last_servers),
> >>          .election_timer = rq->election_timer,
> >>      };
> >> +    raft_entry_set_data(&new_snapshot, rq->data);
> >> +    /* Ensure that new snapshot contains serialized data object. */
> >> +    raft_entry_data_serialize(&new_snapshot);
> >> +
> >>      struct ovsdb_error *error = raft_save_snapshot(raft,
new_log_start,
> >>                                                     &new_snapshot);
> >>      if (error) {
> >>          char *error_s = ovsdb_error_to_string_free(error);
> >>          VLOG_WARN("could not save snapshot: %s", error_s);
> >>          free(error_s);
> >> +        raft_entry_uninit(&new_snapshot);
> >>          return false;
> >>      }
> >>
> >> @@ -4120,7 +4124,7 @@ raft_handle_install_snapshot_request__(
> >>      }
> >>
> >>      raft_entry_uninit(&raft->snap);
> >> -    raft_entry_clone(&raft->snap, &new_snapshot);
> >> +    raft->snap = new_snapshot;
> >>
> >>      raft_get_servers_from_log(raft, VLL_INFO);
> >>      raft_get_election_timer_from_log(raft);
> >> @@ -4265,11 +4269,14 @@ raft_store_snapshot(struct raft *raft, const
struct json *new_snapshot_data)
> >>      uint64_t new_log_start = raft->last_applied + 1;
> >>      struct raft_entry new_snapshot = {
> >>          .term = raft_get_term(raft, new_log_start - 1),
> >> -        .data = json_clone(new_snapshot_data),
> >>          .eid = *raft_get_eid(raft, new_log_start - 1),
> >>          .servers = json_clone(raft_servers_for_index(raft,
new_log_start - 1)),
> >>          .election_timer = raft->election_timer,
> >>      };
> >> +    raft_entry_set_data(&new_snapshot, new_snapshot_data);
> >> +    /* Ensure that new snapshot contains serialized data object. */
> >> +    raft_entry_data_serialize(&new_snapshot);
> >
> > The subsequent raft_entry_get_serialized_data() call would do the
serialization, so why call the serialize function explicitly here? Should
raft_entry_data_serialize() be just a static function? Or maybe just remove
the call to raft_entry_data_serialize() in the
raft_entry_get_serialized_data(), if we want to call
raft_entry_data_serialize() explicitly.
>
> Here is a tricky part.  raft_write_snapshot() does this:
>
>     struct raft_header h = {
>         .snap = *new_snapshot.
>     }
>
> And later this .snap will be serialized beofre writing on disk.
> If new_snapshot had no serialized data, it will be created and
> stored inside the .snap filed, and it will be leaked later when
> on-stack raft header 'h' will be destroyed.  We could replace
> this initilaization with a field-by-filed copy, but we will
> still need that serialized object to be in new_snapshot anyway,
> so it's better to make sure that it's serialized beforehand.
>
> I see that it's a bit confusing.  It might be better to remove
> the direct call to raft_entry_data_serialize() here and do
> something like this:
>
> static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> raft_write_snapshot(struct raft *raft, struct ovsdb_log *log,
>                     uint64_t new_log_start,
>                     const struct raft_entry *new_snapshot)
> {
>     /* Ensure that new snapshot contains serialized data object, so it
will
>      * not be allocated while serializing the on-stack raft header
object. */
>     ovs_assert(raft_entry_get_serialized_data(new_snapshot));
>
>     struct raft_header h = {
>         .sid = raft->sid,
>         .cid = raft->cid,
>         .name = raft->name,
>         .local_address = raft->local_address,
>         .snap_index = new_log_start - 1,
>         .snap = *new_snapshot,
>     };
>     ...
>
> What do you think?
>

OK, thanks for explaining. The new approach looks better, and the comment
does help for understanding the tricks behind. So consider my ACK for this:

Acked-by: Han Zhou <[email protected]>

> >
> > Thanks,
> > Han
> >
> >
> >> +
> >>      struct ovsdb_error *error = raft_save_snapshot(raft,
new_log_start,
> >>                                                     &new_snapshot);
> >>      if (error) {
> >> @@ -4286,6 +4293,9 @@ raft_store_snapshot(struct raft *raft, const
struct json *new_snapshot_data)
> >>      memmove(&raft->entries[0], &raft->entries[new_log_start -
raft->log_start],
> >>              (raft->log_end - new_log_start) * sizeof *raft->entries);
> >>      raft->log_start = new_log_start;
> >> +    /* It's a snapshot of a current database state, ovsdb-server will
not
> >> +     * read it back.  Destroying the parsed json object to not waste
memory. */
> >> +    json_destroy(raft_entry_steal_parsed_data(&raft->snap));
> >>      return NULL;
> >>  }
> >>
> >> diff --git a/ovsdb/raft.h b/ovsdb/raft.h
> >> index 3545c41c2..710862f94 100644
> >> --- a/ovsdb/raft.h
> >> +++ b/ovsdb/raft.h
> >> @@ -132,8 +132,7 @@ bool raft_left(const struct raft *);
> >>  bool raft_failed(const struct raft *);
> >>
> >>  /* Reading snapshots and log entries. */
> >> -const struct json *raft_next_entry(struct raft *, struct uuid *eid,
> >> -                                   bool *is_snapshot);
> >> +struct json *raft_next_entry(struct raft *, struct uuid *eid);
> >>  bool raft_has_next_entry(const struct raft *);
> >>
> >>  uint64_t raft_get_applied_index(const struct raft *);
> >> diff --git a/ovsdb/storage.c b/ovsdb/storage.c
> >> index d727b1eac..9e32efe58 100644
> >> --- a/ovsdb/storage.c
> >> +++ b/ovsdb/storage.c
> >> @@ -268,9 +268,7 @@ ovsdb_storage_read(struct ovsdb_storage *storage,
> >>      struct json *schema_json = NULL;
> >>      struct json *txn_json = NULL;
> >>      if (storage->raft) {
> >> -        bool is_snapshot;
> >> -        json = json_nullable_clone(
> >> -            raft_next_entry(storage->raft, txnid, &is_snapshot));
> >> +        json = raft_next_entry(storage->raft, txnid);
> >>          if (!json) {
> >>              return NULL;
> >>          } else if (json->type != JSON_ARRAY || json->array.n != 2) {
> >> --
> >> 2.31.1
> >>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to