On Fri, Sep 16, 2022 at 11:59 AM Numan Siddique <[email protected]> wrote:
>
> On Tue, Sep 13, 2022 at 7:34 AM Ilya Maximets <[email protected]> wrote:
> >
> > Every time a new logical flow is created, ovn-northd creates an
> > exclusive hash map to collect all the datapaths this logical flow is
> > applicable to. While adding a new datapath to existing logical flow
> > hmapx will perform a lookup by hash and then add a new entry by
> > allocating a piece of memory and putting it into a hash map. When
> > all the logical flows are created and all the datapath groups are
> > ready, northd is collecting all the unique groups by comparing them
> > to each other, destroying duplicates and replacing them with a pointer
> > to a one true group. After that, northd is analyzing all the existing
> > flows in the Sb DB and comparing their groups to ones that the flow
> > should have. At the end all the remaining exclusive hash maps
> > containing datapath groups are destroyed.
> >
> > The process is fairly inefficient. The number of unique datapath
> > groups is typically much smaller than the number of logical flows.
> > So, in high-scale environments we may end up generating hundreds
> > of thousands equal groups just to discard them later.
> > Every element in a group is a separately allocated piece of memory
> > added to a hash map. This imposes significant cost of allocating
> > them and freeing later. Comparison of 2 hash maps to check for
> > equality involves looking up every element of one hash map in another,
> > which is not a very fast operation either. Also hash maps are
> > inherently cache-unfriendly, so every access during iteration
> > likely results in a cache miss.
> >
> > As a result, all these datapath group manipulations end up taking
> > most of the processing time in northd.
> >
> > Let's enumerate all the logical datapaths we have assigning a
> > unique index from zero to the total number of datapaths. And
> > create an array of datapaths for a quick access by their index.
> > Having that, instead of collecting datapath pointers in hash
> > maps, we can use bitmaps instead and turn on bits that correspond
> > to datapaths relevant to particular group. Later we can reconstruct
> > the group by iterating over a bitmap and taking required datapaths
> > from the array.
> >
> > Advantages:
> >
> > - Bitmaps are relatively small. It will be only 125 bytes to
> > accommodate a setup with 1000 datapaths. At the same time hmapx
> > with just one element will end up with 2 separate memory allocations
> > to allocate at least 4 buckets and a hmapx_node. In total that can
> > be less than a bitmap with only one datapath, but it will be only
> > 2-3x less and comparison becomes irrelevant as soon as there is at
> > least one big datapath group.
> >
> > - Bitmaps are trivial to compare. Simple memcmp does the trick.
> > And memcmp will have one the most optimized implementations for
> > the actual hardware in use.
> >
> > - Flipping bits or checking if a certain bit is set in constant time.
> > Asymptotically same as in hash maps, but the constant is much lower.
> >
> > - Cache-friendly. Depending on a system and the scale, the whole map
> > can fit into a single cache line or two. All the data is sequential
> > in memory.
> >
> > - Reduced number of memory allocations/deallocations. Bitmap is
> > allocated and freed only once as a single piece. hmapx allocates
> > memory for every datapath.
> >
> > Disadvantage:
> >
> > - Counting number of bits in a bitmap is a bit heavier than just
> > getting hmapx_count(), but we're not doing that very frequently,
> > so it doesn't seem to impact performance in any meaningful way.
> >
> > Testing with ovn-heater shows up to 3x performance improvement for
> > 500-node density-heavy scenario. In this case northd poll intervals
> > went down from 10 to 3 seconds, driving the test maximum latency from
> > 21 down to 6.5 seconds. And overall test time went down from 2400 to
> > 720 seconds. Memory consumption for the northd process also reduced
> > from 3.9 GB to 1.6 GB RSS.
> >
> > One design consideration was to make 'datapaths_array' and the
> > 'n_dataptahs' variables global. Logically, from the perspective of
> > the I-P engine, they should be part of the northd-generated 'data'.
> > And it's not a problem for the array itself, since it is only used in
> > a couple of places. However, 'n_datapaths' is used practically
> > everywhere, including ovn_lflow_add(), so we'll have to pass it around
> > in most of the functions in northd as an argument. Taking that into
> > account, I decided to make them both global to avoid touching half of
> > the northd code.
> > We also have a couple of other global variables already (mainly for
> > parallel processing) that should technically be part of the I-P data,
> > but it is way simpler to keep them global.
> >
> > Signed-off-by: Ilya Maximets <[email protected]>
>
> Thanks Ilya for these huge improvements.
>
> The patch LGTM.
>
> Acked-by: Numan Siddique <[email protected]>
@Mark Michelson @Ilya Maximets If you want to take a look at this
patch before the 22.09 release that would be great.
Numan
>
> Numan
>
> > ---
> > northd/northd.c | 158 ++++++++++++++++++++++++++----------------------
> > northd/northd.h | 3 +
> > 2 files changed, 89 insertions(+), 72 deletions(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 346c0c7c8..eab62b8fc 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -1358,6 +1358,10 @@ ovn_datapath_assign_requested_tnl_id(struct
> > northd_input *input_data,
> > }
> > }
> >
> > +/* Array of all datapaths, with 'od->index' being their index in the
> > array. */
> > +static struct ovn_datapath **datapaths_array = NULL;
> > +static size_t n_datapaths = 0; /* Size of the 'datapaths_array'. */
> > +
> > /* Updates the southbound Datapath_Binding table so that it contains the
> > * logical switches and routers specified by the northbound database.
> > *
> > @@ -1422,6 +1426,19 @@ build_datapaths(struct northd_input *input_data,
> > sbrec_datapath_binding_delete(od->sb);
> > ovn_datapath_destroy(datapaths, od);
> > }
> > +
> > + /* Assign unique sequential indexes to all datapaths. These are not
> > + * visible outside of the northd loop, so, unlike the tunnel keys, it
> > + * doesn't matter if they are different on every iteration. */
> > + size_t index = 0;
> > +
> > + n_datapaths = hmap_count(datapaths);
> > + datapaths_array = xrealloc(datapaths_array,
> > + n_datapaths * sizeof *datapaths_array);
> > + HMAP_FOR_EACH (od, key_node, datapaths) {
> > + od->index = index;
> > + datapaths_array[index++] = od;
> > + }
> > }
> >
> > /* Structure representing logical router port
> > @@ -4134,19 +4151,19 @@ build_lb_port_related_data(struct hmap *datapaths,
> > struct hmap *ports,
> >
> >
> > struct ovn_dp_group {
> > - struct hmapx map;
> > + unsigned long *bitmap;
> > struct sbrec_logical_dp_group *dp_group;
> > struct hmap_node node;
> > };
> >
> > static struct ovn_dp_group *
> > ovn_dp_group_find(const struct hmap *dp_groups,
> > - const struct hmapx *od, uint32_t hash)
> > + const unsigned long *dpg_bitmap, uint32_t hash)
> > {
> > struct ovn_dp_group *dpg;
> >
> > HMAP_FOR_EACH_WITH_HASH (dpg, node, hash, dp_groups) {
> > - if (hmapx_equals(&dpg->map, od)) {
> > + if (bitmap_equal(dpg->bitmap, dpg_bitmap, n_datapaths)) {
> > return dpg;
> > }
> > }
> > @@ -4155,16 +4172,15 @@ ovn_dp_group_find(const struct hmap *dp_groups,
> >
> > static struct sbrec_logical_dp_group *
> > ovn_sb_insert_logical_dp_group(struct ovsdb_idl_txn *ovnsb_txn,
> > - const struct hmapx *od)
> > + const unsigned long *dpg_bitmap)
> > {
> > struct sbrec_logical_dp_group *dp_group;
> > const struct sbrec_datapath_binding **sb;
> > - const struct hmapx_node *node;
> > - int n = 0;
> > + size_t n = 0, index;
> >
> > - sb = xmalloc(hmapx_count(od) * sizeof *sb);
> > - HMAPX_FOR_EACH (node, od) {
> > - sb[n++] = ((struct ovn_datapath *) node->data)->sb;
> > + sb = xmalloc(bitmap_count1(dpg_bitmap, n_datapaths) * sizeof *sb);
> > + BITMAP_FOR_EACH_1 (index, n_datapaths, dpg_bitmap) {
> > + sb[n++] = datapaths_array[index]->sb;
> > }
> > dp_group = sbrec_logical_dp_group_insert(ovnsb_txn);
> > sbrec_logical_dp_group_set_datapaths(
> > @@ -4223,9 +4239,9 @@ sync_lbs(struct northd_input *input_data, struct
> > ovsdb_idl_txn *ovnsb_txn,
> > }
> >
> > struct ovn_dp_group *dpg = xzalloc(sizeof *dpg);
> > - size_t i;
> > + size_t i, n = 0;
> >
> > - hmapx_init(&dpg->map);
> > + dpg->bitmap = bitmap_allocate(n_datapaths);
> > for (i = 0; i < dp_group->n_datapaths; i++) {
> > struct ovn_datapath *datapath_od;
> >
> > @@ -4234,18 +4250,19 @@ sync_lbs(struct northd_input *input_data, struct
> > ovsdb_idl_txn *ovnsb_txn,
> > if (!datapath_od || ovn_datapath_is_stale(datapath_od)) {
> > break;
> > }
> > - hmapx_add(&dpg->map, datapath_od);
> > + bitmap_set1(dpg->bitmap, datapath_od->index);
> > + n++;
> > }
> > if (i == dp_group->n_datapaths) {
> > - uint32_t hash = hash_int(hmapx_count(&dpg->map), 0);
> > + uint32_t hash = hash_int(n, 0);
> >
> > - if (!ovn_dp_group_find(&dp_groups, &dpg->map, hash)) {
> > + if (!ovn_dp_group_find(&dp_groups, dpg->bitmap, hash)) {
> > dpg->dp_group = dp_group;
> > hmap_insert(&dp_groups, &dpg->node, hash);
> > continue;
> > }
> > }
> > - hmapx_destroy(&dpg->map);
> > + bitmap_free(dpg->bitmap);
> > free(dpg);
> > }
> > hmapx_destroy(&existing_lbs);
> > @@ -4278,23 +4295,25 @@ sync_lbs(struct northd_input *input_data, struct
> > ovsdb_idl_txn *ovnsb_txn,
> > }
> >
> > /* Find datapath group for this load balancer. */
> > - struct hmapx lb_dps = HMAPX_INITIALIZER(&lb_dps);
> > + unsigned long *lb_dps_bitmap;
> > struct ovn_dp_group *dpg;
> > uint32_t hash;
> >
> > + lb_dps_bitmap = bitmap_allocate(n_datapaths);
> > for (size_t i = 0; i < lb->n_nb_ls; i++) {
> > - hmapx_add(&lb_dps, lb->nb_ls[i]);
> > + bitmap_set1(lb_dps_bitmap, lb->nb_ls[i]->index);
> > }
> >
> > - hash = hash_int(hmapx_count(&lb_dps), 0);
> > - dpg = ovn_dp_group_find(&dp_groups, &lb_dps, hash);
> > + hash = hash_int(bitmap_count1(lb_dps_bitmap, n_datapaths), 0);
> > + dpg = ovn_dp_group_find(&dp_groups, lb_dps_bitmap, hash);
> > if (!dpg) {
> > dpg = xzalloc(sizeof *dpg);
> > - dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
> > &lb_dps);
> > - hmapx_clone(&dpg->map, &lb_dps);
> > + dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
> > + lb_dps_bitmap);
> > + dpg->bitmap = bitmap_clone(lb_dps_bitmap, n_datapaths);
> > hmap_insert(&dp_groups, &dpg->node, hash);
> > }
> > - hmapx_destroy(&lb_dps);
> > + bitmap_free(lb_dps_bitmap);
> >
> > /* Update columns. */
> > sbrec_load_balancer_set_name(lb->slb, lb->nlb->name);
> > @@ -4309,7 +4328,7 @@ sync_lbs(struct northd_input *input_data, struct
> > ovsdb_idl_txn *ovnsb_txn,
> >
> > struct ovn_dp_group *dpg;
> > HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
> > - hmapx_destroy(&dpg->map);
> > + bitmap_free(dpg->bitmap);
> > free(dpg);
> > }
> > hmap_destroy(&dp_groups);
> > @@ -4863,8 +4882,8 @@ struct ovn_lflow {
> > struct hmap_node hmap_node;
> >
> > struct ovn_datapath *od; /* 'logical_datapath' in SB schema. */
> > - struct ovs_mutex odg_lock; /* Lock guarding access to od_group */
> > - struct hmapx od_group; /* Hash map of 'struct ovn_datapath *'. */
> > + struct ovs_mutex dpg_lock; /* Lock guarding access to 'dpg_bitmap' */
> > + unsigned long *dpg_bitmap; /* Bitmap of all datapaths by their
> > 'index'.*/
> > enum ovn_stage stage;
> > uint16_t priority;
> > char *match;
> > @@ -4919,7 +4938,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct
> > ovn_datapath *od,
> > char *match, char *actions, char *io_port, char *ctrl_meter,
> > char *stage_hint, const char *where)
> > {
> > - hmapx_init(&lflow->od_group);
> > + lflow->dpg_bitmap = bitmap_allocate(n_datapaths);
> > lflow->od = od;
> > lflow->stage = stage;
> > lflow->priority = priority;
> > @@ -4931,7 +4950,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct
> > ovn_datapath *od,
> > lflow->dpg = NULL;
> > lflow->where = where;
> > if (parallelization_state != STATE_NULL) {
> > - ovs_mutex_init(&lflow->odg_lock);
> > + ovs_mutex_init(&lflow->dpg_lock);
> > }
> > }
> >
> > @@ -4945,11 +4964,11 @@ ovn_dp_group_add_with_reference(struct ovn_lflow
> > *lflow_ref,
> > }
> >
> > if (parallelization_state == STATE_USE_PARALLELIZATION) {
> > - ovs_mutex_lock(&lflow_ref->odg_lock);
> > - hmapx_add(&lflow_ref->od_group, od);
> > - ovs_mutex_unlock(&lflow_ref->odg_lock);
> > + ovs_mutex_lock(&lflow_ref->dpg_lock);
> > + bitmap_set1(lflow_ref->dpg_bitmap, od->index);
> > + ovs_mutex_unlock(&lflow_ref->dpg_lock);
> > } else {
> > - hmapx_add(&lflow_ref->od_group, od);
> > + bitmap_set1(lflow_ref->dpg_bitmap, od->index);
> > }
> >
> > return true;
> > @@ -5035,7 +5054,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct
> > ovn_datapath *od,
> > io_port ? xstrdup(io_port) : NULL,
> > nullable_xstrdup(ctrl_meter),
> > ovn_lflow_hint(stage_hint), where);
> > - hmapx_add(&lflow->od_group, od);
> > + bitmap_set1(lflow->dpg_bitmap, od->index);
> > if (parallelization_state != STATE_USE_PARALLELIZATION) {
> > hmap_insert(lflow_map, &lflow->hmap_node, hash);
> > } else {
> > @@ -5166,12 +5185,12 @@ ovn_lflow_destroy(struct hmap *lflows, struct
> > ovn_lflow *lflow)
> > {
> > if (lflow) {
> > if (parallelization_state != STATE_NULL) {
> > - ovs_mutex_destroy(&lflow->odg_lock);
> > + ovs_mutex_destroy(&lflow->dpg_lock);
> > }
> > if (lflows) {
> > hmap_remove(lflows, &lflow->hmap_node);
> > }
> > - hmapx_destroy(&lflow->od_group);
> > + bitmap_free(lflow->dpg_bitmap);
> > free(lflow->match);
> > free(lflow->actions);
> > free(lflow->io_port);
> > @@ -14172,23 +14191,25 @@ ovn_sb_set_lflow_logical_dp_group(
> > struct ovsdb_idl_txn *ovnsb_txn,
> > struct hmap *dp_groups,
> > const struct sbrec_logical_flow *sbflow,
> > - const struct hmapx *od_group)
> > + const unsigned long *dpg_bitmap)
> > {
> > struct ovn_dp_group *dpg;
> > + size_t n_ods;
> >
> > - if (!hmapx_count(od_group)) {
> > + n_ods = bitmap_count1(dpg_bitmap, n_datapaths);
> > +
> > + if (!n_ods) {
> > sbrec_logical_flow_set_logical_dp_group(sbflow, NULL);
> > return;
> > }
> >
> > - ovs_assert(hmapx_count(od_group) != 1);
> > + ovs_assert(n_ods != 1);
> >
> > - dpg = ovn_dp_group_find(dp_groups, od_group,
> > - hash_int(hmapx_count(od_group), 0));
> > + dpg = ovn_dp_group_find(dp_groups, dpg_bitmap, hash_int(n_ods, 0));
> > ovs_assert(dpg != NULL);
> >
> > if (!dpg->dp_group) {
> > - dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
> > &dpg->map);
> > + dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn,
> > dpg->bitmap);
> > }
> > sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group);
> > }
> > @@ -14277,21 +14298,22 @@ void build_lflows(struct lflow_input *input_data,
> > fast_hmap_size_for(&single_dp_lflows, max_seen_lflow_size);
> >
> > struct ovn_lflow *lflow;
> > - struct hmapx_node *node;
> > HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflows) {
> > - uint32_t hash;
> > struct ovn_dp_group *dpg;
> > + uint32_t hash, n_ods;
> >
> > - ovs_assert(hmapx_count(&lflow->od_group));
> > + n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths);
> >
> > - if (hmapx_count(&lflow->od_group) == 1) {
> > + ovs_assert(n_ods);
> > +
> > + if (n_ods == 1) {
> > /* There is only one datapath, so it should be moved out of the
> > * group to a single 'od'. */
> > - HMAPX_FOR_EACH (node, &lflow->od_group) {
> > - lflow->od = node->data;
> > - break;
> > - }
> > - hmapx_clear(&lflow->od_group);
> > + size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0,
> > + n_datapaths);
> > +
> > + bitmap_set0(lflow->dpg_bitmap, index);
> > + lflow->od = datapaths_array[index];
> >
> > /* Logical flow should be re-hashed to allow lookups. */
> > hash = hmap_node_hash(&lflow->hmap_node);
> > @@ -14304,11 +14326,11 @@ void build_lflows(struct lflow_input *input_data,
> > continue;
> > }
> >
> > - hash = hash_int(hmapx_count(&lflow->od_group), 0);
> > - dpg = ovn_dp_group_find(&dp_groups, &lflow->od_group, hash);
> > + hash = hash_int(n_ods, 0);
> > + dpg = ovn_dp_group_find(&dp_groups, lflow->dpg_bitmap, hash);
> > if (!dpg) {
> > dpg = xzalloc(sizeof *dpg);
> > - hmapx_clone(&dpg->map, &lflow->od_group);
> > + dpg->bitmap = bitmap_clone(lflow->dpg_bitmap, n_datapaths);
> > hmap_insert(&dp_groups, &dpg->node, hash);
> > }
> > lflow->dpg = dpg;
> > @@ -14409,36 +14431,28 @@ void build_lflows(struct lflow_input *input_data,
> > } else {
> > /* There is a datapath group and we need to perform
> > * a full comparison. */
> > - struct ovn_datapath **od;
> > - int n_datapaths = 0;
> > + unsigned long *dpg_bitmap;
> > + struct ovn_datapath *od;
> >
> > - od = xmalloc(dp_group->n_datapaths * sizeof *od);
> > + dpg_bitmap = bitmap_allocate(n_datapaths);
> > /* Check all logical datapaths from the group. */
> > for (i = 0; i < dp_group->n_datapaths; i++) {
> > - od[n_datapaths] = ovn_datapath_from_sbrec(
> > + od = ovn_datapath_from_sbrec(
> > input_data->datapaths, dp_group->datapaths[i]);
> > - if (!od[n_datapaths]
> > - || ovn_datapath_is_stale(od[n_datapaths])) {
> > + if (!od || ovn_datapath_is_stale(od)) {
> > continue;
> > }
> > - n_datapaths++;
> > + bitmap_set1(dpg_bitmap, od->index);
> > }
> >
> > - if (n_datapaths != hmapx_count(&lflow->od_group)) {
> > - update_dp_group = true;
> > - }
> > - /* Have to compare datapath groups in full. */
> > - for (i = 0; !update_dp_group && i < n_datapaths; i++) {
> > - if (od[i] && !hmapx_contains(&lflow->od_group, od[i]))
> > {
> > - update_dp_group = true;
> > - }
> > - }
> > - free(od);
> > + update_dp_group = !bitmap_equal(dpg_bitmap,
> > lflow->dpg_bitmap,
> > + n_datapaths);
> > + bitmap_free(dpg_bitmap);
> > }
> >
> > if (update_dp_group) {
> > ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
> > - sbflow,
> > &lflow->od_group);
> > + sbflow,
> > lflow->dpg_bitmap);
> > } else if (lflow->dpg && !lflow->dpg->dp_group) {
> > /* Setting relation between unique datapath group and
> > * Sb DB datapath goup. */
> > @@ -14462,7 +14476,7 @@ void build_lflows(struct lflow_input *input_data,
> > sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb);
> > }
> > ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups,
> > - sbflow, &lflow->od_group);
> > + sbflow, lflow->dpg_bitmap);
> > sbrec_logical_flow_set_pipeline(sbflow, pipeline);
> > sbrec_logical_flow_set_table_id(sbflow, table);
> > sbrec_logical_flow_set_priority(sbflow, lflow->priority);
> > @@ -14503,7 +14517,7 @@ void build_lflows(struct lflow_input *input_data,
> >
> > struct ovn_dp_group *dpg;
> > HMAP_FOR_EACH_POP (dpg, node, &dp_groups) {
> > - hmapx_destroy(&dpg->map);
> > + bitmap_free(dpg->bitmap);
> > free(dpg);
> > }
> > hmap_destroy(&dp_groups);
> > diff --git a/northd/northd.h b/northd/northd.h
> > index 8d299864f..fb903102b 100644
> > --- a/northd/northd.h
> > +++ b/northd/northd.h
> > @@ -180,6 +180,9 @@ struct ovn_datapath {
> > struct hmap_node key_node; /* Index on 'key'. */
> > struct uuid key; /* (nbs/nbr)->header_.uuid. */
> >
> > + size_t index; /* A unique index across all datapaths.
> > + * Datapath indexes are sequential and start from
> > zero. */
> > +
> > const struct nbrec_logical_switch *nbs; /* May be NULL. */
> > const struct nbrec_logical_router *nbr; /* May be NULL. */
> > const struct sbrec_datapath_binding *sb; /* May be NULL. */
> > --
> > 2.34.3
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev