On Tue, Sep 13, 2022 at 7:34 AM Ilya Maximets <[email protected]> wrote: > > Every time a new logical flow is created, ovn-northd creates an > exclusive hash map to collect all the datapaths this logical flow is > applicable to. While adding a new datapath to existing logical flow > hmapx will perform a lookup by hash and then add a new entry by > allocating a piece of memory and putting it into a hash map. When > all the logical flows are created and all the datapath groups are > ready, northd is collecting all the unique groups by comparing them > to each other, destroying duplicates and replacing them with a pointer > to a one true group. After that, northd is analyzing all the existing > flows in the Sb DB and comparing their groups to ones that the flow > should have. At the end all the remaining exclusive hash maps > containing datapath groups are destroyed. > > The process is fairly inefficient. The number of unique datapath > groups is typically much smaller than the number of logical flows. > So, in high-scale environments we may end up generating hundreds > of thousands equal groups just to discard them later. > Every element in a group is a separately allocated piece of memory > added to a hash map. This imposes significant cost of allocating > them and freeing later. Comparison of 2 hash maps to check for > equality involves looking up every element of one hash map in another, > which is not a very fast operation either. Also hash maps are > inherently cache-unfriendly, so every access during iteration > likely results in a cache miss. > > As a result, all these datapath group manipulations end up taking > most of the processing time in northd. > > Let's enumerate all the logical datapaths we have assigning a > unique index from zero to the total number of datapaths. And > create an array of datapaths for a quick access by their index. > Having that, instead of collecting datapath pointers in hash > maps, we can use bitmaps instead and turn on bits that correspond > to datapaths relevant to particular group. Later we can reconstruct > the group by iterating over a bitmap and taking required datapaths > from the array. > > Advantages: > > - Bitmaps are relatively small. It will be only 125 bytes to > accommodate a setup with 1000 datapaths. At the same time hmapx > with just one element will end up with 2 separate memory allocations > to allocate at least 4 buckets and a hmapx_node. In total that can > be less than a bitmap with only one datapath, but it will be only > 2-3x less and comparison becomes irrelevant as soon as there is at > least one big datapath group. > > - Bitmaps are trivial to compare. Simple memcmp does the trick. > And memcmp will have one the most optimized implementations for > the actual hardware in use. > > - Flipping bits or checking if a certain bit is set in constant time. > Asymptotically same as in hash maps, but the constant is much lower. > > - Cache-friendly. Depending on a system and the scale, the whole map > can fit into a single cache line or two. All the data is sequential > in memory. > > - Reduced number of memory allocations/deallocations. Bitmap is > allocated and freed only once as a single piece. hmapx allocates > memory for every datapath. > > Disadvantage: > > - Counting number of bits in a bitmap is a bit heavier than just > getting hmapx_count(), but we're not doing that very frequently, > so it doesn't seem to impact performance in any meaningful way. > > Testing with ovn-heater shows up to 3x performance improvement for > 500-node density-heavy scenario. In this case northd poll intervals > went down from 10 to 3 seconds, driving the test maximum latency from > 21 down to 6.5 seconds. And overall test time went down from 2400 to > 720 seconds. Memory consumption for the northd process also reduced > from 3.9 GB to 1.6 GB RSS. > > One design consideration was to make 'datapaths_array' and the > 'n_dataptahs' variables global. Logically, from the perspective of > the I-P engine, they should be part of the northd-generated 'data'. > And it's not a problem for the array itself, since it is only used in > a couple of places. However, 'n_datapaths' is used practically > everywhere, including ovn_lflow_add(), so we'll have to pass it around > in most of the functions in northd as an argument. Taking that into > account, I decided to make them both global to avoid touching half of > the northd code. > We also have a couple of other global variables already (mainly for > parallel processing) that should technically be part of the I-P data, > but it is way simpler to keep them global. > > Signed-off-by: Ilya Maximets <[email protected]>
Thanks Ilya for these huge improvements. The patch LGTM. Acked-by: Numan Siddique <[email protected]> Numan > --- > northd/northd.c | 158 ++++++++++++++++++++++++++---------------------- > northd/northd.h | 3 + > 2 files changed, 89 insertions(+), 72 deletions(-) > > diff --git a/northd/northd.c b/northd/northd.c > index 346c0c7c8..eab62b8fc 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -1358,6 +1358,10 @@ ovn_datapath_assign_requested_tnl_id(struct > northd_input *input_data, > } > } > > +/* Array of all datapaths, with 'od->index' being their index in the array. > */ > +static struct ovn_datapath **datapaths_array = NULL; > +static size_t n_datapaths = 0; /* Size of the 'datapaths_array'. */ > + > /* Updates the southbound Datapath_Binding table so that it contains the > * logical switches and routers specified by the northbound database. > * > @@ -1422,6 +1426,19 @@ build_datapaths(struct northd_input *input_data, > sbrec_datapath_binding_delete(od->sb); > ovn_datapath_destroy(datapaths, od); > } > + > + /* Assign unique sequential indexes to all datapaths. These are not > + * visible outside of the northd loop, so, unlike the tunnel keys, it > + * doesn't matter if they are different on every iteration. */ > + size_t index = 0; > + > + n_datapaths = hmap_count(datapaths); > + datapaths_array = xrealloc(datapaths_array, > + n_datapaths * sizeof *datapaths_array); > + HMAP_FOR_EACH (od, key_node, datapaths) { > + od->index = index; > + datapaths_array[index++] = od; > + } > } > > /* Structure representing logical router port > @@ -4134,19 +4151,19 @@ build_lb_port_related_data(struct hmap *datapaths, > struct hmap *ports, > > > struct ovn_dp_group { > - struct hmapx map; > + unsigned long *bitmap; > struct sbrec_logical_dp_group *dp_group; > struct hmap_node node; > }; > > static struct ovn_dp_group * > ovn_dp_group_find(const struct hmap *dp_groups, > - const struct hmapx *od, uint32_t hash) > + const unsigned long *dpg_bitmap, uint32_t hash) > { > struct ovn_dp_group *dpg; > > HMAP_FOR_EACH_WITH_HASH (dpg, node, hash, dp_groups) { > - if (hmapx_equals(&dpg->map, od)) { > + if (bitmap_equal(dpg->bitmap, dpg_bitmap, n_datapaths)) { > return dpg; > } > } > @@ -4155,16 +4172,15 @@ ovn_dp_group_find(const struct hmap *dp_groups, > > static struct sbrec_logical_dp_group * > ovn_sb_insert_logical_dp_group(struct ovsdb_idl_txn *ovnsb_txn, > - const struct hmapx *od) > + const unsigned long *dpg_bitmap) > { > struct sbrec_logical_dp_group *dp_group; > const struct sbrec_datapath_binding **sb; > - const struct hmapx_node *node; > - int n = 0; > + size_t n = 0, index; > > - sb = xmalloc(hmapx_count(od) * sizeof *sb); > - HMAPX_FOR_EACH (node, od) { > - sb[n++] = ((struct ovn_datapath *) node->data)->sb; > + sb = xmalloc(bitmap_count1(dpg_bitmap, n_datapaths) * sizeof *sb); > + BITMAP_FOR_EACH_1 (index, n_datapaths, dpg_bitmap) { > + sb[n++] = datapaths_array[index]->sb; > } > dp_group = sbrec_logical_dp_group_insert(ovnsb_txn); > sbrec_logical_dp_group_set_datapaths( > @@ -4223,9 +4239,9 @@ sync_lbs(struct northd_input *input_data, struct > ovsdb_idl_txn *ovnsb_txn, > } > > struct ovn_dp_group *dpg = xzalloc(sizeof *dpg); > - size_t i; > + size_t i, n = 0; > > - hmapx_init(&dpg->map); > + dpg->bitmap = bitmap_allocate(n_datapaths); > for (i = 0; i < dp_group->n_datapaths; i++) { > struct ovn_datapath *datapath_od; > > @@ -4234,18 +4250,19 @@ sync_lbs(struct northd_input *input_data, struct > ovsdb_idl_txn *ovnsb_txn, > if (!datapath_od || ovn_datapath_is_stale(datapath_od)) { > break; > } > - hmapx_add(&dpg->map, datapath_od); > + bitmap_set1(dpg->bitmap, datapath_od->index); > + n++; > } > if (i == dp_group->n_datapaths) { > - uint32_t hash = hash_int(hmapx_count(&dpg->map), 0); > + uint32_t hash = hash_int(n, 0); > > - if (!ovn_dp_group_find(&dp_groups, &dpg->map, hash)) { > + if (!ovn_dp_group_find(&dp_groups, dpg->bitmap, hash)) { > dpg->dp_group = dp_group; > hmap_insert(&dp_groups, &dpg->node, hash); > continue; > } > } > - hmapx_destroy(&dpg->map); > + bitmap_free(dpg->bitmap); > free(dpg); > } > hmapx_destroy(&existing_lbs); > @@ -4278,23 +4295,25 @@ sync_lbs(struct northd_input *input_data, struct > ovsdb_idl_txn *ovnsb_txn, > } > > /* Find datapath group for this load balancer. */ > - struct hmapx lb_dps = HMAPX_INITIALIZER(&lb_dps); > + unsigned long *lb_dps_bitmap; > struct ovn_dp_group *dpg; > uint32_t hash; > > + lb_dps_bitmap = bitmap_allocate(n_datapaths); > for (size_t i = 0; i < lb->n_nb_ls; i++) { > - hmapx_add(&lb_dps, lb->nb_ls[i]); > + bitmap_set1(lb_dps_bitmap, lb->nb_ls[i]->index); > } > > - hash = hash_int(hmapx_count(&lb_dps), 0); > - dpg = ovn_dp_group_find(&dp_groups, &lb_dps, hash); > + hash = hash_int(bitmap_count1(lb_dps_bitmap, n_datapaths), 0); > + dpg = ovn_dp_group_find(&dp_groups, lb_dps_bitmap, hash); > if (!dpg) { > dpg = xzalloc(sizeof *dpg); > - dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, > &lb_dps); > - hmapx_clone(&dpg->map, &lb_dps); > + dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, > + lb_dps_bitmap); > + dpg->bitmap = bitmap_clone(lb_dps_bitmap, n_datapaths); > hmap_insert(&dp_groups, &dpg->node, hash); > } > - hmapx_destroy(&lb_dps); > + bitmap_free(lb_dps_bitmap); > > /* Update columns. */ > sbrec_load_balancer_set_name(lb->slb, lb->nlb->name); > @@ -4309,7 +4328,7 @@ sync_lbs(struct northd_input *input_data, struct > ovsdb_idl_txn *ovnsb_txn, > > struct ovn_dp_group *dpg; > HMAP_FOR_EACH_POP (dpg, node, &dp_groups) { > - hmapx_destroy(&dpg->map); > + bitmap_free(dpg->bitmap); > free(dpg); > } > hmap_destroy(&dp_groups); > @@ -4863,8 +4882,8 @@ struct ovn_lflow { > struct hmap_node hmap_node; > > struct ovn_datapath *od; /* 'logical_datapath' in SB schema. */ > - struct ovs_mutex odg_lock; /* Lock guarding access to od_group */ > - struct hmapx od_group; /* Hash map of 'struct ovn_datapath *'. */ > + struct ovs_mutex dpg_lock; /* Lock guarding access to 'dpg_bitmap' */ > + unsigned long *dpg_bitmap; /* Bitmap of all datapaths by their > 'index'.*/ > enum ovn_stage stage; > uint16_t priority; > char *match; > @@ -4919,7 +4938,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct > ovn_datapath *od, > char *match, char *actions, char *io_port, char *ctrl_meter, > char *stage_hint, const char *where) > { > - hmapx_init(&lflow->od_group); > + lflow->dpg_bitmap = bitmap_allocate(n_datapaths); > lflow->od = od; > lflow->stage = stage; > lflow->priority = priority; > @@ -4931,7 +4950,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct > ovn_datapath *od, > lflow->dpg = NULL; > lflow->where = where; > if (parallelization_state != STATE_NULL) { > - ovs_mutex_init(&lflow->odg_lock); > + ovs_mutex_init(&lflow->dpg_lock); > } > } > > @@ -4945,11 +4964,11 @@ ovn_dp_group_add_with_reference(struct ovn_lflow > *lflow_ref, > } > > if (parallelization_state == STATE_USE_PARALLELIZATION) { > - ovs_mutex_lock(&lflow_ref->odg_lock); > - hmapx_add(&lflow_ref->od_group, od); > - ovs_mutex_unlock(&lflow_ref->odg_lock); > + ovs_mutex_lock(&lflow_ref->dpg_lock); > + bitmap_set1(lflow_ref->dpg_bitmap, od->index); > + ovs_mutex_unlock(&lflow_ref->dpg_lock); > } else { > - hmapx_add(&lflow_ref->od_group, od); > + bitmap_set1(lflow_ref->dpg_bitmap, od->index); > } > > return true; > @@ -5035,7 +5054,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct > ovn_datapath *od, > io_port ? xstrdup(io_port) : NULL, > nullable_xstrdup(ctrl_meter), > ovn_lflow_hint(stage_hint), where); > - hmapx_add(&lflow->od_group, od); > + bitmap_set1(lflow->dpg_bitmap, od->index); > if (parallelization_state != STATE_USE_PARALLELIZATION) { > hmap_insert(lflow_map, &lflow->hmap_node, hash); > } else { > @@ -5166,12 +5185,12 @@ ovn_lflow_destroy(struct hmap *lflows, struct > ovn_lflow *lflow) > { > if (lflow) { > if (parallelization_state != STATE_NULL) { > - ovs_mutex_destroy(&lflow->odg_lock); > + ovs_mutex_destroy(&lflow->dpg_lock); > } > if (lflows) { > hmap_remove(lflows, &lflow->hmap_node); > } > - hmapx_destroy(&lflow->od_group); > + bitmap_free(lflow->dpg_bitmap); > free(lflow->match); > free(lflow->actions); > free(lflow->io_port); > @@ -14172,23 +14191,25 @@ ovn_sb_set_lflow_logical_dp_group( > struct ovsdb_idl_txn *ovnsb_txn, > struct hmap *dp_groups, > const struct sbrec_logical_flow *sbflow, > - const struct hmapx *od_group) > + const unsigned long *dpg_bitmap) > { > struct ovn_dp_group *dpg; > + size_t n_ods; > > - if (!hmapx_count(od_group)) { > + n_ods = bitmap_count1(dpg_bitmap, n_datapaths); > + > + if (!n_ods) { > sbrec_logical_flow_set_logical_dp_group(sbflow, NULL); > return; > } > > - ovs_assert(hmapx_count(od_group) != 1); > + ovs_assert(n_ods != 1); > > - dpg = ovn_dp_group_find(dp_groups, od_group, > - hash_int(hmapx_count(od_group), 0)); > + dpg = ovn_dp_group_find(dp_groups, dpg_bitmap, hash_int(n_ods, 0)); > ovs_assert(dpg != NULL); > > if (!dpg->dp_group) { > - dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, &dpg->map); > + dpg->dp_group = ovn_sb_insert_logical_dp_group(ovnsb_txn, > dpg->bitmap); > } > sbrec_logical_flow_set_logical_dp_group(sbflow, dpg->dp_group); > } > @@ -14277,21 +14298,22 @@ void build_lflows(struct lflow_input *input_data, > fast_hmap_size_for(&single_dp_lflows, max_seen_lflow_size); > > struct ovn_lflow *lflow; > - struct hmapx_node *node; > HMAP_FOR_EACH_SAFE (lflow, hmap_node, &lflows) { > - uint32_t hash; > struct ovn_dp_group *dpg; > + uint32_t hash, n_ods; > > - ovs_assert(hmapx_count(&lflow->od_group)); > + n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths); > > - if (hmapx_count(&lflow->od_group) == 1) { > + ovs_assert(n_ods); > + > + if (n_ods == 1) { > /* There is only one datapath, so it should be moved out of the > * group to a single 'od'. */ > - HMAPX_FOR_EACH (node, &lflow->od_group) { > - lflow->od = node->data; > - break; > - } > - hmapx_clear(&lflow->od_group); > + size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0, > + n_datapaths); > + > + bitmap_set0(lflow->dpg_bitmap, index); > + lflow->od = datapaths_array[index]; > > /* Logical flow should be re-hashed to allow lookups. */ > hash = hmap_node_hash(&lflow->hmap_node); > @@ -14304,11 +14326,11 @@ void build_lflows(struct lflow_input *input_data, > continue; > } > > - hash = hash_int(hmapx_count(&lflow->od_group), 0); > - dpg = ovn_dp_group_find(&dp_groups, &lflow->od_group, hash); > + hash = hash_int(n_ods, 0); > + dpg = ovn_dp_group_find(&dp_groups, lflow->dpg_bitmap, hash); > if (!dpg) { > dpg = xzalloc(sizeof *dpg); > - hmapx_clone(&dpg->map, &lflow->od_group); > + dpg->bitmap = bitmap_clone(lflow->dpg_bitmap, n_datapaths); > hmap_insert(&dp_groups, &dpg->node, hash); > } > lflow->dpg = dpg; > @@ -14409,36 +14431,28 @@ void build_lflows(struct lflow_input *input_data, > } else { > /* There is a datapath group and we need to perform > * a full comparison. */ > - struct ovn_datapath **od; > - int n_datapaths = 0; > + unsigned long *dpg_bitmap; > + struct ovn_datapath *od; > > - od = xmalloc(dp_group->n_datapaths * sizeof *od); > + dpg_bitmap = bitmap_allocate(n_datapaths); > /* Check all logical datapaths from the group. */ > for (i = 0; i < dp_group->n_datapaths; i++) { > - od[n_datapaths] = ovn_datapath_from_sbrec( > + od = ovn_datapath_from_sbrec( > input_data->datapaths, dp_group->datapaths[i]); > - if (!od[n_datapaths] > - || ovn_datapath_is_stale(od[n_datapaths])) { > + if (!od || ovn_datapath_is_stale(od)) { > continue; > } > - n_datapaths++; > + bitmap_set1(dpg_bitmap, od->index); > } > > - if (n_datapaths != hmapx_count(&lflow->od_group)) { > - update_dp_group = true; > - } > - /* Have to compare datapath groups in full. */ > - for (i = 0; !update_dp_group && i < n_datapaths; i++) { > - if (od[i] && !hmapx_contains(&lflow->od_group, od[i])) { > - update_dp_group = true; > - } > - } > - free(od); > + update_dp_group = !bitmap_equal(dpg_bitmap, > lflow->dpg_bitmap, > + n_datapaths); > + bitmap_free(dpg_bitmap); > } > > if (update_dp_group) { > ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups, > - sbflow, &lflow->od_group); > + sbflow, lflow->dpg_bitmap); > } else if (lflow->dpg && !lflow->dpg->dp_group) { > /* Setting relation between unique datapath group and > * Sb DB datapath goup. */ > @@ -14462,7 +14476,7 @@ void build_lflows(struct lflow_input *input_data, > sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb); > } > ovn_sb_set_lflow_logical_dp_group(ovnsb_txn, &dp_groups, > - sbflow, &lflow->od_group); > + sbflow, lflow->dpg_bitmap); > sbrec_logical_flow_set_pipeline(sbflow, pipeline); > sbrec_logical_flow_set_table_id(sbflow, table); > sbrec_logical_flow_set_priority(sbflow, lflow->priority); > @@ -14503,7 +14517,7 @@ void build_lflows(struct lflow_input *input_data, > > struct ovn_dp_group *dpg; > HMAP_FOR_EACH_POP (dpg, node, &dp_groups) { > - hmapx_destroy(&dpg->map); > + bitmap_free(dpg->bitmap); > free(dpg); > } > hmap_destroy(&dp_groups); > diff --git a/northd/northd.h b/northd/northd.h > index 8d299864f..fb903102b 100644 > --- a/northd/northd.h > +++ b/northd/northd.h > @@ -180,6 +180,9 @@ struct ovn_datapath { > struct hmap_node key_node; /* Index on 'key'. */ > struct uuid key; /* (nbs/nbr)->header_.uuid. */ > > + size_t index; /* A unique index across all datapaths. > + * Datapath indexes are sequential and start from zero. > */ > + > const struct nbrec_logical_switch *nbs; /* May be NULL. */ > const struct nbrec_logical_router *nbr; /* May be NULL. */ > const struct sbrec_datapath_binding *sb; /* May be NULL. */ > -- > 2.34.3 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
