On 2/13/24 08:19, Han Zhou wrote:
> On Thu, Feb 8, 2024 at 1:50 PM <[email protected]> wrote:
>>
>> From: Ilya Maximets <[email protected]>
>>
>> Currently, whenever a new logical flow is created, northd allocates
>> a reference counter for every datapath in the datapath group for that
>> logical flow.  Those reference counters are necessary in order to not
>> remove the datapath from a logical flow during incremental processing
>> if it was created from two different objects for the same datapath and
>> now one of them is removed.
>>
>> However, that creates a serious scalability problem.  In a large scale
>> setups with tens of thousands of logical flows applied to large
>> datapath groups we may have hundreds of millions of these reference
>> counters allocated, which is many GB of RSS just for that purpose.
>>
>> For example, in ovn-heater's cluster-density 500 node test, ovn-northd
>> started to consume up to 8.5 GB or RAM.  In the same test before the
>> reference counting, ovn-northd consumed only 2.5 GB.  All those memory
>> allocation also increased CPU usage.  Re-compute time went up from
>> 1.5 seconds to 6 seconds in the same test.  In the end we have about
>> 4x increase on both CPU and memory usage.
>>
>> Running under valgrind --tool=massif shows:
>>
>>   -------------------------------------------------------------
>>     total(B)       useful-heap(B)   extra-heap(B)    stacks(B)
>>   -------------------------------------------------------------
>>    9,416,438,200   7,401,971,556    2,014,466,644      0
>>
>>   78.61% (7,401 MB) (heap allocation functions) malloc
>>   ->45.78% (4,311 MB) xcalloc__ (util.c:124)
>>   | ->45.68% (4,301 MB) xzalloc__ (util.c:134)
>>   | | ->45.68% (4,301 MB) xzalloc (util.c:168)
>>   | |   ->40.97% (3,857 MB) dp_refcnt_use (lflow-mgr.c:1348)
>>   | |   | ->40.89% (3,850 MB) lflow_table_add_lflow (lflow-mgr.c:696)
>>   | |   | | ->12.27% (1,155 MB) build_lb_rules_pre_stateful
> (northd.c:7180)
>>   | |   | | ->12.27% (1,155 MB) build_lb_rules (northd.c:7658)
>>   | |   | | ->12.27% (1,155,MB) build_gw_lrouter_nat_flows_for_lb
> (northd.c:11054)
>>   ->28.62% (2,694 MB) xmalloc__ (util.c:140)
>>   | ->28.62% (2,694 MB) xmalloc (util.c:175)
>>   |   ->06.71% (631 MB) resize (hmap.c:100)
>>   |   | ->06.01% (565 MB) hmap_expand_at (hmap.c:175)
>>   |   | | ->05.24% (492 MB) hmap_insert_at (hmap.h:309)
>>   |   | | | ->05.24% (492 MB) dp_refcnt_use (lflow-mgr.c:1351)
>>
>> 45% of all the memory is allocated for reference counters themselves
>> and another 7% is taken by hash maps to hold them.  Also, there is
>> more than 20% of a total memory allocation overhead (extra-heap) since
>> all these allocated objects are very small (32B).
>>
>> This test allocates 120 M reference counters total.
>>
>> However, the vast majority of all the reference counters always has
>> a value of 1, i.e. these datapaths are not used more than once.
>>
>> Defer allocation of reference counters until the datapath is used
>> for the same logical flow for the second time.  We can do that by
>> checking the current datapath group bitmap.
>>
>> With this change, the amount of allocated reference counters goes
>> down from 120 M to just 12 M.  Memory consumption reduced from
>> 8.5 GB to 2.67 GB and the northd recompute time reduced from 6 to 2.1
>> seconds.
>>
>> It is still a little higher than resource usage before introduction of
>> incremental processing for logical flows, but it is fairly manageable.
>> Also, the resource usage and memory consumption may be further improved
>> by reducing the number of cases where northd attempts to create the
>> logical flows for the same datapaths multiple times.
>>
>> Note: the cluster-density test in ovn-heater creates new port groups
>> on every iteration and ovn-northd doesn't handle this incrementally,
>> so it always re-computes.  That's why there is no benefit from
>> northd I-P for CPU usage in this scenario.
>>
>> Fixes: a623606052ea ("northd: Refactor lflow management into a separate
> module.")
>> Signed-off-by: Ilya Maximets <[email protected]>
>> ---
>>  northd/lflow-mgr.c | 39 ++++++++++++++++++++-------------------
>>  1 file changed, 20 insertions(+), 19 deletions(-)
>>
>> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
>> index df62cd6ab4..f70896cce7 100644
>> --- a/northd/lflow-mgr.c
>> +++ b/northd/lflow-mgr.c
>> @@ -47,8 +47,7 @@ static void ovn_lflow_destroy(struct lflow_table
> *lflow_table,
>>  static char *ovn_lflow_hint(const struct ovsdb_idl_row *row);
>>
>>  static struct ovn_lflow *do_ovn_lflow_add(
>> -    struct lflow_table *, const struct ovn_datapath *,
>> -    const unsigned long *dp_bitmap, size_t dp_bitmap_len, uint32_t hash,
>> +    struct lflow_table *, size_t dp_bitmap_len, uint32_t hash,
>>      enum ovn_stage stage, uint16_t priority, const char *match,
>>      const char *actions, const char *io_port,
>>      const char *ctrl_meter,
>> @@ -674,9 +673,9 @@ lflow_table_add_lflow(struct lflow_table *lflow_table,
>>
>>      hash_lock = lflow_hash_lock(&lflow_table->entries, hash);
>>      struct ovn_lflow *lflow =
>> -        do_ovn_lflow_add(lflow_table, od, dp_bitmap,
>> -                         dp_bitmap_len, hash, stage,
>> -                         priority, match, actions,
>> +        do_ovn_lflow_add(lflow_table,
>> +                         od ? ods_size(od->datapaths) : dp_bitmap_len,
>> +                         hash, stage, priority, match, actions,
>>                           io_port, ctrl_meter, stage_hint, where);
>>
>>      if (lflow_ref) {
>> @@ -702,17 +701,24 @@ lflow_table_add_lflow(struct lflow_table
> *lflow_table,
>>                  ovs_assert(lrn->dpgrp_bitmap_len == dp_bitmap_len);
>>                  size_t index;
>>                  BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) {
>> -                    dp_refcnt_use(&lflow->dp_refcnts_map, index);
>> +                    /* Allocate a reference counter only if already
> used. */
>> +                    if (bitmap_is_set(lflow->dpg_bitmap, index)) {
>> +                        dp_refcnt_use(&lflow->dp_refcnts_map, index);
>> +                    }
>>                  }
>>              } else {
>> -                dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index);
>> +                /* Allocate a reference counter only if already used. */
>> +                if (bitmap_is_set(lflow->dpg_bitmap, lrn->dp_index)) {
>> +                    dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index);
>> +                }
>>              }
>>          }
>>          lrn->linked = true;
>>      }
>>
>> -    lflow_hash_unlock(hash_lock);
>> +    ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, dp_bitmap_len);
>>
>> +    lflow_hash_unlock(hash_lock);
>>  }
>>
>>  void
>> @@ -946,9 +952,7 @@ ovn_lflow_destroy(struct lflow_table *lflow_table,
> struct ovn_lflow *lflow)
>>  }
>>
>>  static struct ovn_lflow *
>> -do_ovn_lflow_add(struct lflow_table *lflow_table,
>> -                 const struct ovn_datapath *od,
>> -                 const unsigned long *dp_bitmap, size_t dp_bitmap_len,
>> +do_ovn_lflow_add(struct lflow_table *lflow_table, size_t dp_bitmap_len,
>>                   uint32_t hash, enum ovn_stage stage, uint16_t priority,
>>                   const char *match, const char *actions,
>>                   const char *io_port, const char *ctrl_meter,
>> @@ -959,14 +963,11 @@ do_ovn_lflow_add(struct lflow_table *lflow_table,
>>      struct ovn_lflow *old_lflow;
>>      struct ovn_lflow *lflow;
>>
>> -    size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
>> -    ovs_assert(bitmap_len);
>> +    ovs_assert(dp_bitmap_len);
>>
>>      old_lflow = ovn_lflow_find(&lflow_table->entries, stage,
>>                                 priority, match, actions, ctrl_meter,
> hash);
>>      if (old_lflow) {
>> -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> -                                        bitmap_len);
>>          return old_lflow;
>>      }
>>
>> @@ -974,14 +975,12 @@ do_ovn_lflow_add(struct lflow_table *lflow_table,
>>      /* While adding new logical flows we're not setting single datapath,
> but
>>       * collecting a group.  'od' will be updated later for all flows
> with only
>>       * one datapath in a group, so it could be hashed correctly. */
>> -    ovn_lflow_init(lflow, NULL, bitmap_len, stage, priority,
>> +    ovn_lflow_init(lflow, NULL, dp_bitmap_len, stage, priority,
>>                     xstrdup(match), xstrdup(actions),
>>                     io_port ? xstrdup(io_port) : NULL,
>>                     nullable_xstrdup(ctrl_meter),
>>                     ovn_lflow_hint(stage_hint), where);
>>
>> -    ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, bitmap_len);
>> -
>>      if (parallelization_state != STATE_USE_PARALLELIZATION) {
>>          hmap_insert(&lflow_table->entries, &lflow->hmap_node, hash);
>>      } else {
>> @@ -1350,8 +1349,10 @@ dp_refcnt_use(struct hmap *dp_refcnts_map, size_t
> dp_index)
>>      struct dp_refcnt *dp_refcnt = dp_refcnt_find(dp_refcnts_map,
> dp_index);
>>
>>      if (!dp_refcnt) {
>> -        dp_refcnt = xzalloc(sizeof *dp_refcnt);
>> +        dp_refcnt = xmalloc(sizeof *dp_refcnt);
>>          dp_refcnt->dp_index = dp_index;
>> +        /* Allocation is happening on the second (!) use. */
>> +        dp_refcnt->refcnt = 1;
>>
>>          hmap_insert(dp_refcnts_map, &dp_refcnt->key_node, dp_index);
>>      }
>> --
>> 2.43.0
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> Thanks Ilya and Numan.
> Acked-by: Han Zhou <[email protected]>

Looks good to me too:

Acked-by: Dumitru Ceara <[email protected]>

Thanks!


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to