On 10/9/23 19:23, Ilya Maximets wrote:
> On 10/9/23 17:39, Ilya Maximets wrote:
>> On 10/4/23 18:33, Dumitru Ceara wrote:
>>> OVS actually supports way more.  Use the real numbers instead.
>>> To avoid preallocating huge bitmaps for the group/meter IDs, switch to
>>> id-pool instead (as suggested by Ilya).
>>>
>>> Reported-at: https://issues.redhat.com/browse/FDP-70
>>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com>
>>> ---
>>> V2:
>>> - Addressed Ilya's comment: fixed the way id_pool_create() is called.
>>> - renamed max_size to max_id, it's more accurate.
>>> ---
>>>  controller/ovn-controller.c |  4 ++--
>>>  lib/extend-table.c          | 28 ++++++++++++----------------
>>>  lib/extend-table.h          |  7 +++----
>>>  tests/test-ovn.c            |  4 ++--
>>>  4 files changed, 19 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index 9a81f1a80f..50a101f0e8 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -3936,8 +3936,8 @@ en_lflow_output_init(struct engine_node *node 
>>> OVS_UNUSED,
>>>  {
>>>      struct ed_type_lflow_output *data = xzalloc(sizeof *data);
>>>      ovn_desired_flow_table_init(&data->flow_table);
>>> -    ovn_extend_table_init(&data->group_table);
>>> -    ovn_extend_table_init(&data->meter_table);
>>> +    ovn_extend_table_init(&data->group_table, OFPG_MAX);
>>> +    ovn_extend_table_init(&data->meter_table, OFPM13_MAX);
>>>      objdep_mgr_init(&data->lflow_deps_mgr);
>>>      lflow_conj_ids_init(&data->conj_ids);
>>>      uuidset_init(&data->objs_processed);
>>> diff --git a/lib/extend-table.c b/lib/extend-table.c
>>> index ebb1a054cb..90a1cf92b2 100644
>>> --- a/lib/extend-table.c
>>> +++ b/lib/extend-table.c
>>> @@ -17,7 +17,6 @@
>>>  #include <config.h>
>>>  #include <string.h>
>>>  
>>> -#include "bitmap.h"
>>>  #include "extend-table.h"
>>>  #include "hash.h"
>>>  #include "lib/uuid.h"
>>> @@ -30,10 +29,10 @@ ovn_extend_table_delete_desired(struct ovn_extend_table 
>>> *table,
>>>                                  struct ovn_extend_table_lflow_to_desired 
>>> *l);
>>>  
>>>  void
>>> -ovn_extend_table_init(struct ovn_extend_table *table)
>>> +ovn_extend_table_init(struct ovn_extend_table *table, uint32_t max_id)
>>>  {
>>> -    table->table_ids = bitmap_allocate(MAX_EXT_TABLE_ID);
>>> -    bitmap_set1(table->table_ids, 0); /* table id 0 is invalid. */
>>> +    /* Table id 0 is invalid, set id-pool base to 1. */
>>> +    table->table_ids = id_pool_create(1, max_id);
>>>      hmap_init(&table->desired);
>>>      hmap_init(&table->lflow_to_desired);
>>>      hmap_init(&table->existing);
>>> @@ -192,7 +191,7 @@ ovn_extend_table_clear(struct ovn_extend_table *table, 
>>> bool existing)
>>>              g->peer->peer = NULL;
>>>          } else {
>>>              /* Unset the bitmap because the peer is deleted already. */
>>
>> bitmap?
>>

Oops, I was too much in a hurry I guess.

>>> -            bitmap_set0(table->table_ids, g->table_id);
>>> +            id_pool_free_id(table->table_ids, g->table_id);
>>
>> Just to double check.  This is called only once per id?
>> Asking because it's fine to clear the same bit multiple
>> times, but it's not OK to free the same id-pool ID twice.
> 
> Hmm.  This is not a problem for id-pool.  I mixed up things with id-fpool.
> Basic id-pool handles double free case just fine.
> 

+1

>>
>>>          }
>>>          ovn_extend_table_info_destroy(g);
>>>      }
>>> @@ -206,7 +205,7 @@ ovn_extend_table_destroy(struct ovn_extend_table *table)
>>>      hmap_destroy(&table->lflow_to_desired);
>>>      ovn_extend_table_clear(table, true);
>>>      hmap_destroy(&table->existing);
>>> -    bitmap_free(table->table_ids);
>>> +    id_pool_destroy(table->table_ids);
>>>  }
>>>  
>>>  /* Remove an entry from existing table */
>>> @@ -221,7 +220,7 @@ ovn_extend_table_remove_existing(struct 
>>> ovn_extend_table *table,
>>>          existing->peer->peer = NULL;
>>>      } else {
>>>          /* Dealloc the ID. */
>>> -        bitmap_set0(table->table_ids, existing->table_id);
>>> +        id_pool_free_id(table->table_ids, existing->table_id);
>>>      }
>>>      ovn_extend_table_info_destroy(existing);
>>>  }
>>> @@ -242,7 +241,7 @@ ovn_extend_table_delete_desired(struct ovn_extend_table 
>>> *table,
>>>              if (e->peer) {
>>>                  e->peer->peer = NULL;
>>>              } else {
>>> -                bitmap_set0(table->table_ids, e->table_id);
>>> +                id_pool_free_id(table->table_ids, e->table_id);
>>>              }
>>>              ovn_extend_table_info_destroy(e);
>>>          }
>>> @@ -320,15 +319,12 @@ ovn_extend_table_assign_id(struct ovn_extend_table 
>>> *table, const char *name,
>>
>> Comment for this function is also talking about bitmap.
>>

Yep, I'll change that too.

>>>  
>>>      if (!existing_info) {
>>>          /* Reserve a new id. */
>>> -        table_id = bitmap_scan(table->table_ids, 0, 1, MAX_EXT_TABLE_ID + 
>>> 1);
>>> -    }
>>> -
>>> -    if (table_id == MAX_EXT_TABLE_ID + 1) {
>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>>> -        VLOG_ERR_RL(&rl, "%"PRIu32" out of table ids.", table_id);
>>> -        return EXT_TABLE_ID_INVALID;
>>> +        if (!id_pool_alloc_id(table->table_ids, &table_id)) {
>>> +            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>>
>> Maybe an empty line here?
>>

Sure, will do.

>>> +            VLOG_ERR_RL(&rl, "%"PRIu32" out of table ids.", table_id);
>>
>> Table ID is always zero at this point, so the error message
>> is not very informative.  Not that it was particularly useful
>> before...
>>
>>> +            return EXT_TABLE_ID_INVALID;
>>> +        }
>>>      }
>>> -    bitmap_set1(table->table_ids, table_id);
>>>  
>>>      table_info = ovn_extend_table_info_alloc(name, table_id, existing_info,
>>>                                               hash);
>>
>> So, we're creating a new hash map entry here with pre-existing ID.
>> Wouldn't this cause a double free for that ID later?
> 
> Again, not a problem for id-pool.
> 

:)

>>
>>> diff --git a/lib/extend-table.h b/lib/extend-table.h
>>> index b43a146b4f..56d54afe93 100644
>>> --- a/lib/extend-table.h
>>> +++ b/lib/extend-table.h
>>> @@ -17,18 +17,17 @@
>>>  #ifndef EXTEND_TABLE_H
>>>  #define EXTEND_TABLE_H 1
>>>  
>>> -#define MAX_EXT_TABLE_ID 65535
>>>  #define EXT_TABLE_ID_INVALID 0
>>>  
>>>  #include "openvswitch/hmap.h"
>>>  #include "openvswitch/list.h"
>>>  #include "openvswitch/uuid.h"
>>> +#include "id-pool.h"
>>
>> I'd say the include should be in the .c file.  In the header we should
>> only have:
>>
>> struct id_pool;
>>

I normally don't like to do this but in the code base we have both ways.
 I'll change it in the next revision.

>>>  
>>>  /* Used to manage expansion tables associated with Flow table,
>>>   * such as the Group Table or Meter Table. */
>>>  struct ovn_extend_table {
>>> -    unsigned long *table_ids;  /* Used as a bitmap with value set
>>> -                                * for allocated ids in either desired or
>>> +    struct id_pool *table_ids; /* Used to allocated ids in either desired 
>>> or
>>
>> s/allocated/allocate/
>>

OK.

>>>                                  * existing (or both).  If the same "name"
>>>                                  * exists in both desired and existing 
>>> tables,
>>>                                  * they must share the same ID.  The "peer"
>>> @@ -81,7 +80,7 @@ struct ovn_extend_table_lflow_ref {
>>>      struct ovn_extend_table_info *desired;
>>>  };
>>>  
>>> -void ovn_extend_table_init(struct ovn_extend_table *);
>>> +void ovn_extend_table_init(struct ovn_extend_table *, uint32_t max_id);
>>>  
>>>  void ovn_extend_table_destroy(struct ovn_extend_table *);
>>>  
>>> diff --git a/tests/test-ovn.c b/tests/test-ovn.c
>>> index 1f1e27b51f..e4f03806a3 100644
>>> --- a/tests/test-ovn.c
>>> +++ b/tests/test-ovn.c
>>> @@ -1300,11 +1300,11 @@ test_parse_actions(struct ovs_cmdl_context *ctx 
>>> OVS_UNUSED)
>>>  
>>>      /* Initialize group ids. */
>>>      struct ovn_extend_table group_table;
>>> -    ovn_extend_table_init(&group_table);
>>> +    ovn_extend_table_init(&group_table, OFPG_MAX);
>>>  
>>>      /* Initialize meter ids for QoS. */
>>>      struct ovn_extend_table meter_table;
>>> -    ovn_extend_table_init(&meter_table);
>>> +    ovn_extend_table_init(&meter_table, OFPM13_MAX);
>>
>> I did a bit more digging in this topic and instead of using these
>> constants we should technically ask OVS about real limits
>> with OFPMP13_METER_FEATURES_REQUEST and OFPST12_GROUP_FEATURES_REQUEST
>> respectively.  They will report how many meters and groups are actually
>> supported.
>>
>> At the same time OVN may not do that at the expense of getting and
>> OpenFlow error reply on attempt to create more meters/groups than
>> the implementation supports.  For example, userspace datapath supprts
>> 2^18 meters, and kernel allows creating meters up to filling 3% of
>> physical memory. :)
>>

To be honest, I tend to think that this is a very theoretical problem
and that in real life we will hit other harder to solve issues when
trying to scale to more than a few tens of thousand meters and a few
hundred thousand groups.. but OK.

We already have a dedicated openflow connection in ovn-controller for
probing for meature features support.  I'll extend that one to probe for
group support (and limits) too.

>> Currently OVS always reports OFPG_MAX as the maximum number of groups
>> though anyway.
>>
>> Does it matter if we fail a table ID allocation vs the OF request
>> failure later?
>>

We ignore table ID allocation failures but since 82b90fa5c006
("controller: Handle OpenFlow errors.") we don't ignore OF request
failures: ovn-controller re-connects to br-int, clears
flows/groups/meters, does a full recompute and then tries to re-install
flows/groups/meters.

In short, for this specific case, I think the outcome is the same:
unusable group/meter, except that OF request failure will generate a
re-connect loop.  So it seems "better" to fail the table ID allocation.

>> Best regards, Ilya Maximets.
> 

Thanks for the thorough review, I'll work on v3 and post it in the near
future.

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to