On Mon, Aug 15, 2022 at 11:29 PM Ales Musil <amu...@redhat.com> wrote:
>
>
>
> On Mon, Aug 15, 2022 at 6:51 PM Han Zhou <hz...@ovn.org> wrote:
>>
>>
>>
>> On Thu, Aug 11, 2022 at 7:20 AM Ales Musil <amu...@redhat.com> wrote:
>> >
>> > Add configuration option into NB global table
>> > called "mac_binding_removal_limit" defaulting to 0.
>> > This option allows to limit number of MAC bindings
>> > that can be removed by the aging mechanism in a single
>> > transaction. The 0 means that the mechanism is disabled.
>> > If the limit is reached next removal will be delayed by
>> > 10 ms. This option when being set has a downside that
>> > in theory we could never finish the removal, however in
>> > practice it is unlikely considering that not all routers
>> > will have aging enabled and the enabled will be with
>> > reasonable threshold.
>> >
>> > Reported-at: https://bugzilla.redhat.com/2084668
>> > Signed-off-by: Ales Musil <amu...@redhat.com>
>> > ---
>> > v4: Rebase on top of current main.
>> >     Address comment from Numan.
>> > v5: Rebase on top of current main.
>> >     Adress comment from Dumitru.
>> > ---
>> >  northd/inc-proc-northd.c   |  1 +
>> >  northd/mac-binding-aging.c | 33 +++++++++++++++++++++++++++++++--
>> >  ovn-nb.xml                 |  8 ++++++++
>> >  3 files changed, 40 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
>> > index fc0d9e670..54e0ad3b0 100644
>> > --- a/northd/inc-proc-northd.c
>> > +++ b/northd/inc-proc-northd.c
>> > @@ -215,6 +215,7 @@ void inc_proc_northd_init(struct ovsdb_idl_loop
*nb,
>> >      engine_add_input(&en_northd, &en_sb_load_balancer, NULL);
>> >      engine_add_input(&en_northd, &en_sb_fdb, NULL);
>> >      engine_add_input(&en_northd, &en_sb_static_mac_binding, NULL);
>> > +    engine_add_input(&en_mac_binding_aging, &en_nb_nb_global, NULL);
>> >      engine_add_input(&en_mac_binding_aging, &en_sb_mac_binding, NULL);
>> >      engine_add_input(&en_mac_binding_aging, &en_northd, NULL);
>> >      engine_add_input(&en_mac_binding_aging,
&en_mac_binding_aging_waker, NULL);
>> > diff --git a/northd/mac-binding-aging.c b/northd/mac-binding-aging.c
>> > index 3859c050b..36d0a6fd7 100644
>> > --- a/northd/mac-binding-aging.c
>> > +++ b/northd/mac-binding-aging.c
>> > @@ -28,6 +28,8 @@
>> >
>> >  VLOG_DEFINE_THIS_MODULE(mac_binding_aging);
>> >
>> > +#define MAC_BINDING_BULK_REMOVAL_DELAY_MSEC 10
>>
>
> Hi Han,
>
>>
>> Thanks Ales. I just have a concern for this delay value. 10ms seems
really too short. If I understand correctly, the delay is to avoid too many
MAC binding deletion at the same time triggering lots of MAC-binding
learning and population back to SB (please correct me if I am wrong, and
better add some comment to explain the motivation in the code, too).
>
>
> Your understanding is right, yes we want to avoid too much pressure on SB
and controllers if we remove everything at once and repopulate it again.
>
>>
>> If we think about the control plane latency at scale, it is easy to have
a latency larger than hundreds of ms or even seconds. So wait for 10ms is
nothing and it really depends on how long it takes for the ovn-northd to
execute an iteration. I think the whole feature is just to avoid
MAC-binding entry increases forever, so I think deleting some entries
delayed for seconds or even minutes is not going to impact anything. So if
the purpose here is to avoid scaling problem, shall we set the delay at
least longer than 5s level?
>
>
> Actually this can still happen only once per iteration, we can raise it
but if the loop takes a few seconds to run it will be delayed by that value.
So do you mean it is purposely set to a small value and hoping the slowness
of the northd loop itself to act as the real delay? I doubt if this serves
the purpose. If there are no other changes in NB/SB, northd's I-P engine
would avoid recompute and the loop should complete very fast, which would
end up with the mac-aging executed again for the next batches in just 10ms,
right? Considering that SB may take much longer time to propagate the
changes to all nodes (depending on the number of nodes), the 10ms delay
would actually add more pressure to SB (same amount of record changes in
multiple transactions instead of a combined one).

> However it is actually hard to find the right balance, the fact being
that it still can happen only once per iteration, it almost feels like
another config option
> would be useful, but I don't want to introduce yet another config option
blindly.

I don't think we need another config either, but I am suggesting a much
longer time hardcoded. The whole purpose of the mac-binding aging feature
is to avoid mac-binding entries growing indefinitely, so I really don't
think 5s or even longer delay would do any harm to this feature. Remember
that we also talked about delaying the deletion further if an entry is
refreshed by an ARP (or other mechanisms). I am not sure if 5s is long
enough, but anything below 1s seems too short for this. What do you think?

Thanks,
Han

>
> Thanks,
> Ales
>
>
>>
>>
>> Thanks,
>> Han
>>
>> > +
>> >  struct mac_binding_waker {
>> >      bool should_schedule;
>> >      long long next_wake_msec;
>> > @@ -37,7 +39,8 @@ static void
>> >  mac_binding_aging_run_for_datapath(const struct
sbrec_datapath_binding *dp,
>> >                                     const struct nbrec_logical_router
*nbr,
>> >                                     struct ovsdb_idl_index
*mb_by_datapath,
>> > -                                   int64_t now, int64_t *wake_delay)
>> > +                                   int64_t now, int64_t *wake_delay,
>> > +                                   uint32_t removal_limit, uint32_t
*removed_n)
>> >  {
>> >      uint64_t threshold = smap_get_uint(&nbr->options,
>> >                                         "mac_binding_age_threshold",
>> > @@ -58,6 +61,10 @@ mac_binding_aging_run_for_datapath(const struct
sbrec_datapath_binding *dp,
>> >              continue;
>> >          } else if (elapsed >= threshold) {
>> >              sbrec_mac_binding_delete(mb);
>> > +            (*removed_n)++;
>> > +            if (removal_limit && *removed_n == removal_limit) {
>> > +                break;
>> > +            }
>> >          } else {
>> >              *wake_delay = MIN(*wake_delay, threshold - elapsed);
>> >          }
>> > @@ -65,6 +72,20 @@ mac_binding_aging_run_for_datapath(const struct
sbrec_datapath_binding *dp,
>> >      sbrec_mac_binding_index_destroy_row(mb_index_row);
>> >  }
>> >
>> > +static uint32_t
>> > +get_removal_limit(struct engine_node *node)
>> > +{
>> > +    const struct nbrec_nb_global_table *nb_global_table =
>> > +        EN_OVSDB_GET(engine_get_input("NB_nb_global", node));
>> > +    const struct nbrec_nb_global *nb =
>> > +        nbrec_nb_global_table_first(nb_global_table);
>> > +    if (!nb) {
>> > +       return 0;
>> > +    }
>> > +
>> > +    return smap_get_uint(&nb->options, "mac_binding_removal_limit",
0);
>> > +}
>> > +
>> >  void
>> >  en_mac_binding_aging_run(struct engine_node *node, void *data
OVS_UNUSED)
>> >  {
>> > @@ -76,6 +97,8 @@ en_mac_binding_aging_run(struct engine_node *node,
void *data OVS_UNUSED)
>> >
>> >      int64_t next_expire_msec = INT64_MAX;
>> >      int64_t now = time_wall_msec();
>> > +    uint32_t removal_limit = get_removal_limit(node);
>> > +    uint32_t removed_n = 0;
>> >      struct northd_data *northd_data = engine_get_input_data("northd",
node);
>> >      struct mac_binding_waker *waker =
>> >          engine_get_input_data("mac_binding_aging_waker", node);
>> > @@ -88,7 +111,13 @@ en_mac_binding_aging_run(struct engine_node *node,
void *data OVS_UNUSED)
>> >          if (od->sb && od->nbr) {
>> >              mac_binding_aging_run_for_datapath(od->sb, od->nbr,
>> >
sbrec_mac_binding_by_datapath,
>> > -                                               now,
&next_expire_msec);
>> > +                                               now, &next_expire_msec,
>> > +                                               removal_limit,
&removed_n);
>> > +            if (removal_limit && removed_n == removal_limit) {
>> > +                /* Schedule the next run after specified delay. */
>> > +                next_expire_msec =
MAC_BINDING_BULK_REMOVAL_DELAY_MSEC;
>> > +                break;
>> > +            }
>> >          }
>> >      }
>> >
>> > diff --git a/ovn-nb.xml b/ovn-nb.xml
>> > index 57c7d6174..bed1a3aa1 100644
>> > --- a/ovn-nb.xml
>> > +++ b/ovn-nb.xml
>> > @@ -162,6 +162,14 @@
>> >          dynamically assigned, e.g. <code>00:11:22</code>
>> >        </column>
>> >
>> > +      <column name="options" key="mac_binding_removal_limit"
>> > +              type='{"type": "integer", "minInteger": 0,
"maxInteger": 4294967295}'>
>> > +        MAC binding aging bulk removal limit. This limits how many
rows
>> > +        can expire in a single transaction. Default value is 0 which
>> > +        is unlimited. When we hit the limit next batch removal is
delayed by
>> > +        10 ms.
>> > +      </column>
>> > +
>> >        <column name="options" key="controller_event" type='{"type":
"boolean"}'>
>> >          Value set by the CMS to enable/disable ovn-controller event
reporting.
>> >          Traffic into OVS can raise a 'controller' event that results
in a
>> > --
>> > 2.37.1
>> >
>
>
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA
>
> amu...@redhat.com    IM: amusil
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to