On Mon, Aug 15, 2022 at 11:29 PM Ales Musil <amu...@redhat.com> wrote: > > > > On Mon, Aug 15, 2022 at 6:51 PM Han Zhou <hz...@ovn.org> wrote: >> >> >> >> On Thu, Aug 11, 2022 at 7:20 AM Ales Musil <amu...@redhat.com> wrote: >> > >> > Add configuration option into NB global table >> > called "mac_binding_removal_limit" defaulting to 0. >> > This option allows to limit number of MAC bindings >> > that can be removed by the aging mechanism in a single >> > transaction. The 0 means that the mechanism is disabled. >> > If the limit is reached next removal will be delayed by >> > 10 ms. This option when being set has a downside that >> > in theory we could never finish the removal, however in >> > practice it is unlikely considering that not all routers >> > will have aging enabled and the enabled will be with >> > reasonable threshold. >> > >> > Reported-at: https://bugzilla.redhat.com/2084668 >> > Signed-off-by: Ales Musil <amu...@redhat.com> >> > --- >> > v4: Rebase on top of current main. >> > Address comment from Numan. >> > v5: Rebase on top of current main. >> > Adress comment from Dumitru. >> > --- >> > northd/inc-proc-northd.c | 1 + >> > northd/mac-binding-aging.c | 33 +++++++++++++++++++++++++++++++-- >> > ovn-nb.xml | 8 ++++++++ >> > 3 files changed, 40 insertions(+), 2 deletions(-) >> > >> > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c >> > index fc0d9e670..54e0ad3b0 100644 >> > --- a/northd/inc-proc-northd.c >> > +++ b/northd/inc-proc-northd.c >> > @@ -215,6 +215,7 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, >> > engine_add_input(&en_northd, &en_sb_load_balancer, NULL); >> > engine_add_input(&en_northd, &en_sb_fdb, NULL); >> > engine_add_input(&en_northd, &en_sb_static_mac_binding, NULL); >> > + engine_add_input(&en_mac_binding_aging, &en_nb_nb_global, NULL); >> > engine_add_input(&en_mac_binding_aging, &en_sb_mac_binding, NULL); >> > engine_add_input(&en_mac_binding_aging, &en_northd, NULL); >> > engine_add_input(&en_mac_binding_aging, &en_mac_binding_aging_waker, NULL); >> > diff --git a/northd/mac-binding-aging.c b/northd/mac-binding-aging.c >> > index 3859c050b..36d0a6fd7 100644 >> > --- a/northd/mac-binding-aging.c >> > +++ b/northd/mac-binding-aging.c >> > @@ -28,6 +28,8 @@ >> > >> > VLOG_DEFINE_THIS_MODULE(mac_binding_aging); >> > >> > +#define MAC_BINDING_BULK_REMOVAL_DELAY_MSEC 10 >> > > Hi Han, > >> >> Thanks Ales. I just have a concern for this delay value. 10ms seems really too short. If I understand correctly, the delay is to avoid too many MAC binding deletion at the same time triggering lots of MAC-binding learning and population back to SB (please correct me if I am wrong, and better add some comment to explain the motivation in the code, too). > > > Your understanding is right, yes we want to avoid too much pressure on SB and controllers if we remove everything at once and repopulate it again. > >> >> If we think about the control plane latency at scale, it is easy to have a latency larger than hundreds of ms or even seconds. So wait for 10ms is nothing and it really depends on how long it takes for the ovn-northd to execute an iteration. I think the whole feature is just to avoid MAC-binding entry increases forever, so I think deleting some entries delayed for seconds or even minutes is not going to impact anything. So if the purpose here is to avoid scaling problem, shall we set the delay at least longer than 5s level? > > > Actually this can still happen only once per iteration, we can raise it but if the loop takes a few seconds to run it will be delayed by that value. So do you mean it is purposely set to a small value and hoping the slowness of the northd loop itself to act as the real delay? I doubt if this serves the purpose. If there are no other changes in NB/SB, northd's I-P engine would avoid recompute and the loop should complete very fast, which would end up with the mac-aging executed again for the next batches in just 10ms, right? Considering that SB may take much longer time to propagate the changes to all nodes (depending on the number of nodes), the 10ms delay would actually add more pressure to SB (same amount of record changes in multiple transactions instead of a combined one).
> However it is actually hard to find the right balance, the fact being that it still can happen only once per iteration, it almost feels like another config option > would be useful, but I don't want to introduce yet another config option blindly. I don't think we need another config either, but I am suggesting a much longer time hardcoded. The whole purpose of the mac-binding aging feature is to avoid mac-binding entries growing indefinitely, so I really don't think 5s or even longer delay would do any harm to this feature. Remember that we also talked about delaying the deletion further if an entry is refreshed by an ARP (or other mechanisms). I am not sure if 5s is long enough, but anything below 1s seems too short for this. What do you think? Thanks, Han > > Thanks, > Ales > > >> >> >> Thanks, >> Han >> >> > + >> > struct mac_binding_waker { >> > bool should_schedule; >> > long long next_wake_msec; >> > @@ -37,7 +39,8 @@ static void >> > mac_binding_aging_run_for_datapath(const struct sbrec_datapath_binding *dp, >> > const struct nbrec_logical_router *nbr, >> > struct ovsdb_idl_index *mb_by_datapath, >> > - int64_t now, int64_t *wake_delay) >> > + int64_t now, int64_t *wake_delay, >> > + uint32_t removal_limit, uint32_t *removed_n) >> > { >> > uint64_t threshold = smap_get_uint(&nbr->options, >> > "mac_binding_age_threshold", >> > @@ -58,6 +61,10 @@ mac_binding_aging_run_for_datapath(const struct sbrec_datapath_binding *dp, >> > continue; >> > } else if (elapsed >= threshold) { >> > sbrec_mac_binding_delete(mb); >> > + (*removed_n)++; >> > + if (removal_limit && *removed_n == removal_limit) { >> > + break; >> > + } >> > } else { >> > *wake_delay = MIN(*wake_delay, threshold - elapsed); >> > } >> > @@ -65,6 +72,20 @@ mac_binding_aging_run_for_datapath(const struct sbrec_datapath_binding *dp, >> > sbrec_mac_binding_index_destroy_row(mb_index_row); >> > } >> > >> > +static uint32_t >> > +get_removal_limit(struct engine_node *node) >> > +{ >> > + const struct nbrec_nb_global_table *nb_global_table = >> > + EN_OVSDB_GET(engine_get_input("NB_nb_global", node)); >> > + const struct nbrec_nb_global *nb = >> > + nbrec_nb_global_table_first(nb_global_table); >> > + if (!nb) { >> > + return 0; >> > + } >> > + >> > + return smap_get_uint(&nb->options, "mac_binding_removal_limit", 0); >> > +} >> > + >> > void >> > en_mac_binding_aging_run(struct engine_node *node, void *data OVS_UNUSED) >> > { >> > @@ -76,6 +97,8 @@ en_mac_binding_aging_run(struct engine_node *node, void *data OVS_UNUSED) >> > >> > int64_t next_expire_msec = INT64_MAX; >> > int64_t now = time_wall_msec(); >> > + uint32_t removal_limit = get_removal_limit(node); >> > + uint32_t removed_n = 0; >> > struct northd_data *northd_data = engine_get_input_data("northd", node); >> > struct mac_binding_waker *waker = >> > engine_get_input_data("mac_binding_aging_waker", node); >> > @@ -88,7 +111,13 @@ en_mac_binding_aging_run(struct engine_node *node, void *data OVS_UNUSED) >> > if (od->sb && od->nbr) { >> > mac_binding_aging_run_for_datapath(od->sb, od->nbr, >> > sbrec_mac_binding_by_datapath, >> > - now, &next_expire_msec); >> > + now, &next_expire_msec, >> > + removal_limit, &removed_n); >> > + if (removal_limit && removed_n == removal_limit) { >> > + /* Schedule the next run after specified delay. */ >> > + next_expire_msec = MAC_BINDING_BULK_REMOVAL_DELAY_MSEC; >> > + break; >> > + } >> > } >> > } >> > >> > diff --git a/ovn-nb.xml b/ovn-nb.xml >> > index 57c7d6174..bed1a3aa1 100644 >> > --- a/ovn-nb.xml >> > +++ b/ovn-nb.xml >> > @@ -162,6 +162,14 @@ >> > dynamically assigned, e.g. <code>00:11:22</code> >> > </column> >> > >> > + <column name="options" key="mac_binding_removal_limit" >> > + type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'> >> > + MAC binding aging bulk removal limit. This limits how many rows >> > + can expire in a single transaction. Default value is 0 which >> > + is unlimited. When we hit the limit next batch removal is delayed by >> > + 10 ms. >> > + </column> >> > + >> > <column name="options" key="controller_event" type='{"type": "boolean"}'> >> > Value set by the CMS to enable/disable ovn-controller event reporting. >> > Traffic into OVS can raise a 'controller' event that results in a >> > -- >> > 2.37.1 >> > > > > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA > > amu...@redhat.com IM: amusil _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev