On Mon, Aug 15, 2022 at 11:34 PM Ales Musil <[email protected]> wrote: > > > > On Tue, Aug 16, 2022 at 4:14 AM Numan Siddique <[email protected]> wrote: >> >> On Tue, Aug 16, 2022 at 3:07 AM Han Zhou <[email protected]> wrote: >> > >> > On Mon, Aug 15, 2022 at 1:23 AM Dumitru Ceara <[email protected]> wrote: >> > > >> > > On 8/15/22 08:46, Numan Siddique wrote: >> > > > " >> > > > >> > > > On Mon, Aug 15, 2022 at 3:34 PM Han Zhou <[email protected]> wrote: >> > > >> >> > > >> On Sun, Aug 14, 2022 at 8:30 PM Numan Siddique <[email protected]> wrote: >> > > >>> >> > > >>> On Fri, Aug 12, 2022 at 12:21 AM Ales Musil <[email protected]> wrote: >> > > >>>> >> > > >>>> On Thu, Aug 11, 2022 at 2:45 PM Dumitru Ceara <[email protected] > >> > wrote: >> > > >>>> >> > > >>>>> On 8/10/22 18:37, Han Zhou wrote: >> > > >>>>>> On Tue, Aug 9, 2022 at 11:22 PM Ales Musil <[email protected]> >> > > >> wrote: >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> On Wed, Aug 10, 2022 at 8:04 AM Han Zhou <[email protected]> wrote: >> > > >>>>>>> >> > > >>>>>>> Hi Han, >> > > >>>>>>> >> > > >>>>>>> thank you for the review, see my reply inline below. >> > > >>>>>>> >> > > >>>>>>>> On Tue, Aug 9, 2022 at 4:05 AM Ales Musil <[email protected] > >> > > >> wrote: >> > > >>>>>>>>> >> > > >>>>>>>>> Add MAC binding aging mechanism, that utilizes >> > > >>>>>>>>> the timestamp column of MAC_Binding table. >> > > >>>>>>>>> When the MAC binding exceeds the threshold it is >> > > >>>>>>>>> removed from SB DB, this is postponed only in case >> > > >>>>>>>>> we receive update ARP with update to MAC address. >> > > >>>>>>>>> >> > > >>>>>>>>> The threshold is configurable via option >> > > >>>>>>>>> "mac_binding_age_threshold" that can be specified >> > > >>>>>>>>> for each logical router. The option is defaulting to >> > > >>>>>>>>> 0 which means that by default the aging is disabled >> > > >>>>>>>>> and the MAC binding rows will be persisted the same >> > > >>>>>>>>> way as before. >> > > >>>>>>>>> >> > > >>>>>>>>> Reported-at: https://bugzilla.redhat.com/2084668 >> > > >>>>>>>>> Signed-off-by: Ales Musil <[email protected]> >> > > >>>>>>>>> --- >> > > >>>>>>>>> v3: Rebase on top of current main. >> > > >>>>>>>>> Update according to the final conclusion. >> > > >>>>>>>>> v4: Rebase on top of current main. >> > > >>>>>>>>> Skip one row instead of all if time elapsed is negative. >> > > >>>>>>>>> Add note to NEWS. >> > > >>>>>>>>> --- >> > > >>>>>>>>> NEWS | 3 + >> > > >>>>>>>>> northd/automake.mk | 2 + >> > > >>>>>>>>> northd/inc-proc-northd.c | 13 ++++ >> > > >>>>>>>>> northd/mac-binding-aging.c | 151 >> > > >>>>> +++++++++++++++++++++++++++++++++++++ >> > > >>>>>>>>> northd/mac-binding-aging.h | 33 ++++++++ >> > > >>>>>>>>> ovn-nb.xml | 7 ++ >> > > >>>>>>>>> 6 files changed, 209 insertions(+) >> > > >>>>>>>>> create mode 100644 northd/mac-binding-aging.c >> > > >>>>>>>>> create mode 100644 northd/mac-binding-aging.h >> > > >>>>>>>>> >> > > >>>>>>>>> diff --git a/NEWS b/NEWS >> > > >>>>>>>>> index 20cea579e..eff8e472a 100644 >> > > >>>>>>>>> --- a/NEWS >> > > >>>>>>>>> +++ b/NEWS >> > > >>>>>>>>> @@ -14,6 +14,9 @@ Post v22.06.0 >> > > >>>>>>>>> NAT-T UDP encapsulation. Requires OVS support for IPsec >> > > >> custom >> > > >>>>>> tunnel >> > > >>>>>>>>> options (which will be available in OVS 2.18). >> > > >>>>>>>>> - Removed possibility of disabling logical datapath groups. >> > > >>>>>>>>> + - Added MAC binding aging mechanism, that is disabled by >> > > >> default. >> > > >>>>>>>>> + It can be enabled per logical router with option >> > > >>>>>>>>> + "mac_binding_age_threshold". >> > > >>>>>>>>> >> > > >>>>>>>>> OVN v22.06.0 - 03 Jun 2022 >> > > >>>>>>>>> -------------------------- >> > > >>>>>>>>> diff --git a/northd/automake.mk b/northd/automake.mk >> > > >>>>>>>>> index 4862ec7b7..81582867d 100644 >> > > >>>>>>>>> --- a/northd/automake.mk >> > > >>>>>>>>> +++ b/northd/automake.mk >> > > >>>>>>>>> @@ -1,6 +1,8 @@ >> > > >>>>>>>>> # ovn-northd >> > > >>>>>>>>> bin_PROGRAMS += northd/ovn-northd >> > > >>>>>>>>> northd_ovn_northd_SOURCES = \ >> > > >>>>>>>>> + northd/mac-binding-aging.c \ >> > > >>>>>>>>> + northd/mac-binding-aging.h \ >> > > >>>>>>>>> northd/northd.c \ >> > > >>>>>>>>> northd/northd.h \ >> > > >>>>>>>>> northd/ovn-northd.c \ >> > > >>>>>>>>> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c >> > > >>>>>>>>> index 43093cb5a..4a3699106 100644 >> > > >>>>>>>>> --- a/northd/inc-proc-northd.c >> > > >>>>>>>>> +++ b/northd/inc-proc-northd.c >> > > >>>>>>>>> @@ -22,9 +22,11 @@ >> > > >>>>>>>>> #include "ip-mcast-index.h" >> > > >>>>>>>>> #include "static-mac-binding-index.h" >> > > >>>>>>>>> #include "lib/inc-proc-eng.h" >> > > >>>>>>>>> +#include "lib/mac-binding-index.h" >> > > >>>>>>>>> #include "lib/ovn-nb-idl.h" >> > > >>>>>>>>> #include "lib/ovn-sb-idl.h" >> > > >>>>>>>>> #include "mcast-group-index.h" >> > > >>>>>>>>> +#include "northd/mac-binding-aging.h" >> > > >>>>>>>>> #include "openvswitch/poll-loop.h" >> > > >>>>>>>>> #include "openvswitch/vlog.h" >> > > >>>>>>>>> #include "inc-proc-northd.h" >> > > >>>>>>>>> @@ -149,6 +151,8 @@ enum sb_engine_node { >> > > >>>>>>>>> * avoid sparse errors. */ >> > > >>>>>>>>> static ENGINE_NODE(northd, "northd"); >> > > >>>>>>>>> static ENGINE_NODE(lflow, "lflow"); >> > > >>>>>>>>> +static ENGINE_NODE(mac_binding_aging, "mac_binding_aging"); >> > > >>>>>>>>> +static ENGINE_NODE(mac_binding_aging_waker, >> > > >>>>>> "mac_binding_aging_waker"); >> > > >>>>>>>>> >> > > >>>>>>>>> void inc_proc_northd_init(struct ovsdb_idl_loop *nb, >> > > >>>>>>>>> struct ovsdb_idl_loop *sb) >> > > >>>>>>>>> @@ -211,12 +215,16 @@ void inc_proc_northd_init(struct >> > > >> ovsdb_idl_loop >> > > >>>>>> *nb, >> > > >>>>>>>>> engine_add_input(&en_northd, &en_sb_load_balancer, NULL); >> > > >>>>>>>>> engine_add_input(&en_northd, &en_sb_fdb, NULL); >> > > >>>>>>>>> engine_add_input(&en_northd, &en_sb_static_mac_binding, >> > > >> NULL); >> > > >>>>>>>>> + engine_add_input(&en_mac_binding_aging, &en_sb_mac_binding, >> > > >>>>> NULL); >> > > >>>>>>>>> + engine_add_input(&en_mac_binding_aging, &en_northd, NULL); >> > > >>>>>>>>> + engine_add_input(&en_mac_binding_aging, >> > > >>>>>> &en_mac_binding_aging_waker, NULL); >> > > >>>>>>>>> engine_add_input(&en_lflow, &en_nb_bfd, NULL); >> > > >>>>>>>>> engine_add_input(&en_lflow, &en_sb_bfd, NULL); >> > > >>>>>>>>> engine_add_input(&en_lflow, &en_sb_logical_flow, NULL); >> > > >>>>>>>>> engine_add_input(&en_lflow, &en_sb_multicast_group, NULL); >> > > >>>>>>>>> engine_add_input(&en_lflow, &en_sb_igmp_group, NULL); >> > > >>>>>>>>> engine_add_input(&en_lflow, &en_northd, NULL); >> > > >>>>>>>>> + engine_add_input(&en_lflow, &en_mac_binding_aging, NULL); >> > > >>>>>>>>> >> > > >>>>>>>>> struct engine_arg engine_arg = { >> > > >>>>>>>>> .nb_idl = nb->idl, >> > > >>>>>>>>> @@ -235,6 +243,8 @@ void inc_proc_northd_init(struct >> > > >> ovsdb_idl_loop >> > > >>>>>> *nb, >> > > >>>>>>>>> chassis_hostname_index_create(sb->idl); >> > > >>>>>>>>> struct ovsdb_idl_index >> > *sbrec_static_mac_binding_by_lport_ip >> > > >>>>>>>>> = static_mac_binding_index_create(sb->idl); >> > > >>>>>>>>> + struct ovsdb_idl_index *sbrec_mac_binding_by_datapath >> > > >>>>>>>>> + = mac_binding_by_datapath_index_create(sb->idl); >> > > >>>>>>>>> >> > > >>>>>>>>> engine_init(&en_lflow, &engine_arg); >> > > >>>>>>>>> >> > > >>>>>>>>> @@ -256,6 +266,9 @@ void inc_proc_northd_init(struct >> > > >> ovsdb_idl_loop >> > > >>>>>> *nb, >> > > >>>>>>>>> engine_ovsdb_node_add_index(&en_sb_static_mac_binding, >> > > >>>>>>>>> >> > > >>>>>> "sbrec_static_mac_binding_by_lport_ip", >> > > >>>>>>>>> >> > > >>>>> sbrec_static_mac_binding_by_lport_ip); >> > > >>>>>>>>> + engine_ovsdb_node_add_index(&en_sb_mac_binding, >> > > >>>>>>>>> + >> > "sbrec_mac_binding_by_datapath", >> > > >>>>>>>>> + sbrec_mac_binding_by_datapath); >> > > >>>>>>>>> } >> > > >>>>>>>>> >> > > >>>>>>>>> void inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn, >> > > >>>>>>>>> diff --git a/northd/mac-binding-aging.c >> > > >> b/northd/mac-binding-aging.c >> > > >>>>>>>>> new file mode 100644 >> > > >>>>>>>>> index 000000000..e8217e8bc >> > > >>>>>>>>> --- /dev/null >> > > >>>>>>>>> +++ b/northd/mac-binding-aging.c >> > > >>>>>>>>> @@ -0,0 +1,151 @@ >> > > >>>>>>>>> +/* Copyright (c) 2022, Red Hat, Inc. >> > > >>>>>>>>> + * >> > > >>>>>>>>> + * Licensed under the Apache License, Version 2.0 (the >> > > >> "License"); >> > > >>>>>>>>> + * you may not use this file except in compliance with the >> > > >> License. >> > > >>>>>>>>> + * You may obtain a copy of the License at: >> > > >>>>>>>>> + * >> > > >>>>>>>>> + * http://www.apache.org/licenses/LICENSE-2.0 >> > > >>>>>>>>> + * >> > > >>>>>>>>> + * Unless required by applicable law or agreed to in writing, >> > > >>>>> software >> > > >>>>>>>>> + * distributed under the License is distributed on an "AS IS" >> > > >> BASIS, >> > > >>>>>>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express >> > > >> or >> > > >>>>>> implied. >> > > >>>>>>>>> + * See the License for the specific language governing >> > > >> permissions >> > > >>>>> and >> > > >>>>>>>>> + * limitations under the License. >> > > >>>>>>>>> + */ >> > > >>>>>>>>> + >> > > >>>>>>>>> +#include <config.h> >> > > >>>>>>>>> + >> > > >>>>>>>>> +#include "lib/inc-proc-eng.h" >> > > >>>>>>>>> +#include "lib/ovn-nb-idl.h" >> > > >>>>>>>>> +#include "lib/ovn-sb-idl.h" >> > > >>>>>>>>> +#include "lib/timeval.h" >> > > >>>>>>>>> +#include "northd/mac-binding-aging.h" >> > > >>>>>>>>> +#include "northd/northd.h" >> > > >>>>>>>>> +#include "openvswitch/hmap.h" >> > > >>>>>>>>> +#include "openvswitch/poll-loop.h" >> > > >>>>>>>>> +#include "openvswitch/util.h" >> > > >>>>>>>>> +#include "openvswitch/vlog.h" >> > > >>>>>>>>> + >> > > >>>>>>>>> +VLOG_DEFINE_THIS_MODULE(mac_binding_aging); >> > > >>>>>>>>> + >> > > >>>>>>>>> +struct mac_binding_waker { >> > > >>>>>>>>> + bool should_schedule; >> > > >>>>>>>>> + long long next_wake_msec; >> > > >>>>>>>>> +}; >> > > >>>>>>>>> + >> > > >>>>>>>>> +static struct mac_binding_waker waker; >> > > >>>>>>>>> + >> > > >>>>>>>>> +static void >> > > >>>>>>>>> +mac_binding_aging_run_for_datapath(const struct >> > > >>>>>> sbrec_datapath_binding *dp, >> > > >>>>>>>>> + const struct >> > > >> nbrec_logical_router >> > > >>>>>> *nbr, >> > > >>>>>>>>> + struct ovsdb_idl_index >> > > >>>>>> *mb_by_datapath, >> > > >>>>>>>>> + int64_t now, int64_t >> > > >> *wake_delay) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> + uint64_t threshold = smap_get_uint(&nbr->options, >> > > >>>>>>>>> + >> > > >> "mac_binding_age_threshold", >> > > >>>>>>>>> + 0) * 1000; >> > > >>>>> >> > > >>>>> Nit: indentation. >> > > >>>>> >> > > >>>>>>>>> + if (!threshold) { >> > > >>>>>>>>> + return; >> > > >>>>>>>>> + } >> > > >>>>>>>>> + >> > > >>>>>>>>> + struct sbrec_mac_binding *mb_index_row = >> > > >>>>>>>>> + sbrec_mac_binding_index_init_row(mb_by_datapath); >> > > >>>>>>>>> + sbrec_mac_binding_index_set_datapath(mb_index_row, dp); >> > > >>>>>>>>> + >> > > >>>>>>>>> + const struct sbrec_mac_binding *mb; >> > > >>>>>>>>> + SBREC_MAC_BINDING_FOR_EACH_EQUAL (mb, mb_index_row, >> > > >>>>>> mb_by_datapath) { >> > > >>>>>>>>> + int64_t elapsed = now - mb->timestamp; >> > > >>>>>>>>> + >> > > >>>>>>>>> + if (elapsed < 0) { >> > > >>>>>>>>> + continue; >> > > >>>>>>>>> + } else if (elapsed >= threshold) { >> > > >>>>>>>>> + sbrec_mac_binding_delete(mb); >> > > >>>>>>>>> + } else { >> > > >>>>>>>>> + *wake_delay = MIN(*wake_delay, threshold - >> > elapsed); >> > > >>>>>>>>> + } >> > > >>>>>>>>> + } >> > > >>>>>>>>> + sbrec_mac_binding_index_destroy_row(mb_index_row); >> > > >>>>>>>>> +} >> > > >>>>>>>>> + >> > > >>>>>>>>> +void >> > > >>>>>>>>> +en_mac_binding_aging_run(struct engine_node *node, void *data >> > > >>>>>> OVS_UNUSED) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> + const struct engine_context *eng_ctx = >> > engine_get_context(); >> > > >>>>>>>>> + >> > > >>>>>>>>> + if (!eng_ctx->ovnsb_idl_txn) { >> > > >>>>>>>>> + return; >> > > >>>>>>>>> + } >> > > >>>>>>>>> + >> > > >>>>>>>>> + int64_t next_expire_msec = INT64_MAX; >> > > >>>>>>>>> + int64_t now = time_wall_msec(); >> > > >>>>>>>>> + struct northd_data *northd_data = >> > > >> engine_get_input_data("northd", >> > > >>>>>> node); >> > > >>>>>>>>> + struct ovsdb_idl_index *sbrec_mac_binding_by_datapath = >> > > >>>>>>>>> + >> > > >>>>>> engine_ovsdb_node_get_index(engine_get_input("SB_mac_binding", >> > > >> node), >> > > >>>>>>>>> + >> > > >> "sbrec_mac_binding_by_datapath"); >> > > >>>>>>>>> + >> > > >>>>>>>>> + struct ovn_datapath *od; >> > > >>>>>>>>> + HMAP_FOR_EACH (od, key_node, &northd_data->datapaths) { >> > > >>>>>>>>> + if (od->sb && od->nbr) { >> > > >>>>>>>>> + mac_binding_aging_run_for_datapath(od->sb, od->nbr, >> > > >>>>>>>>> + >> > > >>>>>> sbrec_mac_binding_by_datapath, >> > > >>>>>>>>> + now, >> > > >>>>>> &next_expire_msec); >> > > >>>>>>>>> + } >> > > >>>>>>>>> + } >> > > >>>>>>>>> + >> > > >>>>>>>>> + if (next_expire_msec < INT64_MAX) { >> > > >>>>>>>>> + waker.should_schedule = true; >> > > >>>>>>>>> + waker.next_wake_msec = time_msec() + next_expire_msec; >> > > >>>>>>>>> + /* Run the engine right after so the waker can reflect >> > > >> on the >> > > >>>>>> new wake >> > > >>>>>>>>> + * time. */ >> > > >>>>>>>>> + poll_immediate_wake(); >> > > >>>>>>>> >> > > >>>>>>>> Why would we immediately wake up? Wouldn't it cause busy-loop? >> > > >> Shall >> > > >>>>> we >> > > >>>>>> use poll_timer_wait_until? Sorry if I missed anything here and >> > > >> please >> > > >>>>>> correct me. >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> That's because of the dependency of the nodes, we need to run the >> > > >> waker >> > > >>>>>> that will schedule the sleep. >> > > >>>>>>> We could probably use poll_timer_wait_until as well and it should >> > > >> have >> > > >>>>>> the same effect. >> > > >>>>>>> >> > > >>>>>> Thanks, I think I did misunderstand a little and now I got the idea >> > > >> of >> > > >>>>> the >> > > >>>>>> waker node. It is an input node just to trigger the timer, which >> > > >> wakes up >> > > >>>>>> the aging processing when needed and at the same time avoids >> > > >> running the >> > > >>>>>> aging scanning when the process is woken up for other reasons. So >> > > >> the >> > > >>>>>> poll_immediate_wake() here won't cause busy-loop. However, it does >> > > >> wake >> > > >>>>> up >> > > >>>>>> the process at least one extra time for each run, although the cost >> > > >> may >> > > >>>>> be >> > > >>>>>> small. I'd still use poll_timer_wait_until() here since we already >> > > >> know >> > > >>>>> the >> > > >>>>>> next_wake_msec. >> > > >>>>>> >> > > >>>>>> In addition, it is a little weird from I-P engine's point of view >> > > >> that >> > > >>>>> the >> > > >>>>>> output node execution updates the input node's data, but for this >> > > >> timer >> > > >>>>> use >> > > >>>>>> case I think it should work and I haven't come up with anything >> > > >> better. >> > > >>>>> It >> > > >>>>>> would be helpful to leave some comments for the waker node to >> > > >> explain the >> > > >>>>>> motivation and make this part easier to understand. >> > > >>>>>> >> > > >>>>> >> > > >>>>> +1 for both points (comment + poll_timer_wait_until()). >> > > >>>>> >> > > >>>>> Also, can we at least use engine_get_input_data() to get the input >> > > >>>>> node's data? This will make it more clear that we're using data >> > > >> that's >> > > >>>>> related to the waker node. The 'waker' variable would have to be >> > > >> moved >> > > >>>>> inside the en_mac_binding_aging_waker data. >> > > >>>>> >> > > >>>>>> Thanks, >> > > >>>>>> Han >> > > >>>>>> >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>>> + } else { >> > > >>>>>>>>> + waker.should_schedule = false; >> > > >>>>>>>>> + } >> > > >>>>>>>>> + >> > > >>>>>>>>> + /* This node is part of lflow, but lflow does not depend on >> > > >> it. >> > > >>>>>> Setting >> > > >>>>>>>>> + * state as unchanged does not trigger lflow node when it >> > > >> is not >> > > >>>>>> needed. */ >> > > >>>>>>>> >> > > >>>>>>>> This should work, but it is like a hack. Ideally we should allow >> > > >> the >> > > >>>>> I-P >> > > >>>>>> engine to have multiple "root" nodes. This can be a future >> > > >> improvement >> > > >>>>>> though. Maybe add a "XXX" comment here (a TODO item). >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> Yes unfortunately this is not the cleanest solution, but I did not >> > > >> come >> > > >>>>>> with a better one. I will add a TODO comment here about moving it >> > > >> once we >> > > >>>>>> allow multiple root nodes. >> > > >>>>>>> >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>> Thanks, >> > > >>>>>>>> Han >> > > >>>>>>>> >> > > >>>>>>> >> > > >>>>>>> Thanks, >> > > >>>>>>> Ales >> > > >>>>>>> >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>>> + engine_set_node_state(node, EN_UNCHANGED); >> > > >>>>>>>>> +} >> > > >>>>>>>>> + >> > > >>>>>>>>> +void * >> > > >>>>>>>>> +en_mac_binding_aging_init(struct engine_node *node OVS_UNUSED, >> > > >>>>>>>>> + struct engine_arg *arg OVS_UNUSED) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> + return NULL; >> > > >>>>>>>>> +} >> > > >>>>>>>>> + >> > > >>>>>>>>> +void >> > > >>>>>>>>> +en_mac_binding_aging_cleanup(void *data OVS_UNUSED) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> +} >> > > >>>>>>>>> + >> > > >>>>>>>>> +void >> > > >>>>>>>>> +en_mac_binding_aging_waker_run(struct engine_node *node, void >> > > >> *data >> > > >>>>>> OVS_UNUSED) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> + if (!waker.should_schedule) { >> > > >>>>>>>>> + return; >> > > >>>>>>>>> + } >> > > >>>>>>>>> + >> > > >>>>>>>>> + if (time_msec() >= waker.next_wake_msec) { >> > > >>>>>>>>> + waker.should_schedule = false; >> > > >>>>>>>>> + engine_set_node_state(node, EN_UPDATED); >> > > >>>>>>>>> + return; >> > > >>>>>>>>> + } >> > > >>>>>>>>> + >> > > >>>>>>>>> + engine_set_node_state(node, EN_UNCHANGED); >> > > >>>>>>>>> + poll_timer_wait_until(waker.next_wake_msec); >> > > >>>>>>>>> +} >> > > >>>>>>>>> + >> > > >>>>>>>>> +void * >> > > >>>>>>>>> +en_mac_binding_aging_waker_init(struct engine_node *node >> > > >> OVS_UNUSED, >> > > >>>>>>>>> + struct engine_arg *arg >> > > >> OVS_UNUSED) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> + waker.should_schedule = false; >> > > >>>>>>>>> + waker.next_wake_msec = 0; >> > > >>>>>>>>> + return NULL; >> > > >>>>>>>>> +} >> > > >>>>>>>>> + >> > > >>>>>>>>> +void >> > > >>>>>>>>> +en_mac_binding_aging_waker_cleanup(void *data OVS_UNUSED) >> > > >>>>>>>>> +{ >> > > >>>>>>>>> +} >> > > >>>>>>>>> diff --git a/northd/mac-binding-aging.h >> > > >> b/northd/mac-binding-aging.h >> > > >>>>>>>>> new file mode 100644 >> > > >>>>>>>>> index 000000000..296a7ab38 >> > > >>>>>>>>> --- /dev/null >> > > >>>>>>>>> +++ b/northd/mac-binding-aging.h >> > > >>>>>>>>> @@ -0,0 +1,33 @@ >> > > >>>>>>>>> +/* Copyright (c) 2022, Red Hat, Inc. >> > > >>>>>>>>> + * >> > > >>>>>>>>> + * Licensed under the Apache License, Version 2.0 (the >> > > >> "License"); >> > > >>>>>>>>> + * you may not use this file except in compliance with the >> > > >> License. >> > > >>>>>>>>> + * You may obtain a copy of the License at: >> > > >>>>>>>>> + * >> > > >>>>>>>>> + * http://www.apache.org/licenses/LICENSE-2.0 >> > > >>>>>>>>> + * >> > > >>>>>>>>> + * Unless required by applicable law or agreed to in writing, >> > > >>>>> software >> > > >>>>>>>>> + * distributed under the License is distributed on an "AS IS" >> > > >> BASIS, >> > > >>>>>>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express >> > > >> or >> > > >>>>>> implied. >> > > >>>>>>>>> + * See the License for the specific language governing >> > > >> permissions >> > > >>>>> and >> > > >>>>>>>>> + * limitations under the License. >> > > >>>>>>>>> + */ >> > > >>>>>>>>> + >> > > >>>>>>>>> +#ifndef MAC_BINDING_AGING_H >> > > >>>>>>>>> +#define MAC_BINDING_AGING_H 1 >> > > >>>>>>>>> + >> > > >>>>>>>>> +#include "lib/inc-proc-eng.h" >> > > >>>>>>>>> + >> > > >>>>>>>>> +/* The MAC binding aging node functions. */ >> > > >>>>>>>>> +void en_mac_binding_aging_run(struct engine_node *node, void >> > > >> *data); >> > > >>>>>>>>> +void *en_mac_binding_aging_init(struct engine_node *node, >> > > >>>>>>>>> + struct engine_arg *arg); >> > > >>>>>>>>> +void en_mac_binding_aging_cleanup(void *data); >> > > >>>>>>>>> + >> > > >>>>>>>>> +/* The MAC binding aging waker node functions. */ >> > > >>>>>>>>> +void en_mac_binding_aging_waker_run(struct engine_node *node, >> > > >> void >> > > >>>>>> *data); >> > > >>>>>>>>> +void *en_mac_binding_aging_waker_init(struct engine_node *node, >> > > >>>>>>>>> + struct engine_arg *arg); >> > > >>>>>>>>> +void en_mac_binding_aging_waker_cleanup(void *data); >> > > >>>>>>>>> + >> > > >>>>>>>>> +#endif /* northd/mac-binding-aging.h */ >> > > >>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >> > > >>>>>>>>> index e26afd83c..d3fba9bdc 100644 >> > > >>>>>>>>> --- a/ovn-nb.xml >> > > >>>>>>>>> +++ b/ovn-nb.xml >> > > >>>>>>>>> @@ -2392,6 +2392,13 @@ >> > > >>>>>>>>> and other sources. This way, OVN and the other sources >> > > >> can >> > > >>>>>> make use of >> > > >>>>>>>>> the same conntrack zone. >> > > >>>>>>>>> </column> >> > > >>>>>>>>> + >> > > >>>>>>>>> + <column name="options" key="mac_binding_age_threshold" >> > > >>>>>>>>> + type='{"type": "integer", "minInteger": 0, >> > > >>>>>> "maxInteger": 4294967295}'> >> > > >>>>>>>>> + MAC binding aging <code>threshold</code> value in secs. >> > > >> MAC >> > > >>>>>> binding >> > > >>>>> >> > > >>>>> Nit: I'd use 'seconds'. >> > > >>>>> >> > > >>>>> Thanks, >> > > >>>>> Dumitru >> > > >>>>> >> > > >>>>>>>>> + exceeding this timeout will be automatically removed. >> > > >> The >> > > >>>>>> value >> > > >>>>>>>>> + defaults to 0, which means disabled. >> > > >>>>>>>>> + </column> >> > > >>>>>>>>> </group> >> > > >>>>>>>>> >> > > >>>>>>>>> <group title="Common Columns"> >> > > >>>>>>>>> -- >> > > >>>>>>>>> 2.37.1 >> > > >>>>>>>>> >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> -- >> > > >>>>>>> >> > > >>>>>>> Ales Musil >> > > >>>>>>> >> > > >>>>>>> Senior Software Engineer - OVN Core >> > > >>>>>>> >> > > >>>>>>> Red Hat EMEA >> > > >>>>>>> >> > > >>>>>>> [email protected] IM: amusil >> > > >>>>>> >> > > >>>>> >> > > >>>>> >> > > >>>> Thank you for the reviews. >> > > >>>> >> > > >>>> v5 should address most of the comments except the open one from Numan >> > > >> about >> > > >>>> upgrade. >> > > >>>> >> > > >>> >> > > >>> IMO upgrading one is very important. As I mentioned earlier, >> > > >>> openstack neutron or ovn-kubernetes would not know >> > > >>> which OVN version is being run and if all the ovn-controllers are >> > > >>> upgraded to the version which has the mac_binding timestamp >> > > >>> support. This config option is not something which an administrator >> > > >>> can enable or disable depending on the upgrade status. >> > > >>> >> > > >>> There have been a few upgrade issues with OVN and because of which >> > > >>> openstack CMS had to do special handling. >> > > >>> This patch series can potentially affect the existing datapath traffic >> > > >>> after partial upgrades. So I think its better we handle this >> > > >>> properly. >> > > >>> >> > > >> >> > > >> Hi Numan, I agree with you that upgrading is important. But for this >> > case, >> > > >> maybe I didn't understand the problem. The feature is disabled by >> > default, >> > > >> so as long as the CMS doesn't enable the aging setting for any LR >> > before >> > > >> the upgrade is completed, it should be fine, right? This in practice >> > just >> > > >> means two separate changes to the production environment: >> > > >> 1. upgrade the OVN version, both central and controller components >> > > >> 2. change the aging setting to desired value for some (or all) LRs >> > > > >> > > > In my opinion, the step (2) you mentioned above is something which an >> > > > operator/admin cannot set >> > > > and it is most likely be set by openstack neutron or ovn-kubernetes or >> > > > any other CMS programmatically. >> > > > In the case of Openstack, neutron is part of a bigger >> > > > system/deployment and it wont be so easy to do >> > > > Step 1 first and then step 2. Tripleo installer for example will >> > > > upgrade OVN Northd services and Neutron >> > > > first before upgrading ovn-controllers on the compute nodes. >> > > > >> > > >> > > As far as I understand, OpenStack has recently added support to ensure >> > > that all ovn-controllers are upgraded before the central components are >> > > upgraded. Adding Jakub to this thread. >> > > >> > > > This feature will most likely be available in OVN 22.09. Lets say >> > > > openstack neutron >> > > > wants to use this feature and adds the necessary code to enable ageing >> > > > in some neutron >> > > > routers. I don't know on what basis or what logic it would use to set >> > > > the option >> > > > "mac_binding_age_threshold" for a logical router. Let's assume that >> > > > neutron queries the >> > > > OVN Southbound schema to check for the presence of the column >> > > > "timestamp" in MAC_Binding table >> > > > before setting the logical router option - mac_binding_age_threshold. >> > > > And lets say neutron uses this feature >> > > > in the openstack release Zed (scheduled to release in October). My >> > > > concern is how would neutron know >> > > > if all ovn-controllers are upgraded with this feature or not ? As I >> > > > mentioned above, It's possible for the openstack >> > > > installer/upgrade tool to upgrade OVN northd services first. Until a >> > > > compute node 'N' is not upgraded with the >> > > > proper ovn-controller version, ovn-northd will immediately delete the >> > > > mac binding entries learnt by >> > > > ovn-controller in node 'N'. >> > > > >> > > > Let me know your thoughts. >> > > > >> > > >> > > I think that if neutron/tripleo/OpenStack indeed take care of performing >> > > upgrades in the OVN supported way (all ovn-controllers before >> > > SB/ovn-northd) then we should be OK with the current patch. >> > > >> >> There are other OpenStack installers like Kolla Ansible. And I'm not >> sure how they handle >> upgrades. >> >> >> > >> > Thanks Numan and Dumitru, and now I understand the problem. Suppose Neutron >> > has a way to make sure ovn-controller are upgraded before ovn-central, >> > probably ovn-k8s still faces the problem? >> > For a simpler solution, I'd support the idea that don't delete the entry if >> > the timestamp is 0. Anything populated before the upgrade can be deleted as >> > a one-time cleanup manually at any time by the operator if desired. >> > The solution of checking features of all controllers is good, too, but just >> > a little complexity and an extra external-ids in the chassis table just for >> > the one-time use case. > > > This unfortunately still suffers from the problem that CMS might delete those prematurely. > Even if we don't remove anything with timestamp=0, there is still a chance that after > cleanup we might end up with the MAC binding again with timestamp=0. Which to me > IMO is error prone. >
Just to clarify, I didn't mean to add any automation in CMS to do that during upgrading. I meant the one-time cleanup can be done, if desired, by operators using OVSDB commands. Or, just leave them forever, until at some point probably after one more release of OVN we update the logic to allow deletion for timestamp==0. But I am ok with the current approach in v6/v7, with things we need to keep in mind (see below). >> >> >> If you see this commit [1], it is actually quite simple. In northd, >> there is already a >> "struct chassis_features" and function build_chassis_features() to handle this. >> And it should not add much complexity IMO. >> >> I'm fine with the other options too. But I'm slightly in favour of >> handling this in OVN itself instead >> of the extra config option or one-time cleanup script. > Yes, the code is not complex. I am concerned more about: 1. We need to be more clear about upgrade strategy support. Officially we support the order of upgrading ovn-controllers before the central components, and recently we had the new feature change within a release because of a backport (ct_label/ct_mark), which required support for upgrading central components first. Now it seems we are asking for support for arbitrary upgrade orders even between releases. It is good to be more flexible in upgradability support, but we'd better define it more clearly based on requirement, with more details such as, whether we support upgrading across multiple releases, downgrades, etc. (+ @Mark Michelson <[email protected]> ) 2. Probing the features seems flexible, but it also means we need to maintain the features whenever it relates to both central and ovn-controller. The list can grow and the if/else branches will also grow in the feature implementation code. Some may be simple (like this one), some may be more complex. These code are mostly for one-time upgrade use, but we may need to maintain them forever, or, we can remove/cleanup them after every other release, assuming that we only support upgrading between adjacent releases. 3. There may be security/operational concerns for this approach. At any time a misbehaving ovn-controller can revert any features and impact/degrade the whole environment. Assume that there are features A,B,C,D,E. A compute node may delete some or all of the features from its chassis:other_config, and ovn-northd would revert those features just because of the bad compute node. This can happen easily in an operational problem - some "bad" servers that were down for quite some time suddenly "wake up" with an old release running. Or it can be a compromised server controlled by an attacker. It is also possible that a problematic server fails in upgrading blocks the whole control plane to enable the new feature, which may be even more common but less harmful than the former cases of "reverting features". I think we may need to add a safeguard to avoid reverting already applied features in northd. Thanks, Han > > I went ahead and pushed v6 with additional commit (as everything else was acked, except the delay discussion on 5th commit) > that adds the feature check, we can always push only the first part if we decide that the > feature is not needed. > > > Thanks, > Ales > >> >> Thanks >> Numan >> >> A 3rd option depends on how CMS would integrate this feature. If the CMS >> > can implement a global option that enables/disables the feature, that would >> > work for the 2 steps I mentioned earlier: step1: upgrade OVN, step2: turn >> > on the CMS option. >> > >> >> [1] - https://github.com/ovn-org/ovn/commit/3fd4db6324ec9fe8a8ddedab3e91f7251f56de9d >> >> > > Regards, >> > > Dumitru >> > > >> > _______________________________________________ >> > dev mailing list >> > [email protected] >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA > > [email protected] IM: amusil _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
