On Fri, Aug 12, 2022 at 12:21 AM Ales Musil <[email protected]> wrote: > > On Thu, Aug 11, 2022 at 2:45 PM Dumitru Ceara <[email protected]> wrote: > > > On 8/10/22 18:37, Han Zhou wrote: > > > On Tue, Aug 9, 2022 at 11:22 PM Ales Musil <[email protected]> wrote: > > >> > > >> > > >> > > >> On Wed, Aug 10, 2022 at 8:04 AM Han Zhou <[email protected]> wrote: > > >> > > >> Hi Han, > > >> > > >> thank you for the review, see my reply inline below. > > >> > > >>> On Tue, Aug 9, 2022 at 4:05 AM Ales Musil <[email protected]> wrote: > > >>>> > > >>>> Add MAC binding aging mechanism, that utilizes > > >>>> the timestamp column of MAC_Binding table. > > >>>> When the MAC binding exceeds the threshold it is > > >>>> removed from SB DB, this is postponed only in case > > >>>> we receive update ARP with update to MAC address. > > >>>> > > >>>> The threshold is configurable via option > > >>>> "mac_binding_age_threshold" that can be specified > > >>>> for each logical router. The option is defaulting to > > >>>> 0 which means that by default the aging is disabled > > >>>> and the MAC binding rows will be persisted the same > > >>>> way as before. > > >>>> > > >>>> Reported-at: https://bugzilla.redhat.com/2084668 > > >>>> Signed-off-by: Ales Musil <[email protected]> > > >>>> --- > > >>>> v3: Rebase on top of current main. > > >>>> Update according to the final conclusion. > > >>>> v4: Rebase on top of current main. > > >>>> Skip one row instead of all if time elapsed is negative. > > >>>> Add note to NEWS. > > >>>> --- > > >>>> NEWS | 3 + > > >>>> northd/automake.mk | 2 + > > >>>> northd/inc-proc-northd.c | 13 ++++ > > >>>> northd/mac-binding-aging.c | 151 > > +++++++++++++++++++++++++++++++++++++ > > >>>> northd/mac-binding-aging.h | 33 ++++++++ > > >>>> ovn-nb.xml | 7 ++ > > >>>> 6 files changed, 209 insertions(+) > > >>>> create mode 100644 northd/mac-binding-aging.c > > >>>> create mode 100644 northd/mac-binding-aging.h > > >>>> > > >>>> diff --git a/NEWS b/NEWS > > >>>> index 20cea579e..eff8e472a 100644 > > >>>> --- a/NEWS > > >>>> +++ b/NEWS > > >>>> @@ -14,6 +14,9 @@ Post v22.06.0 > > >>>> NAT-T UDP encapsulation. Requires OVS support for IPsec custom > > > tunnel > > >>>> options (which will be available in OVS 2.18). > > >>>> - Removed possibility of disabling logical datapath groups. > > >>>> + - Added MAC binding aging mechanism, that is disabled by default. > > >>>> + It can be enabled per logical router with option > > >>>> + "mac_binding_age_threshold". > > >>>> > > >>>> OVN v22.06.0 - 03 Jun 2022 > > >>>> -------------------------- > > >>>> diff --git a/northd/automake.mk b/northd/automake.mk > > >>>> index 4862ec7b7..81582867d 100644 > > >>>> --- a/northd/automake.mk > > >>>> +++ b/northd/automake.mk > > >>>> @@ -1,6 +1,8 @@ > > >>>> # ovn-northd > > >>>> bin_PROGRAMS += northd/ovn-northd > > >>>> northd_ovn_northd_SOURCES = \ > > >>>> + northd/mac-binding-aging.c \ > > >>>> + northd/mac-binding-aging.h \ > > >>>> northd/northd.c \ > > >>>> northd/northd.h \ > > >>>> northd/ovn-northd.c \ > > >>>> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c > > >>>> index 43093cb5a..4a3699106 100644 > > >>>> --- a/northd/inc-proc-northd.c > > >>>> +++ b/northd/inc-proc-northd.c > > >>>> @@ -22,9 +22,11 @@ > > >>>> #include "ip-mcast-index.h" > > >>>> #include "static-mac-binding-index.h" > > >>>> #include "lib/inc-proc-eng.h" > > >>>> +#include "lib/mac-binding-index.h" > > >>>> #include "lib/ovn-nb-idl.h" > > >>>> #include "lib/ovn-sb-idl.h" > > >>>> #include "mcast-group-index.h" > > >>>> +#include "northd/mac-binding-aging.h" > > >>>> #include "openvswitch/poll-loop.h" > > >>>> #include "openvswitch/vlog.h" > > >>>> #include "inc-proc-northd.h" > > >>>> @@ -149,6 +151,8 @@ enum sb_engine_node { > > >>>> * avoid sparse errors. */ > > >>>> static ENGINE_NODE(northd, "northd"); > > >>>> static ENGINE_NODE(lflow, "lflow"); > > >>>> +static ENGINE_NODE(mac_binding_aging, "mac_binding_aging"); > > >>>> +static ENGINE_NODE(mac_binding_aging_waker, > > > "mac_binding_aging_waker"); > > >>>> > > >>>> void inc_proc_northd_init(struct ovsdb_idl_loop *nb, > > >>>> struct ovsdb_idl_loop *sb) > > >>>> @@ -211,12 +215,16 @@ void inc_proc_northd_init(struct ovsdb_idl_loop > > > *nb, > > >>>> engine_add_input(&en_northd, &en_sb_load_balancer, NULL); > > >>>> engine_add_input(&en_northd, &en_sb_fdb, NULL); > > >>>> engine_add_input(&en_northd, &en_sb_static_mac_binding, NULL); > > >>>> + engine_add_input(&en_mac_binding_aging, &en_sb_mac_binding, > > NULL); > > >>>> + engine_add_input(&en_mac_binding_aging, &en_northd, NULL); > > >>>> + engine_add_input(&en_mac_binding_aging, > > > &en_mac_binding_aging_waker, NULL); > > >>>> engine_add_input(&en_lflow, &en_nb_bfd, NULL); > > >>>> engine_add_input(&en_lflow, &en_sb_bfd, NULL); > > >>>> engine_add_input(&en_lflow, &en_sb_logical_flow, NULL); > > >>>> engine_add_input(&en_lflow, &en_sb_multicast_group, NULL); > > >>>> engine_add_input(&en_lflow, &en_sb_igmp_group, NULL); > > >>>> engine_add_input(&en_lflow, &en_northd, NULL); > > >>>> + engine_add_input(&en_lflow, &en_mac_binding_aging, NULL); > > >>>> > > >>>> struct engine_arg engine_arg = { > > >>>> .nb_idl = nb->idl, > > >>>> @@ -235,6 +243,8 @@ void inc_proc_northd_init(struct ovsdb_idl_loop > > > *nb, > > >>>> chassis_hostname_index_create(sb->idl); > > >>>> struct ovsdb_idl_index *sbrec_static_mac_binding_by_lport_ip > > >>>> = static_mac_binding_index_create(sb->idl); > > >>>> + struct ovsdb_idl_index *sbrec_mac_binding_by_datapath > > >>>> + = mac_binding_by_datapath_index_create(sb->idl); > > >>>> > > >>>> engine_init(&en_lflow, &engine_arg); > > >>>> > > >>>> @@ -256,6 +266,9 @@ void inc_proc_northd_init(struct ovsdb_idl_loop > > > *nb, > > >>>> engine_ovsdb_node_add_index(&en_sb_static_mac_binding, > > >>>> > > > "sbrec_static_mac_binding_by_lport_ip", > > >>>> > > sbrec_static_mac_binding_by_lport_ip); > > >>>> + engine_ovsdb_node_add_index(&en_sb_mac_binding, > > >>>> + "sbrec_mac_binding_by_datapath", > > >>>> + sbrec_mac_binding_by_datapath); > > >>>> } > > >>>> > > >>>> void inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn, > > >>>> diff --git a/northd/mac-binding-aging.c b/northd/mac-binding-aging.c > > >>>> new file mode 100644 > > >>>> index 000000000..e8217e8bc > > >>>> --- /dev/null > > >>>> +++ b/northd/mac-binding-aging.c > > >>>> @@ -0,0 +1,151 @@ > > >>>> +/* Copyright (c) 2022, Red Hat, Inc. > > >>>> + * > > >>>> + * Licensed under the Apache License, Version 2.0 (the "License"); > > >>>> + * you may not use this file except in compliance with the License. > > >>>> + * You may obtain a copy of the License at: > > >>>> + * > > >>>> + * http://www.apache.org/licenses/LICENSE-2.0 > > >>>> + * > > >>>> + * Unless required by applicable law or agreed to in writing, > > software > > >>>> + * distributed under the License is distributed on an "AS IS" BASIS, > > >>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > > > implied. > > >>>> + * See the License for the specific language governing permissions > > and > > >>>> + * limitations under the License. > > >>>> + */ > > >>>> + > > >>>> +#include <config.h> > > >>>> + > > >>>> +#include "lib/inc-proc-eng.h" > > >>>> +#include "lib/ovn-nb-idl.h" > > >>>> +#include "lib/ovn-sb-idl.h" > > >>>> +#include "lib/timeval.h" > > >>>> +#include "northd/mac-binding-aging.h" > > >>>> +#include "northd/northd.h" > > >>>> +#include "openvswitch/hmap.h" > > >>>> +#include "openvswitch/poll-loop.h" > > >>>> +#include "openvswitch/util.h" > > >>>> +#include "openvswitch/vlog.h" > > >>>> + > > >>>> +VLOG_DEFINE_THIS_MODULE(mac_binding_aging); > > >>>> + > > >>>> +struct mac_binding_waker { > > >>>> + bool should_schedule; > > >>>> + long long next_wake_msec; > > >>>> +}; > > >>>> + > > >>>> +static struct mac_binding_waker waker; > > >>>> + > > >>>> +static void > > >>>> +mac_binding_aging_run_for_datapath(const struct > > > sbrec_datapath_binding *dp, > > >>>> + const struct nbrec_logical_router > > > *nbr, > > >>>> + struct ovsdb_idl_index > > > *mb_by_datapath, > > >>>> + int64_t now, int64_t *wake_delay) > > >>>> +{ > > >>>> + uint64_t threshold = smap_get_uint(&nbr->options, > > >>>> + "mac_binding_age_threshold", > > >>>> + 0) * 1000; > > > > Nit: indentation. > > > > >>>> + if (!threshold) { > > >>>> + return; > > >>>> + } > > >>>> + > > >>>> + struct sbrec_mac_binding *mb_index_row = > > >>>> + sbrec_mac_binding_index_init_row(mb_by_datapath); > > >>>> + sbrec_mac_binding_index_set_datapath(mb_index_row, dp); > > >>>> + > > >>>> + const struct sbrec_mac_binding *mb; > > >>>> + SBREC_MAC_BINDING_FOR_EACH_EQUAL (mb, mb_index_row, > > > mb_by_datapath) { > > >>>> + int64_t elapsed = now - mb->timestamp; > > >>>> + > > >>>> + if (elapsed < 0) { > > >>>> + continue; > > >>>> + } else if (elapsed >= threshold) { > > >>>> + sbrec_mac_binding_delete(mb); > > >>>> + } else { > > >>>> + *wake_delay = MIN(*wake_delay, threshold - elapsed); > > >>>> + } > > >>>> + } > > >>>> + sbrec_mac_binding_index_destroy_row(mb_index_row); > > >>>> +} > > >>>> + > > >>>> +void > > >>>> +en_mac_binding_aging_run(struct engine_node *node, void *data > > > OVS_UNUSED) > > >>>> +{ > > >>>> + const struct engine_context *eng_ctx = engine_get_context(); > > >>>> + > > >>>> + if (!eng_ctx->ovnsb_idl_txn) { > > >>>> + return; > > >>>> + } > > >>>> + > > >>>> + int64_t next_expire_msec = INT64_MAX; > > >>>> + int64_t now = time_wall_msec(); > > >>>> + struct northd_data *northd_data = engine_get_input_data("northd", > > > node); > > >>>> + struct ovsdb_idl_index *sbrec_mac_binding_by_datapath = > > >>>> + > > > engine_ovsdb_node_get_index(engine_get_input("SB_mac_binding", node), > > >>>> + "sbrec_mac_binding_by_datapath"); > > >>>> + > > >>>> + struct ovn_datapath *od; > > >>>> + HMAP_FOR_EACH (od, key_node, &northd_data->datapaths) { > > >>>> + if (od->sb && od->nbr) { > > >>>> + mac_binding_aging_run_for_datapath(od->sb, od->nbr, > > >>>> + > > > sbrec_mac_binding_by_datapath, > > >>>> + now, > > > &next_expire_msec); > > >>>> + } > > >>>> + } > > >>>> + > > >>>> + if (next_expire_msec < INT64_MAX) { > > >>>> + waker.should_schedule = true; > > >>>> + waker.next_wake_msec = time_msec() + next_expire_msec; > > >>>> + /* Run the engine right after so the waker can reflect on the > > > new wake > > >>>> + * time. */ > > >>>> + poll_immediate_wake(); > > >>> > > >>> Why would we immediately wake up? Wouldn't it cause busy-loop? Shall > > we > > > use poll_timer_wait_until? Sorry if I missed anything here and please > > > correct me. > > >> > > >> > > >> That's because of the dependency of the nodes, we need to run the waker > > > that will schedule the sleep. > > >> We could probably use poll_timer_wait_until as well and it should have > > > the same effect. > > >> > > > Thanks, I think I did misunderstand a little and now I got the idea of > > the > > > waker node. It is an input node just to trigger the timer, which wakes up > > > the aging processing when needed and at the same time avoids running the > > > aging scanning when the process is woken up for other reasons. So the > > > poll_immediate_wake() here won't cause busy-loop. However, it does wake > > up > > > the process at least one extra time for each run, although the cost may > > be > > > small. I'd still use poll_timer_wait_until() here since we already know > > the > > > next_wake_msec. > > > > > > In addition, it is a little weird from I-P engine's point of view that > > the > > > output node execution updates the input node's data, but for this timer > > use > > > case I think it should work and I haven't come up with anything better. > > It > > > would be helpful to leave some comments for the waker node to explain the > > > motivation and make this part easier to understand. > > > > > > > +1 for both points (comment + poll_timer_wait_until()). > > > > Also, can we at least use engine_get_input_data() to get the input > > node's data? This will make it more clear that we're using data that's > > related to the waker node. The 'waker' variable would have to be moved > > inside the en_mac_binding_aging_waker data. > > > > > Thanks, > > > Han > > > > > >>> > > >>> > > >>>> + } else { > > >>>> + waker.should_schedule = false; > > >>>> + } > > >>>> + > > >>>> + /* This node is part of lflow, but lflow does not depend on it. > > > Setting > > >>>> + * state as unchanged does not trigger lflow node when it is not > > > needed. */ > > >>> > > >>> This should work, but it is like a hack. Ideally we should allow the > > I-P > > > engine to have multiple "root" nodes. This can be a future improvement > > > though. Maybe add a "XXX" comment here (a TODO item). > > >> > > >> > > >> Yes unfortunately this is not the cleanest solution, but I did not come > > > with a better one. I will add a TODO comment here about moving it once we > > > allow multiple root nodes. > > >> > > >>> > > >>> > > >>> Thanks, > > >>> Han > > >>> > > >> > > >> Thanks, > > >> Ales > > >> > > >>> > > >>> > > >>>> + engine_set_node_state(node, EN_UNCHANGED); > > >>>> +} > > >>>> + > > >>>> +void * > > >>>> +en_mac_binding_aging_init(struct engine_node *node OVS_UNUSED, > > >>>> + struct engine_arg *arg OVS_UNUSED) > > >>>> +{ > > >>>> + return NULL; > > >>>> +} > > >>>> + > > >>>> +void > > >>>> +en_mac_binding_aging_cleanup(void *data OVS_UNUSED) > > >>>> +{ > > >>>> +} > > >>>> + > > >>>> +void > > >>>> +en_mac_binding_aging_waker_run(struct engine_node *node, void *data > > > OVS_UNUSED) > > >>>> +{ > > >>>> + if (!waker.should_schedule) { > > >>>> + return; > > >>>> + } > > >>>> + > > >>>> + if (time_msec() >= waker.next_wake_msec) { > > >>>> + waker.should_schedule = false; > > >>>> + engine_set_node_state(node, EN_UPDATED); > > >>>> + return; > > >>>> + } > > >>>> + > > >>>> + engine_set_node_state(node, EN_UNCHANGED); > > >>>> + poll_timer_wait_until(waker.next_wake_msec); > > >>>> +} > > >>>> + > > >>>> +void * > > >>>> +en_mac_binding_aging_waker_init(struct engine_node *node OVS_UNUSED, > > >>>> + struct engine_arg *arg OVS_UNUSED) > > >>>> +{ > > >>>> + waker.should_schedule = false; > > >>>> + waker.next_wake_msec = 0; > > >>>> + return NULL; > > >>>> +} > > >>>> + > > >>>> +void > > >>>> +en_mac_binding_aging_waker_cleanup(void *data OVS_UNUSED) > > >>>> +{ > > >>>> +} > > >>>> diff --git a/northd/mac-binding-aging.h b/northd/mac-binding-aging.h > > >>>> new file mode 100644 > > >>>> index 000000000..296a7ab38 > > >>>> --- /dev/null > > >>>> +++ b/northd/mac-binding-aging.h > > >>>> @@ -0,0 +1,33 @@ > > >>>> +/* Copyright (c) 2022, Red Hat, Inc. > > >>>> + * > > >>>> + * Licensed under the Apache License, Version 2.0 (the "License"); > > >>>> + * you may not use this file except in compliance with the License. > > >>>> + * You may obtain a copy of the License at: > > >>>> + * > > >>>> + * http://www.apache.org/licenses/LICENSE-2.0 > > >>>> + * > > >>>> + * Unless required by applicable law or agreed to in writing, > > software > > >>>> + * distributed under the License is distributed on an "AS IS" BASIS, > > >>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > > > implied. > > >>>> + * See the License for the specific language governing permissions > > and > > >>>> + * limitations under the License. > > >>>> + */ > > >>>> + > > >>>> +#ifndef MAC_BINDING_AGING_H > > >>>> +#define MAC_BINDING_AGING_H 1 > > >>>> + > > >>>> +#include "lib/inc-proc-eng.h" > > >>>> + > > >>>> +/* The MAC binding aging node functions. */ > > >>>> +void en_mac_binding_aging_run(struct engine_node *node, void *data); > > >>>> +void *en_mac_binding_aging_init(struct engine_node *node, > > >>>> + struct engine_arg *arg); > > >>>> +void en_mac_binding_aging_cleanup(void *data); > > >>>> + > > >>>> +/* The MAC binding aging waker node functions. */ > > >>>> +void en_mac_binding_aging_waker_run(struct engine_node *node, void > > > *data); > > >>>> +void *en_mac_binding_aging_waker_init(struct engine_node *node, > > >>>> + struct engine_arg *arg); > > >>>> +void en_mac_binding_aging_waker_cleanup(void *data); > > >>>> + > > >>>> +#endif /* northd/mac-binding-aging.h */ > > >>>> diff --git a/ovn-nb.xml b/ovn-nb.xml > > >>>> index e26afd83c..d3fba9bdc 100644 > > >>>> --- a/ovn-nb.xml > > >>>> +++ b/ovn-nb.xml > > >>>> @@ -2392,6 +2392,13 @@ > > >>>> and other sources. This way, OVN and the other sources can > > > make use of > > >>>> the same conntrack zone. > > >>>> </column> > > >>>> + > > >>>> + <column name="options" key="mac_binding_age_threshold" > > >>>> + type='{"type": "integer", "minInteger": 0, > > > "maxInteger": 4294967295}'> > > >>>> + MAC binding aging <code>threshold</code> value in secs. MAC > > > binding > > > > Nit: I'd use 'seconds'. > > > > Thanks, > > Dumitru > > > > >>>> + exceeding this timeout will be automatically removed. The > > > value > > >>>> + defaults to 0, which means disabled. > > >>>> + </column> > > >>>> </group> > > >>>> > > >>>> <group title="Common Columns"> > > >>>> -- > > >>>> 2.37.1 > > >>>> > > >> > > >> > > >> > > >> -- > > >> > > >> Ales Musil > > >> > > >> Senior Software Engineer - OVN Core > > >> > > >> Red Hat EMEA > > >> > > >> [email protected] IM: amusil > > > > > > > > Thank you for the reviews. > > v5 should address most of the comments except the open one from Numan about > upgrade. >
IMO upgrading one is very important. As I mentioned earlier, openstack neutron or ovn-kubernetes would not know which OVN version is being run and if all the ovn-controllers are upgraded to the version which has the mac_binding timestamp support. This config option is not something which an administrator can enable or disable depending on the upgrade status. There have been a few upgrade issues with OVN and because of which openstack CMS had to do special handling. This patch series can potentially affect the existing datapath traffic after partial upgrades. So I think its better we handle this properly. Numan > Thanks, > Ales > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > [email protected] IM: amusil > <https://red.ht/sig> > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
