On Mon, Aug 15, 2022 at 11:34 PM Ales Musil <[email protected]> wrote:
>
>
>
> On Tue, Aug 16, 2022 at 4:14 AM Numan Siddique <[email protected]> wrote:
>>
>> On Tue, Aug 16, 2022 at 3:07 AM Han Zhou <[email protected]> wrote:
>> >
>> > On Mon, Aug 15, 2022 at 1:23 AM Dumitru Ceara <[email protected]>
wrote:
>> > >
>> > > On 8/15/22 08:46, Numan Siddique wrote:
>> > > > "
>> > > >
>> > > > On Mon, Aug 15, 2022 at 3:34 PM Han Zhou <[email protected]> wrote:
>> > > >>
>> > > >> On Sun, Aug 14, 2022 at 8:30 PM Numan Siddique <[email protected]>
wrote:
>> > > >>>
>> > > >>> On Fri, Aug 12, 2022 at 12:21 AM Ales Musil <[email protected]>
wrote:
>> > > >>>>
>> > > >>>> On Thu, Aug 11, 2022 at 2:45 PM Dumitru Ceara <[email protected]
>
>> > wrote:
>> > > >>>>
>> > > >>>>> On 8/10/22 18:37, Han Zhou wrote:
>> > > >>>>>> On Tue, Aug 9, 2022 at 11:22 PM Ales Musil <[email protected]>
>> > > >> wrote:
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> On Wed, Aug 10, 2022 at 8:04 AM Han Zhou <[email protected]>
wrote:
>> > > >>>>>>>
>> > > >>>>>>> Hi Han,
>> > > >>>>>>>
>> > > >>>>>>> thank you for the review, see my reply inline below.
>> > > >>>>>>>
>> > > >>>>>>>> On Tue, Aug 9, 2022 at 4:05 AM Ales Musil <[email protected]
>
>> > > >> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>> Add MAC binding aging mechanism, that utilizes
>> > > >>>>>>>>> the timestamp column of MAC_Binding table.
>> > > >>>>>>>>> When the MAC binding exceeds the threshold it is
>> > > >>>>>>>>> removed from SB DB, this is postponed only in case
>> > > >>>>>>>>> we receive update ARP with update to MAC address.
>> > > >>>>>>>>>
>> > > >>>>>>>>> The threshold is configurable via option
>> > > >>>>>>>>> "mac_binding_age_threshold" that can be specified
>> > > >>>>>>>>> for each logical router. The option is defaulting to
>> > > >>>>>>>>> 0 which means that by default the aging is disabled
>> > > >>>>>>>>> and the MAC binding rows will be persisted the same
>> > > >>>>>>>>> way as before.
>> > > >>>>>>>>>
>> > > >>>>>>>>> Reported-at: https://bugzilla.redhat.com/2084668
>> > > >>>>>>>>> Signed-off-by: Ales Musil <[email protected]>
>> > > >>>>>>>>> ---
>> > > >>>>>>>>> v3: Rebase on top of current main.
>> > > >>>>>>>>>     Update according to the final conclusion.
>> > > >>>>>>>>> v4: Rebase on top of current main.
>> > > >>>>>>>>>     Skip one row instead of all if time elapsed is
negative.
>> > > >>>>>>>>>     Add note to NEWS.
>> > > >>>>>>>>> ---
>> > > >>>>>>>>>  NEWS                       |   3 +
>> > > >>>>>>>>>  northd/automake.mk         |   2 +
>> > > >>>>>>>>>  northd/inc-proc-northd.c   |  13 ++++
>> > > >>>>>>>>>  northd/mac-binding-aging.c | 151
>> > > >>>>> +++++++++++++++++++++++++++++++++++++
>> > > >>>>>>>>>  northd/mac-binding-aging.h |  33 ++++++++
>> > > >>>>>>>>>  ovn-nb.xml                 |   7 ++
>> > > >>>>>>>>>  6 files changed, 209 insertions(+)
>> > > >>>>>>>>>  create mode 100644 northd/mac-binding-aging.c
>> > > >>>>>>>>>  create mode 100644 northd/mac-binding-aging.h
>> > > >>>>>>>>>
>> > > >>>>>>>>> diff --git a/NEWS b/NEWS
>> > > >>>>>>>>> index 20cea579e..eff8e472a 100644
>> > > >>>>>>>>> --- a/NEWS
>> > > >>>>>>>>> +++ b/NEWS
>> > > >>>>>>>>> @@ -14,6 +14,9 @@ Post v22.06.0
>> > > >>>>>>>>>      NAT-T UDP encapsulation. Requires OVS support for
IPsec
>> > > >> custom
>> > > >>>>>> tunnel
>> > > >>>>>>>>>      options (which will be available in OVS 2.18).
>> > > >>>>>>>>>    - Removed possibility of disabling logical datapath
groups.
>> > > >>>>>>>>> +  - Added MAC binding aging mechanism, that is disabled by
>> > > >> default.
>> > > >>>>>>>>> +    It can be enabled per logical router with option
>> > > >>>>>>>>> +    "mac_binding_age_threshold".
>> > > >>>>>>>>>
>> > > >>>>>>>>>  OVN v22.06.0 - 03 Jun 2022
>> > > >>>>>>>>>  --------------------------
>> > > >>>>>>>>> diff --git a/northd/automake.mk b/northd/automake.mk
>> > > >>>>>>>>> index 4862ec7b7..81582867d 100644
>> > > >>>>>>>>> --- a/northd/automake.mk
>> > > >>>>>>>>> +++ b/northd/automake.mk
>> > > >>>>>>>>> @@ -1,6 +1,8 @@
>> > > >>>>>>>>>  # ovn-northd
>> > > >>>>>>>>>  bin_PROGRAMS += northd/ovn-northd
>> > > >>>>>>>>>  northd_ovn_northd_SOURCES = \
>> > > >>>>>>>>> +       northd/mac-binding-aging.c \
>> > > >>>>>>>>> +       northd/mac-binding-aging.h \
>> > > >>>>>>>>>         northd/northd.c \
>> > > >>>>>>>>>         northd/northd.h \
>> > > >>>>>>>>>         northd/ovn-northd.c \
>> > > >>>>>>>>> diff --git a/northd/inc-proc-northd.c
b/northd/inc-proc-northd.c
>> > > >>>>>>>>> index 43093cb5a..4a3699106 100644
>> > > >>>>>>>>> --- a/northd/inc-proc-northd.c
>> > > >>>>>>>>> +++ b/northd/inc-proc-northd.c
>> > > >>>>>>>>> @@ -22,9 +22,11 @@
>> > > >>>>>>>>>  #include "ip-mcast-index.h"
>> > > >>>>>>>>>  #include "static-mac-binding-index.h"
>> > > >>>>>>>>>  #include "lib/inc-proc-eng.h"
>> > > >>>>>>>>> +#include "lib/mac-binding-index.h"
>> > > >>>>>>>>>  #include "lib/ovn-nb-idl.h"
>> > > >>>>>>>>>  #include "lib/ovn-sb-idl.h"
>> > > >>>>>>>>>  #include "mcast-group-index.h"
>> > > >>>>>>>>> +#include "northd/mac-binding-aging.h"
>> > > >>>>>>>>>  #include "openvswitch/poll-loop.h"
>> > > >>>>>>>>>  #include "openvswitch/vlog.h"
>> > > >>>>>>>>>  #include "inc-proc-northd.h"
>> > > >>>>>>>>> @@ -149,6 +151,8 @@ enum sb_engine_node {
>> > > >>>>>>>>>   * avoid sparse errors. */
>> > > >>>>>>>>>  static ENGINE_NODE(northd, "northd");
>> > > >>>>>>>>>  static ENGINE_NODE(lflow, "lflow");
>> > > >>>>>>>>> +static ENGINE_NODE(mac_binding_aging,
"mac_binding_aging");
>> > > >>>>>>>>> +static ENGINE_NODE(mac_binding_aging_waker,
>> > > >>>>>> "mac_binding_aging_waker");
>> > > >>>>>>>>>
>> > > >>>>>>>>>  void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
>> > > >>>>>>>>>                            struct ovsdb_idl_loop *sb)
>> > > >>>>>>>>> @@ -211,12 +215,16 @@ void inc_proc_northd_init(struct
>> > > >> ovsdb_idl_loop
>> > > >>>>>> *nb,
>> > > >>>>>>>>>      engine_add_input(&en_northd, &en_sb_load_balancer,
NULL);
>> > > >>>>>>>>>      engine_add_input(&en_northd, &en_sb_fdb, NULL);
>> > > >>>>>>>>>      engine_add_input(&en_northd,
&en_sb_static_mac_binding,
>> > > >> NULL);
>> > > >>>>>>>>> +    engine_add_input(&en_mac_binding_aging,
&en_sb_mac_binding,
>> > > >>>>> NULL);
>> > > >>>>>>>>> +    engine_add_input(&en_mac_binding_aging, &en_northd,
NULL);
>> > > >>>>>>>>> +    engine_add_input(&en_mac_binding_aging,
>> > > >>>>>> &en_mac_binding_aging_waker, NULL);
>> > > >>>>>>>>>      engine_add_input(&en_lflow, &en_nb_bfd, NULL);
>> > > >>>>>>>>>      engine_add_input(&en_lflow, &en_sb_bfd, NULL);
>> > > >>>>>>>>>      engine_add_input(&en_lflow, &en_sb_logical_flow,
NULL);
>> > > >>>>>>>>>      engine_add_input(&en_lflow, &en_sb_multicast_group,
NULL);
>> > > >>>>>>>>>      engine_add_input(&en_lflow, &en_sb_igmp_group, NULL);
>> > > >>>>>>>>>      engine_add_input(&en_lflow, &en_northd, NULL);
>> > > >>>>>>>>> +    engine_add_input(&en_lflow, &en_mac_binding_aging,
NULL);
>> > > >>>>>>>>>
>> > > >>>>>>>>>      struct engine_arg engine_arg = {
>> > > >>>>>>>>>          .nb_idl = nb->idl,
>> > > >>>>>>>>> @@ -235,6 +243,8 @@ void inc_proc_northd_init(struct
>> > > >> ovsdb_idl_loop
>> > > >>>>>> *nb,
>> > > >>>>>>>>>          chassis_hostname_index_create(sb->idl);
>> > > >>>>>>>>>      struct ovsdb_idl_index
>> > *sbrec_static_mac_binding_by_lport_ip
>> > > >>>>>>>>>          = static_mac_binding_index_create(sb->idl);
>> > > >>>>>>>>> +    struct ovsdb_idl_index *sbrec_mac_binding_by_datapath
>> > > >>>>>>>>> +        = mac_binding_by_datapath_index_create(sb->idl);
>> > > >>>>>>>>>
>> > > >>>>>>>>>      engine_init(&en_lflow, &engine_arg);
>> > > >>>>>>>>>
>> > > >>>>>>>>> @@ -256,6 +266,9 @@ void inc_proc_northd_init(struct
>> > > >> ovsdb_idl_loop
>> > > >>>>>> *nb,
>> > > >>>>>>>>>      engine_ovsdb_node_add_index(&en_sb_static_mac_binding,
>> > > >>>>>>>>>
>> > > >>>>>>  "sbrec_static_mac_binding_by_lport_ip",
>> > > >>>>>>>>>
>> > > >>>>> sbrec_static_mac_binding_by_lport_ip);
>> > > >>>>>>>>> +    engine_ovsdb_node_add_index(&en_sb_mac_binding,
>> > > >>>>>>>>> +
>> >  "sbrec_mac_binding_by_datapath",
>> > > >>>>>>>>> +
 sbrec_mac_binding_by_datapath);
>> > > >>>>>>>>>  }
>> > > >>>>>>>>>
>> > > >>>>>>>>>  void inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
>> > > >>>>>>>>> diff --git a/northd/mac-binding-aging.c
>> > > >> b/northd/mac-binding-aging.c
>> > > >>>>>>>>> new file mode 100644
>> > > >>>>>>>>> index 000000000..e8217e8bc
>> > > >>>>>>>>> --- /dev/null
>> > > >>>>>>>>> +++ b/northd/mac-binding-aging.c
>> > > >>>>>>>>> @@ -0,0 +1,151 @@
>> > > >>>>>>>>> +/* Copyright (c) 2022, Red Hat, Inc.
>> > > >>>>>>>>> + *
>> > > >>>>>>>>> + * Licensed under the Apache License, Version 2.0 (the
>> > > >> "License");
>> > > >>>>>>>>> + * you may not use this file except in compliance with the
>> > > >> License.
>> > > >>>>>>>>> + * You may obtain a copy of the License at:
>> > > >>>>>>>>> + *
>> > > >>>>>>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> > > >>>>>>>>> + *
>> > > >>>>>>>>> + * Unless required by applicable law or agreed to in
writing,
>> > > >>>>> software
>> > > >>>>>>>>> + * distributed under the License is distributed on an "AS
IS"
>> > > >> BASIS,
>> > > >>>>>>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express
>> > > >> or
>> > > >>>>>> implied.
>> > > >>>>>>>>> + * See the License for the specific language governing
>> > > >> permissions
>> > > >>>>> and
>> > > >>>>>>>>> + * limitations under the License.
>> > > >>>>>>>>> + */
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +#include <config.h>
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +#include "lib/inc-proc-eng.h"
>> > > >>>>>>>>> +#include "lib/ovn-nb-idl.h"
>> > > >>>>>>>>> +#include "lib/ovn-sb-idl.h"
>> > > >>>>>>>>> +#include "lib/timeval.h"
>> > > >>>>>>>>> +#include "northd/mac-binding-aging.h"
>> > > >>>>>>>>> +#include "northd/northd.h"
>> > > >>>>>>>>> +#include "openvswitch/hmap.h"
>> > > >>>>>>>>> +#include "openvswitch/poll-loop.h"
>> > > >>>>>>>>> +#include "openvswitch/util.h"
>> > > >>>>>>>>> +#include "openvswitch/vlog.h"
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +VLOG_DEFINE_THIS_MODULE(mac_binding_aging);
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +struct mac_binding_waker {
>> > > >>>>>>>>> +    bool should_schedule;
>> > > >>>>>>>>> +    long long next_wake_msec;
>> > > >>>>>>>>> +};
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +static struct mac_binding_waker waker;
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +static void
>> > > >>>>>>>>> +mac_binding_aging_run_for_datapath(const struct
>> > > >>>>>> sbrec_datapath_binding *dp,
>> > > >>>>>>>>> +                                   const struct
>> > > >> nbrec_logical_router
>> > > >>>>>> *nbr,
>> > > >>>>>>>>> +                                   struct ovsdb_idl_index
>> > > >>>>>> *mb_by_datapath,
>> > > >>>>>>>>> +                                   int64_t now, int64_t
>> > > >> *wake_delay)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +    uint64_t threshold = smap_get_uint(&nbr->options,
>> > > >>>>>>>>> +
>> > > >> "mac_binding_age_threshold",
>> > > >>>>>>>>> +                                         0) * 1000;
>> > > >>>>>
>> > > >>>>> Nit: indentation.
>> > > >>>>>
>> > > >>>>>>>>> +    if (!threshold) {
>> > > >>>>>>>>> +        return;
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    struct sbrec_mac_binding *mb_index_row =
>> > > >>>>>>>>> +        sbrec_mac_binding_index_init_row(mb_by_datapath);
>> > > >>>>>>>>> +    sbrec_mac_binding_index_set_datapath(mb_index_row,
dp);
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    const struct sbrec_mac_binding *mb;
>> > > >>>>>>>>> +    SBREC_MAC_BINDING_FOR_EACH_EQUAL (mb, mb_index_row,
>> > > >>>>>> mb_by_datapath) {
>> > > >>>>>>>>> +        int64_t elapsed = now - mb->timestamp;
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +        if (elapsed < 0) {
>> > > >>>>>>>>> +            continue;
>> > > >>>>>>>>> +        } else if (elapsed >= threshold) {
>> > > >>>>>>>>> +            sbrec_mac_binding_delete(mb);
>> > > >>>>>>>>> +        } else {
>> > > >>>>>>>>> +            *wake_delay = MIN(*wake_delay, threshold -
>> > elapsed);
>> > > >>>>>>>>> +        }
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +    sbrec_mac_binding_index_destroy_row(mb_index_row);
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +void
>> > > >>>>>>>>> +en_mac_binding_aging_run(struct engine_node *node, void
*data
>> > > >>>>>> OVS_UNUSED)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +    const struct engine_context *eng_ctx =
>> > engine_get_context();
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    if (!eng_ctx->ovnsb_idl_txn) {
>> > > >>>>>>>>> +        return;
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    int64_t next_expire_msec = INT64_MAX;
>> > > >>>>>>>>> +    int64_t now = time_wall_msec();
>> > > >>>>>>>>> +    struct northd_data *northd_data =
>> > > >> engine_get_input_data("northd",
>> > > >>>>>> node);
>> > > >>>>>>>>> +    struct ovsdb_idl_index *sbrec_mac_binding_by_datapath
=
>> > > >>>>>>>>> +
>> > > >>>>>>
 engine_ovsdb_node_get_index(engine_get_input("SB_mac_binding",
>> > > >> node),
>> > > >>>>>>>>> +
>> > > >>  "sbrec_mac_binding_by_datapath");
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    struct ovn_datapath *od;
>> > > >>>>>>>>> +    HMAP_FOR_EACH (od, key_node, &northd_data->datapaths)
{
>> > > >>>>>>>>> +        if (od->sb && od->nbr) {
>> > > >>>>>>>>> +            mac_binding_aging_run_for_datapath(od->sb,
od->nbr,
>> > > >>>>>>>>> +
>> > > >>>>>> sbrec_mac_binding_by_datapath,
>> > > >>>>>>>>> +                                               now,
>> > > >>>>>> &next_expire_msec);
>> > > >>>>>>>>> +        }
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    if (next_expire_msec < INT64_MAX) {
>> > > >>>>>>>>> +        waker.should_schedule = true;
>> > > >>>>>>>>> +        waker.next_wake_msec = time_msec() +
next_expire_msec;
>> > > >>>>>>>>> +        /* Run the engine right after so the waker can
reflect
>> > > >> on the
>> > > >>>>>> new wake
>> > > >>>>>>>>> +         * time. */
>> > > >>>>>>>>> +        poll_immediate_wake();
>> > > >>>>>>>>
>> > > >>>>>>>> Why would we immediately wake up? Wouldn't it cause
busy-loop?
>> > > >> Shall
>> > > >>>>> we
>> > > >>>>>> use poll_timer_wait_until? Sorry if I missed anything here and
>> > > >> please
>> > > >>>>>> correct me.
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> That's because of the dependency of the nodes, we need to
run the
>> > > >> waker
>> > > >>>>>> that will schedule the sleep.
>> > > >>>>>>> We could probably use poll_timer_wait_until as well and it
should
>> > > >> have
>> > > >>>>>> the same effect.
>> > > >>>>>>>
>> > > >>>>>> Thanks, I think I did misunderstand a little and now I got
the idea
>> > > >> of
>> > > >>>>> the
>> > > >>>>>> waker node. It is an input node just to trigger the timer,
which
>> > > >> wakes up
>> > > >>>>>> the aging processing when needed and at the same time avoids
>> > > >> running the
>> > > >>>>>> aging scanning when the process is woken up for other
reasons. So
>> > > >> the
>> > > >>>>>> poll_immediate_wake() here won't cause busy-loop. However, it
does
>> > > >> wake
>> > > >>>>> up
>> > > >>>>>> the process at least one extra time for each run, although
the cost
>> > > >> may
>> > > >>>>> be
>> > > >>>>>> small. I'd still use poll_timer_wait_until() here since we
already
>> > > >> know
>> > > >>>>> the
>> > > >>>>>> next_wake_msec.
>> > > >>>>>>
>> > > >>>>>> In addition, it is a little weird from I-P engine's point of
view
>> > > >> that
>> > > >>>>> the
>> > > >>>>>> output node execution updates the input node's data, but for
this
>> > > >> timer
>> > > >>>>> use
>> > > >>>>>> case I think it should work and I haven't come up with
anything
>> > > >> better.
>> > > >>>>> It
>> > > >>>>>> would be helpful to leave some comments for the waker node to
>> > > >> explain the
>> > > >>>>>> motivation and make this part easier to understand.
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>> +1 for both points (comment + poll_timer_wait_until()).
>> > > >>>>>
>> > > >>>>> Also, can we at least use engine_get_input_data() to get the
input
>> > > >>>>> node's data?  This will make it more clear that we're using
data
>> > > >> that's
>> > > >>>>> related to the waker node.  The 'waker' variable would have to
be
>> > > >> moved
>> > > >>>>> inside the en_mac_binding_aging_waker data.
>> > > >>>>>
>> > > >>>>>> Thanks,
>> > > >>>>>> Han
>> > > >>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>> +    } else {
>> > > >>>>>>>>> +        waker.should_schedule = false;
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    /* This node is part of lflow, but lflow does not
depend on
>> > > >> it.
>> > > >>>>>> Setting
>> > > >>>>>>>>> +     * state as unchanged does not trigger lflow node
when it
>> > > >> is not
>> > > >>>>>> needed. */
>> > > >>>>>>>>
>> > > >>>>>>>> This should work, but it is like a hack. Ideally we should
allow
>> > > >> the
>> > > >>>>> I-P
>> > > >>>>>> engine to have multiple "root" nodes. This can be a future
>> > > >> improvement
>> > > >>>>>> though. Maybe add a "XXX" comment here (a TODO item).
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> Yes unfortunately this is not the cleanest solution, but I
did not
>> > > >> come
>> > > >>>>>> with a better one. I will add a TODO comment here about
moving it
>> > > >> once we
>> > > >>>>>> allow multiple root nodes.
>> > > >>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks,
>> > > >>>>>>>> Han
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> Thanks,
>> > > >>>>>>> Ales
>> > > >>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>> +    engine_set_node_state(node, EN_UNCHANGED);
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +void *
>> > > >>>>>>>>> +en_mac_binding_aging_init(struct engine_node *node
OVS_UNUSED,
>> > > >>>>>>>>> +                          struct engine_arg *arg
OVS_UNUSED)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +    return NULL;
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +void
>> > > >>>>>>>>> +en_mac_binding_aging_cleanup(void *data OVS_UNUSED)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +void
>> > > >>>>>>>>> +en_mac_binding_aging_waker_run(struct engine_node *node,
void
>> > > >> *data
>> > > >>>>>> OVS_UNUSED)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +    if (!waker.should_schedule) {
>> > > >>>>>>>>> +        return;
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    if (time_msec() >= waker.next_wake_msec) {
>> > > >>>>>>>>> +        waker.should_schedule = false;
>> > > >>>>>>>>> +        engine_set_node_state(node, EN_UPDATED);
>> > > >>>>>>>>> +        return;
>> > > >>>>>>>>> +    }
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +    engine_set_node_state(node, EN_UNCHANGED);
>> > > >>>>>>>>> +    poll_timer_wait_until(waker.next_wake_msec);
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +void *
>> > > >>>>>>>>> +en_mac_binding_aging_waker_init(struct engine_node *node
>> > > >> OVS_UNUSED,
>> > > >>>>>>>>> +                                struct engine_arg *arg
>> > > >> OVS_UNUSED)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +    waker.should_schedule = false;
>> > > >>>>>>>>> +    waker.next_wake_msec = 0;
>> > > >>>>>>>>> +    return NULL;
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +void
>> > > >>>>>>>>> +en_mac_binding_aging_waker_cleanup(void *data OVS_UNUSED)
>> > > >>>>>>>>> +{
>> > > >>>>>>>>> +}
>> > > >>>>>>>>> diff --git a/northd/mac-binding-aging.h
>> > > >> b/northd/mac-binding-aging.h
>> > > >>>>>>>>> new file mode 100644
>> > > >>>>>>>>> index 000000000..296a7ab38
>> > > >>>>>>>>> --- /dev/null
>> > > >>>>>>>>> +++ b/northd/mac-binding-aging.h
>> > > >>>>>>>>> @@ -0,0 +1,33 @@
>> > > >>>>>>>>> +/* Copyright (c) 2022, Red Hat, Inc.
>> > > >>>>>>>>> + *
>> > > >>>>>>>>> + * Licensed under the Apache License, Version 2.0 (the
>> > > >> "License");
>> > > >>>>>>>>> + * you may not use this file except in compliance with the
>> > > >> License.
>> > > >>>>>>>>> + * You may obtain a copy of the License at:
>> > > >>>>>>>>> + *
>> > > >>>>>>>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>> > > >>>>>>>>> + *
>> > > >>>>>>>>> + * Unless required by applicable law or agreed to in
writing,
>> > > >>>>> software
>> > > >>>>>>>>> + * distributed under the License is distributed on an "AS
IS"
>> > > >> BASIS,
>> > > >>>>>>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express
>> > > >> or
>> > > >>>>>> implied.
>> > > >>>>>>>>> + * See the License for the specific language governing
>> > > >> permissions
>> > > >>>>> and
>> > > >>>>>>>>> + * limitations under the License.
>> > > >>>>>>>>> + */
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +#ifndef MAC_BINDING_AGING_H
>> > > >>>>>>>>> +#define MAC_BINDING_AGING_H 1
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +#include "lib/inc-proc-eng.h"
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +/* The MAC binding aging node functions. */
>> > > >>>>>>>>> +void en_mac_binding_aging_run(struct engine_node *node,
void
>> > > >> *data);
>> > > >>>>>>>>> +void *en_mac_binding_aging_init(struct engine_node *node,
>> > > >>>>>>>>> +                                struct engine_arg *arg);
>> > > >>>>>>>>> +void en_mac_binding_aging_cleanup(void *data);
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +/* The MAC binding aging waker node functions. */
>> > > >>>>>>>>> +void en_mac_binding_aging_waker_run(struct engine_node
*node,
>> > > >> void
>> > > >>>>>> *data);
>> > > >>>>>>>>> +void *en_mac_binding_aging_waker_init(struct engine_node
*node,
>> > > >>>>>>>>> +                                      struct engine_arg
*arg);
>> > > >>>>>>>>> +void en_mac_binding_aging_waker_cleanup(void *data);
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +#endif /* northd/mac-binding-aging.h */
>> > > >>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>> > > >>>>>>>>> index e26afd83c..d3fba9bdc 100644
>> > > >>>>>>>>> --- a/ovn-nb.xml
>> > > >>>>>>>>> +++ b/ovn-nb.xml
>> > > >>>>>>>>> @@ -2392,6 +2392,13 @@
>> > > >>>>>>>>>          and other sources. This way, OVN and the other
sources
>> > > >> can
>> > > >>>>>> make use of
>> > > >>>>>>>>>          the same conntrack zone.
>> > > >>>>>>>>>        </column>
>> > > >>>>>>>>> +
>> > > >>>>>>>>> +      <column name="options"
key="mac_binding_age_threshold"
>> > > >>>>>>>>> +              type='{"type": "integer", "minInteger": 0,
>> > > >>>>>> "maxInteger": 4294967295}'>
>> > > >>>>>>>>> +        MAC binding aging <code>threshold</code> value in
secs.
>> > > >> MAC
>> > > >>>>>> binding
>> > > >>>>>
>> > > >>>>> Nit: I'd use 'seconds'.
>> > > >>>>>
>> > > >>>>> Thanks,
>> > > >>>>> Dumitru
>> > > >>>>>
>> > > >>>>>>>>> +        exceeding this timeout will be automatically
removed.
>> > > >> The
>> > > >>>>>> value
>> > > >>>>>>>>> +        defaults to 0, which means disabled.
>> > > >>>>>>>>> +      </column>
>> > > >>>>>>>>>      </group>
>> > > >>>>>>>>>
>> > > >>>>>>>>>      <group title="Common Columns">
>> > > >>>>>>>>> --
>> > > >>>>>>>>> 2.37.1
>> > > >>>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> --
>> > > >>>>>>>
>> > > >>>>>>> Ales Musil
>> > > >>>>>>>
>> > > >>>>>>> Senior Software Engineer - OVN Core
>> > > >>>>>>>
>> > > >>>>>>> Red Hat EMEA
>> > > >>>>>>>
>> > > >>>>>>> [email protected]    IM: amusil
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>> Thank you for the reviews.
>> > > >>>>
>> > > >>>> v5 should address most of the comments except the open one from
Numan
>> > > >> about
>> > > >>>> upgrade.
>> > > >>>>
>> > > >>>
>> > > >>> IMO upgrading one is very important.  As I mentioned earlier,
>> > > >>> openstack neutron or ovn-kubernetes would not know
>> > > >>> which OVN version is being run and if all the ovn-controllers are
>> > > >>> upgraded to the version which has the mac_binding timestamp
>> > > >>> support.  This config option is not something which an
administrator
>> > > >>> can enable or disable depending on the upgrade status.
>> > > >>>
>> > > >>> There have been a few  upgrade issues with OVN and because of
which
>> > > >>> openstack CMS had to do special handling.
>> > > >>> This patch series can potentially affect the existing datapath
traffic
>> > > >>> after partial upgrades.  So I think its better we handle this
>> > > >>> properly.
>> > > >>>
>> > > >>
>> > > >> Hi Numan, I agree with you that upgrading is important. But for
this
>> > case,
>> > > >> maybe I didn't understand the problem. The feature is disabled by
>> > default,
>> > > >> so as long as the CMS doesn't enable the aging setting for any LR
>> > before
>> > > >> the upgrade is completed, it should be fine, right?  This in
practice
>> > just
>> > > >> means two separate changes to the production environment:
>> > > >> 1. upgrade the OVN version, both central and controller components
>> > > >> 2. change the aging setting to desired value for some (or all) LRs
>> > > >
>> > > > In my opinion, the step (2) you mentioned above is something which
an
>> > > > operator/admin cannot set
>> > > > and it is most likely be set by openstack neutron or
ovn-kubernetes or
>> > > > any other CMS programmatically.
>> > > > In the case of Openstack,  neutron is part of a bigger
>> > > > system/deployment and it wont be so easy to do
>> > > > Step 1 first and then step 2.  Tripleo installer for example will
>> > > > upgrade OVN Northd services and Neutron
>> > > > first before upgrading ovn-controllers on the compute nodes.
>> > > >
>> > >
>> > > As far as I understand, OpenStack has recently added support to
ensure
>> > > that all ovn-controllers are upgraded before the central components
are
>> > > upgraded.  Adding Jakub to this thread.
>> > >
>> > > > This feature will most likely be available in OVN 22.09.  Lets say
>> > > > openstack neutron
>> > > > wants to use this feature and adds the necessary code to enable
ageing
>> > > > in some neutron
>> > > > routers.  I don't know on what basis or what logic it would use to
set
>> > > > the option
>> > > > "mac_binding_age_threshold" for a logical router.   Let's assume
that
>> > > > neutron  queries the
>> > > > OVN Southbound schema to check for the presence of the column
>> > > > "timestamp" in MAC_Binding table
>> > > > before setting the logical router option -
mac_binding_age_threshold.
>> > > > And lets say neutron uses this feature
>> > > > in the openstack release Zed (scheduled to release in October).  My
>> > > > concern is how would neutron know
>> > > > if all ovn-controllers are upgraded with this feature or not ? As I
>> > > > mentioned above, It's possible for the openstack
>> > > > installer/upgrade tool to upgrade OVN northd services first.
Until a
>> > > > compute node 'N' is not upgraded with the
>> > > > proper ovn-controller version, ovn-northd will immediately delete
the
>> > > > mac binding entries learnt by
>> > > > ovn-controller in node 'N'.
>> > > >
>> > > > Let me know your thoughts.
>> > > >
>> > >
>> > > I think that if neutron/tripleo/OpenStack indeed take care of
performing
>> > > upgrades in the OVN supported way (all ovn-controllers before
>> > > SB/ovn-northd) then we should be OK with the current patch.
>> > >
>>
>> There are other OpenStack installers like Kolla Ansible. And I'm not
>> sure how they handle
>> upgrades.
>>
>>
>> >
>> > Thanks Numan and Dumitru, and now I understand the problem. Suppose
Neutron
>> > has a way to make sure ovn-controller are upgraded before ovn-central,
>> > probably ovn-k8s still faces the problem?
>> > For a simpler solution, I'd support the idea that don't delete the
entry if
>> > the timestamp is 0. Anything populated before the upgrade can be
deleted as
>> > a one-time cleanup manually at any time by the operator if desired.
>> > The solution of checking features of all controllers is good, too, but
just
>> > a little complexity and an extra external-ids in the chassis table
just for
>> > the one-time use case.
>
>
> This unfortunately still suffers from the problem that CMS might delete
those prematurely.
> Even if we don't remove anything with timestamp=0, there is still a
chance that after
> cleanup we might end up with the MAC binding again with timestamp=0.
Which to me
> IMO is error prone.
>

Just to clarify, I didn't mean to add any automation in CMS to do that
during upgrading. I meant the one-time cleanup can be done, if desired, by
operators using OVSDB commands. Or, just leave them forever, until at some
point probably after one more release of OVN we update the logic to allow
deletion for timestamp==0.
But I am ok with the current approach in v6/v7, with things we need to keep
in mind (see below).

>>
>>
>> If you see this commit [1],  it is actually quite simple. In northd,
>> there is already a
>> "struct chassis_features" and function build_chassis_features() to
handle this.
>> And it should not add much complexity IMO.
>>
>> I'm fine with the other options too.  But I'm slightly in favour of
>> handling this in OVN itself instead
>> of the extra config option or one-time cleanup script.
>
Yes, the code is not complex. I am concerned more about:
1. We need to be more clear about upgrade strategy support. Officially we
support the order of upgrading ovn-controllers before the central
components, and recently we had the new feature change within a release
because of a backport (ct_label/ct_mark), which required support for
upgrading central components first. Now it seems we are asking for support
for arbitrary upgrade orders even between releases. It is good to be more
flexible in upgradability support, but we'd better define it more clearly
based on requirement, with more details such as, whether we support
upgrading across multiple releases, downgrades, etc. (+ @Mark Michelson
<[email protected]> )
2. Probing the features seems flexible, but it also means we need to
maintain the features whenever it relates to both central and
ovn-controller. The list can grow and the if/else branches will also grow
in the feature implementation code. Some may be simple (like this one),
some may be more complex. These code are mostly for one-time upgrade use,
but we may need to maintain them forever, or, we can remove/cleanup them
after every other release, assuming that we only support upgrading between
adjacent releases.
3. There may be security/operational concerns for this approach. At any
time a misbehaving ovn-controller can revert any features and
impact/degrade the whole environment. Assume that there are features
A,B,C,D,E. A compute node may delete some or all of the features from its
chassis:other_config, and ovn-northd would revert those features just
because of the bad compute node. This can happen easily in an operational
problem - some "bad" servers that were down for quite some time suddenly
"wake up" with an old release running. Or it can be a compromised server
controlled by an attacker. It is also possible that a problematic server
fails in upgrading blocks the whole control plane to enable the new
feature, which may be even more common but less harmful than the former
cases of "reverting features". I think we may need to add a safeguard to
avoid reverting already applied features in northd.

Thanks,
Han

>
> I went ahead and pushed v6 with additional commit (as everything else was
acked, except the delay discussion on 5th commit)
> that adds the feature check, we can always push only the first part if we
decide that the
> feature is not needed.
>
>
> Thanks,
> Ales
>
>>
>> Thanks
>> Numan
>>
>>  A 3rd option depends on how CMS would integrate this feature. If the CMS
>> > can implement a global option that enables/disables the feature, that
would
>> > work for the 2 steps I mentioned earlier: step1: upgrade OVN, step2:
turn
>> > on the CMS option.
>> >
>>
>> [1] -
https://github.com/ovn-org/ovn/commit/3fd4db6324ec9fe8a8ddedab3e91f7251f56de9d
>>
>> > > Regards,
>> > > Dumitru
>> > >
>> > _______________________________________________
>> > dev mailing list
>> > [email protected]
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA
>
> [email protected]    IM: amusil
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to