On Fri, Jun 13, 2025 at 4:18 PM Dumitru Ceara <dce...@redhat.com> wrote:

> On 6/13/25 3:50 PM, Ales Musil wrote:
> > On Fri, Jun 13, 2025 at 11:08 AM Dumitru Ceara <dce...@redhat.com>
> wrote:
> >
> >> On 6/10/25 2:03 PM, Felix Huettner via dev wrote:
> >>> Previously on each call to pinctrl_run all addresses that need to be
> >>> announced via garps or rarps where computed newly. On larger external
> >>> networks that might take significant time.
> >>> However the content of the data can only change in a run of the I+P
> >>> engine. So we move the whole logic to an engine node and only handle
> the
> >>> minimal needed set in pinctrl_run.
> >>>
> >>> Acked-by: Lorenzo Bianconi <lorenzo.bianc...@redhat.com>
> >>> Co-Authored-by: Max Lamprecht <max.lamprecht@mail.schwarz>
> >>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz>
> >>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
> >>> ---
> >>
> >> Hi Felix, Max,
> >>
> >> Thanks for the new revision!  I have a couple of small comments below
> but
> >> I think whoever is applying this patch can take care of them so it's
> >> probably fine to not send a new version.
> >>
> >
> > Hi Felix, Max and Dumitru,
> >
>
> Hi Ales,
>

Hi Dumitru,


>
> > I wanted to apply the series, but I discovered a serious issue during
> > testing. There is use-after-free happening during controller exit, the
> > engine data is freed while the pinctrl thread is still running so we
> > need a way to ensure this won't happen.
> >
>
> Good catch!
>
> > We could return NULL from garp_rarp_get_data() after engine_cleanup
> > and have extra check in pinctrl, it seems the we are only accessing
> > the cmap in pinctrl so maybe the function can return pointer to cmap
> > instead.
> >
> > I was thinking about adding atomic_bool into inc-proc-eng to indicate
> > that the engine data are safe to access, we could use that in the
> > garp_rarp and possibly other nodes to check if it is safe to access
> > those data in pinctrl or not.
> >
> > Let me know what are your thoughts, if that sounds feasible I will
> > post a patch then we could rebase this one on top of it.
> >
>
> What if we delay engine_cleanup() until after pinctrl_destroy()?  That
> would wait (pthread_join()) for the pinctrl thread to finish.
>
> Isn't that enough?
>
>
Good point, that should be enough, yes.


> > There is also missing cmap_destroy() in garp_rarp_cleanup().
> >
>
> Yes, let's add that to v13 once we fix the race (as you point out
> offline, it's an already existing problem that we could hit with DNS too).
>

If the above is the case then the patch should be minimal and
I could still use the amended v12 that I had prepared, let's wait
for the use-after-free fix. we might not need v13 afterall.


>
> Regards,
> Dumitru
>


Thanks,
Ales


>
> >
> >>> v11->v12: rebased
> >>> v10->v11:
> >>>   * fix race condition in announce_time and backoff
> >>>   * addressed a few nits
> >>> v8->v9:
> >>>   * remove need for sync by using atomics
> >>> v7->v8:
> >>>   * addressed review comments
> >>>   * changed to cmap for data storage
> >>> v6->v7: added ack
> >>>
> >>>  controller/automake.mk      |   4 +-
> >>>  controller/garp_rarp.c      | 556 ++++++++++++++++++++++++++++++
> >>>  controller/garp_rarp.h      |  79 +++++
> >>>  controller/ovn-controller.c | 209 +++++++++++-
> >>>  controller/pinctrl.c        | 660 +++---------------------------------
> >>>  controller/pinctrl.h        |   3 +-
> >>>  lib/mac-binding-index.c     |  45 +++
> >>>  lib/mac-binding-index.h     |   6 +
> >>>  8 files changed, 943 insertions(+), 619 deletions(-)
> >>>  create mode 100644 controller/garp_rarp.c
> >>>  create mode 100644 controller/garp_rarp.h
> >>>
> >>
> >>> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c
> >>> new file mode 100644
> >>> index 000000000..c17ba6b0e
> >>> --- /dev/null
> >>> +++ b/controller/garp_rarp.c
> >>> @@ -0,0 +1,556 @@
> >>> +/*
> >>> + * Copyright (c) 2025, STACKIT GmbH & Co. KG
> >>> + *
> >>> + * Licensed under the Apache License, Version 2.0 (the "License");
> >>> + * you may not use this file except in compliance with the License.
> >>> + * You may obtain a copy of the License at:
> >>> + *
> >>> + *     http://www.apache.org/licenses/LICENSE-2.0
> >>> + *
> >>> + * Unless required by applicable law or agreed to in writing, software
> >>> + * distributed under the License is distributed on an "AS IS" BASIS,
> >>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> >> implied.
> >>> + * See the License for the specific language governing permissions and
> >>> + * limitations under the License.
> >>> + */
> >>> +
> >>> +#include <config.h>
> >>> +
> >>> +#include <net/if.h>
> >>> +#include <stdbool.h>
> >>> +#include <stddef.h>
> >>> +
> >>> +#include "controller/encaps.h"
> >>> +#include "controller/local_data.h"
> >>> +#include "controller/lport.h"
> >>> +#include "hash.h"
> >>> +#include "mac-binding-index.h"
> >>> +#include "openvswitch/hmap.h"
> >>> +#include "openvswitch/vlog.h"
> >>> +#include "ovn/lex.h"
> >>> +#include "packets.h"
> >>> +#include "smap.h"
> >>> +#include "sset.h"
> >>> +#include "openvswitch/shash.h"
> >>> +#include "garp_rarp.h"
> >>> +#include "ovn-sb-idl.h"
> >>
> >> Nit: Most of these are not really needed anymore.  We get a trimmed
> >> version with the following incremental change:
> >>
> >> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c
> >> index c17ba6b0e4..7c2f4814fc 100644
> >> --- a/controller/garp_rarp.c
> >> +++ b/controller/garp_rarp.c
> >> @@ -16,22 +16,11 @@
> >>
> >>  #include <config.h>
> >>
> >> -#include <net/if.h>
> >> -#include <stdbool.h>
> >> -#include <stddef.h>
> >> -
> >> -#include "controller/encaps.h"
> >>  #include "controller/local_data.h"
> >> -#include "controller/lport.h"
> >> -#include "hash.h"
> >>  #include "mac-binding-index.h"
> >>  #include "openvswitch/hmap.h"
> >>  #include "openvswitch/vlog.h"
> >>  #include "ovn/lex.h"
> >> -#include "packets.h"
> >> -#include "smap.h"
> >> -#include "sset.h"
> >> -#include "openvswitch/shash.h"
> >>  #include "garp_rarp.h"
> >>  #include "ovn-sb-idl.h"
> >>
> >>> +
> >>> +VLOG_DEFINE_THIS_MODULE(garp_rarp);
> >>> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> >>> +
> >>> +#define GARP_RARP_DEF_MAX_TIMEOUT    16000
> >>> +
> >>> +static bool garp_rarp_data_has_changed = false;
> >>> +static struct garp_rarp_data garp_rarp_data;
> >>> +
> >>> +/* Get localnet vifs, local l3gw ports and ofport for localnet patch
> >> ports. */
> >>> +static void
> >>> +get_localnet_vifs_l3gwports(
> >>> +    struct ovsdb_idl_index *sbrec_port_binding_by_datapath,
> >>> +    const struct sbrec_chassis *chassis,
> >>> +    const struct hmap *local_datapaths,
> >>> +    struct sset *localnet_vifs,
> >>> +    struct sset *local_l3gw_ports)
> >>> +{
> >>> +    struct sbrec_port_binding *target =
> >> sbrec_port_binding_index_init_row(
> >>> +        sbrec_port_binding_by_datapath);
> >>> +
> >>> +    const struct local_datapath *ld;
> >>> +    HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
> >>> +        const struct sbrec_port_binding *pb;
> >>> +
> >>> +        if (!ld->localnet_port) {
> >>> +            continue;
> >>> +        }
> >>> +
> >>> +        sbrec_port_binding_index_set_datapath(target, ld->datapath);
> >>> +        SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target,
> >>> +
> >>  sbrec_port_binding_by_datapath) {
> >>> +            /* Get l3gw ports. Consider port bindings with type
> >> "l3gateway"
> >>> +             * that connect to gateway routers (if local), and
> consider
> >> port
> >>> +             * bindings of type "patch" since they might connect to
> >>> +             * distributed gateway ports with NAT addresses. */
> >>> +            if ((!strcmp(pb->type, "l3gateway") && pb->chassis ==
> >> chassis)
> >>> +                || !strcmp(pb->type, "patch")) {
> >>> +                sset_add(local_l3gw_ports, pb->logical_port);
> >>> +            }
> >>> +
> >>> +            /* Get all vifs that are directly connected to a localnet
> >> port. */
> >>> +            if (!strcmp(pb->type, "") && pb->chassis == chassis) {
> >>> +                sset_add(localnet_vifs, pb->logical_port);
> >>> +            }
> >>> +        }
> >>> +    }
> >>> +    sbrec_port_binding_index_destroy_row(target);
> >>> +}
> >>> +
> >>> +
> >>
> >> Nit: too many new lines.
> >>
> >>> +/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from
> >>> + * 'addresses' which should be of the format 'MAC [IP1 IP2 ..]
> >>> + * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid
> >> IPv4
> >>> + * or IPv6 address, and stores them in the 'ipv4_addrs' and
> 'ipv6_addrs'
> >>> + * fields of 'laddrs'.  The logical port name is stored in 'lport'.
> >>> + *
> >>> + * Returns true if at least 'MAC' is found in 'address', false
> >> otherwise.
> >>> + *
> >>> + * The caller must call destroy_lport_addresses() and free(*lport). */
> >>> +static bool
> >>> +extract_addresses_with_port(const char *addresses,
> >>> +                            struct lport_addresses *laddrs,
> >>> +                            char **lport)
> >>> +{
> >>> +    int ofs;
> >>> +    if (!extract_addresses(addresses, laddrs, &ofs)) {
> >>> +        return false;
> >>> +    } else if (!addresses[ofs]) {
> >>> +        return true;
> >>> +    }
> >>> +
> >>> +    struct lexer lexer;
> >>> +    lexer_init(&lexer, addresses + ofs);
> >>> +    lexer_get(&lexer);
> >>> +
> >>> +    if (lexer.error || lexer.token.type != LEX_T_ID
> >>> +        || !lexer_match_id(&lexer, "is_chassis_resident")) {
> >>> +        VLOG_INFO_RL(&rl, "invalid syntax '%s' in address",
> addresses);
> >>> +        lexer_destroy(&lexer);
> >>> +        return true;
> >>> +    }
> >>> +
> >>> +    if (!lexer_match(&lexer, LEX_T_LPAREN)) {
> >>> +        VLOG_INFO_RL(&rl, "Syntax error: expecting '(' after "
> >>> +                          "'is_chassis_resident' in address '%s'",
> >> addresses);
> >>> +        lexer_destroy(&lexer);
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    if (lexer.token.type != LEX_T_STRING) {
> >>> +        VLOG_INFO_RL(&rl,
> >>> +                    "Syntax error: expecting quoted string after "
> >>> +                    "'is_chassis_resident' in address '%s'",
> addresses);
> >>> +        lexer_destroy(&lexer);
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    *lport = xstrdup(lexer.token.s);
> >>> +
> >>> +    lexer_get(&lexer);
> >>> +    if (!lexer_match(&lexer, LEX_T_RPAREN)) {
> >>> +        VLOG_INFO_RL(&rl, "Syntax error: expecting ')' after quoted
> >> string in "
> >>> +                          "'is_chassis_resident()' in address '%s'",
> >>> +                          addresses);
> >>> +        lexer_destroy(&lexer);
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    lexer_destroy(&lexer);
> >>> +    return true;
> >>> +}
> >>> +
> >>
> >> [...]
> >>
> >>> @@ -5370,12 +5092,19 @@ send_self_originated_neigh_packet(struct rconn
> >> *swconn,
> >>>
> >>>  /* Called with in the pinctrl_handler thread context. */
> >>>  static long long int
> >>> -send_garp_rarp(struct rconn *swconn, struct garp_rarp_data *garp_rarp,
> >>> -               long long int current_time)
> >>> -    OVS_REQUIRES(pinctrl_mutex)
> >>> +send_garp_rarp(struct rconn *swconn, struct garp_rarp_node *garp_rarp,
> >>> +               long long int current_time, long long int max_timeout,
> >>> +               bool continuous)
> >>>  {
> >>> -    if (current_time < garp_rarp->announce_time) {
> >>> -        return garp_rarp->announce_time;
> >>> +    long long int announce_time, old_announce_time;
> >>> +    int backoff, old_backoff;
> >>> +
> >>> +    atomic_read(&garp_rarp->announce_time, &announce_time);
> >>> +    atomic_read(&garp_rarp->backoff, &backoff);
> >>> +    old_announce_time = announce_time;
> >>> +    old_backoff = backoff;
> >>> +    if (current_time < announce_time) {
> >>> +        return announce_time;
> >>>      }
> >>>
> >>>      /* Compose and inject a GARP request packet. */
> >>> @@ -5398,14 +5127,19 @@ send_garp_rarp(struct rconn *swconn, struct
> >> garp_rarp_data *garp_rarp,
> >>>      /* Set the next announcement.  At most 5 announcements are sent
> for
> >> a
> >>>       * vif if garp_rarp_max_timeout is not specified otherwise cap the
> >> max
> >>>       * timeout to garp_rarp_max_timeout. */
> >>> -    if (garp_rarp_continuous || garp_rarp->backoff <
> >> garp_rarp_max_timeout) {
> >>> -        garp_rarp->announce_time = current_time + garp_rarp->backoff;
> >>> +    if (continuous || backoff < max_timeout) {
> >>> +        announce_time = current_time + backoff;
> >>>      } else {
> >>> -        garp_rarp->announce_time = LLONG_MAX;
> >>> +        announce_time = LLONG_MAX;
> >>>      }
> >>> -    garp_rarp->backoff = MIN(garp_rarp_max_timeout, garp_rarp->backoff
> >> * 2);
> >>> +    backoff = MIN(max_timeout, backoff * 2);
> >>>
> >>> -    return garp_rarp->announce_time;
> >>> +    atomic_compare_exchange_strong(&garp_rarp->announce_time,
> >>> +                                   &old_announce_time, announce_time);
> >>> +    atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff,
> >>> +                                   backoff);
> >>> +
> >>> +    return announce_time;
> >>
> >> I was discussing about this with Ales offline.  It's not exactly
> correct.
> >>
> >> If atomic_compare_exchange_strong() returns false it means
> 'announce_time'
> >> was not written to 'garp_rarp->announce_time'.  'old_announce_time' is
> >> however updated in that case.  We should return it if cmpxchg() failed.
> >>
> >> But I think Ales can probably fix this when applying the patch.
> >>
> >
> >
> > Yes, this area needs the diff below:
> >
> > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > index b6b6aa9f9..545f1b174 100644
> > --- a/controller/pinctrl.c
> > +++ b/controller/pinctrl.c
> > @@ -5134,12 +5134,13 @@ send_garp_rarp(struct rconn *swconn, struct
> > garp_rarp_node *garp_rarp,
> >      }
> >      backoff = MIN(max_timeout, backoff * 2);
> >
> > -    atomic_compare_exchange_strong(&garp_rarp->announce_time,
> > -                                   &old_announce_time, announce_time);
> > +    bool cmp = atomic_compare_exchange_strong(&garp_rarp->announce_time,
> > +                                              &old_announce_time,
> > +                                              announce_time);
> >      atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff,
> >                                     backoff);
> >
> > -    return announce_time;
> > +    return cmp ? announce_time : old_announce_time;
> >  }
> >
> >
> >
> >
> >>
> >>>  }
> >>>
> >>>  static void
> >>
> >> With these small things fixed:
> >> Acked-by: Dumitru Ceara <dce...@redhat.com>
> >>
> >> Regards,
> >> Dumitru
> >>
> >>
> > Thanks,
> > Ales
> >
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to