On 6/13/25 3:50 PM, Ales Musil wrote:
> On Fri, Jun 13, 2025 at 11:08 AM Dumitru Ceara <dce...@redhat.com> wrote:
> 
>> On 6/10/25 2:03 PM, Felix Huettner via dev wrote:
>>> Previously on each call to pinctrl_run all addresses that need to be
>>> announced via garps or rarps where computed newly. On larger external
>>> networks that might take significant time.
>>> However the content of the data can only change in a run of the I+P
>>> engine. So we move the whole logic to an engine node and only handle the
>>> minimal needed set in pinctrl_run.
>>>
>>> Acked-by: Lorenzo Bianconi <lorenzo.bianc...@redhat.com>
>>> Co-Authored-by: Max Lamprecht <max.lamprecht@mail.schwarz>
>>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz>
>>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
>>> ---
>>
>> Hi Felix, Max,
>>
>> Thanks for the new revision!  I have a couple of small comments below but
>> I think whoever is applying this patch can take care of them so it's
>> probably fine to not send a new version.
>>
> 
> Hi Felix, Max and Dumitru,
> 

Hi Ales,

> I wanted to apply the series, but I discovered a serious issue during
> testing. There is use-after-free happening during controller exit, the
> engine data is freed while the pinctrl thread is still running so we
> need a way to ensure this won't happen.
> 

Good catch!

> We could return NULL from garp_rarp_get_data() after engine_cleanup
> and have extra check in pinctrl, it seems the we are only accessing
> the cmap in pinctrl so maybe the function can return pointer to cmap
> instead.
> 
> I was thinking about adding atomic_bool into inc-proc-eng to indicate
> that the engine data are safe to access, we could use that in the
> garp_rarp and possibly other nodes to check if it is safe to access
> those data in pinctrl or not.
> 
> Let me know what are your thoughts, if that sounds feasible I will
> post a patch then we could rebase this one on top of it.
> 

What if we delay engine_cleanup() until after pinctrl_destroy()?  That
would wait (pthread_join()) for the pinctrl thread to finish.

Isn't that enough?

> There is also missing cmap_destroy() in garp_rarp_cleanup().
> 

Yes, let's add that to v13 once we fix the race (as you point out
offline, it's an already existing problem that we could hit with DNS too).

Regards,
Dumitru

> 
>>> v11->v12: rebased
>>> v10->v11:
>>>   * fix race condition in announce_time and backoff
>>>   * addressed a few nits
>>> v8->v9:
>>>   * remove need for sync by using atomics
>>> v7->v8:
>>>   * addressed review comments
>>>   * changed to cmap for data storage
>>> v6->v7: added ack
>>>
>>>  controller/automake.mk      |   4 +-
>>>  controller/garp_rarp.c      | 556 ++++++++++++++++++++++++++++++
>>>  controller/garp_rarp.h      |  79 +++++
>>>  controller/ovn-controller.c | 209 +++++++++++-
>>>  controller/pinctrl.c        | 660 +++---------------------------------
>>>  controller/pinctrl.h        |   3 +-
>>>  lib/mac-binding-index.c     |  45 +++
>>>  lib/mac-binding-index.h     |   6 +
>>>  8 files changed, 943 insertions(+), 619 deletions(-)
>>>  create mode 100644 controller/garp_rarp.c
>>>  create mode 100644 controller/garp_rarp.h
>>>
>>
>>> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c
>>> new file mode 100644
>>> index 000000000..c17ba6b0e
>>> --- /dev/null
>>> +++ b/controller/garp_rarp.c
>>> @@ -0,0 +1,556 @@
>>> +/*
>>> + * Copyright (c) 2025, STACKIT GmbH & Co. KG
>>> + *
>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>> + * you may not use this file except in compliance with the License.
>>> + * You may obtain a copy of the License at:
>>> + *
>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>> + *
>>> + * Unless required by applicable law or agreed to in writing, software
>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>> + * See the License for the specific language governing permissions and
>>> + * limitations under the License.
>>> + */
>>> +
>>> +#include <config.h>
>>> +
>>> +#include <net/if.h>
>>> +#include <stdbool.h>
>>> +#include <stddef.h>
>>> +
>>> +#include "controller/encaps.h"
>>> +#include "controller/local_data.h"
>>> +#include "controller/lport.h"
>>> +#include "hash.h"
>>> +#include "mac-binding-index.h"
>>> +#include "openvswitch/hmap.h"
>>> +#include "openvswitch/vlog.h"
>>> +#include "ovn/lex.h"
>>> +#include "packets.h"
>>> +#include "smap.h"
>>> +#include "sset.h"
>>> +#include "openvswitch/shash.h"
>>> +#include "garp_rarp.h"
>>> +#include "ovn-sb-idl.h"
>>
>> Nit: Most of these are not really needed anymore.  We get a trimmed
>> version with the following incremental change:
>>
>> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c
>> index c17ba6b0e4..7c2f4814fc 100644
>> --- a/controller/garp_rarp.c
>> +++ b/controller/garp_rarp.c
>> @@ -16,22 +16,11 @@
>>
>>  #include <config.h>
>>
>> -#include <net/if.h>
>> -#include <stdbool.h>
>> -#include <stddef.h>
>> -
>> -#include "controller/encaps.h"
>>  #include "controller/local_data.h"
>> -#include "controller/lport.h"
>> -#include "hash.h"
>>  #include "mac-binding-index.h"
>>  #include "openvswitch/hmap.h"
>>  #include "openvswitch/vlog.h"
>>  #include "ovn/lex.h"
>> -#include "packets.h"
>> -#include "smap.h"
>> -#include "sset.h"
>> -#include "openvswitch/shash.h"
>>  #include "garp_rarp.h"
>>  #include "ovn-sb-idl.h"
>>
>>> +
>>> +VLOG_DEFINE_THIS_MODULE(garp_rarp);
>>> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>> +
>>> +#define GARP_RARP_DEF_MAX_TIMEOUT    16000
>>> +
>>> +static bool garp_rarp_data_has_changed = false;
>>> +static struct garp_rarp_data garp_rarp_data;
>>> +
>>> +/* Get localnet vifs, local l3gw ports and ofport for localnet patch
>> ports. */
>>> +static void
>>> +get_localnet_vifs_l3gwports(
>>> +    struct ovsdb_idl_index *sbrec_port_binding_by_datapath,
>>> +    const struct sbrec_chassis *chassis,
>>> +    const struct hmap *local_datapaths,
>>> +    struct sset *localnet_vifs,
>>> +    struct sset *local_l3gw_ports)
>>> +{
>>> +    struct sbrec_port_binding *target =
>> sbrec_port_binding_index_init_row(
>>> +        sbrec_port_binding_by_datapath);
>>> +
>>> +    const struct local_datapath *ld;
>>> +    HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>> +        const struct sbrec_port_binding *pb;
>>> +
>>> +        if (!ld->localnet_port) {
>>> +            continue;
>>> +        }
>>> +
>>> +        sbrec_port_binding_index_set_datapath(target, ld->datapath);
>>> +        SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target,
>>> +
>>  sbrec_port_binding_by_datapath) {
>>> +            /* Get l3gw ports. Consider port bindings with type
>> "l3gateway"
>>> +             * that connect to gateway routers (if local), and consider
>> port
>>> +             * bindings of type "patch" since they might connect to
>>> +             * distributed gateway ports with NAT addresses. */
>>> +            if ((!strcmp(pb->type, "l3gateway") && pb->chassis ==
>> chassis)
>>> +                || !strcmp(pb->type, "patch")) {
>>> +                sset_add(local_l3gw_ports, pb->logical_port);
>>> +            }
>>> +
>>> +            /* Get all vifs that are directly connected to a localnet
>> port. */
>>> +            if (!strcmp(pb->type, "") && pb->chassis == chassis) {
>>> +                sset_add(localnet_vifs, pb->logical_port);
>>> +            }
>>> +        }
>>> +    }
>>> +    sbrec_port_binding_index_destroy_row(target);
>>> +}
>>> +
>>> +
>>
>> Nit: too many new lines.
>>
>>> +/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from
>>> + * 'addresses' which should be of the format 'MAC [IP1 IP2 ..]
>>> + * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid
>> IPv4
>>> + * or IPv6 address, and stores them in the 'ipv4_addrs' and 'ipv6_addrs'
>>> + * fields of 'laddrs'.  The logical port name is stored in 'lport'.
>>> + *
>>> + * Returns true if at least 'MAC' is found in 'address', false
>> otherwise.
>>> + *
>>> + * The caller must call destroy_lport_addresses() and free(*lport). */
>>> +static bool
>>> +extract_addresses_with_port(const char *addresses,
>>> +                            struct lport_addresses *laddrs,
>>> +                            char **lport)
>>> +{
>>> +    int ofs;
>>> +    if (!extract_addresses(addresses, laddrs, &ofs)) {
>>> +        return false;
>>> +    } else if (!addresses[ofs]) {
>>> +        return true;
>>> +    }
>>> +
>>> +    struct lexer lexer;
>>> +    lexer_init(&lexer, addresses + ofs);
>>> +    lexer_get(&lexer);
>>> +
>>> +    if (lexer.error || lexer.token.type != LEX_T_ID
>>> +        || !lexer_match_id(&lexer, "is_chassis_resident")) {
>>> +        VLOG_INFO_RL(&rl, "invalid syntax '%s' in address", addresses);
>>> +        lexer_destroy(&lexer);
>>> +        return true;
>>> +    }
>>> +
>>> +    if (!lexer_match(&lexer, LEX_T_LPAREN)) {
>>> +        VLOG_INFO_RL(&rl, "Syntax error: expecting '(' after "
>>> +                          "'is_chassis_resident' in address '%s'",
>> addresses);
>>> +        lexer_destroy(&lexer);
>>> +        return false;
>>> +    }
>>> +
>>> +    if (lexer.token.type != LEX_T_STRING) {
>>> +        VLOG_INFO_RL(&rl,
>>> +                    "Syntax error: expecting quoted string after "
>>> +                    "'is_chassis_resident' in address '%s'", addresses);
>>> +        lexer_destroy(&lexer);
>>> +        return false;
>>> +    }
>>> +
>>> +    *lport = xstrdup(lexer.token.s);
>>> +
>>> +    lexer_get(&lexer);
>>> +    if (!lexer_match(&lexer, LEX_T_RPAREN)) {
>>> +        VLOG_INFO_RL(&rl, "Syntax error: expecting ')' after quoted
>> string in "
>>> +                          "'is_chassis_resident()' in address '%s'",
>>> +                          addresses);
>>> +        lexer_destroy(&lexer);
>>> +        return false;
>>> +    }
>>> +
>>> +    lexer_destroy(&lexer);
>>> +    return true;
>>> +}
>>> +
>>
>> [...]
>>
>>> @@ -5370,12 +5092,19 @@ send_self_originated_neigh_packet(struct rconn
>> *swconn,
>>>
>>>  /* Called with in the pinctrl_handler thread context. */
>>>  static long long int
>>> -send_garp_rarp(struct rconn *swconn, struct garp_rarp_data *garp_rarp,
>>> -               long long int current_time)
>>> -    OVS_REQUIRES(pinctrl_mutex)
>>> +send_garp_rarp(struct rconn *swconn, struct garp_rarp_node *garp_rarp,
>>> +               long long int current_time, long long int max_timeout,
>>> +               bool continuous)
>>>  {
>>> -    if (current_time < garp_rarp->announce_time) {
>>> -        return garp_rarp->announce_time;
>>> +    long long int announce_time, old_announce_time;
>>> +    int backoff, old_backoff;
>>> +
>>> +    atomic_read(&garp_rarp->announce_time, &announce_time);
>>> +    atomic_read(&garp_rarp->backoff, &backoff);
>>> +    old_announce_time = announce_time;
>>> +    old_backoff = backoff;
>>> +    if (current_time < announce_time) {
>>> +        return announce_time;
>>>      }
>>>
>>>      /* Compose and inject a GARP request packet. */
>>> @@ -5398,14 +5127,19 @@ send_garp_rarp(struct rconn *swconn, struct
>> garp_rarp_data *garp_rarp,
>>>      /* Set the next announcement.  At most 5 announcements are sent for
>> a
>>>       * vif if garp_rarp_max_timeout is not specified otherwise cap the
>> max
>>>       * timeout to garp_rarp_max_timeout. */
>>> -    if (garp_rarp_continuous || garp_rarp->backoff <
>> garp_rarp_max_timeout) {
>>> -        garp_rarp->announce_time = current_time + garp_rarp->backoff;
>>> +    if (continuous || backoff < max_timeout) {
>>> +        announce_time = current_time + backoff;
>>>      } else {
>>> -        garp_rarp->announce_time = LLONG_MAX;
>>> +        announce_time = LLONG_MAX;
>>>      }
>>> -    garp_rarp->backoff = MIN(garp_rarp_max_timeout, garp_rarp->backoff
>> * 2);
>>> +    backoff = MIN(max_timeout, backoff * 2);
>>>
>>> -    return garp_rarp->announce_time;
>>> +    atomic_compare_exchange_strong(&garp_rarp->announce_time,
>>> +                                   &old_announce_time, announce_time);
>>> +    atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff,
>>> +                                   backoff);
>>> +
>>> +    return announce_time;
>>
>> I was discussing about this with Ales offline.  It's not exactly correct.
>>
>> If atomic_compare_exchange_strong() returns false it means 'announce_time'
>> was not written to 'garp_rarp->announce_time'.  'old_announce_time' is
>> however updated in that case.  We should return it if cmpxchg() failed.
>>
>> But I think Ales can probably fix this when applying the patch.
>>
> 
> 
> Yes, this area needs the diff below:
> 
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index b6b6aa9f9..545f1b174 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -5134,12 +5134,13 @@ send_garp_rarp(struct rconn *swconn, struct
> garp_rarp_node *garp_rarp,
>      }
>      backoff = MIN(max_timeout, backoff * 2);
> 
> -    atomic_compare_exchange_strong(&garp_rarp->announce_time,
> -                                   &old_announce_time, announce_time);
> +    bool cmp = atomic_compare_exchange_strong(&garp_rarp->announce_time,
> +                                              &old_announce_time,
> +                                              announce_time);
>      atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff,
>                                     backoff);
> 
> -    return announce_time;
> +    return cmp ? announce_time : old_announce_time;
>  }
> 
> 
> 
> 
>>
>>>  }
>>>
>>>  static void
>>
>> With these small things fixed:
>> Acked-by: Dumitru Ceara <dce...@redhat.com>
>>
>> Regards,
>> Dumitru
>>
>>
> Thanks,
> Ales
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to