On 6/13/25 3:50 PM, Ales Musil wrote: > On Fri, Jun 13, 2025 at 11:08 AM Dumitru Ceara <dce...@redhat.com> wrote: > >> On 6/10/25 2:03 PM, Felix Huettner via dev wrote: >>> Previously on each call to pinctrl_run all addresses that need to be >>> announced via garps or rarps where computed newly. On larger external >>> networks that might take significant time. >>> However the content of the data can only change in a run of the I+P >>> engine. So we move the whole logic to an engine node and only handle the >>> minimal needed set in pinctrl_run. >>> >>> Acked-by: Lorenzo Bianconi <lorenzo.bianc...@redhat.com> >>> Co-Authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud> >>> --- >> >> Hi Felix, Max, >> >> Thanks for the new revision! I have a couple of small comments below but >> I think whoever is applying this patch can take care of them so it's >> probably fine to not send a new version. >> > > Hi Felix, Max and Dumitru, >
Hi Ales, > I wanted to apply the series, but I discovered a serious issue during > testing. There is use-after-free happening during controller exit, the > engine data is freed while the pinctrl thread is still running so we > need a way to ensure this won't happen. > Good catch! > We could return NULL from garp_rarp_get_data() after engine_cleanup > and have extra check in pinctrl, it seems the we are only accessing > the cmap in pinctrl so maybe the function can return pointer to cmap > instead. > > I was thinking about adding atomic_bool into inc-proc-eng to indicate > that the engine data are safe to access, we could use that in the > garp_rarp and possibly other nodes to check if it is safe to access > those data in pinctrl or not. > > Let me know what are your thoughts, if that sounds feasible I will > post a patch then we could rebase this one on top of it. > What if we delay engine_cleanup() until after pinctrl_destroy()? That would wait (pthread_join()) for the pinctrl thread to finish. Isn't that enough? > There is also missing cmap_destroy() in garp_rarp_cleanup(). > Yes, let's add that to v13 once we fix the race (as you point out offline, it's an already existing problem that we could hit with DNS too). Regards, Dumitru > >>> v11->v12: rebased >>> v10->v11: >>> * fix race condition in announce_time and backoff >>> * addressed a few nits >>> v8->v9: >>> * remove need for sync by using atomics >>> v7->v8: >>> * addressed review comments >>> * changed to cmap for data storage >>> v6->v7: added ack >>> >>> controller/automake.mk | 4 +- >>> controller/garp_rarp.c | 556 ++++++++++++++++++++++++++++++ >>> controller/garp_rarp.h | 79 +++++ >>> controller/ovn-controller.c | 209 +++++++++++- >>> controller/pinctrl.c | 660 +++--------------------------------- >>> controller/pinctrl.h | 3 +- >>> lib/mac-binding-index.c | 45 +++ >>> lib/mac-binding-index.h | 6 + >>> 8 files changed, 943 insertions(+), 619 deletions(-) >>> create mode 100644 controller/garp_rarp.c >>> create mode 100644 controller/garp_rarp.h >>> >> >>> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c >>> new file mode 100644 >>> index 000000000..c17ba6b0e >>> --- /dev/null >>> +++ b/controller/garp_rarp.c >>> @@ -0,0 +1,556 @@ >>> +/* >>> + * Copyright (c) 2025, STACKIT GmbH & Co. KG >>> + * >>> + * Licensed under the Apache License, Version 2.0 (the "License"); >>> + * you may not use this file except in compliance with the License. >>> + * You may obtain a copy of the License at: >>> + * >>> + * http://www.apache.org/licenses/LICENSE-2.0 >>> + * >>> + * Unless required by applicable law or agreed to in writing, software >>> + * distributed under the License is distributed on an "AS IS" BASIS, >>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >> implied. >>> + * See the License for the specific language governing permissions and >>> + * limitations under the License. >>> + */ >>> + >>> +#include <config.h> >>> + >>> +#include <net/if.h> >>> +#include <stdbool.h> >>> +#include <stddef.h> >>> + >>> +#include "controller/encaps.h" >>> +#include "controller/local_data.h" >>> +#include "controller/lport.h" >>> +#include "hash.h" >>> +#include "mac-binding-index.h" >>> +#include "openvswitch/hmap.h" >>> +#include "openvswitch/vlog.h" >>> +#include "ovn/lex.h" >>> +#include "packets.h" >>> +#include "smap.h" >>> +#include "sset.h" >>> +#include "openvswitch/shash.h" >>> +#include "garp_rarp.h" >>> +#include "ovn-sb-idl.h" >> >> Nit: Most of these are not really needed anymore. We get a trimmed >> version with the following incremental change: >> >> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c >> index c17ba6b0e4..7c2f4814fc 100644 >> --- a/controller/garp_rarp.c >> +++ b/controller/garp_rarp.c >> @@ -16,22 +16,11 @@ >> >> #include <config.h> >> >> -#include <net/if.h> >> -#include <stdbool.h> >> -#include <stddef.h> >> - >> -#include "controller/encaps.h" >> #include "controller/local_data.h" >> -#include "controller/lport.h" >> -#include "hash.h" >> #include "mac-binding-index.h" >> #include "openvswitch/hmap.h" >> #include "openvswitch/vlog.h" >> #include "ovn/lex.h" >> -#include "packets.h" >> -#include "smap.h" >> -#include "sset.h" >> -#include "openvswitch/shash.h" >> #include "garp_rarp.h" >> #include "ovn-sb-idl.h" >> >>> + >>> +VLOG_DEFINE_THIS_MODULE(garp_rarp); >>> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); >>> + >>> +#define GARP_RARP_DEF_MAX_TIMEOUT 16000 >>> + >>> +static bool garp_rarp_data_has_changed = false; >>> +static struct garp_rarp_data garp_rarp_data; >>> + >>> +/* Get localnet vifs, local l3gw ports and ofport for localnet patch >> ports. */ >>> +static void >>> +get_localnet_vifs_l3gwports( >>> + struct ovsdb_idl_index *sbrec_port_binding_by_datapath, >>> + const struct sbrec_chassis *chassis, >>> + const struct hmap *local_datapaths, >>> + struct sset *localnet_vifs, >>> + struct sset *local_l3gw_ports) >>> +{ >>> + struct sbrec_port_binding *target = >> sbrec_port_binding_index_init_row( >>> + sbrec_port_binding_by_datapath); >>> + >>> + const struct local_datapath *ld; >>> + HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { >>> + const struct sbrec_port_binding *pb; >>> + >>> + if (!ld->localnet_port) { >>> + continue; >>> + } >>> + >>> + sbrec_port_binding_index_set_datapath(target, ld->datapath); >>> + SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target, >>> + >> sbrec_port_binding_by_datapath) { >>> + /* Get l3gw ports. Consider port bindings with type >> "l3gateway" >>> + * that connect to gateway routers (if local), and consider >> port >>> + * bindings of type "patch" since they might connect to >>> + * distributed gateway ports with NAT addresses. */ >>> + if ((!strcmp(pb->type, "l3gateway") && pb->chassis == >> chassis) >>> + || !strcmp(pb->type, "patch")) { >>> + sset_add(local_l3gw_ports, pb->logical_port); >>> + } >>> + >>> + /* Get all vifs that are directly connected to a localnet >> port. */ >>> + if (!strcmp(pb->type, "") && pb->chassis == chassis) { >>> + sset_add(localnet_vifs, pb->logical_port); >>> + } >>> + } >>> + } >>> + sbrec_port_binding_index_destroy_row(target); >>> +} >>> + >>> + >> >> Nit: too many new lines. >> >>> +/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from >>> + * 'addresses' which should be of the format 'MAC [IP1 IP2 ..] >>> + * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid >> IPv4 >>> + * or IPv6 address, and stores them in the 'ipv4_addrs' and 'ipv6_addrs' >>> + * fields of 'laddrs'. The logical port name is stored in 'lport'. >>> + * >>> + * Returns true if at least 'MAC' is found in 'address', false >> otherwise. >>> + * >>> + * The caller must call destroy_lport_addresses() and free(*lport). */ >>> +static bool >>> +extract_addresses_with_port(const char *addresses, >>> + struct lport_addresses *laddrs, >>> + char **lport) >>> +{ >>> + int ofs; >>> + if (!extract_addresses(addresses, laddrs, &ofs)) { >>> + return false; >>> + } else if (!addresses[ofs]) { >>> + return true; >>> + } >>> + >>> + struct lexer lexer; >>> + lexer_init(&lexer, addresses + ofs); >>> + lexer_get(&lexer); >>> + >>> + if (lexer.error || lexer.token.type != LEX_T_ID >>> + || !lexer_match_id(&lexer, "is_chassis_resident")) { >>> + VLOG_INFO_RL(&rl, "invalid syntax '%s' in address", addresses); >>> + lexer_destroy(&lexer); >>> + return true; >>> + } >>> + >>> + if (!lexer_match(&lexer, LEX_T_LPAREN)) { >>> + VLOG_INFO_RL(&rl, "Syntax error: expecting '(' after " >>> + "'is_chassis_resident' in address '%s'", >> addresses); >>> + lexer_destroy(&lexer); >>> + return false; >>> + } >>> + >>> + if (lexer.token.type != LEX_T_STRING) { >>> + VLOG_INFO_RL(&rl, >>> + "Syntax error: expecting quoted string after " >>> + "'is_chassis_resident' in address '%s'", addresses); >>> + lexer_destroy(&lexer); >>> + return false; >>> + } >>> + >>> + *lport = xstrdup(lexer.token.s); >>> + >>> + lexer_get(&lexer); >>> + if (!lexer_match(&lexer, LEX_T_RPAREN)) { >>> + VLOG_INFO_RL(&rl, "Syntax error: expecting ')' after quoted >> string in " >>> + "'is_chassis_resident()' in address '%s'", >>> + addresses); >>> + lexer_destroy(&lexer); >>> + return false; >>> + } >>> + >>> + lexer_destroy(&lexer); >>> + return true; >>> +} >>> + >> >> [...] >> >>> @@ -5370,12 +5092,19 @@ send_self_originated_neigh_packet(struct rconn >> *swconn, >>> >>> /* Called with in the pinctrl_handler thread context. */ >>> static long long int >>> -send_garp_rarp(struct rconn *swconn, struct garp_rarp_data *garp_rarp, >>> - long long int current_time) >>> - OVS_REQUIRES(pinctrl_mutex) >>> +send_garp_rarp(struct rconn *swconn, struct garp_rarp_node *garp_rarp, >>> + long long int current_time, long long int max_timeout, >>> + bool continuous) >>> { >>> - if (current_time < garp_rarp->announce_time) { >>> - return garp_rarp->announce_time; >>> + long long int announce_time, old_announce_time; >>> + int backoff, old_backoff; >>> + >>> + atomic_read(&garp_rarp->announce_time, &announce_time); >>> + atomic_read(&garp_rarp->backoff, &backoff); >>> + old_announce_time = announce_time; >>> + old_backoff = backoff; >>> + if (current_time < announce_time) { >>> + return announce_time; >>> } >>> >>> /* Compose and inject a GARP request packet. */ >>> @@ -5398,14 +5127,19 @@ send_garp_rarp(struct rconn *swconn, struct >> garp_rarp_data *garp_rarp, >>> /* Set the next announcement. At most 5 announcements are sent for >> a >>> * vif if garp_rarp_max_timeout is not specified otherwise cap the >> max >>> * timeout to garp_rarp_max_timeout. */ >>> - if (garp_rarp_continuous || garp_rarp->backoff < >> garp_rarp_max_timeout) { >>> - garp_rarp->announce_time = current_time + garp_rarp->backoff; >>> + if (continuous || backoff < max_timeout) { >>> + announce_time = current_time + backoff; >>> } else { >>> - garp_rarp->announce_time = LLONG_MAX; >>> + announce_time = LLONG_MAX; >>> } >>> - garp_rarp->backoff = MIN(garp_rarp_max_timeout, garp_rarp->backoff >> * 2); >>> + backoff = MIN(max_timeout, backoff * 2); >>> >>> - return garp_rarp->announce_time; >>> + atomic_compare_exchange_strong(&garp_rarp->announce_time, >>> + &old_announce_time, announce_time); >>> + atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff, >>> + backoff); >>> + >>> + return announce_time; >> >> I was discussing about this with Ales offline. It's not exactly correct. >> >> If atomic_compare_exchange_strong() returns false it means 'announce_time' >> was not written to 'garp_rarp->announce_time'. 'old_announce_time' is >> however updated in that case. We should return it if cmpxchg() failed. >> >> But I think Ales can probably fix this when applying the patch. >> > > > Yes, this area needs the diff below: > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index b6b6aa9f9..545f1b174 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -5134,12 +5134,13 @@ send_garp_rarp(struct rconn *swconn, struct > garp_rarp_node *garp_rarp, > } > backoff = MIN(max_timeout, backoff * 2); > > - atomic_compare_exchange_strong(&garp_rarp->announce_time, > - &old_announce_time, announce_time); > + bool cmp = atomic_compare_exchange_strong(&garp_rarp->announce_time, > + &old_announce_time, > + announce_time); > atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff, > backoff); > > - return announce_time; > + return cmp ? announce_time : old_announce_time; > } > > > > >> >>> } >>> >>> static void >> >> With these small things fixed: >> Acked-by: Dumitru Ceara <dce...@redhat.com> >> >> Regards, >> Dumitru >> >> > Thanks, > Ales > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev