On Fri, Jun 13, 2025 at 4:18 PM Dumitru Ceara <dce...@redhat.com> wrote:
> On 6/13/25 3:50 PM, Ales Musil wrote: > > On Fri, Jun 13, 2025 at 11:08 AM Dumitru Ceara <dce...@redhat.com> > wrote: > > > >> On 6/10/25 2:03 PM, Felix Huettner via dev wrote: > >>> Previously on each call to pinctrl_run all addresses that need to be > >>> announced via garps or rarps where computed newly. On larger external > >>> networks that might take significant time. > >>> However the content of the data can only change in a run of the I+P > >>> engine. So we move the whole logic to an engine node and only handle > the > >>> minimal needed set in pinctrl_run. > >>> > >>> Acked-by: Lorenzo Bianconi <lorenzo.bianc...@redhat.com> > >>> Co-Authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> > >>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> > >>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud> > >>> --- > >> > >> Hi Felix, Max, > >> > >> Thanks for the new revision! I have a couple of small comments below > but > >> I think whoever is applying this patch can take care of them so it's > >> probably fine to not send a new version. > >> > > > > Hi Felix, Max and Dumitru, > > > > Hi Ales, > Hi Dumitru, > > > I wanted to apply the series, but I discovered a serious issue during > > testing. There is use-after-free happening during controller exit, the > > engine data is freed while the pinctrl thread is still running so we > > need a way to ensure this won't happen. > > > > Good catch! > > > We could return NULL from garp_rarp_get_data() after engine_cleanup > > and have extra check in pinctrl, it seems the we are only accessing > > the cmap in pinctrl so maybe the function can return pointer to cmap > > instead. > > > > I was thinking about adding atomic_bool into inc-proc-eng to indicate > > that the engine data are safe to access, we could use that in the > > garp_rarp and possibly other nodes to check if it is safe to access > > those data in pinctrl or not. > > > > Let me know what are your thoughts, if that sounds feasible I will > > post a patch then we could rebase this one on top of it. > > > > What if we delay engine_cleanup() until after pinctrl_destroy()? That > would wait (pthread_join()) for the pinctrl thread to finish. > > Isn't that enough? > > Good point, that should be enough, yes. > > There is also missing cmap_destroy() in garp_rarp_cleanup(). > > > > Yes, let's add that to v13 once we fix the race (as you point out > offline, it's an already existing problem that we could hit with DNS too). > If the above is the case then the patch should be minimal and I could still use the amended v12 that I had prepared, let's wait for the use-after-free fix. we might not need v13 afterall. > > Regards, > Dumitru > Thanks, Ales > > > > >>> v11->v12: rebased > >>> v10->v11: > >>> * fix race condition in announce_time and backoff > >>> * addressed a few nits > >>> v8->v9: > >>> * remove need for sync by using atomics > >>> v7->v8: > >>> * addressed review comments > >>> * changed to cmap for data storage > >>> v6->v7: added ack > >>> > >>> controller/automake.mk | 4 +- > >>> controller/garp_rarp.c | 556 ++++++++++++++++++++++++++++++ > >>> controller/garp_rarp.h | 79 +++++ > >>> controller/ovn-controller.c | 209 +++++++++++- > >>> controller/pinctrl.c | 660 +++--------------------------------- > >>> controller/pinctrl.h | 3 +- > >>> lib/mac-binding-index.c | 45 +++ > >>> lib/mac-binding-index.h | 6 + > >>> 8 files changed, 943 insertions(+), 619 deletions(-) > >>> create mode 100644 controller/garp_rarp.c > >>> create mode 100644 controller/garp_rarp.h > >>> > >> > >>> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c > >>> new file mode 100644 > >>> index 000000000..c17ba6b0e > >>> --- /dev/null > >>> +++ b/controller/garp_rarp.c > >>> @@ -0,0 +1,556 @@ > >>> +/* > >>> + * Copyright (c) 2025, STACKIT GmbH & Co. KG > >>> + * > >>> + * Licensed under the Apache License, Version 2.0 (the "License"); > >>> + * you may not use this file except in compliance with the License. > >>> + * You may obtain a copy of the License at: > >>> + * > >>> + * http://www.apache.org/licenses/LICENSE-2.0 > >>> + * > >>> + * Unless required by applicable law or agreed to in writing, software > >>> + * distributed under the License is distributed on an "AS IS" BASIS, > >>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > >> implied. > >>> + * See the License for the specific language governing permissions and > >>> + * limitations under the License. > >>> + */ > >>> + > >>> +#include <config.h> > >>> + > >>> +#include <net/if.h> > >>> +#include <stdbool.h> > >>> +#include <stddef.h> > >>> + > >>> +#include "controller/encaps.h" > >>> +#include "controller/local_data.h" > >>> +#include "controller/lport.h" > >>> +#include "hash.h" > >>> +#include "mac-binding-index.h" > >>> +#include "openvswitch/hmap.h" > >>> +#include "openvswitch/vlog.h" > >>> +#include "ovn/lex.h" > >>> +#include "packets.h" > >>> +#include "smap.h" > >>> +#include "sset.h" > >>> +#include "openvswitch/shash.h" > >>> +#include "garp_rarp.h" > >>> +#include "ovn-sb-idl.h" > >> > >> Nit: Most of these are not really needed anymore. We get a trimmed > >> version with the following incremental change: > >> > >> diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c > >> index c17ba6b0e4..7c2f4814fc 100644 > >> --- a/controller/garp_rarp.c > >> +++ b/controller/garp_rarp.c > >> @@ -16,22 +16,11 @@ > >> > >> #include <config.h> > >> > >> -#include <net/if.h> > >> -#include <stdbool.h> > >> -#include <stddef.h> > >> - > >> -#include "controller/encaps.h" > >> #include "controller/local_data.h" > >> -#include "controller/lport.h" > >> -#include "hash.h" > >> #include "mac-binding-index.h" > >> #include "openvswitch/hmap.h" > >> #include "openvswitch/vlog.h" > >> #include "ovn/lex.h" > >> -#include "packets.h" > >> -#include "smap.h" > >> -#include "sset.h" > >> -#include "openvswitch/shash.h" > >> #include "garp_rarp.h" > >> #include "ovn-sb-idl.h" > >> > >>> + > >>> +VLOG_DEFINE_THIS_MODULE(garp_rarp); > >>> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > >>> + > >>> +#define GARP_RARP_DEF_MAX_TIMEOUT 16000 > >>> + > >>> +static bool garp_rarp_data_has_changed = false; > >>> +static struct garp_rarp_data garp_rarp_data; > >>> + > >>> +/* Get localnet vifs, local l3gw ports and ofport for localnet patch > >> ports. */ > >>> +static void > >>> +get_localnet_vifs_l3gwports( > >>> + struct ovsdb_idl_index *sbrec_port_binding_by_datapath, > >>> + const struct sbrec_chassis *chassis, > >>> + const struct hmap *local_datapaths, > >>> + struct sset *localnet_vifs, > >>> + struct sset *local_l3gw_ports) > >>> +{ > >>> + struct sbrec_port_binding *target = > >> sbrec_port_binding_index_init_row( > >>> + sbrec_port_binding_by_datapath); > >>> + > >>> + const struct local_datapath *ld; > >>> + HMAP_FOR_EACH (ld, hmap_node, local_datapaths) { > >>> + const struct sbrec_port_binding *pb; > >>> + > >>> + if (!ld->localnet_port) { > >>> + continue; > >>> + } > >>> + > >>> + sbrec_port_binding_index_set_datapath(target, ld->datapath); > >>> + SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target, > >>> + > >> sbrec_port_binding_by_datapath) { > >>> + /* Get l3gw ports. Consider port bindings with type > >> "l3gateway" > >>> + * that connect to gateway routers (if local), and > consider > >> port > >>> + * bindings of type "patch" since they might connect to > >>> + * distributed gateway ports with NAT addresses. */ > >>> + if ((!strcmp(pb->type, "l3gateway") && pb->chassis == > >> chassis) > >>> + || !strcmp(pb->type, "patch")) { > >>> + sset_add(local_l3gw_ports, pb->logical_port); > >>> + } > >>> + > >>> + /* Get all vifs that are directly connected to a localnet > >> port. */ > >>> + if (!strcmp(pb->type, "") && pb->chassis == chassis) { > >>> + sset_add(localnet_vifs, pb->logical_port); > >>> + } > >>> + } > >>> + } > >>> + sbrec_port_binding_index_destroy_row(target); > >>> +} > >>> + > >>> + > >> > >> Nit: too many new lines. > >> > >>> +/* Extracts the mac, IPv4 and IPv6 addresses, and logical port from > >>> + * 'addresses' which should be of the format 'MAC [IP1 IP2 ..] > >>> + * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid > >> IPv4 > >>> + * or IPv6 address, and stores them in the 'ipv4_addrs' and > 'ipv6_addrs' > >>> + * fields of 'laddrs'. The logical port name is stored in 'lport'. > >>> + * > >>> + * Returns true if at least 'MAC' is found in 'address', false > >> otherwise. > >>> + * > >>> + * The caller must call destroy_lport_addresses() and free(*lport). */ > >>> +static bool > >>> +extract_addresses_with_port(const char *addresses, > >>> + struct lport_addresses *laddrs, > >>> + char **lport) > >>> +{ > >>> + int ofs; > >>> + if (!extract_addresses(addresses, laddrs, &ofs)) { > >>> + return false; > >>> + } else if (!addresses[ofs]) { > >>> + return true; > >>> + } > >>> + > >>> + struct lexer lexer; > >>> + lexer_init(&lexer, addresses + ofs); > >>> + lexer_get(&lexer); > >>> + > >>> + if (lexer.error || lexer.token.type != LEX_T_ID > >>> + || !lexer_match_id(&lexer, "is_chassis_resident")) { > >>> + VLOG_INFO_RL(&rl, "invalid syntax '%s' in address", > addresses); > >>> + lexer_destroy(&lexer); > >>> + return true; > >>> + } > >>> + > >>> + if (!lexer_match(&lexer, LEX_T_LPAREN)) { > >>> + VLOG_INFO_RL(&rl, "Syntax error: expecting '(' after " > >>> + "'is_chassis_resident' in address '%s'", > >> addresses); > >>> + lexer_destroy(&lexer); > >>> + return false; > >>> + } > >>> + > >>> + if (lexer.token.type != LEX_T_STRING) { > >>> + VLOG_INFO_RL(&rl, > >>> + "Syntax error: expecting quoted string after " > >>> + "'is_chassis_resident' in address '%s'", > addresses); > >>> + lexer_destroy(&lexer); > >>> + return false; > >>> + } > >>> + > >>> + *lport = xstrdup(lexer.token.s); > >>> + > >>> + lexer_get(&lexer); > >>> + if (!lexer_match(&lexer, LEX_T_RPAREN)) { > >>> + VLOG_INFO_RL(&rl, "Syntax error: expecting ')' after quoted > >> string in " > >>> + "'is_chassis_resident()' in address '%s'", > >>> + addresses); > >>> + lexer_destroy(&lexer); > >>> + return false; > >>> + } > >>> + > >>> + lexer_destroy(&lexer); > >>> + return true; > >>> +} > >>> + > >> > >> [...] > >> > >>> @@ -5370,12 +5092,19 @@ send_self_originated_neigh_packet(struct rconn > >> *swconn, > >>> > >>> /* Called with in the pinctrl_handler thread context. */ > >>> static long long int > >>> -send_garp_rarp(struct rconn *swconn, struct garp_rarp_data *garp_rarp, > >>> - long long int current_time) > >>> - OVS_REQUIRES(pinctrl_mutex) > >>> +send_garp_rarp(struct rconn *swconn, struct garp_rarp_node *garp_rarp, > >>> + long long int current_time, long long int max_timeout, > >>> + bool continuous) > >>> { > >>> - if (current_time < garp_rarp->announce_time) { > >>> - return garp_rarp->announce_time; > >>> + long long int announce_time, old_announce_time; > >>> + int backoff, old_backoff; > >>> + > >>> + atomic_read(&garp_rarp->announce_time, &announce_time); > >>> + atomic_read(&garp_rarp->backoff, &backoff); > >>> + old_announce_time = announce_time; > >>> + old_backoff = backoff; > >>> + if (current_time < announce_time) { > >>> + return announce_time; > >>> } > >>> > >>> /* Compose and inject a GARP request packet. */ > >>> @@ -5398,14 +5127,19 @@ send_garp_rarp(struct rconn *swconn, struct > >> garp_rarp_data *garp_rarp, > >>> /* Set the next announcement. At most 5 announcements are sent > for > >> a > >>> * vif if garp_rarp_max_timeout is not specified otherwise cap the > >> max > >>> * timeout to garp_rarp_max_timeout. */ > >>> - if (garp_rarp_continuous || garp_rarp->backoff < > >> garp_rarp_max_timeout) { > >>> - garp_rarp->announce_time = current_time + garp_rarp->backoff; > >>> + if (continuous || backoff < max_timeout) { > >>> + announce_time = current_time + backoff; > >>> } else { > >>> - garp_rarp->announce_time = LLONG_MAX; > >>> + announce_time = LLONG_MAX; > >>> } > >>> - garp_rarp->backoff = MIN(garp_rarp_max_timeout, garp_rarp->backoff > >> * 2); > >>> + backoff = MIN(max_timeout, backoff * 2); > >>> > >>> - return garp_rarp->announce_time; > >>> + atomic_compare_exchange_strong(&garp_rarp->announce_time, > >>> + &old_announce_time, announce_time); > >>> + atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff, > >>> + backoff); > >>> + > >>> + return announce_time; > >> > >> I was discussing about this with Ales offline. It's not exactly > correct. > >> > >> If atomic_compare_exchange_strong() returns false it means > 'announce_time' > >> was not written to 'garp_rarp->announce_time'. 'old_announce_time' is > >> however updated in that case. We should return it if cmpxchg() failed. > >> > >> But I think Ales can probably fix this when applying the patch. > >> > > > > > > Yes, this area needs the diff below: > > > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > > index b6b6aa9f9..545f1b174 100644 > > --- a/controller/pinctrl.c > > +++ b/controller/pinctrl.c > > @@ -5134,12 +5134,13 @@ send_garp_rarp(struct rconn *swconn, struct > > garp_rarp_node *garp_rarp, > > } > > backoff = MIN(max_timeout, backoff * 2); > > > > - atomic_compare_exchange_strong(&garp_rarp->announce_time, > > - &old_announce_time, announce_time); > > + bool cmp = atomic_compare_exchange_strong(&garp_rarp->announce_time, > > + &old_announce_time, > > + announce_time); > > atomic_compare_exchange_strong(&garp_rarp->backoff, &old_backoff, > > backoff); > > > > - return announce_time; > > + return cmp ? announce_time : old_announce_time; > > } > > > > > > > > > >> > >>> } > >>> > >>> static void > >> > >> With these small things fixed: > >> Acked-by: Dumitru Ceara <dce...@redhat.com> > >> > >> Regards, > >> Dumitru > >> > >> > > Thanks, > > Ales > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev