Hi Ales, On 6/18/25 7:33 AM, Ales Musil wrote: > The function behavior would be unexpected in cases when there is > BFD flap between two gateway chassis. This would result in two > chassis claiming that the port binding should be claimed by them.
Nit: "claiming that .. should be claimed by them" sounds a bit weird to me. Maybe "two chassis racing to claim the same port binding"? > This leads to multiple problems like unsent gARPs. The gARP problem > on its own was solved [0], but unfortunately reintroduced [1] in > slightly different form. > > The conclusion from the investigation is that the function in > the current form is easy to misuse. To prevent that use > only SB state to determine which chassis has actually claimed > the port binding. This is the desired behavior in cases where > this function was used previously. > > The code that determines which chassis should claim port binding > based on BFD status remains unchanged, thus the behavior during > failover should be the same with minimizing the impact for BFD > flapping cases. > > [0] 289ec19b01ad ("pinctrl: Fix missing garp.") > [1] 05527bd6ccdb ("controller: Extract garp_rarp to engine node."). > I was initially thinking of asking for a test but (as you pointed out offline) we already have one, in the multinode test suite. It might be worth pointing that out in the commit message too. > Signed-off-by: Ales Musil <amu...@redhat.com> > --- > controller/garp_rarp.c | 8 +++----- > controller/lport.c | 19 +++++++------------ > controller/lport.h | 3 --- > controller/ovn-controller.c | 11 ++--------- > controller/physical.c | 3 +-- > controller/pinctrl.c | 22 ++++++++-------------- > controller/pinctrl.h | 1 - > controller/route.c | 5 +---- > controller/route.h | 2 -- > 9 files changed, 22 insertions(+), 52 deletions(-) > > diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c > index ef377e26b..551d8303f 100644 > --- a/controller/garp_rarp.c > +++ b/controller/garp_rarp.c > @@ -17,6 +17,7 @@ > #include <config.h> > > #include "controller/local_data.h" > +#include "lport.h" > #include "mac-binding-index.h" > #include "openvswitch/hmap.h" > #include "openvswitch/vlog.h" > @@ -147,11 +148,7 @@ consider_nat_address(struct ovsdb_idl_index > *sbrec_port_binding_by_name, > { > struct lport_addresses *laddrs = xmalloc(sizeof *laddrs); > char *lport = NULL; > - const struct sbrec_port_binding *cr_pb = NULL; > bool rc = extract_addresses_with_port(nat_address, laddrs, &lport); > - if (lport) { > - cr_pb = lport_lookup_by_name(sbrec_port_binding_by_name, lport); > - } > if (!rc > || (!lport && !strcmp(pb->type, "patch"))) { > destroy_lport_addresses(laddrs); > @@ -160,7 +157,8 @@ consider_nat_address(struct ovsdb_idl_index > *sbrec_port_binding_by_name, > return; > } > if (lport) { > - if (!cr_pb || (cr_pb->chassis != chassis)) { > + if (!lport_is_chassis_resident(sbrec_port_binding_by_name, > + chassis, lport)) { > sset_add(non_local_lports, lport); > destroy_lport_addresses(laddrs); > free(laddrs); > diff --git a/controller/lport.c b/controller/lport.c > index 92de375b5..1ea57a5ca 100644 > --- a/controller/lport.c > +++ b/controller/lport.c > @@ -65,35 +65,30 @@ lport_lookup_by_key(struct ovsdb_idl_index > *sbrec_datapath_binding_by_key, > > bool > lport_pb_is_chassis_resident(const struct sbrec_chassis *chassis, > - const struct sset *active_tunnels, > const struct sbrec_port_binding *pb) > { > if (!pb || !pb->chassis) { > return false; > } > - if (strcmp(pb->type, "chassisredirect")) { > - return pb->chassis == chassis; > - } else { > - return ha_chassis_group_is_active(pb->ha_chassis_group, > - active_tunnels, chassis); > - } > + > + /* Note we relay on SB to provide the information, this is needed in case > + * of flapping BFD when it's not detected by one side. */ Typo: s/relay/rely/ Nit: the comment makes sense when seen in the diff, together with the removed lines, but in the resulting code it might make readers wonder what it means. Maybe we should change it a bit, e.g.: Note: we rely on SB to provide information about who owns the port binding. In particular, for chassisredirect ports, this avoids issues when the underlying BFD state changes are only detected by some of the chassis in the associated HA_Chassis_Group. > + return pb->chassis == chassis; > } > The rest of the patch looks good to me. So, with the comment and commit message fixed up: Acked-by: Dumitru Ceara <dce...@redhat.com> Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev