On Thu, Jun 19, 2025 at 2:11 PM Dumitru Ceara <dce...@redhat.com> wrote:

> Hi Ales,
>

Thank you Dumitru,


>
> On 6/18/25 7:33 AM, Ales Musil wrote:
> > The function behavior would be unexpected in cases when there is
> > BFD flap between two gateway chassis. This would result in two
> > chassis claiming that the port binding should be claimed by them.
>
> Nit: "claiming that .. should be claimed by them" sounds a bit weird to
> me.  Maybe "two chassis racing to claim the same port binding"?
>


that sounds better, thanks.


>
> > This leads to multiple problems like unsent gARPs. The gARP problem
> > on its own was solved [0], but unfortunately reintroduced [1] in
> > slightly different form.
> >
> > The conclusion from the investigation is that the function in
> > the current form is easy to misuse. To prevent that use
> > only SB state to determine which chassis has actually claimed
> > the port binding. This is the desired behavior in cases where
> > this function was used previously.
> >
> > The code that determines which chassis should claim port binding
> > based on BFD status remains unchanged, thus the behavior during
> > failover should be the same with minimizing the impact for BFD
> > flapping cases.
> >
> > [0] 289ec19b01ad ("pinctrl: Fix missing garp.")
> > [1] 05527bd6ccdb ("controller: Extract garp_rarp to engine node.").
> >
>
> I was initially thinking of asking for a test but (as you pointed out
> offline) we already have one, in the multinode test suite.  It might be
> worth pointing that out in the commit message too.
>
> > Signed-off-by: Ales Musil <amu...@redhat.com>
> > ---
> >  controller/garp_rarp.c      |  8 +++-----
> >  controller/lport.c          | 19 +++++++------------
> >  controller/lport.h          |  3 ---
> >  controller/ovn-controller.c | 11 ++---------
> >  controller/physical.c       |  3 +--
> >  controller/pinctrl.c        | 22 ++++++++--------------
> >  controller/pinctrl.h        |  1 -
> >  controller/route.c          |  5 +----
> >  controller/route.h          |  2 --
> >  9 files changed, 22 insertions(+), 52 deletions(-)
> >
> > diff --git a/controller/garp_rarp.c b/controller/garp_rarp.c
> > index ef377e26b..551d8303f 100644
> > --- a/controller/garp_rarp.c
> > +++ b/controller/garp_rarp.c
> > @@ -17,6 +17,7 @@
> >  #include <config.h>
> >
> >  #include "controller/local_data.h"
> > +#include "lport.h"
> >  #include "mac-binding-index.h"
> >  #include "openvswitch/hmap.h"
> >  #include "openvswitch/vlog.h"
> > @@ -147,11 +148,7 @@ consider_nat_address(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> >  {
> >      struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
> >      char *lport = NULL;
> > -    const struct sbrec_port_binding *cr_pb = NULL;
> >      bool rc = extract_addresses_with_port(nat_address, laddrs, &lport);
> > -    if (lport) {
> > -        cr_pb = lport_lookup_by_name(sbrec_port_binding_by_name, lport);
> > -    }
> >      if (!rc
> >          || (!lport && !strcmp(pb->type, "patch"))) {
> >          destroy_lport_addresses(laddrs);
> > @@ -160,7 +157,8 @@ consider_nat_address(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> >          return;
> >      }
> >      if (lport) {
> > -        if (!cr_pb || (cr_pb->chassis != chassis)) {
> > +        if (!lport_is_chassis_resident(sbrec_port_binding_by_name,
> > +                                       chassis, lport)) {
> >              sset_add(non_local_lports, lport);
> >              destroy_lport_addresses(laddrs);
> >              free(laddrs);
> > diff --git a/controller/lport.c b/controller/lport.c
> > index 92de375b5..1ea57a5ca 100644
> > --- a/controller/lport.c
> > +++ b/controller/lport.c
> > @@ -65,35 +65,30 @@ lport_lookup_by_key(struct ovsdb_idl_index
> *sbrec_datapath_binding_by_key,
> >
> >  bool
> >  lport_pb_is_chassis_resident(const struct sbrec_chassis *chassis,
> > -                             const struct sset *active_tunnels,
> >                               const struct sbrec_port_binding *pb)
> >  {
> >      if (!pb || !pb->chassis) {
> >          return false;
> >      }
> > -    if (strcmp(pb->type, "chassisredirect")) {
> > -        return pb->chassis == chassis;
> > -    } else {
> > -        return ha_chassis_group_is_active(pb->ha_chassis_group,
> > -                                          active_tunnels, chassis);
> > -    }
> > +
> > +    /* Note we relay on SB to provide the information, this is needed
> in case
> > +     * of flapping BFD when it's not detected by one side. */
>
> Typo: s/relay/rely/
>
> Nit: the comment makes sense when seen in the diff, together with the
> removed lines, but in the resulting code it might make readers wonder
> what it means.  Maybe we should change it a bit, e.g.:
>
> Note: we rely on SB to provide information about who owns the port
> binding.  In particular, for chassisredirect ports, this avoids issues
> when the underlying BFD state changes are only detected by some of the
> chassis in the associated HA_Chassis_Group.
>


I have adjusted the comment.


>
> > +    return pb->chassis == chassis;
> >  }
> >
>
> The rest of the patch looks good to me.  So, with the comment and commit
> message fixed up:
>
> Acked-by: Dumitru Ceara <dce...@redhat.com>
>
> Regards,
> Dumitru
>
>
>
I have addressed both comments, went ahead and merged this into main.

Regards,
Ales
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to