On Fri, Jun 06, 2025 at 10:12:24AM +0200, Xavier Simonart via dev wrote:
> Hi Felix
> 
> Thanks for your feedback.
> Some comments below.

Hi Xavier,

thanks a lot for the feedback.

> Thanks
> Xavier
> 
> On Thu, Jun 5, 2025 at 1:56 PM Felix Huettner <[email protected]>
> wrote:
> 
> > On Wed, Jun 04, 2025 at 02:49:15PM +0200, Ales Musil via dev wrote:
> > > On Mon, Jun 2, 2025 at 9:42 AM Xavier Simonart via dev <
> > > [email protected]> wrote:
> > >
> > > > In HA, when bfd was going down then up for the higher priority gw,
> > > > new garp were not generated.
> > > >
> > > > Signed-off-by: Xavier Simonart <[email protected]>
> > > > ---
> > > >
> > >
> > > Thank you Xavier,
> > >
> > > there is one small nit below that I took care of during the merge.
> > >
> > >
> > > >  controller/pinctrl.c | 23 +++++++++--------------
> > > >  1 file changed, 9 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > > > index e10ce06cf..b5b2bb7b5 100644
> > > > --- a/controller/pinctrl.c
> > > > +++ b/controller/pinctrl.c
> > > > @@ -242,7 +242,6 @@ static void send_garp_rarp_prepare(
> > > >      const struct ovsrec_bridge *,
> > > >      const struct sbrec_chassis *,
> > > >      const struct hmap *local_datapaths,
> > > > -    const struct sset *active_tunnels,
> > > >      const struct ovsrec_open_vswitch_table *ovs_table)
> > > >      OVS_REQUIRES(pinctrl_mutex);
> > > >  static void send_garp_rarp_run(struct rconn *swconn,
> > > > @@ -4099,8 +4098,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > >      send_garp_rarp_prepare(ovnsb_idl_txn,
> > sbrec_port_binding_by_datapath,
> > > >                             sbrec_port_binding_by_name,
> > > >                             sbrec_mac_binding_by_lport_ip,
> > ecmp_nh_table,
> > > > -                           br_int, chassis, local_datapaths,
> > > > active_tunnels,
> > > > -                           ovs_table);
> > > > +                           br_int, chassis, local_datapaths,
> > ovs_table);
> > > >      prepare_ipv6_ras(local_active_ports_ras,
> > sbrec_port_binding_by_name);
> > > >      prepare_ipv6_prefixd(ovnsb_idl_txn, sbrec_port_binding_by_name,
> > > >                           local_active_ports_ipv6_pd, chassis,
> > > > @@ -6598,16 +6596,18 @@ consider_nat_address(struct ovsdb_idl_index
> > > > *sbrec_port_binding_by_name,
> > > >                       const struct sbrec_port_binding *pb,
> > > >                       struct sset *nat_address_keys,
> > > >                       const struct sbrec_chassis *chassis,
> > > > -                     const struct sset *active_tunnels,
> > > >                       struct shash *nat_addresses)
> > > >  {
> > > >      struct lport_addresses *laddrs = xmalloc(sizeof *laddrs);
> > > >      char *lport = NULL;
> > > > -    if (!extract_addresses_with_port(nat_address, laddrs, &lport)
> > > > +    const struct sbrec_port_binding *cr_pb = NULL;
> > > > +    int rc = extract_addresses_with_port(nat_address, laddrs, &lport);
> > > >
> > >
> > > nit: bool rc
> > >
> > >
> > > > +    if (lport) {
> > > > +        cr_pb = lport_lookup_by_name(sbrec_port_binding_by_name,
> > lport);
> > > > +    }
> > > > +    if (!rc
> > > >          || (!lport && !strcmp(pb->type, "patch"))
> > > > -        || (lport && !lport_is_chassis_resident(
> > > > -                sbrec_port_binding_by_name, chassis,
> > > > -                active_tunnels, lport))) {
> > > > +        || (lport && (!cr_pb || (cr_pb->chassis != chassis)))) {
> >
> > Hi Xavier, Hi Ales,
> >
> > i have concerns over this change.
> >
> > Let me first try to summarize my understanding of the problem this is
> > trying to solve, because i am not fully sure about that yet.
> >
> > I assume the scenario is:
> > * You have 2 gateway chassis connected to the same localnet
> > * There is an LRP with a HA_Chassis_Group that utilizes these two
> >   gateways
> > * When a gateway claims the LRP it sends out garp's towards to localnet
> >   port, to let the fabric know where to send traffic to
> >
> > The issue this patch should address now (if i got it right):
> > * Let us assume gw1 has a higher priority than gw2, so gw1 sent out garps
> > in the past
> > * Now there is a network partition between gw1 and gw2
> > * Since the bfd session between the gateways will die, both gw1 and gw2
> >   will assume they should be active
> > * Since gw2 was previously not active it sends garps, at least parts of
> >   the fabric now forward traffic to gw2 instead of gw1
> > * The network partition heals and the bfd sessions come back up
> > * gw2 will stop handling the LRP
> > * gw1 will NOT send garps, since it never "lost" the LRP from its
> >   perspective
> > * the network fabric will partially direct packets to the wrong chassis
> >
> Yes, that's correct: inbound traffic will be sent to gw2.
> 
> >
> > My understanding of the patch is now that instead of using the bfd
> > signaling for garps, we just use it to claim ports in the southbound.
> 
> That's correct. Just one slight comment: we were already claiming the port
> in the southbound db. We now use this claim
> for the garp generation instead of using the bfd status directly.
> 
> > We
> > then generate garps based on these southbound ports. That ensures that
> > either gw2 can not send garps because it does not reach southbound. Or
> > if it claims the LRP gw1 will notice, claim it back and this will
> > trigger sending the garps again.
> >
> That's correct. And worth mentioning, the more "general" case when BFD went
> down: if gw1 died, gw2 will claim the port (same as w/o the patch) but now
> relies on the port claim to send the garp.
> The main difference, however, is from gw1 perspective: w/o the patch, gw1
> always "wanted" to send garp, i.e. it only stopped sending garps because the
> garp backoff (16 seconds/5 garps). The fact that bfd comes up or down did
> not change anything from that perspective. The fact that, for some time,
> gw2 claims the port did not change anything either.
> With the patch gw1 stops sending garp when it does not claim the port, and
> hence reset the garp "count". And it restarts sending garp (with a count
> /backoff time of 0) when it claims the port.

Ok, thanks for all the clarification.

> 
> >
> > Now my point of concern is that this means failover of LRPs is bound to
> > the southbound being up. If southbound is not up and gw1 would die, gw2
> > can not claim the port and will therefor not send garps, basically
> > causing a traffic interruption until southbound is restored.
> >
> I think that, unrelated to this patch, if southbound is down, then, in most
> cases, failover will not work.
> If southbound is not up, southbound will be seen as "read-only" by gw2
> ovn-controller as soon as it sent a transaction (e.g. mac binding, FDB
> update ...).
> When southound is seen as read-only, gw2 ovn-controller will not try to
> claim the port,and the proper flows (e.g. replying to the arp request) will
> not be installed.
> If southound is down for enough time (based on inactivity probe) when bfd
> goes down, I think that the same happens: garp is not generated.
> There is one case where it was probably working: if after southbound went
> down (since not too long), gw2 ovn-controller had not sent yet any update
> to southound, then gw2 could claim the port; it fails to update southbound,
> but the proper flows can be installed.
> 
> >
> > This seems to be not the intended behaviour however, otherwise the
> > compute chassis would not need to have bfd sessions to the gateway
> > chassis and could just rely on the southbound port_binding entry.
> >
> That's a very valid remark. But as said earlier, I think that failover w/o
> southound could only work in a very limited case.

I just tried it out and yes it seems to not generaly work (asside from
maybe specific cases). I always was under a different impression, but i
never validated it before.

> Also it raises a different question which should be part of a different
> patch set: if bfd goes down only between gw1 and gw2
> (but not between compute1 and gw1 or between compute1 and gw2), then gw2
> would (at least for some time) claim the port (as bfd is down), and send
> garp
> but outgoing traffic would still be sent to gw1 by compute1 (as bfd between
> compute1 and gw1 is up).
> Or, if bfd only goes down between compute1 and gw1, then outgoing traffic
> would be sent through gw2 (despite the fact gw1 claims the port).
> I think that this is a different topic (which we are looking at).

Would it make sense to talk about it in one of the community meetings?
Unfortunately i can not attend the next one, but we could gather
different views or requirements there?

> 
> >
> > If all of the above is not some general misunderstanding and this
> > problem realy exists then i would have the following idea for an
> > solution:
> > * Revert to the previous behaviour using lport_is_chassis_resident
> > * Reset the garp timers if any bfd session that is relevant for the LRP
> >   changed the status from down->up
> >
> This would definitely fix the most important case targeted by this patch:
> w/o any patch, when bfd goes up, gw1 does not (restart) generate garp,
> causing a potentially long down time.
> 
> >
> > Noticing when bfd changed from down->up would be by the following
> > condition for all interfaces that point to chassis that are part of the
> > HA_Chassis_Group:
> > `bfd_status:forwarding==true && bfd_status:flap_count !=
> > previous_flap_count`
> >
> > But with your proposal I think that we keep the fact that when bfd between
> gw1 & gw2  is down, gw1 stops generating garp (backoff timeout), while gw2
> might claim the port.
> 
> > With this we have a southbound free way of signaling when another
> > chassis might have tried to claim the LRP from its local perspective.
> >
> > While i have a new version for
> > https://patchwork.ozlabs.org/project/ovn/list/%3Fseries%3D459489
> > would not post it until this is clarified.
> >
> > Thanks a lot,
> > Felix
> >
> Thanks for the very detailed discussion!
> Xavier

Thanks also for the new insights. That ways realy helpful for me.

Thanks a lot,
Felix

> 
> >
> >
> > > >          destroy_lport_addresses(laddrs);
> > > >          free(laddrs);
> > > >          free(lport);
> > > > @@ -6635,7 +6635,6 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index
> > > > *sbrec_port_binding_by_name,
> > > >                             struct sset *nat_address_keys,
> > > >                             struct sset *local_l3gw_ports,
> > > >                             const struct sbrec_chassis *chassis,
> > > > -                           const struct sset *active_tunnels,
> > > >                             struct shash *nat_addresses)
> > > >  {
> > > >      const char *gw_port;
> > > > @@ -6652,7 +6651,6 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index
> > > > *sbrec_port_binding_by_name,
> > > >                  consider_nat_address(sbrec_port_binding_by_name,
> > > >                                       pb->nat_addresses[i], pb,
> > > >                                       nat_address_keys, chassis,
> > > > -                                     active_tunnels,
> > > >                                       nat_addresses);
> > > >              }
> > > >          } else {
> > > > @@ -6664,7 +6662,6 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index
> > > > *sbrec_port_binding_by_name,
> > > >                  consider_nat_address(sbrec_port_binding_by_name,
> > > >                                       nat_addresses_options, pb,
> > > >                                       nat_address_keys, chassis,
> > > > -                                     active_tunnels,
> > > >                                       nat_addresses);
> > > >              }
> > > >          }
> > > > @@ -6773,7 +6770,6 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn
> > > > *ovnsb_idl_txn,
> > > >                         const struct ovsrec_bridge *br_int,
> > > >                         const struct sbrec_chassis *chassis,
> > > >                         const struct hmap *local_datapaths,
> > > > -                       const struct sset *active_tunnels,
> > > >                         const struct ovsrec_open_vswitch_table
> > *ovs_table)
> > > >      OVS_REQUIRES(pinctrl_mutex)
> > > >  {
> > > > @@ -6809,8 +6805,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn
> > > > *ovnsb_idl_txn,
> > > >
> > > >      get_nat_addresses_and_keys(sbrec_port_binding_by_name,
> > > >                                 &nat_ip_keys, &local_l3gw_ports,
> > > > -                               chassis, active_tunnels,
> > > > -                               &nat_addresses);
> > > > +                               chassis, &nat_addresses);
> > > >
> > > >      /* For deleted ports and deleted nat ips, remove from
> > > >       * send_garp_rarp_data. */
> > > > --
> > > > 2.47.1
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > [email protected]
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > > >
> > > Regards,
> > > Ales
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to