On Fri, Jun 06, 2025 at 10:12:24AM +0200, Xavier Simonart via dev wrote: > Hi Felix > > Thanks for your feedback. > Some comments below.
Hi Xavier, thanks a lot for the feedback. > Thanks > Xavier > > On Thu, Jun 5, 2025 at 1:56 PM Felix Huettner <[email protected]> > wrote: > > > On Wed, Jun 04, 2025 at 02:49:15PM +0200, Ales Musil via dev wrote: > > > On Mon, Jun 2, 2025 at 9:42 AM Xavier Simonart via dev < > > > [email protected]> wrote: > > > > > > > In HA, when bfd was going down then up for the higher priority gw, > > > > new garp were not generated. > > > > > > > > Signed-off-by: Xavier Simonart <[email protected]> > > > > --- > > > > > > > > > > Thank you Xavier, > > > > > > there is one small nit below that I took care of during the merge. > > > > > > > > > > controller/pinctrl.c | 23 +++++++++-------------- > > > > 1 file changed, 9 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > > > > index e10ce06cf..b5b2bb7b5 100644 > > > > --- a/controller/pinctrl.c > > > > +++ b/controller/pinctrl.c > > > > @@ -242,7 +242,6 @@ static void send_garp_rarp_prepare( > > > > const struct ovsrec_bridge *, > > > > const struct sbrec_chassis *, > > > > const struct hmap *local_datapaths, > > > > - const struct sset *active_tunnels, > > > > const struct ovsrec_open_vswitch_table *ovs_table) > > > > OVS_REQUIRES(pinctrl_mutex); > > > > static void send_garp_rarp_run(struct rconn *swconn, > > > > @@ -4099,8 +4098,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, > > > > send_garp_rarp_prepare(ovnsb_idl_txn, > > sbrec_port_binding_by_datapath, > > > > sbrec_port_binding_by_name, > > > > sbrec_mac_binding_by_lport_ip, > > ecmp_nh_table, > > > > - br_int, chassis, local_datapaths, > > > > active_tunnels, > > > > - ovs_table); > > > > + br_int, chassis, local_datapaths, > > ovs_table); > > > > prepare_ipv6_ras(local_active_ports_ras, > > sbrec_port_binding_by_name); > > > > prepare_ipv6_prefixd(ovnsb_idl_txn, sbrec_port_binding_by_name, > > > > local_active_ports_ipv6_pd, chassis, > > > > @@ -6598,16 +6596,18 @@ consider_nat_address(struct ovsdb_idl_index > > > > *sbrec_port_binding_by_name, > > > > const struct sbrec_port_binding *pb, > > > > struct sset *nat_address_keys, > > > > const struct sbrec_chassis *chassis, > > > > - const struct sset *active_tunnels, > > > > struct shash *nat_addresses) > > > > { > > > > struct lport_addresses *laddrs = xmalloc(sizeof *laddrs); > > > > char *lport = NULL; > > > > - if (!extract_addresses_with_port(nat_address, laddrs, &lport) > > > > + const struct sbrec_port_binding *cr_pb = NULL; > > > > + int rc = extract_addresses_with_port(nat_address, laddrs, &lport); > > > > > > > > > > nit: bool rc > > > > > > > > > > + if (lport) { > > > > + cr_pb = lport_lookup_by_name(sbrec_port_binding_by_name, > > lport); > > > > + } > > > > + if (!rc > > > > || (!lport && !strcmp(pb->type, "patch")) > > > > - || (lport && !lport_is_chassis_resident( > > > > - sbrec_port_binding_by_name, chassis, > > > > - active_tunnels, lport))) { > > > > + || (lport && (!cr_pb || (cr_pb->chassis != chassis)))) { > > > > Hi Xavier, Hi Ales, > > > > i have concerns over this change. > > > > Let me first try to summarize my understanding of the problem this is > > trying to solve, because i am not fully sure about that yet. > > > > I assume the scenario is: > > * You have 2 gateway chassis connected to the same localnet > > * There is an LRP with a HA_Chassis_Group that utilizes these two > > gateways > > * When a gateway claims the LRP it sends out garp's towards to localnet > > port, to let the fabric know where to send traffic to > > > > The issue this patch should address now (if i got it right): > > * Let us assume gw1 has a higher priority than gw2, so gw1 sent out garps > > in the past > > * Now there is a network partition between gw1 and gw2 > > * Since the bfd session between the gateways will die, both gw1 and gw2 > > will assume they should be active > > * Since gw2 was previously not active it sends garps, at least parts of > > the fabric now forward traffic to gw2 instead of gw1 > > * The network partition heals and the bfd sessions come back up > > * gw2 will stop handling the LRP > > * gw1 will NOT send garps, since it never "lost" the LRP from its > > perspective > > * the network fabric will partially direct packets to the wrong chassis > > > Yes, that's correct: inbound traffic will be sent to gw2. > > > > > My understanding of the patch is now that instead of using the bfd > > signaling for garps, we just use it to claim ports in the southbound. > > That's correct. Just one slight comment: we were already claiming the port > in the southbound db. We now use this claim > for the garp generation instead of using the bfd status directly. > > > We > > then generate garps based on these southbound ports. That ensures that > > either gw2 can not send garps because it does not reach southbound. Or > > if it claims the LRP gw1 will notice, claim it back and this will > > trigger sending the garps again. > > > That's correct. And worth mentioning, the more "general" case when BFD went > down: if gw1 died, gw2 will claim the port (same as w/o the patch) but now > relies on the port claim to send the garp. > The main difference, however, is from gw1 perspective: w/o the patch, gw1 > always "wanted" to send garp, i.e. it only stopped sending garps because the > garp backoff (16 seconds/5 garps). The fact that bfd comes up or down did > not change anything from that perspective. The fact that, for some time, > gw2 claims the port did not change anything either. > With the patch gw1 stops sending garp when it does not claim the port, and > hence reset the garp "count". And it restarts sending garp (with a count > /backoff time of 0) when it claims the port. Ok, thanks for all the clarification. > > > > > Now my point of concern is that this means failover of LRPs is bound to > > the southbound being up. If southbound is not up and gw1 would die, gw2 > > can not claim the port and will therefor not send garps, basically > > causing a traffic interruption until southbound is restored. > > > I think that, unrelated to this patch, if southbound is down, then, in most > cases, failover will not work. > If southbound is not up, southbound will be seen as "read-only" by gw2 > ovn-controller as soon as it sent a transaction (e.g. mac binding, FDB > update ...). > When southound is seen as read-only, gw2 ovn-controller will not try to > claim the port,and the proper flows (e.g. replying to the arp request) will > not be installed. > If southound is down for enough time (based on inactivity probe) when bfd > goes down, I think that the same happens: garp is not generated. > There is one case where it was probably working: if after southbound went > down (since not too long), gw2 ovn-controller had not sent yet any update > to southound, then gw2 could claim the port; it fails to update southbound, > but the proper flows can be installed. > > > > > This seems to be not the intended behaviour however, otherwise the > > compute chassis would not need to have bfd sessions to the gateway > > chassis and could just rely on the southbound port_binding entry. > > > That's a very valid remark. But as said earlier, I think that failover w/o > southound could only work in a very limited case. I just tried it out and yes it seems to not generaly work (asside from maybe specific cases). I always was under a different impression, but i never validated it before. > Also it raises a different question which should be part of a different > patch set: if bfd goes down only between gw1 and gw2 > (but not between compute1 and gw1 or between compute1 and gw2), then gw2 > would (at least for some time) claim the port (as bfd is down), and send > garp > but outgoing traffic would still be sent to gw1 by compute1 (as bfd between > compute1 and gw1 is up). > Or, if bfd only goes down between compute1 and gw1, then outgoing traffic > would be sent through gw2 (despite the fact gw1 claims the port). > I think that this is a different topic (which we are looking at). Would it make sense to talk about it in one of the community meetings? Unfortunately i can not attend the next one, but we could gather different views or requirements there? > > > > > If all of the above is not some general misunderstanding and this > > problem realy exists then i would have the following idea for an > > solution: > > * Revert to the previous behaviour using lport_is_chassis_resident > > * Reset the garp timers if any bfd session that is relevant for the LRP > > changed the status from down->up > > > This would definitely fix the most important case targeted by this patch: > w/o any patch, when bfd goes up, gw1 does not (restart) generate garp, > causing a potentially long down time. > > > > > Noticing when bfd changed from down->up would be by the following > > condition for all interfaces that point to chassis that are part of the > > HA_Chassis_Group: > > `bfd_status:forwarding==true && bfd_status:flap_count != > > previous_flap_count` > > > > But with your proposal I think that we keep the fact that when bfd between > gw1 & gw2 is down, gw1 stops generating garp (backoff timeout), while gw2 > might claim the port. > > > With this we have a southbound free way of signaling when another > > chassis might have tried to claim the LRP from its local perspective. > > > > While i have a new version for > > https://patchwork.ozlabs.org/project/ovn/list/%3Fseries%3D459489 > > would not post it until this is clarified. > > > > Thanks a lot, > > Felix > > > Thanks for the very detailed discussion! > Xavier Thanks also for the new insights. That ways realy helpful for me. Thanks a lot, Felix > > > > > > > > > destroy_lport_addresses(laddrs); > > > > free(laddrs); > > > > free(lport); > > > > @@ -6635,7 +6635,6 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index > > > > *sbrec_port_binding_by_name, > > > > struct sset *nat_address_keys, > > > > struct sset *local_l3gw_ports, > > > > const struct sbrec_chassis *chassis, > > > > - const struct sset *active_tunnels, > > > > struct shash *nat_addresses) > > > > { > > > > const char *gw_port; > > > > @@ -6652,7 +6651,6 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index > > > > *sbrec_port_binding_by_name, > > > > consider_nat_address(sbrec_port_binding_by_name, > > > > pb->nat_addresses[i], pb, > > > > nat_address_keys, chassis, > > > > - active_tunnels, > > > > nat_addresses); > > > > } > > > > } else { > > > > @@ -6664,7 +6662,6 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index > > > > *sbrec_port_binding_by_name, > > > > consider_nat_address(sbrec_port_binding_by_name, > > > > nat_addresses_options, pb, > > > > nat_address_keys, chassis, > > > > - active_tunnels, > > > > nat_addresses); > > > > } > > > > } > > > > @@ -6773,7 +6770,6 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn > > > > *ovnsb_idl_txn, > > > > const struct ovsrec_bridge *br_int, > > > > const struct sbrec_chassis *chassis, > > > > const struct hmap *local_datapaths, > > > > - const struct sset *active_tunnels, > > > > const struct ovsrec_open_vswitch_table > > *ovs_table) > > > > OVS_REQUIRES(pinctrl_mutex) > > > > { > > > > @@ -6809,8 +6805,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn > > > > *ovnsb_idl_txn, > > > > > > > > get_nat_addresses_and_keys(sbrec_port_binding_by_name, > > > > &nat_ip_keys, &local_l3gw_ports, > > > > - chassis, active_tunnels, > > > > - &nat_addresses); > > > > + chassis, &nat_addresses); > > > > > > > > /* For deleted ports and deleted nat ips, remove from > > > > * send_garp_rarp_data. */ > > > > -- > > > > 2.47.1 > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > [email protected] > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > Regards, > > > Ales > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
