On Tue, Oct 14, 2025 at 03:44:11PM +0200, Max Lamprecht via discuss wrote:
> Hi everyone,
> 
> In the last OVN Community Meeting we talked about synchronizing connection
> tracking information.

Hi everyone,

i just wanted to add an idea for how this could be done, then we can see
where it will take us.

> 
> Our primary goal is to ensure seamless failover during maintenance
> (preserve stateful connections to reduce disruptions)
> We have identified two key use cases, with the first being our priority:
> - Live Migration of VMs(openstack) with Multi-Chassis Port Bindings in OVN
> - LRP Failover on Gateway Chassis

What would definately be needed here would be some coordinated failover
method. For live-migration we already have this with
"activation-strategy". For LRP failovers something like this is not
available. I would therefor focus on the Live Migration case for now.

The high level approach would need to be:
1. The CMS tells OVN that a LSP should have a secondary requested
   chassis, adds an activation-strategy and some information to activate
   conntrack syncing.
2. The ovn-controller on the secondary chassis binds the port and adds
   itself the southbound. In this process it already allocates a
   conntrack zone for the port.
3. The ovn-controller on the secondary chassis would then need to start
   accepting incoming conntrack information in some way (or delegate
   that to some outside tool). That information should be key'ed by the LSP
   UUID (or a similar identifier). Incoming information should only be accepted
   from the primary chassis (e.g. via ip filtering)
4. The ovn-controller on the primary chassis would need to start sending
   conntrack information in some way (or delegate it). This needs to be
   reliable in some way, so that we only start sending if the receiving
   side is actually ready (maybe signaled with LSP status).
5. The primary chassis needs to send an initial dump of the conntrack
   information and afterwards send changes for each change in the
   source conntrack zone.
6. At some point on the secondary chassis the activation-strategy
   triggers (e.g. live-migration has finished). The ovn-controller there
   will enable the local port and set itself as primary chassis in the
   Port_Binding.
   At the same time the secondary chassis must stop accepting conntrack
   information from other chassis.
7. The primary chassis can stop sending conntrack information now


If we need some kind of communication channel between the two chassis that we
can rely on being available we could use the existing tunnels. The
tunnel-id 0 is afaik already only used for OVN purposed (e.g. BFD) so we
could send conntrack information this way as well.

> 
> As we plan to move to DPDK in the future, the ideal mechanism would support
> the kernel and userspace datapath.
> 
> During our conversation we had the following thoughts:
> - external agent such as conntrackd? how does it work with dpdk?
> - some syncing logic in ovs-vswitchd between nodes

In order to support DPDK ovs-vswitchd must be in some way part of the
solution, there is no other component that would have conntrack
information available. ovs-vswitchd could also listen to the kernel
conntrack table, but it would not be necessary there.

My feeling is that trying out an implementation in ovs-vswitchd would be
the most direct approach. There we could also do some kind of "rewrite"
for conntrack zones.

I would probably go with expanding Bridge other_config and add
other_config:ct-zone-replicate in the form of:
`<ZONE-ID>,<UID>,<Type>,<Remote>;...`
Where:
* ZONE-ID: the id of the conntrack zone
* UID: some globally unique id that needs to match on source and
  destination
* Type: "Send" or "Receive"
* Remote: Name of the tunnel port to send this over

This could then be the interface that ovn-controller fills.

ovs-vswitchd could then use a similar protocol as conntrackd to sync the
information. It would send them via the tunnel port with the tunnel-id 0.

What do people think about that high level idea?

Thanks,
Felix

> 
> Has anyone else encountered this problem or has already a solution for dpdk
> conntrack sync
> Are there any existing or planned OVN/OVS features that might help address
> this?
> What architectural approach or layer do you think would be most suitable
> for this functionality?
> 
> We are reaching out to the ovs community to gather more ideas and to hear
> your thoughts on this.
> 
> -Max

> _______________________________________________
> discuss mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to