On 6/13/26 1:26 AM, [email protected] wrote: > From: Numan Siddique <[email protected]> > > Stateless DNAT in OVN rewrites only the outer IPv4 destination via a > flow-based 'ip4.dst = <logical_ip>' action. This is fine for normal > reply traffic, but it leaves the inner payload of an inbound ICMPv4 > error untouched. When such an error reaches the downstream logical > switch pipeline, conntrack tries to correlate the embedded original > packet with the tracked outgoing flow. Because that embedded packet > is the VM's outbound datagram after stateless SNAT, its inner source > still carries the external (post-NAT) IP, the lookup fails, and the > packet is marked ct.inv, causing the LS ACL stage to drop it. The VM > never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks, > and TCP/UDP traffic to destinations beyond a smaller-MTU link > black-holes. > > Emit an additional, higher-priority logical flow for each stateless > NAT entry that matches the external IP plus ICMPv4 type 3 code 4 > (Fragmentation Needed) and uses the new 'icmp4.inner_ip4.src' action > to un-NAT the embedded inner source back to the logical IP, in > addition to rewriting the outer destination. After this rewrite, > conntrack in the LS zone can correlate the error with the tracked > outgoing flow, the LS ACL stage allows it, and the VM receives a > well-formed ICMP error whose inner source is its own private address > - so the kernel's PMTU update path installs a correct route > exception. > > This behavior is gated by a per-NAT option, > options:stateless_icmp_helper, on the NB_Global NAT entry. It > defaults to true, so the inner-IP rewrite flow is emitted for every > stateless NAT entry out of the box and PMTUD works without any extra > configuration. Operators who do not want the additional flow (for > example to avoid the pinctrl round-trip for these ICMP errors) can > opt out by setting options:stateless_icmp_helper=false on the > individual NAT entry. > > The new flow uses priority + 1 so that: > - The exempted-ext-ips bypass flow (priority + 2, emits 'next;') > still wins for traffic explicitly excluded from NAT. > - Non-ICMP traffic falls through to the existing stateless DNAT > flow at the original priority. > > Only IPv4 is wired up; IPv6 stateless NAT is not currently supported > in OVN, so no equivalent action is needed for icmp6 Packet Too Big. > > The pinctrl-side implementation of icmp4.inner_ip4.src is in the > previous patch. > > Note that this is required when CMS doesn't use the gateway_mtu > option and an external PE router generates the ICMPv4 error > message. > > Signed-off-by: Numan Siddique <[email protected]> > Assisted-by: Claude Opus 4.7, Claude Code > Signed-off-by: Numan Siddique <[email protected]> > --- > Documentation/ref/ovn-logical-flows.7.rst | 29 ++++++ > NEWS | 7 ++ > northd/northd.c | 33 +++++++ > ovn-nb.xml | 35 +++++++ > tests/multinode.at | 73 ++++++++++++++ > tests/ovn-northd.at | 18 +++- > tests/ovn.at | 113 +++++++++++++++++++++- > 7 files changed, 305 insertions(+), 3 deletions(-)
I did not read this in full, but IMO we should not introduce this without CoPP. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
