On Mon, Jun 15, 2026 at 8:38 AM Ilya Maximets <[email protected]> wrote: > > On 6/13/26 1:26 AM, [email protected] wrote: > > From: Numan Siddique <[email protected]> > > > > Stateless DNAT in OVN rewrites only the outer IPv4 destination via a > > flow-based 'ip4.dst = <logical_ip>' action. This is fine for normal > > reply traffic, but it leaves the inner payload of an inbound ICMPv4 > > error untouched. When such an error reaches the downstream logical > > switch pipeline, conntrack tries to correlate the embedded original > > packet with the tracked outgoing flow. Because that embedded packet > > is the VM's outbound datagram after stateless SNAT, its inner source > > still carries the external (post-NAT) IP, the lookup fails, and the > > packet is marked ct.inv, causing the LS ACL stage to drop it. The VM > > never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks, > > and TCP/UDP traffic to destinations beyond a smaller-MTU link > > black-holes. > > > > Emit an additional, higher-priority logical flow for each stateless > > NAT entry that matches the external IP plus ICMPv4 type 3 code 4 > > (Fragmentation Needed) and uses the new 'icmp4.inner_ip4.src' action > > to un-NAT the embedded inner source back to the logical IP, in > > addition to rewriting the outer destination. After this rewrite, > > conntrack in the LS zone can correlate the error with the tracked > > outgoing flow, the LS ACL stage allows it, and the VM receives a > > well-formed ICMP error whose inner source is its own private address > > - so the kernel's PMTU update path installs a correct route > > exception. > > > > This behavior is gated by a per-NAT option, > > options:stateless_icmp_helper, on the NB_Global NAT entry. It > > defaults to true, so the inner-IP rewrite flow is emitted for every > > stateless NAT entry out of the box and PMTUD works without any extra > > configuration. Operators who do not want the additional flow (for > > example to avoid the pinctrl round-trip for these ICMP errors) can > > opt out by setting options:stateless_icmp_helper=false on the > > individual NAT entry. > > > > The new flow uses priority + 1 so that: > > - The exempted-ext-ips bypass flow (priority + 2, emits 'next;') > > still wins for traffic explicitly excluded from NAT. > > - Non-ICMP traffic falls through to the existing stateless DNAT > > flow at the original priority. > > > > Only IPv4 is wired up; IPv6 stateless NAT is not currently supported > > in OVN, so no equivalent action is needed for icmp6 Packet Too Big. > > > > The pinctrl-side implementation of icmp4.inner_ip4.src is in the > > previous patch. > > > > Note that this is required when CMS doesn't use the gateway_mtu > > option and an external PE router generates the ICMPv4 error > > message. > > > > Signed-off-by: Numan Siddique <[email protected]> > > Assisted-by: Claude Opus 4.7, Claude Code > > Signed-off-by: Numan Siddique <[email protected]> > > --- > > Documentation/ref/ovn-logical-flows.7.rst | 29 ++++++ > > NEWS | 7 ++ > > northd/northd.c | 33 +++++++ > > ovn-nb.xml | 35 +++++++ > > tests/multinode.at | 73 ++++++++++++++ > > tests/ovn-northd.at | 18 +++- > > tests/ovn.at | 113 +++++++++++++++++++++- > > 7 files changed, 305 insertions(+), 3 deletions(-) > > I did not read this in full, but IMO we should not introduce this without > CoPP.
Thanks for the comments. I agree. I think it makes sense to me to use the existing COPP_ICMP4_ERR for this use case as well. Thanks Numan > > > Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
