On Mon, Jun 15, 2026 at 8:38 AM Ilya Maximets <[email protected]> wrote:
>
> On 6/13/26 1:26 AM, [email protected] wrote:
> > From: Numan Siddique <[email protected]>
> >
> > Stateless DNAT in OVN rewrites only the outer IPv4 destination via a
> > flow-based 'ip4.dst = <logical_ip>' action.  This is fine for normal
> > reply traffic, but it leaves the inner payload of an inbound ICMPv4
> > error untouched.  When such an error reaches the downstream logical
> > switch pipeline, conntrack tries to correlate the embedded original
> > packet with the tracked outgoing flow.  Because that embedded packet
> > is the VM's outbound datagram after stateless SNAT, its inner source
> > still carries the external (post-NAT) IP, the lookup fails, and the
> > packet is marked ct.inv, causing the LS ACL stage to drop it.  The VM
> > never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks,
> > and TCP/UDP traffic to destinations beyond a smaller-MTU link
> > black-holes.
> >
> > Emit an additional, higher-priority logical flow for each stateless
> > NAT entry that matches the external IP plus ICMPv4 type 3 code 4
> > (Fragmentation Needed) and uses the new 'icmp4.inner_ip4.src' action
> > to un-NAT the embedded inner source back to the logical IP, in
> > addition to rewriting the outer destination.  After this rewrite,
> > conntrack in the LS zone can correlate the error with the tracked
> > outgoing flow, the LS ACL stage allows it, and the VM receives a
> > well-formed ICMP error whose inner source is its own private address
> > - so the kernel's PMTU update path installs a correct route
> > exception.
> >
> > This behavior is gated by a per-NAT option,
> > options:stateless_icmp_helper, on the NB_Global NAT entry.  It
> > defaults to true, so the inner-IP rewrite flow is emitted for every
> > stateless NAT entry out of the box and PMTUD works without any extra
> > configuration.  Operators who do not want the additional flow (for
> > example to avoid the pinctrl round-trip for these ICMP errors) can
> > opt out by setting options:stateless_icmp_helper=false on the
> > individual NAT entry.
> >
> > The new flow uses priority + 1 so that:
> >   - The exempted-ext-ips bypass flow (priority + 2, emits 'next;')
> >     still wins for traffic explicitly excluded from NAT.
> >   - Non-ICMP traffic falls through to the existing stateless DNAT
> >     flow at the original priority.
> >
> > Only IPv4 is wired up; IPv6 stateless NAT is not currently supported
> > in OVN, so no equivalent action is needed for icmp6 Packet Too Big.
> >
> > The pinctrl-side implementation of icmp4.inner_ip4.src is in the
> > previous patch.
> >
> > Note that this is required when CMS doesn't use the gateway_mtu
> > option and an external PE router generates the ICMPv4 error
> > message.
> >
> > Signed-off-by: Numan Siddique <[email protected]>
> > Assisted-by: Claude Opus 4.7, Claude Code
> > Signed-off-by: Numan Siddique <[email protected]>
> > ---
> >  Documentation/ref/ovn-logical-flows.7.rst |  29 ++++++
> >  NEWS                                      |   7 ++
> >  northd/northd.c                           |  33 +++++++
> >  ovn-nb.xml                                |  35 +++++++
> >  tests/multinode.at                        |  73 ++++++++++++++
> >  tests/ovn-northd.at                       |  18 +++-
> >  tests/ovn.at                              | 113 +++++++++++++++++++++-
> >  7 files changed, 305 insertions(+), 3 deletions(-)
>
> I did not read this in full, but IMO we should not introduce this without 
> CoPP.


Thanks for the comments.  I agree.  I think it makes sense to me to
use the existing
COPP_ICMP4_ERR for this use case as well.

Thanks
Numan


>
>
> Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to