On 6/13/26 1:26 AM, [email protected] wrote:
> From: Numan Siddique <[email protected]>
> 
> Stateless DNAT in OVN rewrites only the outer IPv4 destination via a
> flow-based 'ip4.dst = <logical_ip>' action.  This is fine for normal
> reply traffic, but it leaves the inner payload of an inbound ICMPv4
> error untouched.  When such an error reaches the downstream logical
> switch pipeline, conntrack tries to correlate the embedded original
> packet with the tracked outgoing flow.  Because that embedded packet
> is the VM's outbound datagram after stateless SNAT, its inner source
> still carries the external (post-NAT) IP, the lookup fails, and the
> packet is marked ct.inv, causing the LS ACL stage to drop it.  The VM
> never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks,
> and TCP/UDP traffic to destinations beyond a smaller-MTU link
> black-holes.
> 
> Emit an additional, higher-priority logical flow for each stateless
> NAT entry that matches the external IP plus ICMPv4 type 3 code 4
> (Fragmentation Needed) and uses the new 'icmp4.inner_ip4.src' action
> to un-NAT the embedded inner source back to the logical IP, in
> addition to rewriting the outer destination.  After this rewrite,
> conntrack in the LS zone can correlate the error with the tracked
> outgoing flow, the LS ACL stage allows it, and the VM receives a
> well-formed ICMP error whose inner source is its own private address
> - so the kernel's PMTU update path installs a correct route
> exception.
> 
> This behavior is gated by a per-NAT option,
> options:stateless_icmp_helper, on the NB_Global NAT entry.  It
> defaults to true, so the inner-IP rewrite flow is emitted for every
> stateless NAT entry out of the box and PMTUD works without any extra
> configuration.  Operators who do not want the additional flow (for
> example to avoid the pinctrl round-trip for these ICMP errors) can
> opt out by setting options:stateless_icmp_helper=false on the
> individual NAT entry.
> 
> The new flow uses priority + 1 so that:
>   - The exempted-ext-ips bypass flow (priority + 2, emits 'next;')
>     still wins for traffic explicitly excluded from NAT.
>   - Non-ICMP traffic falls through to the existing stateless DNAT
>     flow at the original priority.
> 
> Only IPv4 is wired up; IPv6 stateless NAT is not currently supported
> in OVN, so no equivalent action is needed for icmp6 Packet Too Big.
> 
> The pinctrl-side implementation of icmp4.inner_ip4.src is in the
> previous patch.
> 
> Note that this is required when CMS doesn't use the gateway_mtu
> option and an external PE router generates the ICMPv4 error
> message.
> 
> Signed-off-by: Numan Siddique <[email protected]>
> Assisted-by: Claude Opus 4.7, Claude Code
> Signed-off-by: Numan Siddique <[email protected]>
> ---
>  Documentation/ref/ovn-logical-flows.7.rst |  29 ++++++
>  NEWS                                      |   7 ++
>  northd/northd.c                           |  33 +++++++
>  ovn-nb.xml                                |  35 +++++++
>  tests/multinode.at                        |  73 ++++++++++++++
>  tests/ovn-northd.at                       |  18 +++-
>  tests/ovn.at                              | 113 +++++++++++++++++++++-
>  7 files changed, 305 insertions(+), 3 deletions(-)

I did not read this in full, but IMO we should not introduce this without CoPP.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to