On Tue, Jun 16, 2026 at 4:54 PM <[email protected]> wrote:
>
> From: Numan Siddique <[email protected]>
>
> The stateless_icmp_helper flow added previously matched ICMPv4
> Destination Unreachable errors (type 3), so it fixed Path MTU discovery
> and the other unreachable cases, but left the remaining ICMPv4 error
> types un-NATed. Traceroute, which relies on Time Exceeded (type 11)
> replies from intermediate hops, still hit the ct.inv drop in the
> downstream logical switch: the error quotes the VM's post-SNAT datagram,
> so the embedded inner source is the external IP and conntrack cannot
> correlate it with the tracked outgoing probe.
>
> The inner-source un-NAT performed by icmp4.inner_ip4.src is correct for
> every ICMPv4 error that quotes the original datagram. Broaden the
> helper's match from "icmp4.type == 3" to "icmp4.type == {3, 11, 12}",
> adding Time Exceeded (type 11, traceroute) and Parameter Problem
> (type 12) to the Destination Unreachable errors already handled.
> Redirect (type 5) is intentionally excluded: it has special NAT
> semantics and is not an error that needs flow correlation.
>
> The pinctrl-side action is already ICMP-type-agnostic, so no controller
> change is needed; this is purely a northd match change.
>
> Assisted-by: Claude Opus 4.8, Claude Code
> Signed-off-by: Numan Siddique <[email protected]>
Recheck-request: github-robot-_Build_and_Test
> ---
> Documentation/ref/ovn-logical-flows.7.rst | 30 +++++++++++----------
> NEWS | 8 +++---
> northd/northd.c | 20 ++++++++------
> ovn-nb.xml | 32 +++++++++++------------
> tests/ovn.at | 23 ++++++++++++++++
> 5 files changed, 72 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/ref/ovn-logical-flows.7.rst
> b/Documentation/ref/ovn-logical-flows.7.rst
> index 56e7b3ef00..121791423a 100644
> --- a/Documentation/ref/ovn-logical-flows.7.rst
> +++ b/Documentation/ref/ovn-logical-flows.7.rst
> @@ -2778,16 +2778,17 @@ flows do not get programmed for load balancers with
> IPv6 *VIPs*.
> For an IPv4 stateless ``dnat_and_snat`` rule that has
> ``options:stateless_icmp_helper`` set to ``true`` (the default), an
> additional flow at priority *P + 1* is added that matches ``ip && ip4.dst
> - == A && icmp4 && icmp4.type == 3`` with the action
> + == A && icmp4 && icmp4.type == {3, 11, 12}`` with the action
> ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``, where *P* is the priority
> of
> the flow above. This rewrites the outer destination and un-NATs the source
> - embedded in the inbound ICMPv4 Destination Unreachable error payload (from
> - the external IP *A* back to the logical IP *B*) - every type-3 code quotes
> - the original datagram, including ``Fragmentation Needed`` (code 4) - so
> that
> + embedded in the inbound ICMPv4 error payload (from the external IP *A* back
> + to the logical IP *B*) for every error type that quotes the original
> + datagram - Destination Unreachable (type 3, which includes ``Fragmentation
> + Needed``), Time Exceeded (type 11) and Parameter Problem (type 12) - so
> that
> conntrack in the downstream logical switch can correlate the error with the
> - tracked outgoing flow and Path MTU discovery (RFC 1191) works end-to-end
> - across stateless NAT. See ``options:stateless_icmp_helper`` in the ``NAT``
> - table of the
> + tracked outgoing flow. This makes Path MTU discovery (RFC 1191) and
> + traceroute work end-to-end across stateless NAT. See
> + ``options:stateless_icmp_helper`` in the ``NAT`` table of the
> ``OVN_Northbound`` database (``ovn-nb`` (5)). The priority is *P + 1* so
> that
> the ``exempted_ext_ips`` bypass flow (at *P + 2*) still wins for traffic
> excluded from NAT, and non-ICMP traffic falls through to the regular
> @@ -2836,14 +2837,15 @@ the egress pipeline.
> For an IPv4 stateless ``dnat_and_snat`` rule that has
> ``options:stateless_icmp_helper`` set to ``true`` (the default), an
> additional priority-101 flow is added that matches ``ip && ip4.dst == B &&
> - inport == GW && icmp4 && icmp4.type == 3`` with the action
> + inport == GW && icmp4 && icmp4.type == {3, 11, 12}`` with the action
> ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``. This rewrites the outer
> - destination and un-NATs the source embedded in the inbound ICMPv4
> - Destination Unreachable error payload (back to the logical IP *B*) - every
> - type-3 code quotes the original datagram, including ``Fragmentation
> Needed``
> - (code 4) - so that conntrack in the downstream logical switch can correlate
> - the error with the tracked outgoing flow and Path MTU discovery (RFC 1191)
> - works end-to-end across stateless NAT. See
> + destination and un-NATs the source embedded in the inbound ICMPv4 error
> + payload (back to the logical IP *B*) for every error type that quotes the
> + original datagram - Destination Unreachable (type 3, which includes
> + ``Fragmentation Needed``), Time Exceeded (type 11) and Parameter Problem
> + (type 12) - so that conntrack in the downstream logical switch can
> correlate
> + the error with the tracked outgoing flow. This makes Path MTU discovery
> + (RFC 1191) and traceroute work end-to-end across stateless NAT. See
> ``options:stateless_icmp_helper`` in the ``NAT`` table of the
> ``OVN_Northbound`` database (``ovn-nb`` (5)).
>
> diff --git a/NEWS b/NEWS
> index 18374dc71b..aeb1cc5c3f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -37,9 +37,11 @@ Post v26.03.0
> IPv4 address embedded in an ICMPv4 error's inner packet. ovn-northd
> uses it for stateless "dnat_and_snat" rules, controlled by the new
> "options:stateless_icmp_helper" NAT option (default true), so that
> - inbound ICMPv4 Destination Unreachable errors (type 3, including the
> - "fragmentation needed" message) generated by an external router are
> - un-NATed correctly and Path MTU discovery works through stateless NAT.
> + inbound ICMPv4 errors that quote the original datagram - Destination
> + Unreachable (type 3, including "fragmentation needed"), Time Exceeded
> + (type 11) and Parameter Problem (type 12) - generated by an external
> + router are un-NATed correctly. This makes Path MTU discovery and
> + traceroute work through stateless NAT.
>
> OVN v26.03.0 - xxx xx xxxx
> --------------------------
> diff --git a/northd/northd.c b/northd/northd.c
> index 3bb9cafaac..e0820dbfb4 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -17669,13 +17669,16 @@ build_lrouter_in_dnat_flow(struct lflow_table
> *lflows,
> * the inner ip4.src back to the logical IP, that lookup fails and the
> * error is dropped as ct.inv.
> *
> - * Emit a higher-priority flow that matches the same external IP plus any
> - * ICMPv4 Destination Unreachable error (type 3) and rewrites the outer
> - * ip4.dst and the embedded inner ip4.src to the logical IP. Every
> type-3
> - * code quotes the original datagram (RFC 792), so the inner-source
> un-NAT
> - * is correct for all of them; this covers Fragmentation Needed (code 4,
> - * for PMTUD per RFC 1191) as well as host/port unreachable and the rest,
> - * so PMTUD works end-to-end through stateless NAT. */
> + * Emit a higher-priority flow that matches the same external IP plus
> + * any ICMPv4 error that quotes the original datagram - Destination
> + * Unreachable (type 3, which includes code 4 Fragmentation Needed for
> + * PMTUD), Time Exceeded (type 11, used by traceroute) and Parameter
> + * Problem (type 12) - and rewrites the outer ip4.dst and the embedded
> + * inner ip4.src to the logical IP. The inner-source un-NAT is correct
> + * for every such error, so PMTUD, traceroute and the other ICMP errors
> + * all work end-to-end through stateless NAT. Redirect (type 5) is
> + * intentionally excluded: it has special NAT semantics and is not an
> + * error that needs flow correlation. */
> if (stateless && !is_v6 &&
> smap_get_bool(&nat_entry->nb->options, "stateless_icmp_helper",
> true)) {
> @@ -17684,7 +17687,8 @@ build_lrouter_in_dnat_flow(struct lflow_table *lflows,
> meter_groups);
> size_t match_len = match->length;
>
> - ds_put_cstr(match, " && icmp4 && icmp4.type == 3");
> + ds_put_cstr(match,
> + " && icmp4 && icmp4.type == {3, 11, 12}");
>
> ds_clear(actions);
> ds_put_format(actions,
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 2fc8543868..33a6dc6765 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -5470,28 +5470,28 @@ or
> A stateless DNAT rule rewrites only the outer IPv4 destination of
> inbound packets. For an inbound ICMPv4 error (for example a
> <code>Fragmentation Needed</code> message generated for Path MTU
> - discovery, RFC 1191), the original packet embedded in the ICMP
> - payload still carries the external, post-NAT IP as its source.
> - When the error reaches the downstream logical switch, conntrack
> - cannot correlate the embedded tuple with the tracked outgoing
> - flow, the packet is marked <code>ct.inv</code> and dropped by the
> - ACL stage, and PMTU discovery breaks.
> + discovery, RFC 1191, or a <code>Time Exceeded</code> message
> + generated for a traceroute probe), the original packet embedded in
> + the ICMP payload still carries the external, post-NAT IP as its
> + source. When the error reaches the downstream logical switch,
> + conntrack cannot correlate the embedded tuple with the tracked
> + outgoing flow, the packet is marked <code>ct.inv</code> and dropped
> + by the ACL stage, and PMTU discovery and traceroute break.
> </p>
>
> <p>
> When this option is <code>true</code>, <code>ovn-northd</code>
> emits an additional, higher-priority logical flow in the router
> - ingress DNAT stage that matches ICMPv4 Destination Unreachable
> - (type 3) errors destined to the external IP. Every type-3 code
> - quotes the original datagram, so this covers
> - <code>Fragmentation Needed</code> (code 4) for PMTU discovery as
> - well as host/port unreachable and the rest. It rewrites the outer
> - IPv4 destination to the logical IP and, using the
> - <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
> + ingress DNAT stage that matches the ICMPv4 errors which quote the
> + original datagram - Destination Unreachable (type 3, which includes
> + <code>Fragmentation Needed</code>), Time Exceeded (type 11) and
> + Parameter Problem (type 12) - destined to the external IP. It
> + rewrites the outer IPv4 destination to the logical IP and, using
> + the <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
> inner IPv4 source from the external IP back to the logical IP, so
> - that conntrack can correlate the error and PMTU discovery works
> - end-to-end. Set it to <code>false</code> to suppress this flow for
> - an individual NAT entry.
> + that conntrack can correlate the error. This makes PMTU discovery
> + and traceroute work end-to-end. Set it to <code>false</code> to
> + suppress this flow for an individual NAT entry.
> </p>
> </column>
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index e43579cdd7..d67ca2ead5 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -22816,6 +22816,29 @@ as hv1 reset_pcap_file hv1-vif1 hv1/vif1
>
> check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
>
> +# ICMP Time Exceeded error from a transit router to the NAT external IP.
> +# The embedded original packet is the VM's outbound probe after stateless
> +# SNAT, so its inner source is the external IP and its inner destination is
> +# the traceroute target (50.0.0.100).
> +packet=$(fmt_pkt "Ether(dst='$rtr_ext_mac', src='$ext1_mac')/ \
> + IP(src='$ext1_ip', dst='$nat_ext_ip', ttl=64)/ \
> + ICMP(type=11, code=0)/ \
> + IP(src='$nat_ext_ip', dst='50.0.0.100', ttl=1, proto=17)/ \
> + bytes.fromhex('$inner_l4')")
> +
> +# Expected packet delivered to vm1: outer destination DNATed to 10.0.0.3,
> +# outer TTL decremented, L2 rewritten, and (via pinctrl) the inner IPv4
> +# source un-NATed to 10.0.0.3. The inner destination (50.0.0.100) is left
> +# unchanged.
> +expected=$(fmt_pkt "Ether(dst='$vm1_mac', src='$rtr_int_mac')/ \
> + IP(src='$ext1_ip', dst='$vm1_ip', ttl=63)/ \
> + ICMP(type=11, code=0)/ \
> + IP(src='$vm1_ip', dst='50.0.0.100', ttl=1, proto=17)/ \
> + bytes.fromhex('$inner_l4')")
> +echo $expected >> vif1.expected
> +
> +check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
> +
> OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [vif1.expected])
>
> OVN_CLEANUP([hv1])
> --
> 2.54.0
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev