On Tue, Jun 16, 2026 at 4:54 PM <[email protected]> wrote:
>
> From: Numan Siddique <[email protected]>
>
> The stateless_icmp_helper flow added previously matched ICMPv4
> Destination Unreachable errors (type 3), so it fixed Path MTU discovery
> and the other unreachable cases, but left the remaining ICMPv4 error
> types un-NATed.  Traceroute, which relies on Time Exceeded (type 11)
> replies from intermediate hops, still hit the ct.inv drop in the
> downstream logical switch: the error quotes the VM's post-SNAT datagram,
> so the embedded inner source is the external IP and conntrack cannot
> correlate it with the tracked outgoing probe.
>
> The inner-source un-NAT performed by icmp4.inner_ip4.src is correct for
> every ICMPv4 error that quotes the original datagram.  Broaden the
> helper's match from "icmp4.type == 3" to "icmp4.type == {3, 11, 12}",
> adding Time Exceeded (type 11, traceroute) and Parameter Problem
> (type 12) to the Destination Unreachable errors already handled.
> Redirect (type 5) is intentionally excluded: it has special NAT
> semantics and is not an error that needs flow correlation.
>
> The pinctrl-side action is already ICMP-type-agnostic, so no controller
> change is needed; this is purely a northd match change.
>
> Assisted-by: Claude Opus 4.8, Claude Code
> Signed-off-by: Numan Siddique <[email protected]>

Recheck-request: github-robot-_Build_and_Test

> ---
>  Documentation/ref/ovn-logical-flows.7.rst | 30 +++++++++++----------
>  NEWS                                      |  8 +++---
>  northd/northd.c                           | 20 ++++++++------
>  ovn-nb.xml                                | 32 +++++++++++------------
>  tests/ovn.at                              | 23 ++++++++++++++++
>  5 files changed, 72 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/ref/ovn-logical-flows.7.rst 
> b/Documentation/ref/ovn-logical-flows.7.rst
> index 56e7b3ef00..121791423a 100644
> --- a/Documentation/ref/ovn-logical-flows.7.rst
> +++ b/Documentation/ref/ovn-logical-flows.7.rst
> @@ -2778,16 +2778,17 @@ flows do not get programmed for load balancers with 
> IPv6 *VIPs*.
>    For an IPv4 stateless ``dnat_and_snat`` rule that has
>    ``options:stateless_icmp_helper`` set to ``true`` (the default), an
>    additional flow at priority *P + 1* is added that matches ``ip && ip4.dst
> -  == A && icmp4 && icmp4.type == 3`` with the action
> +  == A && icmp4 && icmp4.type == {3, 11, 12}`` with the action
>    ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``, where *P* is the priority 
> of
>    the flow above. This rewrites the outer destination and un-NATs the source
> -  embedded in the inbound ICMPv4 Destination Unreachable error payload (from
> -  the external IP *A* back to the logical IP *B*) - every type-3 code quotes
> -  the original datagram, including ``Fragmentation Needed`` (code 4) - so 
> that
> +  embedded in the inbound ICMPv4 error payload (from the external IP *A* back
> +  to the logical IP *B*) for every error type that quotes the original
> +  datagram - Destination Unreachable (type 3, which includes ``Fragmentation
> +  Needed``), Time Exceeded (type 11) and Parameter Problem (type 12) - so 
> that
>    conntrack in the downstream logical switch can correlate the error with the
> -  tracked outgoing flow and Path MTU discovery (RFC 1191) works end-to-end
> -  across stateless NAT. See ``options:stateless_icmp_helper`` in the ``NAT``
> -  table of the
> +  tracked outgoing flow. This makes Path MTU discovery (RFC 1191) and
> +  traceroute work end-to-end across stateless NAT. See
> +  ``options:stateless_icmp_helper`` in the ``NAT`` table of the
>    ``OVN_Northbound`` database (``ovn-nb`` (5)). The priority is *P + 1* so 
> that
>    the ``exempted_ext_ips`` bypass flow (at *P + 2*) still wins for traffic
>    excluded from NAT, and non-ICMP traffic falls through to the regular
> @@ -2836,14 +2837,15 @@ the egress pipeline.
>    For an IPv4 stateless ``dnat_and_snat`` rule that has
>    ``options:stateless_icmp_helper`` set to ``true`` (the default), an
>    additional priority-101 flow is added that matches ``ip && ip4.dst == B &&
> -  inport == GW && icmp4 && icmp4.type == 3`` with the action
> +  inport == GW && icmp4 && icmp4.type == {3, 11, 12}`` with the action
>    ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``. This rewrites the outer
> -  destination and un-NATs the source embedded in the inbound ICMPv4
> -  Destination Unreachable error payload (back to the logical IP *B*) - every
> -  type-3 code quotes the original datagram, including ``Fragmentation 
> Needed``
> -  (code 4) - so that conntrack in the downstream logical switch can correlate
> -  the error with the tracked outgoing flow and Path MTU discovery (RFC 1191)
> -  works end-to-end across stateless NAT. See
> +  destination and un-NATs the source embedded in the inbound ICMPv4 error
> +  payload (back to the logical IP *B*) for every error type that quotes the
> +  original datagram - Destination Unreachable (type 3, which includes
> +  ``Fragmentation Needed``), Time Exceeded (type 11) and Parameter Problem
> +  (type 12) - so that conntrack in the downstream logical switch can 
> correlate
> +  the error with the tracked outgoing flow. This makes Path MTU discovery
> +  (RFC 1191) and traceroute work end-to-end across stateless NAT. See
>    ``options:stateless_icmp_helper`` in the ``NAT`` table of the
>    ``OVN_Northbound`` database (``ovn-nb`` (5)).
>
> diff --git a/NEWS b/NEWS
> index 18374dc71b..aeb1cc5c3f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -37,9 +37,11 @@ Post v26.03.0
>       IPv4 address embedded in an ICMPv4 error's inner packet.  ovn-northd
>       uses it for stateless "dnat_and_snat" rules, controlled by the new
>       "options:stateless_icmp_helper" NAT option (default true), so that
> -     inbound ICMPv4 Destination Unreachable errors (type 3, including the
> -     "fragmentation needed" message) generated by an external router are
> -     un-NATed correctly and Path MTU discovery works through stateless NAT.
> +     inbound ICMPv4 errors that quote the original datagram - Destination
> +     Unreachable (type 3, including "fragmentation needed"), Time Exceeded
> +     (type 11) and Parameter Problem (type 12) - generated by an external
> +     router are un-NATed correctly.  This makes Path MTU discovery and
> +     traceroute work through stateless NAT.
>
>  OVN v26.03.0 - xxx xx xxxx
>  --------------------------
> diff --git a/northd/northd.c b/northd/northd.c
> index 3bb9cafaac..e0820dbfb4 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -17669,13 +17669,16 @@ build_lrouter_in_dnat_flow(struct lflow_table 
> *lflows,
>       * the inner ip4.src back to the logical IP, that lookup fails and the
>       * error is dropped as ct.inv.
>       *
> -     * Emit a higher-priority flow that matches the same external IP plus any
> -     * ICMPv4 Destination Unreachable error (type 3) and rewrites the outer
> -     * ip4.dst and the embedded inner ip4.src to the logical IP.  Every 
> type-3
> -     * code quotes the original datagram (RFC 792), so the inner-source 
> un-NAT
> -     * is correct for all of them; this covers Fragmentation Needed (code 4,
> -     * for PMTUD per RFC 1191) as well as host/port unreachable and the rest,
> -     * so PMTUD works end-to-end through stateless NAT. */
> +     * Emit a higher-priority flow that matches the same external IP plus
> +     * any ICMPv4 error that quotes the original datagram - Destination
> +     * Unreachable (type 3, which includes code 4 Fragmentation Needed for
> +     * PMTUD), Time Exceeded (type 11, used by traceroute) and Parameter
> +     * Problem (type 12) - and rewrites the outer ip4.dst and the embedded
> +     * inner ip4.src to the logical IP.  The inner-source un-NAT is correct
> +     * for every such error, so PMTUD, traceroute and the other ICMP errors
> +     * all work end-to-end through stateless NAT.  Redirect (type 5) is
> +     * intentionally excluded: it has special NAT semantics and is not an
> +     * error that needs flow correlation. */
>      if (stateless && !is_v6 &&
>          smap_get_bool(&nat_entry->nb->options, "stateless_icmp_helper",
>                        true)) {
> @@ -17684,7 +17687,8 @@ build_lrouter_in_dnat_flow(struct lflow_table *lflows,
>                                                   meter_groups);
>          size_t match_len = match->length;
>
> -        ds_put_cstr(match, " && icmp4 && icmp4.type == 3");
> +        ds_put_cstr(match,
> +                    " && icmp4 && icmp4.type == {3, 11, 12}");
>
>          ds_clear(actions);
>          ds_put_format(actions,
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 2fc8543868..33a6dc6765 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -5470,28 +5470,28 @@ or
>          A stateless DNAT rule rewrites only the outer IPv4 destination of
>          inbound packets. For an inbound ICMPv4 error (for example a
>          <code>Fragmentation Needed</code> message generated for Path MTU
> -        discovery, RFC 1191), the original packet embedded in the ICMP
> -        payload still carries the external, post-NAT IP as its source.
> -        When the error reaches the downstream logical switch, conntrack
> -        cannot correlate the embedded tuple with the tracked outgoing
> -        flow, the packet is marked <code>ct.inv</code> and dropped by the
> -        ACL stage, and PMTU discovery breaks.
> +        discovery, RFC 1191, or a <code>Time Exceeded</code> message
> +        generated for a traceroute probe), the original packet embedded in
> +        the ICMP payload still carries the external, post-NAT IP as its
> +        source. When the error reaches the downstream logical switch,
> +        conntrack cannot correlate the embedded tuple with the tracked
> +        outgoing flow, the packet is marked <code>ct.inv</code> and dropped
> +        by the ACL stage, and PMTU discovery and traceroute break.
>        </p>
>
>        <p>
>          When this option is <code>true</code>, <code>ovn-northd</code>
>          emits an additional, higher-priority logical flow in the router
> -        ingress DNAT stage that matches ICMPv4 Destination Unreachable
> -        (type 3) errors destined to the external IP. Every type-3 code
> -        quotes the original datagram, so this covers
> -        <code>Fragmentation Needed</code> (code 4) for PMTU discovery as
> -        well as host/port unreachable and the rest. It rewrites the outer
> -        IPv4 destination to the logical IP and, using the
> -        <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
> +        ingress DNAT stage that matches the ICMPv4 errors which quote the
> +        original datagram - Destination Unreachable (type 3, which includes
> +        <code>Fragmentation Needed</code>), Time Exceeded (type 11) and
> +        Parameter Problem (type 12) - destined to the external IP. It
> +        rewrites the outer IPv4 destination to the logical IP and, using
> +        the <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
>          inner IPv4 source from the external IP back to the logical IP, so
> -        that conntrack can correlate the error and PMTU discovery works
> -        end-to-end. Set it to <code>false</code> to suppress this flow for
> -        an individual NAT entry.
> +        that conntrack can correlate the error. This makes PMTU discovery
> +        and traceroute work end-to-end. Set it to <code>false</code> to
> +        suppress this flow for an individual NAT entry.
>        </p>
>      </column>
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index e43579cdd7..d67ca2ead5 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -22816,6 +22816,29 @@ as hv1 reset_pcap_file hv1-vif1 hv1/vif1
>
>  check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
>
> +# ICMP Time Exceeded error from a transit router to the NAT external IP.
> +# The embedded original packet is the VM's outbound probe after stateless
> +# SNAT, so its inner source is the external IP and its inner destination is
> +# the traceroute target (50.0.0.100).
> +packet=$(fmt_pkt "Ether(dst='$rtr_ext_mac', src='$ext1_mac')/ \
> +                  IP(src='$ext1_ip', dst='$nat_ext_ip', ttl=64)/ \
> +                  ICMP(type=11, code=0)/ \
> +                  IP(src='$nat_ext_ip', dst='50.0.0.100', ttl=1, proto=17)/ \
> +                  bytes.fromhex('$inner_l4')")
> +
> +# Expected packet delivered to vm1: outer destination DNATed to 10.0.0.3,
> +# outer TTL decremented, L2 rewritten, and (via pinctrl) the inner IPv4
> +# source un-NATed to 10.0.0.3.  The inner destination (50.0.0.100) is left
> +# unchanged.
> +expected=$(fmt_pkt "Ether(dst='$vm1_mac', src='$rtr_int_mac')/ \
> +                    IP(src='$ext1_ip', dst='$vm1_ip', ttl=63)/ \
> +                    ICMP(type=11, code=0)/ \
> +                    IP(src='$vm1_ip', dst='50.0.0.100', ttl=1, proto=17)/ \
> +                    bytes.fromhex('$inner_l4')")
> +echo $expected >> vif1.expected
> +
> +check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
> +
>  OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [vif1.expected])
>
>  OVN_CLEANUP([hv1])
> --
> 2.54.0
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to