From: Numan Siddique <[email protected]>
Stateless DNAT in OVN rewrites only the outer IPv4 destination via a
flow-based 'ip4.dst = <logical_ip>' action. This is fine for normal
reply traffic, but it leaves the inner payload of an inbound ICMPv4
error untouched. When such an error reaches the downstream logical
switch pipeline, conntrack tries to correlate the embedded original
packet with the tracked outgoing flow. Because that embedded packet
is the VM's outbound datagram after stateless SNAT, its inner source
still carries the external (post-NAT) IP, the lookup fails, and the
packet is marked ct.inv, causing the LS ACL stage to drop it. The VM
never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks,
and TCP/UDP traffic to destinations beyond a smaller-MTU link
black-holes.
Emit an additional, higher-priority logical flow for each stateless
NAT entry that matches the external IP plus any ICMPv4 Destination
Unreachable error (type 3) and uses the new 'icmp4.inner_ip4.src'
action to un-NAT the embedded inner source back to the logical IP, in
addition to rewriting the outer destination. Every type-3 code quotes
the original datagram (RFC 792), so this is correct for all of them;
it covers Fragmentation Needed (code 4, for PMTUD) as well as the
host/port unreachable codes. After this rewrite,
conntrack in the LS zone can correlate the error with the tracked
outgoing flow, the LS ACL stage allows it, and the VM receives a
well-formed ICMP error whose inner source is its own private address
- so the kernel's PMTU update path installs a correct route
exception.
This behavior is gated by a per-NAT option,
options:stateless_icmp_helper, on the NB_Global NAT entry. It
defaults to true, so the inner-IP rewrite flow is emitted for every
stateless NAT entry out of the box and PMTUD works without any extra
configuration. Operators who do not want the additional flow (for
example to avoid the pinctrl round-trip for these ICMP errors) can
opt out by setting options:stateless_icmp_helper=false on the
individual NAT entry.
The new flow uses priority + 1 so that:
- The exempted-ext-ips bypass flow (priority + 2, emits 'next;')
still wins for traffic explicitly excluded from NAT.
- Non-ICMP traffic falls through to the existing stateless DNAT
flow at the original priority.
The rewrite flow round-trips the packet through ovn-controller, so it is
emitted with the logical router's icmp4-error CoPP meter (when one is
configured), rate-limiting the punt the same way OVN does for the other
controller-handled ICMPv4 errors.
Only IPv4 is wired up; IPv6 stateless NAT is not currently supported
in OVN, so no equivalent action is needed for icmp6 Packet Too Big.
The pinctrl-side implementation of icmp4.inner_ip4.src is in the
previous patch.
Note that this is required when CMS doesn't use the gateway_mtu
option and an external PE router generates the ICMPv4 error
message.
Assisted-by: Claude Opus 4.7, Claude Code
Signed-off-by: Numan Siddique <[email protected]>
---
Documentation/ref/ovn-logical-flows.7.rst | 32 ++++++
NEWS | 7 ++
northd/northd.c | 48 ++++++++-
ovn-nb.xml | 38 ++++++++
tests/multinode.at | 73 ++++++++++++++
tests/ovn-northd.at | 35 ++++++-
tests/ovn.at | 113 +++++++++++++++++++++-
7 files changed, 340 insertions(+), 6 deletions(-)
diff --git a/Documentation/ref/ovn-logical-flows.7.rst
b/Documentation/ref/ovn-logical-flows.7.rst
index ce4dd53559..56e7b3ef00 100644
--- a/Documentation/ref/ovn-logical-flows.7.rst
+++ b/Documentation/ref/ovn-logical-flows.7.rst
@@ -2775,6 +2775,24 @@ flows do not get programmed for load balancers with IPv6
*VIPs*.
rule is of type dnat_and_snat and has ``stateless=true`` in the options, then
the action would be ``ip4/6.dst=(B)``.
+ For an IPv4 stateless ``dnat_and_snat`` rule that has
+ ``options:stateless_icmp_helper`` set to ``true`` (the default), an
+ additional flow at priority *P + 1* is added that matches ``ip && ip4.dst
+ == A && icmp4 && icmp4.type == 3`` with the action
+ ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``, where *P* is the priority of
+ the flow above. This rewrites the outer destination and un-NATs the source
+ embedded in the inbound ICMPv4 Destination Unreachable error payload (from
+ the external IP *A* back to the logical IP *B*) - every type-3 code quotes
+ the original datagram, including ``Fragmentation Needed`` (code 4) - so that
+ conntrack in the downstream logical switch can correlate the error with the
+ tracked outgoing flow and Path MTU discovery (RFC 1191) works end-to-end
+ across stateless NAT. See ``options:stateless_icmp_helper`` in the ``NAT``
+ table of the
+ ``OVN_Northbound`` database (``ovn-nb`` (5)). The priority is *P + 1* so that
+ the ``exempted_ext_ips`` bypass flow (at *P + 2*) still wins for traffic
+ excluded from NAT, and non-ICMP traffic falls through to the regular
+ stateless DNAT flow.
+
If the NAT rule has ``allowed_ext_ips`` configured, then there is an
additional match ``ip4.src == allowed_ext_ips``. Similarly, for IPV6, match
would be ``ip6.src == allowed_ext_ips``.
@@ -2815,6 +2833,20 @@ the egress pipeline.
the IPv6 case. If the NAT rule is of type dnat_and_snat and has
``stateless=true`` in the options, then the action would be
``ip4/6.dst=(B)``.
+ For an IPv4 stateless ``dnat_and_snat`` rule that has
+ ``options:stateless_icmp_helper`` set to ``true`` (the default), an
+ additional priority-101 flow is added that matches ``ip && ip4.dst == B &&
+ inport == GW && icmp4 && icmp4.type == 3`` with the action
+ ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``. This rewrites the outer
+ destination and un-NATs the source embedded in the inbound ICMPv4
+ Destination Unreachable error payload (back to the logical IP *B*) - every
+ type-3 code quotes the original datagram, including ``Fragmentation Needed``
+ (code 4) - so that conntrack in the downstream logical switch can correlate
+ the error with the tracked outgoing flow and Path MTU discovery (RFC 1191)
+ works end-to-end across stateless NAT. See
+ ``options:stateless_icmp_helper`` in the ``NAT`` table of the
+ ``OVN_Northbound`` database (``ovn-nb`` (5)).
+
If the NAT rule cannot be handled in a distributed manner, then the
priority-100 flow above is only programmed on the gateway chassis.
diff --git a/NEWS b/NEWS
index 748ae30eb2..18374dc71b 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,13 @@ Post v26.03.0
The DHCP and unbound-router ARP/ND drop lflows for external
ports were updated to key on the external LSP's inport
accordingly.
+ - Added a new "icmp4.inner_ip4.src" action that rewrites the source
+ IPv4 address embedded in an ICMPv4 error's inner packet. ovn-northd
+ uses it for stateless "dnat_and_snat" rules, controlled by the new
+ "options:stateless_icmp_helper" NAT option (default true), so that
+ inbound ICMPv4 Destination Unreachable errors (type 3, including the
+ "fragmentation needed" message) generated by an external router are
+ un-NATed correctly and Path MTU discovery works through stateless NAT.
OVN v26.03.0 - xxx xx xxxx
--------------------------
diff --git a/northd/northd.c b/northd/northd.c
index f5aa5cca38..3bb9cafaac 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -17584,7 +17584,9 @@ static void
build_lrouter_in_dnat_flow(struct lflow_table *lflows,
const struct ovn_datapath *od,
const struct lr_nat_record *lrnat_rec,
- const struct ovn_nat *nat_entry, struct ds *match,
+ const struct ovn_nat *nat_entry,
+ const struct shash *meter_groups,
+ struct ds *match,
struct ds *actions, bool distributed_nat,
int cidr_bits, bool is_v6,
struct ovn_port *l3dgw_port, bool stateless,
@@ -17657,6 +17659,45 @@ build_lrouter_in_dnat_flow(struct lflow_table *lflows,
ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, priority, ds_cstr(match),
ds_cstr(actions), lflow_ref, WITH_HINT(&nat->header_));
+
+ /* For stateless DNAT, the action above only rewrites the outer IPv4
+ * destination. An inbound ICMPv4 error (RFC 792 / RFC 1191) carries
+ * the original (post-NAT) packet inside its payload, whose source is
+ * the external (post-SNAT) IP. The conntrack-based ACL check in the
+ * downstream logical switch zone uses that inner tuple to match the
+ * reverse direction of the tracked outgoing flow; without un-NATing
+ * the inner ip4.src back to the logical IP, that lookup fails and the
+ * error is dropped as ct.inv.
+ *
+ * Emit a higher-priority flow that matches the same external IP plus any
+ * ICMPv4 Destination Unreachable error (type 3) and rewrites the outer
+ * ip4.dst and the embedded inner ip4.src to the logical IP. Every type-3
+ * code quotes the original datagram (RFC 792), so the inner-source un-NAT
+ * is correct for all of them; this covers Fragmentation Needed (code 4,
+ * for PMTUD per RFC 1191) as well as host/port unreachable and the rest,
+ * so PMTUD works end-to-end through stateless NAT. */
+ if (stateless && !is_v6 &&
+ smap_get_bool(&nat_entry->nb->options, "stateless_icmp_helper",
+ true)) {
+ const char *icmp4_meter = copp_meter_get(COPP_ICMP4_ERR,
+ od->nbr->copp,
+ meter_groups);
+ size_t match_len = match->length;
+
+ ds_put_cstr(match, " && icmp4 && icmp4.type == 3");
+
+ ds_clear(actions);
+ ds_put_format(actions,
+ "ip4.dst=%s; icmp4.inner_ip4.src = %s; next;",
+ nat->logical_ip, nat->logical_ip);
+
+ ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, priority + 1,
+ ds_cstr(match), ds_cstr(actions), lflow_ref,
+ WITH_CTRL_METER(icmp4_meter),
+ WITH_HINT(&nat->header_));
+
+ ds_truncate(match, match_len);
+ }
}
static void
@@ -18404,8 +18445,9 @@ build_lrouter_nat_defrag_and_lb(
lflow_ref);
}
/* S_ROUTER_IN_DNAT */
- build_lrouter_in_dnat_flow(lflows, od, lrnat_rec, nat_entry, match,
- actions, nat_entry->is_distributed,
+ build_lrouter_in_dnat_flow(lflows, od, lrnat_rec, nat_entry,
+ meter_groups, match, actions,
+ nat_entry->is_distributed,
cidr_bits, is_v6, nat_entry->l3dgw_port,
stateless, lflow_ref);
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 15fb1d7e86..2fc8543868 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -5457,6 +5457,44 @@ or
tracking state or not.
</column>
+ <column name="options" key="stateless_icmp_helper"
+ type='{"type": "boolean"}'>
+ <p>
+ Applies only to stateless <code>dnat_and_snat</code> rules (that
+ is, NATs with <ref column="options" key="stateless"/> set to
+ <code>true</code>) on IPv4 addresses. Defaults to
+ <code>true</code>.
+ </p>
+
+ <p>
+ A stateless DNAT rule rewrites only the outer IPv4 destination of
+ inbound packets. For an inbound ICMPv4 error (for example a
+ <code>Fragmentation Needed</code> message generated for Path MTU
+ discovery, RFC 1191), the original packet embedded in the ICMP
+ payload still carries the external, post-NAT IP as its source.
+ When the error reaches the downstream logical switch, conntrack
+ cannot correlate the embedded tuple with the tracked outgoing
+ flow, the packet is marked <code>ct.inv</code> and dropped by the
+ ACL stage, and PMTU discovery breaks.
+ </p>
+
+ <p>
+ When this option is <code>true</code>, <code>ovn-northd</code>
+ emits an additional, higher-priority logical flow in the router
+ ingress DNAT stage that matches ICMPv4 Destination Unreachable
+ (type 3) errors destined to the external IP. Every type-3 code
+ quotes the original datagram, so this covers
+ <code>Fragmentation Needed</code> (code 4) for PMTU discovery as
+ well as host/port unreachable and the rest. It rewrites the outer
+ IPv4 destination to the logical IP and, using the
+ <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
+ inner IPv4 source from the external IP back to the logical IP, so
+ that conntrack can correlate the error and PMTU discovery works
+ end-to-end. Set it to <code>false</code> to suppress this flow for
+ an individual NAT entry.
+ </p>
+ </column>
+
<column name="options" key="add_route">
If set to <code>true</code>, then neighbor routers will have logical
flows added that will allow for routing to the NAT address. It also will
diff --git a/tests/multinode.at b/tests/multinode.at
index 069f2a677d..37ef523f95 100644
--- a/tests/multinode.at
+++ b/tests/multinode.at
@@ -1041,6 +1041,79 @@ run_ns_traffic
AT_CLEANUP
])
+AT_SETUP([ovn multinode stateless NAT - icmp4 PMTUD inner src un-NAT])
+
+# Check that ovn-fake-multinode setup is up and running
+check_fake_multinode_setup
+
+# Delete the multinode NB and OVS resources before starting the test.
+cleanup_multinode_resources
+
+m_as ovn-chassis-1 ip link del sw0p1-p
+
+# Reset geneve tunnels
+for c in ovn-chassis-1 ovn-gw-1
+do
+ m_as $c ovs-vsctl set open . external-ids:ovn-encap-type=geneve
+done
+
+OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys])
+OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys])
+
+# Internal switch with one VM (10.0.0.3) on ovn-chassis-1.
+check multinode_nbctl ls-add sw0
+check multinode_nbctl lsp-add sw0 sw0-port1
+check multinode_nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:03 10.0.0.3"
+
+m_as ovn-chassis-1 /data/create_fake_vm.sh sw0-port1 sw0p1 50:54:00:00:00:03
1342 10.0.0.3 24 10.0.0.1
+
+# Gateway router pinned to ovn-gw-1.
+check multinode_nbctl lr-add lr0 -- set Logical_Router lr0
options:chassis=ovn-gw-1
+check multinode_nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
+check multinode_nbctl lsp-add-router-port sw0 sw0-lr0 lr0-sw0
+
+# External / provider network.
+check multinode_nbctl ls-add public
+check multinode_nbctl lsp-add-localnet-port public ln-public public
+check multinode_nbctl lrp-add lr0 lr0-public 00:11:22:00:ff:01 172.20.0.100/24
+check multinode_nbctl lsp-add-router-port public public-lr0 lr0-public
+check multinode_nbctl lr-route-add lr0 0.0.0.0/0 172.20.0.1
+
+# Stateless dnat_and_snat for the VM. options:stateless_icmp_helper defaults
+# to true, so ovn-northd emits the extra DNAT flow that, for an inbound ICMPv4
+# "fragmentation needed" error destined to 172.20.0.110, un-NATs the inner
+# source from 172.20.0.110 back to 10.0.0.3 (icmp4.inner_ip4.src). Unlike
+# stateful NAT (where conntrack fixes the embedded header automatically),
+# stateless NAT relies entirely on this flow for PMTUD to work.
+check multinode_nbctl --stateless lr-nat-add lr0 dnat_and_snat 172.20.0.110
10.0.0.3
+
+m_as ovn-gw-1 ovs-vsctl set open .
external-ids:ovn-bridge-mappings=public:br-ex
+m_as ovn-chassis-1 ovs-vsctl set open .
external-ids:ovn-bridge-mappings=public:br-ex
+
+m_wait_for_ports_up
+
+# ovn-ext0 routes between the OVN public net (172.20.0.0/24) and a downstream
+# net (172.20.1.0/24); ovn-ext2 (172.20.1.2) is the far host behind it.
+m_add_internal_port ovn-gw-1 ovn-ext0 br-ex ext0 172.20.0.1/24
+m_add_internal_port ovn-gw-1 ovn-ext0 br-ex ext1 172.20.1.1/24
+m_add_internal_port ovn-gw-1 ovn-ext2 br-ex ext2 172.20.1.2/24 172.20.1.1
+
+# Baseline: the VM reaches the far host through stateless NAT.
+M_NS_CHECK_EXEC([ovn-chassis-1], [sw0p1], [ping -q -c 3 -i 0.3 -w 2 172.20.1.2
| FORMAT_PING], \
+[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+# Shrink the downstream link MTU so ovn-ext0 emits ICMPv4 "fragmentation
+# needed" (type 3, code 4) towards 172.20.0.110 (the VM's stateless SNAT
+# address) for oversized DF traffic. The error's inner packet carries
+# 172.20.0.110 as its source; the VM only honors the PMTU signal if OVN
+# un-NATs that inner source back to 10.0.0.3.
+M_NS_CHECK_EXEC([ovn-gw-1], [ovn-ext0], [ip link set dev ext1 mtu 1100])
+M_NS_CHECK_EXEC([ovn-chassis-1], [sw0p1], [ping -c 20 -i 0.5 -s 1300 -M do
172.20.1.2 2>&1 | grep -q "mtu = 1100"])
+
+AT_CLEANUP
+
PMTUD_SWITCH_TESTS(["geneve"])
PMTUD_SWITCH_TESTS(["vxlan"])
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 7f4a88d4ec..6c19690b3b 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1087,13 +1087,29 @@ check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
echo
echo "IPv4: stateless"
check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat 172.16.1.1
50.0.0.11
+dnl Two ip4.dst= flows: the regular stateless DNAT flow plus the default
+dnl stateless_icmp_helper flow that also rewrites the inner ICMPv4 src.
+check_flow_match_sets 2 0 0 2 1 0 0
+dnl stateless_icmp_helper defaults to true, so the inner-IP rewrite flow
+dnl is present.
+check_flow_matches "icmp4.inner_ip4.src" 1
+check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
+
+echo
+echo "IPv4: stateless, stateless_icmp_helper=false"
+check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat 172.16.1.1
50.0.0.11
+check ovn-nbctl --wait=sb set NAT . options:stateless_icmp_helper=false
+dnl With the helper disabled, only the regular stateless DNAT flow remains
+dnl and the inner-IP rewrite flow is gone.
check_flow_match_sets 2 0 0 1 1 0 0
+check_flow_matches "icmp4.inner_ip4.src" 0
check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
echo
echo "IPv4: stateless with match"
check ovn-nbctl --wait=sb --match="udp" --stateless lr-nat-add R1
dnat_and_snat 172.16.1.1 50.0.0.11
-check_flow_match_sets 2 0 0 1 1 0 0
+dnl As above, the stateless_icmp_helper flow adds a second ip4.dst= flow.
+check_flow_match_sets 2 0 0 2 1 0 0
check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
echo
@@ -1118,6 +1134,23 @@ echo
echo "IPv6: stateless with match"
check ovn-nbctl --wait=sb --match="udp" --stateless lr-nat-add R1
dnat_and_snat fd01::1 fd11::2
check_flow_match_sets 2 0 0 0 0 1 1
+check ovn-nbctl lr-nat-del R1 dnat_and_snat fd01::1
+
+echo
+echo "IPv4: stateless, stateless_icmp_helper rate-limited by CoPP"
+dnl The inner-IP rewrite flow round-trips through ovn-controller, so it is
+dnl emitted with the router's icmp4-error CoPP meter (when configured) to
+dnl rate-limit the punt.
+check ovn-nbctl --wait=sb meter-add m-icmp4-err drop 100 pktps 10
+check ovn-nbctl --wait=sb copp-add copp-r1 icmp4-error m-icmp4-err
+check ovn-nbctl --wait=sb lr-copp-add copp-r1 R1
+check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat 172.16.1.1
50.0.0.11
+ovn-sbctl dump-flows R1 > r1-flows
+AT_CAPTURE_FILE([r1-flows])
+check_flow_matches "icmp4.inner_ip4.src" 1
+dnl The stateless_icmp_helper flow carries the icmp4-error meter.
+AT_CHECK([ovn-sbctl list logical_flow | grep "icmp4.inner_ip4.src" -A 2 | \
+ grep -q "controller_meter.*m-icmp4-err"], [0], [], [ignore])
OVN_CLEANUP_NORTHD
AT_CLEANUP
diff --git a/tests/ovn.at b/tests/ovn.at
index 59b41bde82..e43579cdd7 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22714,6 +22714,113 @@ OVN_CLEANUP([hv1])
AT_CLEANUP
])
+AT_SETUP([stateless NAT - icmp4.inner_ip4.src rewrite])
+AT_KEYWORDS([stateless-nat icmp4-inner-ip4-src])
+AT_SKIP_IF([test $HAVE_SCAPY = no])
+ovn_start
+
+# Topology:
+# "external" host (ext1, on the public LS) --- lr0 (gw router) --- vm1 (on
sw0)
+#
+# lr0 has a stateless dnat_and_snat rule that maps the external IP
+# 172.168.0.110 to the logical IP 10.0.0.3 (vm1). With
+# options:stateless_icmp_helper defaulting to true, ovn-northd emits a
+# higher-priority DNAT flow that matches inbound ICMPv4 "Fragmentation
+# Needed" errors (type 3, code 4) destined to the external IP and applies
+# the action "ip4.dst = 10.0.0.3; icmp4.inner_ip4.src = 10.0.0.3;".
+#
+# An inbound ICMP error quotes the VM's original outbound (post-SNAT)
+# datagram, so its inner source is the external IP 172.168.0.110. This test
+# injects such an error from the external side and verifies that
+# ovn-controller (pinctrl) DNATs the outer destination to 10.0.0.3 and
+# un-NATs the inner (embedded) IPv4 source back to 10.0.0.3 - while leaving
+# the inner destination untouched - before delivering the packet to vm1.
+
+vm1_mac=50:54:00:00:00:01
+vm1_ip=10.0.0.3
+rtr_int_mac=00:00:00:00:ff:01
+rtr_ext_mac=00:00:20:20:12:13
+ext1_mac=00:00:00:00:00:99
+ext1_ip=172.168.0.50
+nat_ext_ip=172.168.0.110
+
+# Internal switch with vm1.
+check ovn-nbctl ls-add sw0
+check ovn-nbctl lsp-add sw0 sw0-vm1
+check ovn-nbctl lsp-set-addresses sw0-vm1 "$vm1_mac $vm1_ip"
+
+# Router (gateway router pinned to hv1).
+check ovn-nbctl lr-add lr0
+check ovn-nbctl lrp-add lr0 lr0-sw0 $rtr_int_mac 10.0.0.1/24
+check ovn-nbctl lsp-add-router-port sw0 sw0-lr0 lr0-sw0
+
+# Public switch with the external host.
+check ovn-nbctl ls-add public
+check ovn-nbctl lrp-add lr0 lr0-public $rtr_ext_mac 172.168.0.100/24
+check ovn-nbctl lsp-add-router-port public public-lr0 lr0-public
+check ovn-nbctl lsp-add public ext1
+check ovn-nbctl lsp-set-addresses ext1 "$ext1_mac $ext1_ip"
+
+check ovn-nbctl set logical_router lr0 options:chassis=hv1
+
+# Stateless dnat_and_snat: external 172.168.0.110 <-> logical 10.0.0.3.
+check ovn-nbctl --wait=sb --stateless lr-nat-add lr0 dnat_and_snat \
+ $nat_ext_ip $vm1_ip
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+ set interface hv1-vif1 external-ids:iface-id=sw0-vm1 \
+ options:tx_pcap=hv1/vif1-tx.pcap \
+ options:rxq_pcap=hv1/vif1-rx.pcap \
+ ofport-request=1
+ovs-vsctl -- add-port br-int hv1-vif2 -- \
+ set interface hv1-vif2 external-ids:iface-id=ext1 \
+ options:tx_pcap=hv1/vif2-tx.pcap \
+ options:rxq_pcap=hv1/vif2-rx.pcap \
+ ofport-request=2
+
+wait_for_ports_up
+check ovn-nbctl --wait=hv sync
+
+# The inner packet is the original (post-SNAT) datagram quoted in the ICMP
+# error. Use a raw 8-byte blob for the embedded L4 header so that no inner
+# L4 checksum (which the action does not recompute) is involved.
+inner_l4="0102030405060708"
+
+# ICMP fragmentation-needed error from ext1 to the NAT external IP. The
+# embedded original packet is the VM's outbound datagram after stateless
+# SNAT, so its inner source is the external IP and its inner destination is
+# some far host (50.0.0.100).
+packet=$(fmt_pkt "Ether(dst='$rtr_ext_mac', src='$ext1_mac')/ \
+ IP(src='$ext1_ip', dst='$nat_ext_ip', ttl=64)/ \
+ ICMP(type=3, code=4, nexthopmtu=1400)/ \
+ IP(src='$nat_ext_ip', dst='50.0.0.100', ttl=63, proto=17)/ \
+ bytes.fromhex('$inner_l4')")
+
+# Expected packet delivered to vm1: the router has DNATed the outer
+# destination to 10.0.0.3, decremented the outer TTL, rewritten the L2
+# addresses, and (via pinctrl) un-NATed the inner IPv4 source to 10.0.0.3.
+# The inner destination (50.0.0.100) is left unchanged.
+expected=$(fmt_pkt "Ether(dst='$vm1_mac', src='$rtr_int_mac')/ \
+ IP(src='$ext1_ip', dst='$vm1_ip', ttl=63)/ \
+ ICMP(type=3, code=4, nexthopmtu=1400)/ \
+ IP(src='$vm1_ip', dst='50.0.0.100', ttl=63, proto=17)/ \
+ bytes.fromhex('$inner_l4')")
+echo $expected > vif1.expected
+
+as hv1 reset_pcap_file hv1-vif1 hv1/vif1
+
+check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
+
+OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [vif1.expected])
+
+OVN_CLEANUP([hv1])
+AT_CLEANUP
+
OVN_FOR_EACH_NORTHD([
AT_SETUP([IP packet buffering])
AT_KEYWORDS([ip-buffering])
@@ -26941,14 +27048,16 @@ test_ip vif11 f00000000011 000001010203 $sip $dip
vif-north
# Confirm that South to North traffic works fine.
OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv4/vif-north-tx.pcap],
[vif-north.expected])
-# Confirm that NATing happened without connection tracker
+# Confirm that NATing happened without connection tracker.
+# Two ip4.dst= flows are expected: the regular stateless DNAT flow plus the
+# default stateless_icmp_helper flow (which also carries icmp4.inner_ip4.src).
ovn-sbctl dump-flows router > sbflows
AT_CAPTURE_FILE([sbflows])
AT_CHECK([for regex in ct_snat ct_dnat ip4.dst= ip4.src=; do
grep -c "$regex" sbflows;
done], [0], [0
0
-1
+2
1
])
--
2.54.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev