On Jun 16, Numan Siddique wrote:
> From: Numan Siddique <[email protected]>
>
> Stateless DNAT in OVN rewrites only the outer IPv4 destination via a
> flow-based 'ip4.dst = <logical_ip>' action. This is fine for normal
> reply traffic, but it leaves the inner payload of an inbound ICMPv4
> error untouched. When such an error reaches the downstream logical
> switch pipeline, conntrack tries to correlate the embedded original
> packet with the tracked outgoing flow. Because that embedded packet
> is the VM's outbound datagram after stateless SNAT, its inner source
> still carries the external (post-NAT) IP, the lookup fails, and the
> packet is marked ct.inv, causing the LS ACL stage to drop it. The VM
> never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks,
> and TCP/UDP traffic to destinations beyond a smaller-MTU link
> black-holes.
>
> Emit an additional, higher-priority logical flow for each stateless
> NAT entry that matches the external IP plus any ICMPv4 Destination
> Unreachable error (type 3) and uses the new 'icmp4.inner_ip4.src'
> action to un-NAT the embedded inner source back to the logical IP, in
> addition to rewriting the outer destination. Every type-3 code quotes
> the original datagram (RFC 792), so this is correct for all of them;
> it covers Fragmentation Needed (code 4, for PMTUD) as well as the
> host/port unreachable codes. After this rewrite,
> conntrack in the LS zone can correlate the error with the tracked
> outgoing flow, the LS ACL stage allows it, and the VM receives a
> well-formed ICMP error whose inner source is its own private address
> - so the kernel's PMTU update path installs a correct route
> exception.
>
> This behavior is gated by a per-NAT option,
> options:stateless_icmp_helper, on the NB_Global NAT entry. It
> defaults to true, so the inner-IP rewrite flow is emitted for every
> stateless NAT entry out of the box and PMTUD works without any extra
> configuration. Operators who do not want the additional flow (for
> example to avoid the pinctrl round-trip for these ICMP errors) can
> opt out by setting options:stateless_icmp_helper=false on the
> individual NAT entry.
>
> The new flow uses priority + 1 so that:
> - The exempted-ext-ips bypass flow (priority + 2, emits 'next;')
> still wins for traffic explicitly excluded from NAT.
> - Non-ICMP traffic falls through to the existing stateless DNAT
> flow at the original priority.
>
> The rewrite flow round-trips the packet through ovn-controller, so it is
> emitted with the logical router's icmp4-error CoPP meter (when one is
> configured), rate-limiting the punt the same way OVN does for the other
> controller-handled ICMPv4 errors.
>
> Only IPv4 is wired up; IPv6 stateless NAT is not currently supported
> in OVN, so no equivalent action is needed for icmp6 Packet Too Big.
>
> The pinctrl-side implementation of icmp4.inner_ip4.src is in the
> previous patch.
>
> Note that this is required when CMS doesn't use the gateway_mtu
> option and an external PE router generates the ICMPv4 error
> message.
>
> Assisted-by: Claude Opus 4.7, Claude Code
> Signed-off-by: Numan Siddique <[email protected]>
> ---
> Documentation/ref/ovn-logical-flows.7.rst | 32 ++++++
> NEWS | 7 ++
> northd/northd.c | 48 ++++++++-
> ovn-nb.xml | 38 ++++++++
> tests/multinode.at | 73 ++++++++++++++
> tests/ovn-northd.at | 35 ++++++-
> tests/ovn.at | 113 +++++++++++++++++++++-
> 7 files changed, 340 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/ref/ovn-logical-flows.7.rst
> b/Documentation/ref/ovn-logical-flows.7.rst
> index ce4dd53559..56e7b3ef00 100644
> --- a/Documentation/ref/ovn-logical-flows.7.rst
> +++ b/Documentation/ref/ovn-logical-flows.7.rst
> @@ -2775,6 +2775,24 @@ flows do not get programmed for load balancers with
> IPv6 *VIPs*.
> rule is of type dnat_and_snat and has ``stateless=true`` in the options,
> then
> the action would be ``ip4/6.dst=(B)``.
>
> + For an IPv4 stateless ``dnat_and_snat`` rule that has
> + ``options:stateless_icmp_helper`` set to ``true`` (the default), an
> + additional flow at priority *P + 1* is added that matches ``ip && ip4.dst
> + == A && icmp4 && icmp4.type == 3`` with the action
> + ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``, where *P* is the priority
> of
> + the flow above. This rewrites the outer destination and un-NATs the source
> + embedded in the inbound ICMPv4 Destination Unreachable error payload (from
> + the external IP *A* back to the logical IP *B*) - every type-3 code quotes
> + the original datagram, including ``Fragmentation Needed`` (code 4) - so
> that
> + conntrack in the downstream logical switch can correlate the error with the
> + tracked outgoing flow and Path MTU discovery (RFC 1191) works end-to-end
> + across stateless NAT. See ``options:stateless_icmp_helper`` in the ``NAT``
> + table of the
> + ``OVN_Northbound`` database (``ovn-nb`` (5)). The priority is *P + 1* so
> that
> + the ``exempted_ext_ips`` bypass flow (at *P + 2*) still wins for traffic
> + excluded from NAT, and non-ICMP traffic falls through to the regular
> + stateless DNAT flow.
> +
> If the NAT rule has ``allowed_ext_ips`` configured, then there is an
> additional match ``ip4.src == allowed_ext_ips``. Similarly, for IPV6, match
> would be ``ip6.src == allowed_ext_ips``.
> @@ -2815,6 +2833,20 @@ the egress pipeline.
> the IPv6 case. If the NAT rule is of type dnat_and_snat and has
> ``stateless=true`` in the options, then the action would be
> ``ip4/6.dst=(B)``.
>
> + For an IPv4 stateless ``dnat_and_snat`` rule that has
> + ``options:stateless_icmp_helper`` set to ``true`` (the default), an
> + additional priority-101 flow is added that matches ``ip && ip4.dst == B &&
> + inport == GW && icmp4 && icmp4.type == 3`` with the action
> + ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``. This rewrites the outer
> + destination and un-NATs the source embedded in the inbound ICMPv4
> + Destination Unreachable error payload (back to the logical IP *B*) - every
> + type-3 code quotes the original datagram, including ``Fragmentation
> Needed``
> + (code 4) - so that conntrack in the downstream logical switch can correlate
> + the error with the tracked outgoing flow and Path MTU discovery (RFC 1191)
> + works end-to-end across stateless NAT. See
> + ``options:stateless_icmp_helper`` in the ``NAT`` table of the
> + ``OVN_Northbound`` database (``ovn-nb`` (5)).
> +
> If the NAT rule cannot be handled in a distributed manner, then the
> priority-100 flow above is only programmed on the gateway chassis.
>
> diff --git a/NEWS b/NEWS
> index 748ae30eb2..18374dc71b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -33,6 +33,13 @@ Post v26.03.0
> The DHCP and unbound-router ARP/ND drop lflows for external
> ports were updated to key on the external LSP's inport
> accordingly.
> + - Added a new "icmp4.inner_ip4.src" action that rewrites the source
> + IPv4 address embedded in an ICMPv4 error's inner packet. ovn-northd
> + uses it for stateless "dnat_and_snat" rules, controlled by the new
> + "options:stateless_icmp_helper" NAT option (default true), so that
> + inbound ICMPv4 Destination Unreachable errors (type 3, including the
> + "fragmentation needed" message) generated by an external router are
> + un-NATed correctly and Path MTU discovery works through stateless NAT.
>
> OVN v26.03.0 - xxx xx xxxx
> --------------------------
> diff --git a/northd/northd.c b/northd/northd.c
> index f5aa5cca38..3bb9cafaac 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -17584,7 +17584,9 @@ static void
> build_lrouter_in_dnat_flow(struct lflow_table *lflows,
> const struct ovn_datapath *od,
> const struct lr_nat_record *lrnat_rec,
> - const struct ovn_nat *nat_entry, struct ds *match,
> + const struct ovn_nat *nat_entry,
> + const struct shash *meter_groups,
> + struct ds *match,
> struct ds *actions, bool distributed_nat,
> int cidr_bits, bool is_v6,
> struct ovn_port *l3dgw_port, bool stateless,
> @@ -17657,6 +17659,45 @@ build_lrouter_in_dnat_flow(struct lflow_table
> *lflows,
>
> ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, priority, ds_cstr(match),
> ds_cstr(actions), lflow_ref, WITH_HINT(&nat->header_));
> +
> + /* For stateless DNAT, the action above only rewrites the outer IPv4
> + * destination. An inbound ICMPv4 error (RFC 792 / RFC 1191) carries
> + * the original (post-NAT) packet inside its payload, whose source is
> + * the external (post-SNAT) IP. The conntrack-based ACL check in the
> + * downstream logical switch zone uses that inner tuple to match the
> + * reverse direction of the tracked outgoing flow; without un-NATing
> + * the inner ip4.src back to the logical IP, that lookup fails and the
> + * error is dropped as ct.inv.
> + *
> + * Emit a higher-priority flow that matches the same external IP plus any
> + * ICMPv4 Destination Unreachable error (type 3) and rewrites the outer
> + * ip4.dst and the embedded inner ip4.src to the logical IP. Every
> type-3
> + * code quotes the original datagram (RFC 792), so the inner-source
> un-NAT
> + * is correct for all of them; this covers Fragmentation Needed (code 4,
> + * for PMTUD per RFC 1191) as well as host/port unreachable and the rest,
> + * so PMTUD works end-to-end through stateless NAT. */
> + if (stateless && !is_v6 &&
> + smap_get_bool(&nat_entry->nb->options, "stateless_icmp_helper",
> + true)) {
> + const char *icmp4_meter = copp_meter_get(COPP_ICMP4_ERR,
> + od->nbr->copp,
> + meter_groups);
> + size_t match_len = match->length;
> +
> + ds_put_cstr(match, " && icmp4 && icmp4.type == 3");
Based on the nat's entry match, here we can pontentially have a match rule
like:
match = ip && ip4.dst == IP && (udp) && icmp4 && icmp4.type == 3
that is always false, right?
Regards,
Lorenzo
> +
> + ds_clear(actions);
> + ds_put_format(actions,
> + "ip4.dst=%s; icmp4.inner_ip4.src = %s; next;",
> + nat->logical_ip, nat->logical_ip);
> +
> + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, priority + 1,
> + ds_cstr(match), ds_cstr(actions), lflow_ref,
> + WITH_CTRL_METER(icmp4_meter),
> + WITH_HINT(&nat->header_));
> +
> + ds_truncate(match, match_len);
> + }
> }
>
> static void
> @@ -18404,8 +18445,9 @@ build_lrouter_nat_defrag_and_lb(
> lflow_ref);
> }
> /* S_ROUTER_IN_DNAT */
> - build_lrouter_in_dnat_flow(lflows, od, lrnat_rec, nat_entry, match,
> - actions, nat_entry->is_distributed,
> + build_lrouter_in_dnat_flow(lflows, od, lrnat_rec, nat_entry,
> + meter_groups, match, actions,
> + nat_entry->is_distributed,
> cidr_bits, is_v6, nat_entry->l3dgw_port,
> stateless, lflow_ref);
>
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 15fb1d7e86..2fc8543868 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -5457,6 +5457,44 @@ or
> tracking state or not.
> </column>
>
> + <column name="options" key="stateless_icmp_helper"
> + type='{"type": "boolean"}'>
> + <p>
> + Applies only to stateless <code>dnat_and_snat</code> rules (that
> + is, NATs with <ref column="options" key="stateless"/> set to
> + <code>true</code>) on IPv4 addresses. Defaults to
> + <code>true</code>.
> + </p>
> +
> + <p>
> + A stateless DNAT rule rewrites only the outer IPv4 destination of
> + inbound packets. For an inbound ICMPv4 error (for example a
> + <code>Fragmentation Needed</code> message generated for Path MTU
> + discovery, RFC 1191), the original packet embedded in the ICMP
> + payload still carries the external, post-NAT IP as its source.
> + When the error reaches the downstream logical switch, conntrack
> + cannot correlate the embedded tuple with the tracked outgoing
> + flow, the packet is marked <code>ct.inv</code> and dropped by the
> + ACL stage, and PMTU discovery breaks.
> + </p>
> +
> + <p>
> + When this option is <code>true</code>, <code>ovn-northd</code>
> + emits an additional, higher-priority logical flow in the router
> + ingress DNAT stage that matches ICMPv4 Destination Unreachable
> + (type 3) errors destined to the external IP. Every type-3 code
> + quotes the original datagram, so this covers
> + <code>Fragmentation Needed</code> (code 4) for PMTU discovery as
> + well as host/port unreachable and the rest. It rewrites the outer
> + IPv4 destination to the logical IP and, using the
> + <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
> + inner IPv4 source from the external IP back to the logical IP, so
> + that conntrack can correlate the error and PMTU discovery works
> + end-to-end. Set it to <code>false</code> to suppress this flow for
> + an individual NAT entry.
> + </p>
> + </column>
> +
> <column name="options" key="add_route">
> If set to <code>true</code>, then neighbor routers will have logical
> flows added that will allow for routing to the NAT address. It also
> will
> diff --git a/tests/multinode.at b/tests/multinode.at
> index 069f2a677d..37ef523f95 100644
> --- a/tests/multinode.at
> +++ b/tests/multinode.at
> @@ -1041,6 +1041,79 @@ run_ns_traffic
> AT_CLEANUP
> ])
>
> +AT_SETUP([ovn multinode stateless NAT - icmp4 PMTUD inner src un-NAT])
> +
> +# Check that ovn-fake-multinode setup is up and running
> +check_fake_multinode_setup
> +
> +# Delete the multinode NB and OVS resources before starting the test.
> +cleanup_multinode_resources
> +
> +m_as ovn-chassis-1 ip link del sw0p1-p
> +
> +# Reset geneve tunnels
> +for c in ovn-chassis-1 ovn-gw-1
> +do
> + m_as $c ovs-vsctl set open . external-ids:ovn-encap-type=geneve
> +done
> +
> +OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys])
> +OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys])
> +
> +# Internal switch with one VM (10.0.0.3) on ovn-chassis-1.
> +check multinode_nbctl ls-add sw0
> +check multinode_nbctl lsp-add sw0 sw0-port1
> +check multinode_nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:03
> 10.0.0.3"
> +
> +m_as ovn-chassis-1 /data/create_fake_vm.sh sw0-port1 sw0p1 50:54:00:00:00:03
> 1342 10.0.0.3 24 10.0.0.1
> +
> +# Gateway router pinned to ovn-gw-1.
> +check multinode_nbctl lr-add lr0 -- set Logical_Router lr0
> options:chassis=ovn-gw-1
> +check multinode_nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> +check multinode_nbctl lsp-add-router-port sw0 sw0-lr0 lr0-sw0
> +
> +# External / provider network.
> +check multinode_nbctl ls-add public
> +check multinode_nbctl lsp-add-localnet-port public ln-public public
> +check multinode_nbctl lrp-add lr0 lr0-public 00:11:22:00:ff:01
> 172.20.0.100/24
> +check multinode_nbctl lsp-add-router-port public public-lr0 lr0-public
> +check multinode_nbctl lr-route-add lr0 0.0.0.0/0 172.20.0.1
> +
> +# Stateless dnat_and_snat for the VM. options:stateless_icmp_helper defaults
> +# to true, so ovn-northd emits the extra DNAT flow that, for an inbound
> ICMPv4
> +# "fragmentation needed" error destined to 172.20.0.110, un-NATs the inner
> +# source from 172.20.0.110 back to 10.0.0.3 (icmp4.inner_ip4.src). Unlike
> +# stateful NAT (where conntrack fixes the embedded header automatically),
> +# stateless NAT relies entirely on this flow for PMTUD to work.
> +check multinode_nbctl --stateless lr-nat-add lr0 dnat_and_snat 172.20.0.110
> 10.0.0.3
> +
> +m_as ovn-gw-1 ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=public:br-ex
> +m_as ovn-chassis-1 ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=public:br-ex
> +
> +m_wait_for_ports_up
> +
> +# ovn-ext0 routes between the OVN public net (172.20.0.0/24) and a downstream
> +# net (172.20.1.0/24); ovn-ext2 (172.20.1.2) is the far host behind it.
> +m_add_internal_port ovn-gw-1 ovn-ext0 br-ex ext0 172.20.0.1/24
> +m_add_internal_port ovn-gw-1 ovn-ext0 br-ex ext1 172.20.1.1/24
> +m_add_internal_port ovn-gw-1 ovn-ext2 br-ex ext2 172.20.1.2/24 172.20.1.1
> +
> +# Baseline: the VM reaches the far host through stateless NAT.
> +M_NS_CHECK_EXEC([ovn-chassis-1], [sw0p1], [ping -q -c 3 -i 0.3 -w 2
> 172.20.1.2 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +# Shrink the downstream link MTU so ovn-ext0 emits ICMPv4 "fragmentation
> +# needed" (type 3, code 4) towards 172.20.0.110 (the VM's stateless SNAT
> +# address) for oversized DF traffic. The error's inner packet carries
> +# 172.20.0.110 as its source; the VM only honors the PMTU signal if OVN
> +# un-NATs that inner source back to 10.0.0.3.
> +M_NS_CHECK_EXEC([ovn-gw-1], [ovn-ext0], [ip link set dev ext1 mtu 1100])
> +M_NS_CHECK_EXEC([ovn-chassis-1], [sw0p1], [ping -c 20 -i 0.5 -s 1300 -M do
> 172.20.1.2 2>&1 | grep -q "mtu = 1100"])
> +
> +AT_CLEANUP
> +
> PMTUD_SWITCH_TESTS(["geneve"])
> PMTUD_SWITCH_TESTS(["vxlan"])
>
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 7f4a88d4ec..6c19690b3b 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1087,13 +1087,29 @@ check ovn-nbctl lr-nat-del R1 dnat_and_snat
> 172.16.1.1
> echo
> echo "IPv4: stateless"
> check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat
> 172.16.1.1 50.0.0.11
> +dnl Two ip4.dst= flows: the regular stateless DNAT flow plus the default
> +dnl stateless_icmp_helper flow that also rewrites the inner ICMPv4 src.
> +check_flow_match_sets 2 0 0 2 1 0 0
> +dnl stateless_icmp_helper defaults to true, so the inner-IP rewrite flow
> +dnl is present.
> +check_flow_matches "icmp4.inner_ip4.src" 1
> +check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
> +
> +echo
> +echo "IPv4: stateless, stateless_icmp_helper=false"
> +check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat
> 172.16.1.1 50.0.0.11
> +check ovn-nbctl --wait=sb set NAT . options:stateless_icmp_helper=false
> +dnl With the helper disabled, only the regular stateless DNAT flow remains
> +dnl and the inner-IP rewrite flow is gone.
> check_flow_match_sets 2 0 0 1 1 0 0
> +check_flow_matches "icmp4.inner_ip4.src" 0
> check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
>
> echo
> echo "IPv4: stateless with match"
> check ovn-nbctl --wait=sb --match="udp" --stateless lr-nat-add R1
> dnat_and_snat 172.16.1.1 50.0.0.11
> -check_flow_match_sets 2 0 0 1 1 0 0
> +dnl As above, the stateless_icmp_helper flow adds a second ip4.dst= flow.
> +check_flow_match_sets 2 0 0 2 1 0 0
> check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
>
> echo
> @@ -1118,6 +1134,23 @@ echo
> echo "IPv6: stateless with match"
> check ovn-nbctl --wait=sb --match="udp" --stateless lr-nat-add R1
> dnat_and_snat fd01::1 fd11::2
> check_flow_match_sets 2 0 0 0 0 1 1
> +check ovn-nbctl lr-nat-del R1 dnat_and_snat fd01::1
> +
> +echo
> +echo "IPv4: stateless, stateless_icmp_helper rate-limited by CoPP"
> +dnl The inner-IP rewrite flow round-trips through ovn-controller, so it is
> +dnl emitted with the router's icmp4-error CoPP meter (when configured) to
> +dnl rate-limit the punt.
> +check ovn-nbctl --wait=sb meter-add m-icmp4-err drop 100 pktps 10
> +check ovn-nbctl --wait=sb copp-add copp-r1 icmp4-error m-icmp4-err
> +check ovn-nbctl --wait=sb lr-copp-add copp-r1 R1
> +check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat 172.16.1.1
> 50.0.0.11
> +ovn-sbctl dump-flows R1 > r1-flows
> +AT_CAPTURE_FILE([r1-flows])
> +check_flow_matches "icmp4.inner_ip4.src" 1
> +dnl The stateless_icmp_helper flow carries the icmp4-error meter.
> +AT_CHECK([ovn-sbctl list logical_flow | grep "icmp4.inner_ip4.src" -A 2 | \
> + grep -q "controller_meter.*m-icmp4-err"], [0], [], [ignore])
>
> OVN_CLEANUP_NORTHD
> AT_CLEANUP
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 59b41bde82..e43579cdd7 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -22714,6 +22714,113 @@ OVN_CLEANUP([hv1])
> AT_CLEANUP
> ])
>
> +AT_SETUP([stateless NAT - icmp4.inner_ip4.src rewrite])
> +AT_KEYWORDS([stateless-nat icmp4-inner-ip4-src])
> +AT_SKIP_IF([test $HAVE_SCAPY = no])
> +ovn_start
> +
> +# Topology:
> +# "external" host (ext1, on the public LS) --- lr0 (gw router) --- vm1 (on
> sw0)
> +#
> +# lr0 has a stateless dnat_and_snat rule that maps the external IP
> +# 172.168.0.110 to the logical IP 10.0.0.3 (vm1). With
> +# options:stateless_icmp_helper defaulting to true, ovn-northd emits a
> +# higher-priority DNAT flow that matches inbound ICMPv4 "Fragmentation
> +# Needed" errors (type 3, code 4) destined to the external IP and applies
> +# the action "ip4.dst = 10.0.0.3; icmp4.inner_ip4.src = 10.0.0.3;".
> +#
> +# An inbound ICMP error quotes the VM's original outbound (post-SNAT)
> +# datagram, so its inner source is the external IP 172.168.0.110. This test
> +# injects such an error from the external side and verifies that
> +# ovn-controller (pinctrl) DNATs the outer destination to 10.0.0.3 and
> +# un-NATs the inner (embedded) IPv4 source back to 10.0.0.3 - while leaving
> +# the inner destination untouched - before delivering the packet to vm1.
> +
> +vm1_mac=50:54:00:00:00:01
> +vm1_ip=10.0.0.3
> +rtr_int_mac=00:00:00:00:ff:01
> +rtr_ext_mac=00:00:20:20:12:13
> +ext1_mac=00:00:00:00:00:99
> +ext1_ip=172.168.0.50
> +nat_ext_ip=172.168.0.110
> +
> +# Internal switch with vm1.
> +check ovn-nbctl ls-add sw0
> +check ovn-nbctl lsp-add sw0 sw0-vm1
> +check ovn-nbctl lsp-set-addresses sw0-vm1 "$vm1_mac $vm1_ip"
> +
> +# Router (gateway router pinned to hv1).
> +check ovn-nbctl lr-add lr0
> +check ovn-nbctl lrp-add lr0 lr0-sw0 $rtr_int_mac 10.0.0.1/24
> +check ovn-nbctl lsp-add-router-port sw0 sw0-lr0 lr0-sw0
> +
> +# Public switch with the external host.
> +check ovn-nbctl ls-add public
> +check ovn-nbctl lrp-add lr0 lr0-public $rtr_ext_mac 172.168.0.100/24
> +check ovn-nbctl lsp-add-router-port public public-lr0 lr0-public
> +check ovn-nbctl lsp-add public ext1
> +check ovn-nbctl lsp-set-addresses ext1 "$ext1_mac $ext1_ip"
> +
> +check ovn-nbctl set logical_router lr0 options:chassis=hv1
> +
> +# Stateless dnat_and_snat: external 172.168.0.110 <-> logical 10.0.0.3.
> +check ovn-nbctl --wait=sb --stateless lr-nat-add lr0 dnat_and_snat \
> + $nat_ext_ip $vm1_ip
> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> + set interface hv1-vif1 external-ids:iface-id=sw0-vm1 \
> + options:tx_pcap=hv1/vif1-tx.pcap \
> + options:rxq_pcap=hv1/vif1-rx.pcap \
> + ofport-request=1
> +ovs-vsctl -- add-port br-int hv1-vif2 -- \
> + set interface hv1-vif2 external-ids:iface-id=ext1 \
> + options:tx_pcap=hv1/vif2-tx.pcap \
> + options:rxq_pcap=hv1/vif2-rx.pcap \
> + ofport-request=2
> +
> +wait_for_ports_up
> +check ovn-nbctl --wait=hv sync
> +
> +# The inner packet is the original (post-SNAT) datagram quoted in the ICMP
> +# error. Use a raw 8-byte blob for the embedded L4 header so that no inner
> +# L4 checksum (which the action does not recompute) is involved.
> +inner_l4="0102030405060708"
> +
> +# ICMP fragmentation-needed error from ext1 to the NAT external IP. The
> +# embedded original packet is the VM's outbound datagram after stateless
> +# SNAT, so its inner source is the external IP and its inner destination is
> +# some far host (50.0.0.100).
> +packet=$(fmt_pkt "Ether(dst='$rtr_ext_mac', src='$ext1_mac')/ \
> + IP(src='$ext1_ip', dst='$nat_ext_ip', ttl=64)/ \
> + ICMP(type=3, code=4, nexthopmtu=1400)/ \
> + IP(src='$nat_ext_ip', dst='50.0.0.100', ttl=63, proto=17)/
> \
> + bytes.fromhex('$inner_l4')")
> +
> +# Expected packet delivered to vm1: the router has DNATed the outer
> +# destination to 10.0.0.3, decremented the outer TTL, rewritten the L2
> +# addresses, and (via pinctrl) un-NATed the inner IPv4 source to 10.0.0.3.
> +# The inner destination (50.0.0.100) is left unchanged.
> +expected=$(fmt_pkt "Ether(dst='$vm1_mac', src='$rtr_int_mac')/ \
> + IP(src='$ext1_ip', dst='$vm1_ip', ttl=63)/ \
> + ICMP(type=3, code=4, nexthopmtu=1400)/ \
> + IP(src='$vm1_ip', dst='50.0.0.100', ttl=63, proto=17)/ \
> + bytes.fromhex('$inner_l4')")
> +echo $expected > vif1.expected
> +
> +as hv1 reset_pcap_file hv1-vif1 hv1/vif1
> +
> +check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
> +
> +OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [vif1.expected])
> +
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +
> OVN_FOR_EACH_NORTHD([
> AT_SETUP([IP packet buffering])
> AT_KEYWORDS([ip-buffering])
> @@ -26941,14 +27048,16 @@ test_ip vif11 f00000000011 000001010203 $sip $dip
> vif-north
> # Confirm that South to North traffic works fine.
> OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv4/vif-north-tx.pcap],
> [vif-north.expected])
>
> -# Confirm that NATing happened without connection tracker
> +# Confirm that NATing happened without connection tracker.
> +# Two ip4.dst= flows are expected: the regular stateless DNAT flow plus the
> +# default stateless_icmp_helper flow (which also carries
> icmp4.inner_ip4.src).
> ovn-sbctl dump-flows router > sbflows
> AT_CAPTURE_FILE([sbflows])
> AT_CHECK([for regex in ct_snat ct_dnat ip4.dst= ip4.src=; do
> grep -c "$regex" sbflows;
> done], [0], [0
> 0
> -1
> +2
> 1
> ])
>
> --
> 2.54.0
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev