From: Numan Siddique <[email protected]>

When a logical router uses stateless NAT (a dnat_and_snat rule with
options:stateless=true, i.e. a "stateless floating IP"), the NAT is a
pure header rewrite with no connection tracking: outbound packets get
their source rewritten to the external IP, and inbound packets get their
destination rewritten back to the logical IP.

This works for ordinary traffic, but it breaks Path MTU discovery
(RFC 1191) for inbound ICMPv4 "fragmentation needed" (type 3, code 4)
errors.  Such an error embeds a copy of the original datagram that
triggered it.  For traffic the workload sent out, that embedded datagram
has already been SNATed, so its inner source is the external (post-NAT)
IP.  When the error reaches the workload's logical switch, conntrack
tries to correlate the embedded tuple with the tracked outgoing flow; the
inner source is the external IP rather than the logical IP, so the lookup
fails, the packet is marked ct.inv, and the ACL stage drops it.  The
workload never sees the error, never lowers its PMTU, and large-packet
flows black-hole.

With stateful NAT this "just works", because conntrack rewrites the
embedded header of related ICMP errors automatically.  Stateless NAT has
no such machinery, so the inner header must be fixed up explicitly.

This series adds that fix-up:

  - A new OVN action field, icmp4.inner_ip4.src, rewrites the source
    address of the IPv4 header embedded in an ICMPv4 error and recomputes
    the inner IPv4 and outer ICMP checksums.  Because it mutates bytes
    inside the ICMP payload, it is implemented as a controller (pinctrl)
    action.

  - ovn-northd emits this action, for each IPv4 stateless dnat_and_snat
    rule, in a higher-priority router-ingress DNAT flow that matches
    inbound ICMPv4 type 3 / code 4 errors and un-NATs the embedded inner
    source back to the logical IP.  It is gated by a per-NAT option,
    options:stateless_icmp_helper, which defaults to true.

Why this is needed
------------------

OVN already has the gateway_mtu feature, where ovn-controller itself
generates the ICMP "fragmentation needed" error when a packet exceeds the
configured logical router port MTU.  In that path the error originates
inside OVN with correct addressing, so no inner rewrite is required.

This series targets deployments where the CMS does NOT use gateway_mtu
and the ICMP error is instead generated by an external router on the path
- for example a provider-edge (PE) router beyond the OVN gateway.  That
error arrives at OVN from the outside, destined to the stateless NAT
external IP, with its embedded inner source still carrying the external
IP.  Only OVN can un-NAT that inner source back to the logical IP, which
is exactly what icmp4.inner_ip4.src does.  Without it, PMTU discovery is
broken end-to-end for such deployments.

Only IPv4 is wired up; IPv6 stateless NAT is not currently supported in
OVN.

Testing
-------

  - ovn-northd flow-generation tests covering presence/absence of the
    helper flow (options:stateless_icmp_helper) and the resulting flow
    counts.
  - An ovn.at end-to-end test that injects an ICMPv4 error through a
    stateless dnat_and_snat router and verifies the embedded inner source
    is un-NATed (and the inner destination is left untouched).
  - A fake-multinode (tests/multinode.at) test reproducing the full
    PMTUD scenario: an external namespace acting as a router emits the
    frag-needed error, and the workload learns the reduced PMTU only
    because OVN rewrites the inner source.  Verified on a live
    ovn-fake-multinode deployment.

Numan Siddique (3):
  ovn-fields: Add icmp4.inner_ip4.src to rewrite ICMP inner source.
  pinctrl: Implement put_icmp4_inner_ip4_src action.
  northd: Emit inner-IP rewrite flow for stateless DNAT.

 Documentation/ref/ovn-logical-flows.7.rst |  29 ++++++
 NEWS                                      |   7 ++
 controller/pinctrl.c                      |  92 +++++++++++++++++
 include/ovn/actions.h                     |   8 +-
 include/ovn/logical-fields.h              |   9 ++
 lib/actions.c                             |  38 ++++++-
 lib/logical-fields.c                      |  11 +-
 northd/northd.c                           |  33 ++++++
 ovn-nb.xml                                |  35 +++++++
 ovn-sb.xml                                |  27 ++++-
 tests/multinode.at                        |  73 ++++++++++++++
 tests/ovn-northd.at                       |  18 +++-
 tests/ovn.at                              | 116 +++++++++++++++++++++-
 utilities/ovn-trace.c                     |   6 ++
 14 files changed, 493 insertions(+), 9 deletions(-)

-- 
2.54.0

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to