On Mon, Jun 15, 2026 at 8:40 AM Ilya Maximets <[email protected]> wrote:

> On 6/15/26 2:36 PM, Ilya Maximets wrote:
> > On 6/13/26 1:25 AM, [email protected] wrote:
> >> From: Numan Siddique <[email protected]>
> >>
> >> Implement the controller-side handler for the new OVN action
> >> 'icmp4.inner_ip4.src = <value>' introduced in the previous patch.
> >>
> >> When ovn-controller receives a packet with action opcode
> >> ACTION_OPCODE_PUT_ICMP4_INNER_IP4_SRC, it now:
> >
> > Not a full review, but a couple comments below.
>

Thank you Ilya for the valuable comments.

I'll address your comments in v2.

Numan



> >
> > Best regards, Ilya Maximets.
> >
> >>
> >>   - Locates the inner IPv4 header inside the ICMP error payload
> >>     (right after the 8-byte outer ICMP header).
> >>   - Validates that the ICMP payload is long enough to contain
> >>     the inner IPv4 header (respecting its IHL, so IP options
> >>     are handled).
> >>   - Rewrites the inner IPv4 source address to the value
> >>     supplied in the userdata.
> >>   - Recomputes the inner IPv4 header checksum.
> >
> > Why a full recompute instead of incremental recalculation?  Normally,
> > we shouldn't fix incorrect checksums, but the full recalculation will.
> >
> > Also, the inner packet likely includes L4 headers as well and the IP
> > header change will break the L4 checksum.  Not sure how important that
> > is, but should at least be noted in the commit message and in the code
> > that changes the data.
> >
> >>   - Incrementally updates the outer ICMP checksum to reflect
> >>     both the 4-byte inner-src change and the 2-byte inner
> >>     IPv4-csum change (the outer ICMP csum covers the entire
> >>     ICMP payload, including the inner IP header).
> >
> > Do we even need this?  ICMP checksum covers both the inner IP checksum
> and
> > the inner IP source address.  Summing the entire inner header should give
> > you zero or a constant value in case the checksum was initially wrong.
> > So, changes in the inner IP header should not affect the ICMP checksum,
> or
> > do they?
> >
> >>   - Resumes the OpenFlow pipeline with the rewritten packet.
> >>
> >> This is the building block needed to make ICMP fragmentation-
> >> needed errors (RFC 1191 PMTU discovery) work correctly when an
> >> OVN logical router uses stateless NAT.  With stateless NAT, the
> >> inner payload of an incoming ICMP error still carries the
> >> external (post-NAT) IPv4 source, so conntrack in the VM's
> >> logical switch zone fails to match the tracked outgoing flow
> >> and marks the packet as +inv.  By rewriting the inner source
> >> back to the logical IP in the LR pipeline before the LS
> >> pipeline runs, conntrack can correlate the error with the
> >> original flow and the VM can honor the PMTUD signal.
> >>
> >> A follow-up patch will add the corresponding ovn-northd logical
> >> flow that emits this action for each stateless NAT configured
> >> on a logical router.
> >>
> >> Assisted-by: Claude Opus 4.7, Claude Code
> >> Signed-off-by: Numan Siddique <[email protected]>
> >> ---
> >>  controller/pinctrl.c | 92 ++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 92 insertions(+)
> >>
> >> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> >> index bb8d20e7f4..93e8d7417b 100644
> >> --- a/controller/pinctrl.c
> >> +++ b/controller/pinctrl.c
> >> @@ -256,6 +256,10 @@ static void pinctrl_handle_nd_ns(struct rconn
> *swconn,
> >>                                   const struct ofputil_packet_in *pin,
> >>                                   struct ofpbuf *userdata,
> >>                                   const struct ofpbuf *continuation);
> >> +static void pinctrl_handle_put_icmp4_inner_ip4_src(
> >> +    struct rconn *swconn, const struct flow *in_flow,
> >> +    struct dp_packet *pkt_in, struct ofputil_packet_in *pin,
> >> +    struct ofpbuf *userdata, struct ofpbuf *continuation);
> >>  static void
> >>  pinctrl_handle_event(struct ofpbuf *userdata)
> >>      OVS_REQUIRES(pinctrl_mutex);
> >> @@ -3928,6 +3932,10 @@ process_packet_in(struct rconn *swconn, const
> struct ofp_header *msg)
> >>          break;
> >>      }
> >>
> >> +    case ACTION_OPCODE_PUT_ICMP4_INNER_IP4_SRC:
> >> +        pinctrl_handle_put_icmp4_inner_ip4_src(swconn, &headers,
> &packet, &pin,
> >> +                                               &userdata,
> &continuation);
> >> +        break;
> >>      default:
> >>          VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32,
> >>                       ntohl(ah->opcode));
> >> @@ -6654,6 +6662,90 @@ exit:
> >>      dp_packet_uninit(pkt_out_ptr);
> >>  }
> >>
> >> +/* Called with in the pinctrl_handler thread context. */
> >> +static void
> >> +pinctrl_handle_put_icmp4_inner_ip4_src(struct rconn *swconn,
> >> +                                       const struct flow *in_flow,
> >> +                                       struct dp_packet *pkt_in,
> >> +                                       struct ofputil_packet_in *pin,
> >> +                                       struct ofpbuf *userdata,
> >> +                                       struct ofpbuf *continuation)
> >> +{
> >> +    enum ofp_version version = rconn_get_version(swconn);
> >> +    enum ofputil_protocol proto =
> ofputil_protocol_from_ofp_version(version);
> >> +    struct dp_packet *pkt_out = NULL;
> >> +
> >> +    /* This action only works for ICMPv4 packets. */
> >> +    if (!is_icmpv4(in_flow, NULL)) {
> >> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> >> +        VLOG_WARN_RL(&rl,
> >> +                     "put_icmp4_inner_ip4_src action on non-ICMPv4
> packet");
> >> +        goto exit;
> >> +    }
> >> +
> >> +    ovs_be32 *ip4_src = ofpbuf_try_pull(userdata, sizeof *ip4_src);
> >> +    if (!ip4_src) {
> >> +        goto exit;
> >> +    }
> >> +
> >> +    pkt_out = dp_packet_clone(pkt_in);
> >> +    pkt_out->l2_5_ofs = pkt_in->l2_5_ofs;
> >> +    pkt_out->l2_pad_size = pkt_in->l2_pad_size;
> >> +    pkt_out->l3_ofs = pkt_in->l3_ofs;
> >> +    pkt_out->l4_ofs = pkt_in->l4_ofs;
> >
> > Clone copies all of these.
> >
> >> +
> >> +    struct icmp_header *ih = dp_packet_l4(pkt_out);
> >> +
> >> +    /* The inner IPv4 header lives in the ICMP payload, right after the
> >> +     * 8-byte ICMP header. */
> >> +    struct ip_header *inner_ip = ALIGNED_CAST(struct ip_header *,
> >> +                                              (char *) ih + sizeof
> *ih);
> >
> > AFAIR, shifting a pointer outside of the memory block triggers an
> undefined
> > behavior.  I think, all the checks below need to be re-structured in a
> way
> > that we don't have such pointers.
> >
> > E.g. compare tail - known-good-pointer > struct size, instead of
> >              known-good-pointer + struct size > tail.
> >
> > In the current code in this patch the inner_ip is not even a known good
> pointer.
> >
>
> May also want to check that it's not a later fragment, just in case.
>
> >> +    const char *pkt_end = dp_packet_tail(pkt_out);
> >> +    if ((const char *)(inner_ip + 1) > pkt_end) {
> >> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> >> +        VLOG_WARN_RL(&rl, "put_icmp4_inner_ip4_src: ICMP payload too
> short "
> >> +                     "to contain an inner IPv4 header");
> >> +        goto exit;
> >> +    }
> >> +
> >> +    size_t inner_ihl_bytes = IP_IHL(inner_ip->ip_ihl_ver) * 4;
> >> +    if (inner_ihl_bytes < sizeof *inner_ip ||
> >> +        (const char *) inner_ip + inner_ihl_bytes > pkt_end) {
> >> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> >> +        VLOG_WARN_RL(&rl, "put_icmp4_inner_ip4_src: inner IPv4 header "
> >> +                     "length (%"PRIuSIZE") invalid", inner_ihl_bytes);
> >> +        goto exit;
> >> +    }
> >> +
> >> +    /* Save old values so we can incrementally update the outer ICMP
> >> +     * checksum, which covers the entire ICMP payload (including the
> inner
> >> +     * IP header). */
> >> +    ovs_be32 old_inner_ip_src = get_16aligned_be32(&inner_ip->ip_src);
> >> +    ovs_be16 old_inner_ip_csum = inner_ip->ip_csum;
> >> +
> >> +    /* Rewrite the inner source and recompute the inner IP checksum
> >> +     * over its full IHL (including any options). */
> >> +    put_16aligned_be32(&inner_ip->ip_src, *ip4_src);
> >> +    inner_ip->ip_csum = 0;
> >> +    inner_ip->ip_csum = csum(inner_ip, inner_ihl_bytes);
> >> +
> >> +    /* Fold both deltas (inner src, inner csum) into the outer ICMP
> csum. */
> >> +    ih->icmp_csum = recalc_csum32(ih->icmp_csum, old_inner_ip_src,
> *ip4_src);
> >> +    ih->icmp_csum = recalc_csum16(ih->icmp_csum, old_inner_ip_csum,
> >> +                                  inner_ip->ip_csum);
> >> +
> >> +    /* The outer IPv4 header is unchanged, so its checksum stays
> valid. */
> >> +
> >> +    pin->packet = dp_packet_data(pkt_out);
> >> +    pin->packet_len = dp_packet_size(pkt_out);
> >> +
> >> +exit:
> >> +    queue_msg(swconn, ofputil_encode_resume(pin, continuation, proto));
> >> +    if (pkt_out) {
> >> +        dp_packet_delete(pkt_out);
> >> +    }
> >> +}
> >> +
> >>  static void
> >>  wait_controller_event(void)
> >>  {
> >
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to