On 4/4/24 14:46, David Marchand wrote:
> On Wed, Apr 3, 2024 at 8:13 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>> - This patch fixes some misusage of the DPDK API.
>>
>> Hmm, I understand that the driver does something funny when it gets
>> outer flags set without any inner flags, but how is that a misuse
>> of the DPDK API?  Could you point me to the API docs that say that
>> inner flags must always be set in this case or that setting only
>> outer offloads is not allowed?
> 
> Setting the tunnel type (which is set along outer checksum in OVS) is
> described as:
> 
> /**
>  * Bits 45:48 used for the tunnel type.
>  * The tunnel type must be specified for TSO or checksum on the inner part
>  * of tunnel packets.
>  * These flags can be used with RTE_MBUF_F_TX_TCP_SEG for TSO, or
>  * RTE_MBUF_F_TX_xxx_CKSUM.
>  * The mbuf fields for inner and outer header lengths are required:
>  * outer_l2_len, outer_l3_len, l2_len, l3_len, l4_len and tso_segsz for TSO.
>  */
> #define RTE_MBUF_F_TX_TUNNEL_VXLAN   (0x1ULL << 45)
> #define RTE_MBUF_F_TX_TUNNEL_GRE     (0x2ULL << 45)
> #define RTE_MBUF_F_TX_TUNNEL_IPIP    (0x3ULL << 45)
> #define RTE_MBUF_F_TX_TUNNEL_GENEVE  (0x4ULL << 45)
> /** TX packet with MPLS-in-UDP RFC 7510 header. */
> #define RTE_MBUF_F_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
> #define RTE_MBUF_F_TX_TUNNEL_VXLAN_GPE (0x6ULL << 45)
> #define RTE_MBUF_F_TX_TUNNEL_GTP       (0x7ULL << 45)
> #define RTE_MBUF_F_TX_TUNNEL_ESP       (0x8ULL << 45)
> 
> It is not specified what to expect it neither TSO nor inner checksum
> is requested.

I can agree on that part that specifying the tunnel type while not
requesting any inner offloads is kind of undefined.

However, this is counter-intuitive that providing more of correct
information about a packet makes the driver freak out.

While we should work around this, I would still not call it a misuse.

DPDK needs to define the behavior for this case.  Two options are:

a. Document that tunnel type must not be set if inner offloads
   are not requested.
b. Fix the affected drivers to not break the packets and document
   that any valid informational flags can be set in ol_flags.

Option 'a' sounds like a poor API design though.

BTW, why the outer checksums are documented as inner:
 https://doc.dpdk.org/guides/nics/features.html#inner-l3-checksum
 https://doc.dpdk.org/guides/nics/features.html#inner-l4-checksum
?

> 
> In a same way, it is not described what to expect if outer API is
> called with no inner offload.

I disagree on that one.

/**
 * Outer UDP checksum offload flag. This flag is used for enabling
 * outer UDP checksum in PMD. To use outer UDP checksum, the user needs to
 * 1) Enable the following in mbuf,
 * a) Fill outer_l2_len and outer_l3_len in mbuf.
 * b) Set the RTE_MBUF_F_TX_OUTER_UDP_CKSUM flag.
 * c) Set the RTE_MBUF_F_TX_OUTER_IPV4 or RTE_MBUF_F_TX_OUTER_IPV6 flag.
 * 2) Configure RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM offload flag.
 */
#define RTE_MBUF_F_TX_OUTER_UDP_CKSUM     (1ULL << 41)

Reading this and other documentation in the file I would expect the
following two sets of flags to be equivalent on a card that supports
tunnel offloads:

 - RTE_MBUF_F_TX_OUTER_IPV4|RTE_MBUF_F_TX_OUTER_UDP_CKSUM with the
   outer_l2_len and outer_l3_len filled in.

 - RTE_MBUF_F_TX_IPV4|RTE_MBUF_F_TX_UDP_CKSUM with
   l2_len and l3_len filled in.

If these are not equivalent, that sounds like a bug in DPDK.

BTW, the l2_len field description is kind of vague:

            uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
            /**< L2 (MAC) Header Length for non-tunneling pkt.
             * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
             */

Reading other comments, it seems that setting any "outer" bits
in ol_flags constitutes a 'tunneling pkt', but it's not said explicitly
in this particular comment.

> Adding Ferruh and Thomas who may have one opinion.
> 
> 
>>
>> I agree that it seems safer to just downgrade all outer flags to
>> inner ones on OVS side, when no inner offloads are requested, I'm
>> just not sure I agree that it's an API misuse.  Especially since
>> non-Intel cards seem to work fine.
> 
> I suppose you mean mlx5.

Yes, I think it was tested with mlx5.

> Has it been tested on other nics?

I didn't test.  The original reporter doesn't have other cards, IIRC.

>>
>>> Basically, resolving a neighbor with ARP inside tunnels is broken on
>>> Intel nics (even without re-enabling outer udp checksum).
>>> This can be observed with the following debug patch at the end of
>>> netdev_dpdk_prep_hwol_packet():
>>>
>>> +    char buf[256];
>>> +    if (rte_get_tx_ol_flag_list(mbuf->ol_flags, buf, sizeof(buf)) != 0)
>>> +        buf[0] = '\0';
>>> +    VLOG_WARN("len=%u, ol_flags=%s, outer_l2_len=%u, outer_l3_len=%u,
>>> l2_len=%u, l3_len=%u, l4_len=%u, tso_segsz=%u", mbuf->pkt_len, buf,
>>> mbuf->outer_l2_len, mbuf->outer_l3_len, mbuf->l2_len, mbuf->l3_len,
>>> mbuf->l4_len, mbuf->tso_segsz);
>>>
>>> Then doing a "arping" inside the tunnel triggers:
>>> 2024-04-03T16:05:40.920Z|00014|netdev_dpdk(pmd-c03/id:8)|WARN|len=96,
>>> ol_flags=RTE_MBUF_F_TX_L4_NO_CKSUM RTE_MBUF_F_TX_OUTER_IP_CKSUM
>>> RTE_MBUF_F_TX_OUTER_IPV4 RTE_MBUF_F_TX_TUNNEL_VXLAN , outer_l2_len=18,
>>> outer_l3_len=20, l2_len=0, l3_len=0, l4_len=0, tso_segsz=0

The fact that l2_len and l3_len are not set here looks like an OVS
bug though, as AFAIU, these should always be set if any Tx offload
is requested.

>>> 2024-04-03T16:05:40.920Z|00012|dpdk|WARN|ice_interrupt_handler():
>>> OICR: MDD event
>>>
>>> We need this fix in OVS regardless of the outer udp checksum issue.
>>> I'll respin this fix in a new series, without touching UDP checksum capa.
>>>
>>>
>>> - It does seem that X710 nics have no support for outer udp checksum
>>> (according to its datasheet). Some X722 version may have support for
>>> it, but net/i40e does not configure the Tx descriptor accordingly.
>>> On the other hand, E810 ones seem fine (according to its datasheet).
>>>
>>> After more debugging, I managed to get outer udp checksum working.
>>> I understand the DPDK rte_net_intel_cksum_flags_prepare() helper does
>>> not set the pseudo header checksum in the outer udp header.
>>> I proposed a fix in the dpdk bz.
>>>
>>> Waiting for the fix on DPDK side... it is still possible to add the
>>> missing bits in OVS (see the branch I pointed at in the OVS issue).
>>
>> Since this feature never worked with ice in OVS and it is experimental,
>> I tend to think that we should just disable it for ice as well until
>> DPDK is fixed.
>>
>> A little too many fixes for that thing we have already and this one will
>> involve some extra driver-specific logic that we don't have any automated
>> tests for.
> 
> I don't mind waiting for the DPDK fix before re-enabling outer udp and
> other offloads.
Just to be clear, I think we need:
 - a fix to downgrade outer offloads to inner ones in cases where no
   inner offloads requested.
 - keep manually disabling tunnel offloads for Intel NICs, adding iavf
   to the list.

But we should not:
 - add pseudo-header checksum calculations to OVS to workaround the
   driver behavior.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to