Re: [ovs-dev] [PATCH v3 5/6] netdev-dpdk: Use guest TSO segmentation size hint.

2024-05-16 Thread David Marchand
On Wed, May 15, 2024 at 2:09 PM Kevin Traynor  wrote:
>
> On 19/04/2024 15:06, David Marchand wrote:
> > In a typical setup like:
> > guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B
> >
> > TSO packets from guest A are segmented against the OVS A physical port
> > mtu adjusted by the vxlan tunnel header size, regardless of guest A
> > interface mtu.
> >
> > As an example, let's say guest A and guest B mtu are set to 1500 bytes.
> > OVS A and OVS B physical ports mtu are set to 1600 bytes.
> > Guest A will request TCP segmentation for 1448 bytes segments.
> > On the other hand, OVS A will request 1498 bytes segments to the HW.
> > This results in OVS B dropping packets because decapsulated packets
> > are larger than the vhost-user port (serving guest B) mtu.
> >
> > 2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
> >   Too big size 1564 max_packet_len 1518
> >
> > vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
> > Use it as a hint.
> >
> > This may result in segments (on the wire) slightly shorter than the
> > optimal size.
> >
> > Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
> > Signed-off-by: David Marchand 
> > ---
> > Note:
> > As we trust the guest with this change, should we put a lower limit on
> > mbuf->tso_segsz?
> >
>
> There are some checks I looked at (e.g [0]), but it could be checked
> here for an earlier drop i suppose...additional comment below
>
> [0]
> https://git.dpdk.org/dpdk/tree/drivers/net/ice/ice_rxtx.c#n3754

I guess you meant
https://git.dpdk.org/dpdk/tree/drivers/net/ice/ice_rxtx.c#n3818
And same in v23.11, there are checks at the prepare stage:
https://git.dpdk.org/dpdk-stable/tree/drivers/net/ice/ice_rxtx.c?h=23.11#n3681

I had forgotten about those checks.
There is no limit exposed per driver from DPDK, so the simpler for OVS
is to trust them.


>
> > ---
> >  lib/netdev-dpdk.c | 11 ---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 661269e4b6..1dad2ef833 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -2671,14 +2671,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk 
> > *dev, struct rte_mbuf *mbuf)
> >
> >  if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> >  struct tcp_header *th = dp_packet_l4(pkt);
> > +uint16_t link_tso_segsz;
> >  int hdr_len;
> >
> >  if (tunnel_type) {
> > -mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
> > -  mbuf->l4_len - mbuf->outer_l3_len;
> > +link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
> > + mbuf->l4_len - mbuf->outer_l3_len;
> >  } else {
> >  mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
> > -mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
> > +link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
> > +}
> > +
> > +if (!mbuf->tso_segsz || mbuf->tso_segsz > link_tso_segsz) {
>
> It seems like something is not right if the flag is set but tso_segsz is
> 0. It is set by vhost lib from gso_size, but I don't see a validation
> there either.

At the time I added a check on the 0 value, I thought there was a case
where RTE_MBUF_F_TX_TCP_SEG could be set with no segsz value.
But as you mention, all setters of this flag (either in vhost or in
OVS) set a segsz too.

So with segsz always set, combined with the drivers check, OVS
probably does not need any check on tso_segsz.
I intend to remove this check in a next revision.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 5/6] netdev-dpdk: Use guest TSO segmentation size hint.

2024-05-15 Thread Kevin Traynor
On 19/04/2024 15:06, David Marchand wrote:
> In a typical setup like:
> guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B
> 
> TSO packets from guest A are segmented against the OVS A physical port
> mtu adjusted by the vxlan tunnel header size, regardless of guest A
> interface mtu.
> 
> As an example, let's say guest A and guest B mtu are set to 1500 bytes.
> OVS A and OVS B physical ports mtu are set to 1600 bytes.
> Guest A will request TCP segmentation for 1448 bytes segments.
> On the other hand, OVS A will request 1498 bytes segments to the HW.
> This results in OVS B dropping packets because decapsulated packets
> are larger than the vhost-user port (serving guest B) mtu.
> 
> 2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
>   Too big size 1564 max_packet_len 1518
> 
> vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
> Use it as a hint.
> 
> This may result in segments (on the wire) slightly shorter than the
> optimal size.
> 
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
> Signed-off-by: David Marchand 
> ---
> Note:
> As we trust the guest with this change, should we put a lower limit on
> mbuf->tso_segsz?
> 

There are some checks I looked at (e.g [0]), but it could be checked
here for an earlier drop i suppose...additional comment below

[0]
https://git.dpdk.org/dpdk/tree/drivers/net/ice/ice_rxtx.c#n3754

> ---
>  lib/netdev-dpdk.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 661269e4b6..1dad2ef833 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2671,14 +2671,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
> struct rte_mbuf *mbuf)
>  
>  if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
>  struct tcp_header *th = dp_packet_l4(pkt);
> +uint16_t link_tso_segsz;
>  int hdr_len;
>  
>  if (tunnel_type) {
> -mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
> -  mbuf->l4_len - mbuf->outer_l3_len;
> +link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
> + mbuf->l4_len - mbuf->outer_l3_len;
>  } else {
>  mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
> -mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
> +link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
> +}
> +
> +if (!mbuf->tso_segsz || mbuf->tso_segsz > link_tso_segsz) {

It seems like something is not right if the flag is set but tso_segsz is
0. It is set by vhost lib from gso_size, but I don't see a validation
there either.

So it could be checked and/or adjusted here for obviously incorrect
values. The only question would be is it the right layer to check these,
or is it more appropriate for it to be done in the drivers.

> +mbuf->tso_segsz = link_tso_segsz;
>  }
>  
>  hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 5/6] netdev-dpdk: Use guest TSO segmentation size hint.

2024-04-19 Thread David Marchand
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Signed-off-by: David Marchand 
---
Note:
As we trust the guest with this change, should we put a lower limit on
mbuf->tso_segsz?

---
 lib/netdev-dpdk.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 661269e4b6..1dad2ef833 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2671,14 +2671,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
 struct tcp_header *th = dp_packet_l4(pkt);
+uint16_t link_tso_segsz;
 int hdr_len;
 
 if (tunnel_type) {
-mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
-  mbuf->l4_len - mbuf->outer_l3_len;
+link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
+ mbuf->l4_len - mbuf->outer_l3_len;
 } else {
 mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
-mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+}
+
+if (!mbuf->tso_segsz || mbuf->tso_segsz > link_tso_segsz) {
+mbuf->tso_segsz = link_tso_segsz;
 }
 
 hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev