Re: [ovs-dev] [PATCH v5 01/14] netdev-dpdk: fix mbuf sizing

2018-07-12 Thread Lam, Tiago
On 12/07/2018 17:07, Ian Stokes wrote:
> On 7/12/2018 4:40 PM, Lam, Tiago wrote:
>> On 12/07/2018 14:37, Ian Stokes wrote:
>>> On 7/11/2018 7:23 PM, Tiago Lam wrote:
 From: Mark Kavanagh 

 There are numerous factors that must be considered when calculating
 the size of an mbuf:
 - the data portion of the mbuf must be sized in accordance With Rx
 buffer alignment (typically 1024B). So, for example, in order to
 successfully receive and capture a 1500B packet, mbufs with a
 data portion of size 2048B must be used.
 - in OvS, the elements that comprise an mbuf are:
 * the dp packet, which includes a struct rte mbuf (704B)
 * RTE_PKTMBUF_HEADROOM (128B)
 * packet data (aligned to 1k, as previously described)
 * RTE_PKTMBUF_TAILROOM (typically 0)

 Some PMDs require that the total mbuf size (i.e. the total sum of all
 of the above-listed components' lengths) is cache-aligned. To satisfy
 this requirement, it may be necessary to round up the total mbuf size
 with respect to cacheline size. In doing so, it's possible that the
 dp_packet's data portion is inadvertently increased in size, such that
 it no longer adheres to Rx buffer alignment. Consequently, the
 following property of the mbuf no longer holds true:

   mbuf.data_len == mbuf.buf_len - mbuf.data_off

 This creates a problem in the case of multi-segment mbufs, where that
 assumption is assumed to be true for all but the final segment in an
 mbuf chain. Resolve this issue by adjusting the size of the mbuf's
 private data portion, as opposed to the packet data portion when
 aligning mbuf size to cachelines.
>>>
>>> Hi Tiago,
>>>
>>> with this patch I still don't see mbuf.data_len == mbuf.buf_len -
>>> mbuf.data_off to be true.
>>>
>>> I've tested with both Jumbo frames and non jumbo packets by examining
>>> the mbufs on both tx and rx. mbuf.data_len is always smaller than
>>> mbuf.buf_len - mbuf.data_off.
>>>
>>> Maybe I've missed something here?
>>>
>>
>> Thanks for looking into this, Ian.
>>
>> `mbuf.data_len == mbuf.buf_len - mbuf.data_off` isn't always true.
>> Actually, `mbuf.data_len <= mbuf.buf_len - mbuf.data_off` would be a
>> better representation.
>>
>> If there's a chain of mbufs that are linked together, the expectation is
>> that `mbuf.data_len == mbuf.buf_len - mbuf.data_off` holds true for all
>> of them, except maybe for the last in the chain, since there may not be
>> enough data to fill the whole mbuf.
>>
>> So, for non jumbo frames I would expect `data_len < mbuf_len -
>> data_off`, but for jumbo frames I'd expect that to happen only on the
>> last mbuf in the chain, and in the rest we should see `data_len ==
>> mbuf_len - data_off` hold true. Is that what you're seeing here?
>>
> 
> I had tested this with jumbo frames (MTU 9000) with just this patch 
> applied, in that case I was seeing 'data_len < mbuf_len - data_off'.
> 
> Is this because its 1 mbuf for the entire 9000, if it was segmented then 
> the segments would be equal besides the final mbuf.
> 
> Ian
> 

Ah, yes. If you had just this patch applied then you wouldn't have
multi-segment mbufs enabled (as that only comes in patch 11/14). In
which case the mbufs are still being resized to fit the entire packet
(and most likely the data you're sending < 9000B, hence why the
`data_len < buf_len - data_off`).

In other words, this patch doesn't change any functionality, as mbufs
are still resized to hold the maximum MTU set.

Tiago.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 01/14] netdev-dpdk: fix mbuf sizing

2018-07-12 Thread Ian Stokes

On 7/12/2018 4:40 PM, Lam, Tiago wrote:

On 12/07/2018 14:37, Ian Stokes wrote:

On 7/11/2018 7:23 PM, Tiago Lam wrote:

From: Mark Kavanagh 

There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
buffer alignment (typically 1024B). So, for example, in order to
successfully receive and capture a 1500B packet, mbufs with a
data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
* the dp packet, which includes a struct rte mbuf (704B)
* RTE_PKTMBUF_HEADROOM (128B)
* packet data (aligned to 1k, as previously described)
* RTE_PKTMBUF_TAILROOM (typically 0)

Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:

  mbuf.data_len == mbuf.buf_len - mbuf.data_off

This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.


Hi Tiago,

with this patch I still don't see mbuf.data_len == mbuf.buf_len -
mbuf.data_off to be true.

I've tested with both Jumbo frames and non jumbo packets by examining
the mbufs on both tx and rx. mbuf.data_len is always smaller than
mbuf.buf_len - mbuf.data_off.

Maybe I've missed something here?



Thanks for looking into this, Ian.

`mbuf.data_len == mbuf.buf_len - mbuf.data_off` isn't always true.
Actually, `mbuf.data_len <= mbuf.buf_len - mbuf.data_off` would be a
better representation.

If there's a chain of mbufs that are linked together, the expectation is
that `mbuf.data_len == mbuf.buf_len - mbuf.data_off` holds true for all
of them, except maybe for the last in the chain, since there may not be
enough data to fill the whole mbuf.

So, for non jumbo frames I would expect `data_len < mbuf_len -
data_off`, but for jumbo frames I'd expect that to happen only on the
last mbuf in the chain, and in the rest we should see `data_len ==
mbuf_len - data_off` hold true. Is that what you're seeing here?



I had tested this with jumbo frames (MTU 9000) with just this patch 
applied, in that case I was seeing 'data_len < mbuf_len - data_off'.


Is this because its 1 mbuf for the entire 9000, if it was segmented then 
the segments would be equal besides the final mbuf.


Ian



Regards,
Tiago.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 01/14] netdev-dpdk: fix mbuf sizing

2018-07-12 Thread Lam, Tiago
On 12/07/2018 14:37, Ian Stokes wrote:
> On 7/11/2018 7:23 PM, Tiago Lam wrote:
>> From: Mark Kavanagh 
>>
>> There are numerous factors that must be considered when calculating
>> the size of an mbuf:
>> - the data portion of the mbuf must be sized in accordance With Rx
>>buffer alignment (typically 1024B). So, for example, in order to
>>successfully receive and capture a 1500B packet, mbufs with a
>>data portion of size 2048B must be used.
>> - in OvS, the elements that comprise an mbuf are:
>>* the dp packet, which includes a struct rte mbuf (704B)
>>* RTE_PKTMBUF_HEADROOM (128B)
>>* packet data (aligned to 1k, as previously described)
>>* RTE_PKTMBUF_TAILROOM (typically 0)
>>
>> Some PMDs require that the total mbuf size (i.e. the total sum of all
>> of the above-listed components' lengths) is cache-aligned. To satisfy
>> this requirement, it may be necessary to round up the total mbuf size
>> with respect to cacheline size. In doing so, it's possible that the
>> dp_packet's data portion is inadvertently increased in size, such that
>> it no longer adheres to Rx buffer alignment. Consequently, the
>> following property of the mbuf no longer holds true:
>>
>>  mbuf.data_len == mbuf.buf_len - mbuf.data_off
>>
>> This creates a problem in the case of multi-segment mbufs, where that
>> assumption is assumed to be true for all but the final segment in an
>> mbuf chain. Resolve this issue by adjusting the size of the mbuf's
>> private data portion, as opposed to the packet data portion when
>> aligning mbuf size to cachelines.
> 
> Hi Tiago,
> 
> with this patch I still don't see mbuf.data_len == mbuf.buf_len - 
> mbuf.data_off to be true.
> 
> I've tested with both Jumbo frames and non jumbo packets by examining 
> the mbufs on both tx and rx. mbuf.data_len is always smaller than 
> mbuf.buf_len - mbuf.data_off.
> 
> Maybe I've missed something here?
> 

Thanks for looking into this, Ian.

`mbuf.data_len == mbuf.buf_len - mbuf.data_off` isn't always true.
Actually, `mbuf.data_len <= mbuf.buf_len - mbuf.data_off` would be a
better representation.

If there's a chain of mbufs that are linked together, the expectation is
that `mbuf.data_len == mbuf.buf_len - mbuf.data_off` holds true for all
of them, except maybe for the last in the chain, since there may not be
enough data to fill the whole mbuf.

So, for non jumbo frames I would expect `data_len < mbuf_len -
data_off`, but for jumbo frames I'd expect that to happen only on the
last mbuf in the chain, and in the rest we should see `data_len ==
mbuf_len - data_off` hold true. Is that what you're seeing here?

Regards,
Tiago.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 01/14] netdev-dpdk: fix mbuf sizing

2018-07-12 Thread Ian Stokes

On 7/11/2018 7:23 PM, Tiago Lam wrote:

From: Mark Kavanagh 

There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
   buffer alignment (typically 1024B). So, for example, in order to
   successfully receive and capture a 1500B packet, mbufs with a
   data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
   * the dp packet, which includes a struct rte mbuf (704B)
   * RTE_PKTMBUF_HEADROOM (128B)
   * packet data (aligned to 1k, as previously described)
   * RTE_PKTMBUF_TAILROOM (typically 0)

Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:

 mbuf.data_len == mbuf.buf_len - mbuf.data_off

This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.


Hi Tiago,

with this patch I still don't see mbuf.data_len == mbuf.buf_len - 
mbuf.data_off to be true.


I've tested with both Jumbo frames and non jumbo packets by examining 
the mbufs on both tx and rx. mbuf.data_len is always smaller than 
mbuf.buf_len - mbuf.data_off.


Maybe I've missed something here?



Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla 
Signed-off-by: Mark Kavanagh 
Acked-by: Santosh Shukla 
Acked-by: Eelco Chaudron 
---
  lib/netdev-dpdk.c | 56 +--
  1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bb4d60f..949b87b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
  #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
  #define FRAME_LEN_TO_MTU(frame_len) ((frame_len)\
   - ETHER_HDR_LEN - ETHER_CRC_LEN)
-#define MBUF_SIZE(mtu)  ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
- + sizeof(struct dp_packet) \
- + RTE_PKTMBUF_HEADROOM),   \
- RTE_CACHE_LINE_SIZE)
  #define NETDEV_DPDK_MBUF_ALIGN  1024
  #define NETDEV_DPDK_MAX_PKT_LEN 9728
  
@@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)

  char mp_name[RTE_MEMPOOL_NAMESIZE];
  const char *netdev_name = netdev_get_name(>up);
  int socket_id = dev->requested_socket_id;
-uint32_t n_mbufs;
+uint32_t n_mbufs = 0;
+uint32_t mbuf_size = 0;
+uint32_t aligned_mbuf_size = 0;
+uint32_t mbuf_priv_data_len = 0;
+uint32_t pkt_size = 0;
  uint32_t hash = hash_string(netdev_name, 0);
  struct dpdk_mp *dmp = NULL;
  int ret;
@@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
  dmp->mtu = mtu;
  dmp->refcount = 1;
  
+/* Get the size of each mbuf, based on the MTU */

+mbuf_size = dpdk_buf_size(dev->requested_mtu);
+
  n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
  
  do {

@@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
   * so this is not an issue for tasks such as debugging.
   */
  ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
-   "ovs%08x%02d%05d%07u",
-   hash, socket_id, mtu, n_mbufs);
+   "ovs%08x%02d%05d%07u",
+hash, socket_id, mtu, n_mbufs);
  if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
  VLOG_DBG("snprintf returned %d. "
   "Failed to generate a mempool name for \"%s\". "
@@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
  break;
  }
  
-VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "

-  "on socket %d for %d Rx and %d Tx queues.",
-  netdev_name, n_mbufs, socket_id,
-  dev->requested_n_rxq, dev->requested_n_txq);
-
-dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
-  MP_CACHE_SZ,
-  sizeof (struct dp_packet)
-