Regards _Sugesh
> -----Original Message----- > From: Kavanagh, Mark B > Sent: Tuesday, December 12, 2017 11:52 AM > To: Chandran, Sugesh <[email protected]>; [email protected]; > [email protected] > Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > <[email protected]>; > [email protected] > Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > > >From: Chandran, Sugesh > >Sent: Friday, December 8, 2017 5:48 PM > >To: Kavanagh, Mark B <[email protected]>; [email protected]; > >[email protected] > >Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > ><[email protected]>; [email protected] > >Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > > > > > > > >Regards > >_Sugesh > > > > > >> -----Original Message----- > >> From: Kavanagh, Mark B > >> Sent: Friday, December 8, 2017 12:02 PM > >> To: [email protected]; [email protected] > >> Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > ><[email protected]>; > >> [email protected]; Chandran, Sugesh > >> <[email protected]>; Kavanagh, Mark B > >> <[email protected]> > >> Subject: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > >> > >> There are numerous factors that must be considered when calculating > >> the size of an mbuf: > >> - the data portion of the mbuf must be sized in accordance With Rx > >> buffer alignment (typically 1024B). So, for example, in order to > >> successfully receive and capture a 1500B packet, mbufs with a > >> data portion of size 2048B must be used. > >> - in OvS, the elements that comprise an mbuf are: > >> * the dp packet, which includes a struct rte mbuf (704B) > >> * RTE_PKTMBUF_HEADROOM (128B) > >> * packet data (aligned to 1k, as previously described) > >> * RTE_PKTMBUF_TAILROOM (typically 0) > >> > >> Some PMDs require that the total mbuf size (i.e. the total sum of all > >> of the above-listed components' lengths) is cache-aligned. To satisfy > >> this requirement, it may be necessary to round up the total mbuf size > >> with respect to cacheline size. In doing so, it's possible that the > >> dp_packet's data portion is inadvertently increased in size, such > >> that it no longer adheres to Rx buffer alignment. Consequently, the > >> following property of the mbuf no longer holds true: > >> > >> mbuf.data_len == mbuf.buf_len - mbuf.data_off > >> > >> This creates a problem in the case of multi-segment mbufs, where that > >> assumption is assumed to be true for all but the final segment in an > >> mbuf chain. Resolve this issue by adjusting the size of the mbuf's > >> private data portion, as opposed to the packet data portion when > >> aligning mbuf size to cachelines. > >> > >> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") > >> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") > >> CC: Santosh Shukla <[email protected]> > >> Signed-off-by: Mark Kavanagh <[email protected]> > >> --- > >> lib/netdev-dpdk.c | 46 > >> ++++++++++++++++++++++++++++++---------------- > >> 1 file changed, 30 insertions(+), 16 deletions(-) > >> > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > >> 9715c39..4167497 100644 > >> --- a/lib/netdev-dpdk.c > >> +++ b/lib/netdev-dpdk.c > >> @@ -81,12 +81,6 @@ static struct vlog_rate_limit rl = > >> VLOG_RATE_LIMIT_INIT(5, 20); > >> + (2 * VLAN_HEADER_LEN)) > >> #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + > >> ETHER_CRC_LEN) > >> #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) > >> -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ > >> - - ETHER_HDR_LEN - ETHER_CRC_LEN) > >> -#define MBUF_SIZE(mtu) > ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) > >> \ > >> - + sizeof(struct dp_packet) \ > >> - + RTE_PKTMBUF_HEADROOM), \ > >> - RTE_CACHE_LINE_SIZE) > >> #define NETDEV_DPDK_MBUF_ALIGN 1024 > >> #define NETDEV_DPDK_MAX_PKT_LEN 9728 > >> > >> @@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class) > >> * behaviour, which reduces performance. To prevent this, use a buffer > >> size > >> * that is closest to 'mtu', but which satisfies the aforementioned > >criteria. > >> */ > >> -static uint32_t > >> +static uint16_t > >> dpdk_buf_size(int mtu) > >> { > >> return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + > >> RTE_PKTMBUF_HEADROOM), @@ -486,7 +480,7 @@ > >> ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED, > >> * - a new mempool was just created; > >> * - a matching mempool already exists. */ static struct > >> rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) > >> +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) > >> { > >> char mp_name[RTE_MEMPOOL_NAMESIZE]; > >> const char *netdev_name = netdev_get_name(&dev->up); @@ -494,6 > >> +488,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) > >> uint32_t n_mbufs; > >> uint32_t hash = hash_string(netdev_name, 0); > >> struct rte_mempool *mp = NULL; > >> + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; > >> > >> /* > >> * XXX: rough estimation of number of mbufs required for this port: > >> @@ -513,12 +508,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int > mtu) > >> * longer than RTE_MEMPOOL_NAMESIZE. */ > >> int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, > >> "ovs%08x%02d%05d%07u", > >> - hash, socket_id, mtu, n_mbufs); > >> + hash, socket_id, mbuf_pkt_data_len, > >> + n_mbufs); > >> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { > >> VLOG_DBG("snprintf returned %d. " > >> "Failed to generate a mempool name for \"%s\". " > >> - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", > >> - ret, netdev_name, hash, socket_id, mtu, n_mbufs); > >> + "Hash:0x%x, socket_id: %d, pkt data room:%d, > >mbufs:%u.", > >> + ret, netdev_name, hash, socket_id, mbuf_pkt_data_len, > >> + n_mbufs); > >> break; > >> } > >> > >> @@ -527,13 +523,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int > mtu) > >> netdev_name, n_mbufs, socket_id, > >> dev->requested_n_rxq, dev->requested_n_txq); > >> > >> + mbuf_priv_data_len = sizeof(struct dp_packet) - > >> + sizeof(struct rte_mbuf); > >> + /* The size of the entire mbuf. */ > >> + mbuf_size = sizeof (struct dp_packet) + > >> + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM; > >> + /* mbuf size, rounded up to cacheline size. */ > >> + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE); > >> + /* If there is a size discrepancy, add padding to > >mbuf_priv_data_len. > >> + * This maintains mbuf size cache alignment, while also honoring > >> RX > >> + * buffer alignment in the data portion of the mbuf. If this > >adjustment > >> + * is not made, there is a possibility later on that for an > >> + element > >of > >> + * the mempool, buf, buf->data_len < (buf->buf_len - buf- > >>data_off). > >> + * This is problematic in the case of multi-segment mbufs, > >particularly > >> + * when an mbuf segment needs to be resized (when > >> + [push|popp]ing a > >> VLAN > >> + * header, for example. > >> + */ > >> + mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size); > >> + > >> mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, > >> - sizeof (struct dp_packet) - sizeof (struct rte_mbuf), > >> - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id); > >> + mbuf_priv_data_len, > >> + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM, > >> + socket_id); > >[Sugesh] Is there any chance this will cause any performance drop? > >The earlier implementation made the data to be cache aligned , so the > >the packet data handling is more cache friendly. Now the alignment is > >moved private data, which may > > Hi Sugesh, > > The size adjustment only affects 128B cachelines, in which case it actually > ensures that the packet data within the mbuf is cache aligned. Without this > adjustment, alignment is achieved by adding space to the data portion of the > mbuf, but this introduces a 64B (i.e. half a cacheline) gap between the tail > of the > dp_packet, and the mbuf's headroom. By increasing the size of the private data > instead (by 64B), the start of the mbuf data is once again aligned to a 128B > cacheline. For 64B cachelines it doesn't make a difference, so no performance > degradation is expected, nor have I observed any. > > Thanks, > Mark [Sugesh] Thank you Mark for the confirmation. In fact I did some of the quick testing and didn't find any degradation as such. > > >touch /accessed by CPU > >very rarely. Does it make any difference in performance when running > >different size packet? I don't see any issues when I did some basic > >testing. However I haven't done any testing that involved a lot packet > >data/header handling. > >Do you expect anything like that? > >> > >> if (mp) { > >> VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", > >> - mp_name, n_mbufs); > >> + mp_name, n_mbufs); > >> /* rte_pktmbuf_pool_create has done some initialization of the > >> * rte_mbuf part of each dp_packet. Some OvS specific fields > >> * of the packet still need to be initialized by @@ > >> -582,11 +596,11 @@ static int netdev_dpdk_mempool_configure(struct > >> netdev_dpdk *dev) > >> OVS_REQUIRES(dev->mutex) > >> { > >> - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); > >> + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); > >> struct rte_mempool *mp; > >> int ret = 0; > >> > >> - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size)); > >> + mp = dpdk_mp_create(dev, buf_size); > >> if (!mp) { > >> VLOG_ERR("Failed to create memory pool for netdev " > >> "%s, with MTU %d on socket %d: %s\n", > >> -- > >> 1.9.3 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
