Regards _Sugesh
> -----Original Message----- > From: Chandran, Sugesh > Sent: Tuesday, December 12, 2017 8:42 PM > To: Kavanagh, Mark B <[email protected]>; [email protected]; > [email protected] > Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > <[email protected]>; > [email protected] > Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > > > > Regards > _Sugesh > > > > -----Original Message----- > > From: Kavanagh, Mark B > > Sent: Tuesday, December 12, 2017 11:52 AM > > To: Chandran, Sugesh <[email protected]>; [email protected]; > > [email protected] > > Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > > <[email protected]>; [email protected] > > Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > > > > >From: Chandran, Sugesh > > >Sent: Friday, December 8, 2017 5:48 PM > > >To: Kavanagh, Mark B <[email protected]>; > > >[email protected]; [email protected] > > >Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > > ><[email protected]>; [email protected] > > >Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > > > > > > > > > > > >Regards > > >_Sugesh > > > > > > > > >> -----Original Message----- > > >> From: Kavanagh, Mark B > > >> Sent: Friday, December 8, 2017 12:02 PM > > >> To: [email protected]; [email protected] > > >> Cc: Stokes, Ian <[email protected]>; Loftus, Ciara > > ><[email protected]>; > > >> [email protected]; Chandran, Sugesh > > >> <[email protected]>; Kavanagh, Mark B > > >> <[email protected]> > > >> Subject: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing > > >> > > >> There are numerous factors that must be considered when calculating > > >> the size of an mbuf: > > >> - the data portion of the mbuf must be sized in accordance With Rx > > >> buffer alignment (typically 1024B). So, for example, in order to > > >> successfully receive and capture a 1500B packet, mbufs with a > > >> data portion of size 2048B must be used. > > >> - in OvS, the elements that comprise an mbuf are: > > >> * the dp packet, which includes a struct rte mbuf (704B) > > >> * RTE_PKTMBUF_HEADROOM (128B) > > >> * packet data (aligned to 1k, as previously described) > > >> * RTE_PKTMBUF_TAILROOM (typically 0) > > >> > > >> Some PMDs require that the total mbuf size (i.e. the total sum of > > >> all of the above-listed components' lengths) is cache-aligned. To > > >> satisfy this requirement, it may be necessary to round up the total > > >> mbuf size with respect to cacheline size. In doing so, it's > > >> possible that the dp_packet's data portion is inadvertently > > >> increased in size, such that it no longer adheres to Rx buffer > > >> alignment. Consequently, the following property of the mbuf no longer > holds true: > > >> > > >> mbuf.data_len == mbuf.buf_len - mbuf.data_off > > >> > > >> This creates a problem in the case of multi-segment mbufs, where > > >> that assumption is assumed to be true for all but the final segment > > >> in an mbuf chain. Resolve this issue by adjusting the size of the > > >> mbuf's private data portion, as opposed to the packet data portion > > >> when aligning mbuf size to cachelines. > > >> > > >> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") > > >> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to > > >> cache_line_size") > > >> CC: Santosh Shukla <[email protected]> > > >> Signed-off-by: Mark Kavanagh <[email protected]> > > >> --- > > >> lib/netdev-dpdk.c | 46 > > >> ++++++++++++++++++++++++++++++---------------- > > >> 1 file changed, 30 insertions(+), 16 deletions(-) > > >> > > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > >> 9715c39..4167497 100644 > > >> --- a/lib/netdev-dpdk.c > > >> +++ b/lib/netdev-dpdk.c > > >> @@ -81,12 +81,6 @@ static struct vlog_rate_limit rl = > > >> VLOG_RATE_LIMIT_INIT(5, 20); > > >> + (2 * VLAN_HEADER_LEN)) > > >> #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + > > >> ETHER_CRC_LEN) > > >> #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + > ETHER_HDR_MAX_LEN) > > >> -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ > > >> - - ETHER_HDR_LEN - ETHER_CRC_LEN) > > >> -#define MBUF_SIZE(mtu) > > ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) > > >> \ > > >> - + sizeof(struct dp_packet) > > >> \ > > >> - + RTE_PKTMBUF_HEADROOM), > > >> \ > > >> - RTE_CACHE_LINE_SIZE) > > >> #define NETDEV_DPDK_MBUF_ALIGN 1024 > > >> #define NETDEV_DPDK_MAX_PKT_LEN 9728 > > >> > > >> @@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class) > > >> * behaviour, which reduces performance. To prevent this, use a buffer > > >> size > > >> * that is closest to 'mtu', but which satisfies the > > >> aforementioned > > >criteria. > > >> */ > > >> -static uint32_t > > >> +static uint16_t > > >> dpdk_buf_size(int mtu) > > >> { > > >> return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + > > >> RTE_PKTMBUF_HEADROOM), @@ -486,7 +480,7 @@ > > >> ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED, > > >> * - a new mempool was just created; > > >> * - a matching mempool already exists. */ static struct > > >> rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) > > >> +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t > > >> +mbuf_pkt_data_len) > > >> { > > >> char mp_name[RTE_MEMPOOL_NAMESIZE]; > > >> const char *netdev_name = netdev_get_name(&dev->up); @@ -494,6 > > >> +488,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) > > >> uint32_t n_mbufs; > > >> uint32_t hash = hash_string(netdev_name, 0); > > >> struct rte_mempool *mp = NULL; > > >> + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; > > >> > > >> /* > > >> * XXX: rough estimation of number of mbufs required for this port: > > >> @@ -513,12 +508,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int > > mtu) > > >> * longer than RTE_MEMPOOL_NAMESIZE. */ > > >> int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, > > >> "ovs%08x%02d%05d%07u", > > >> - hash, socket_id, mtu, n_mbufs); > > >> + hash, socket_id, mbuf_pkt_data_len, > > >> + n_mbufs); > > >> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { > > >> VLOG_DBG("snprintf returned %d. " > > >> "Failed to generate a mempool name for \"%s\". " > > >> - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", > > >> - ret, netdev_name, hash, socket_id, mtu, n_mbufs); > > >> + "Hash:0x%x, socket_id: %d, pkt data room:%d, > > >mbufs:%u.", > > >> + ret, netdev_name, hash, socket_id, > > >> mbuf_pkt_data_len, > > >> + n_mbufs); > > >> break; > > >> } > > >> > > >> @@ -527,13 +523,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int > > mtu) > > >> netdev_name, n_mbufs, socket_id, > > >> dev->requested_n_rxq, dev->requested_n_txq); > > >> > > >> + mbuf_priv_data_len = sizeof(struct dp_packet) - > > >> + sizeof(struct rte_mbuf); > > >> + /* The size of the entire mbuf. */ > > >> + mbuf_size = sizeof (struct dp_packet) + > > >> + mbuf_pkt_data_len + > > >> RTE_PKTMBUF_HEADROOM; > > >> + /* mbuf size, rounded up to cacheline size. */ > > >> + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE); > > >> + /* If there is a size discrepancy, add padding to > > >mbuf_priv_data_len. > > >> + * This maintains mbuf size cache alignment, while also > > >> honoring RX > > >> + * buffer alignment in the data portion of the mbuf. If > > >> + this > > >adjustment > > >> + * is not made, there is a possibility later on that for > > >> + an element > > >of > > >> + * the mempool, buf, buf->data_len < (buf->buf_len - buf- > > >>data_off). > > >> + * This is problematic in the case of multi-segment mbufs, > > >particularly > > >> + * when an mbuf segment needs to be resized (when > > >> + [push|popp]ing a > > >> VLAN > > >> + * header, for example. > > >> + */ > > >> + mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size); > > >> + > > >> mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, > > >> - sizeof (struct dp_packet) - sizeof (struct rte_mbuf), > > >> - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id); > > >> + mbuf_priv_data_len, > > >> + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM, > > >> + socket_id); > > >[Sugesh] Is there any chance this will cause any performance drop? > > >The earlier implementation made the data to be cache aligned , so the > > >the packet data handling is more cache friendly. Now the alignment is > > >moved private data, which may > > > > Hi Sugesh, > > > > The size adjustment only affects 128B cachelines, in which case it > > actually ensures that the packet data within the mbuf is cache > > aligned. Without this adjustment, alignment is achieved by adding > > space to the data portion of the mbuf, but this introduces a 64B (i.e. > > half a cacheline) gap between the tail of the dp_packet, and the > > mbuf's headroom. By increasing the size of the private data instead > > (by 64B), the start of the mbuf data is once again aligned to a 128B > > cacheline. For 64B cachelines it doesn't make a difference, so no > > performance > degradation is expected, nor have I observed any. > > > > Thanks, > > Mark > [Sugesh] Thank you Mark for the confirmation. In fact I did some of the quick > testing and didn't find any degradation as such. > [Sugesh] addendum : I will run test cases and validate if there are any issue as such when you release the next version of patches. Thanks! > > > > >touch /accessed by CPU > > >very rarely. Does it make any difference in performance when running > > >different size packet? I don't see any issues when I did some basic > > >testing. However I haven't done any testing that involved a lot > > >packet data/header handling. > > >Do you expect anything like that? > > >> > > >> if (mp) { > > >> VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", > > >> - mp_name, n_mbufs); > > >> + mp_name, n_mbufs); > > >> /* rte_pktmbuf_pool_create has done some initialization of > > >> the > > >> * rte_mbuf part of each dp_packet. Some OvS specific fields > > >> * of the packet still need to be initialized by @@ > > >> -582,11 +596,11 @@ static int netdev_dpdk_mempool_configure(struct > > >> netdev_dpdk *dev) > > >> OVS_REQUIRES(dev->mutex) > > >> { > > >> - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); > > >> + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); > > >> struct rte_mempool *mp; > > >> int ret = 0; > > >> > > >> - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size)); > > >> + mp = dpdk_mp_create(dev, buf_size); > > >> if (!mp) { > > >> VLOG_ERR("Failed to create memory pool for netdev " > > >> "%s, with MTU %d on socket %d: %s\n", > > >> -- > > >> 1.9.3 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
