>From: O Mahony, Billy >Sent: Thursday, November 23, 2017 11:23 AM >To: Kavanagh, Mark B <[email protected]>; [email protected]; >[email protected] >Subject: RE: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment >jumbo frames > >Hi Mark, > >Just one comment below. > >/Billy > >> -----Original Message----- >> From: [email protected] [mailto:ovs-dev- >> [email protected]] On Behalf Of Mark Kavanagh >> Sent: Tuesday, November 21, 2017 6:29 PM >> To: [email protected]; [email protected] >> Subject: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment >> jumbo frames >> >> Currently, jumbo frame support for OvS-DPDK is implemented by increasing the >> size of mbufs within a mempool, such that each mbuf within the pool is large >> enough to contain an entire jumbo frame of a user-defined size. Typically, >for >> each user-defined MTU, 'requested_mtu', a new mempool is created, containing >> mbufs of size ~requested_mtu. >> >> With the multi-segment approach, a port uses a single mempool, (containing >> standard/default-sized mbufs of ~2k bytes), irrespective of the user- >requested >> MTU value. To accommodate jumbo frames, mbufs are chained together, where >> each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the >> chain is termed a segment, hence the name. >> >> == Enabling multi-segment mbufs == >> Multi-segment and single-segment mbufs are mutually exclusive, and the user >> must decide on which approach to adopt on init. The introduction of a new >> OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global >boolean >> value, which determines how jumbo frames are represented across all DPDK >> ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' >defaults >> to false, i.e. multi-segment mbufs must be explicitly enabled / single- >segment >> mbufs remain the default. >> >[[BO'M]] Would it be more useful if they multi-segment was enabled by default? >Does enabling multi-segment mbufs result in much of a performance decrease >when not-using jumbo frames? Either because jumbo frames are not coming in on >the ingress port or because the mtu is set not to accept jumbo frames.
Hey Billy, I think that single-segment should remain the default. Enabling multi-segment implicitly means that non-vectorized DPDK driver Rx and TX functions must be used, which are, by nature, not as performant as their vectorized counterparts. I don't have comparative figures to hand, but I'll note same in the cover letter of any subsequent versions of this patchset. Thanks, Mark > >Obviously not a blocker to this patch-set. Maybe something to be looked at in >the future. > >> Setting the field is identical to setting existing DPDK-specific OVSDB >> fields: >> >> ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true >> ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 >> ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 >> ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true >> >> Signed-off-by: Mark Kavanagh <[email protected]> >> --- >> NEWS | 1 + >> lib/dpdk.c | 7 +++++++ >> lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++--- >> lib/netdev-dpdk.h | 1 + >> vswitchd/vswitch.xml | 20 ++++++++++++++++++++ >> 5 files changed, 69 insertions(+), 3 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index c15dc24..657b598 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -15,6 +15,7 @@ Post-v2.8.0 >> - DPDK: >> * Add support for DPDK v17.11 >> * Add support for vHost IOMMU feature >> + * Add support for multi-segment mbufs >> >> v2.8.0 - 31 Aug 2017 >> -------------------- >> diff --git a/lib/dpdk.c b/lib/dpdk.c >> index 8da6c32..4c28bd0 100644 >> --- a/lib/dpdk.c >> +++ b/lib/dpdk.c >> @@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config) >> >> /* Finally, register the dpdk classes */ >> netdev_dpdk_register(); >> + >> + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, >> + "dpdk-multi-seg-mbufs", false); >> + if (multi_seg_mbufs_enable) { >> + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); >> + netdev_dpdk_multi_segment_mbufs_enable(); >> + } >> } >> >> void >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 36275bd..293edad >> 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; >> >> VLOG_DEFINE_THIS_MODULE(netdev_dpdk); >> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); >> +static bool dpdk_multi_segment_mbufs = false; >> >> #define DPDK_PORT_WATCHDOG_INTERVAL 5 >> >> @@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t >> frame_len) >> + dev->requested_n_txq * dev->requested_txq_size >> + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * >> NETDEV_MAX_BURST >> + MIN_NB_MBUF; >> + /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are >> + used? */ >> >> ovs_mutex_lock(&dpdk_mp_mutex); >> do { >> @@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp) >> >> /* Tries to allocate a new mempool - or re-use an existing one where >> * appropriate - on requested_socket_id with a size determined by >> - * requested_mtu and requested Rx/Tx queues. >> + * requested_mtu and requested Rx/Tx queues. Some properties of the >> + mempool's >> + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': >> + * - if 'true', then the mempool contains standard-sized mbufs that are >chained >> + * together to accommodate packets of size 'requested_mtu'. >> + * - if 'false', then the members of the allocated mempool are >> + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to >> fully >> + * accomdate packets of size 'requested_mtu'. >> * On success - or when re-using an existing mempool - the new >configuration >> * will be applied. >> * On error, device will be left unchanged. */ @@ -576,10 +584,18 @@ static >> int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) >> OVS_REQUIRES(dev->mutex) >> { >> - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); >> + uint16_t buf_size = 0; >> struct rte_mempool *mp; >> int ret = 0; >> >> + /* Contiguous mbufs in use - permit oversized mbufs */ >> + if (!dpdk_multi_segment_mbufs) { >> + buf_size = dpdk_buf_size(dev->requested_mtu); >> + } else { >> + /* multi-segment mbufs - use standard mbuf size */ >> + buf_size = dpdk_buf_size(ETHER_MTU); >> + } >> + >> mp = dpdk_mp_create(dev, buf_size); >> if (!mp) { >> VLOG_ERR("Failed to create memory pool for netdev " >> @@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >> int n_rxq, int n_txq) >> int diag = 0; >> int i; >> struct rte_eth_conf conf = port_conf; >> + struct rte_eth_txconf txconf; >> >> /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly >> * enabled. */ >> @@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >> int n_rxq, int n_txq) >> break; >> } >> >> + /* DPDK PMDs typically attempt to use simple or vectorized >> + * transmit functions, neither of which are compatible with >> + * multi-segment mbufs. Ensure that these are disabled in the >> + * when multi-segment mbufs are enabled. >> + */ >> + if (dpdk_multi_segment_mbufs) { >> + struct rte_eth_dev_info dev_info; >> + rte_eth_dev_info_get(dev->port_id, &dev_info); >> + txconf = dev_info.default_txconf; >> + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; >> + } >> + >> for (i = 0; i < n_txq; i++) { >> diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, >> - dev->socket_id, NULL); >> + dev->socket_id, >> + dpdk_multi_segment_mbufs ? >&txconf >> + : >> + NULL); >> if (diag) { >> VLOG_INFO("Interface %s txq(%d) setup error: %s", >> dev->up.name, i, rte_strerror(-diag)); @@ -3380,6 >+3411,12 @@ >> unlock: >> return err; >> } >> >> +void >> +netdev_dpdk_multi_segment_mbufs_enable(void) >> +{ >> + dpdk_multi_segment_mbufs = true; >> +} >> + >> #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ >> SET_CONFIG, SET_TX_MULTIQ, SEND, \ >> GET_CARRIER, GET_STATS, \ >> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe >> 100644 >> --- a/lib/netdev-dpdk.h >> +++ b/lib/netdev-dpdk.h >> @@ -25,6 +25,7 @@ struct dp_packet; >> >> #ifdef DPDK_NETDEV >> >> +void netdev_dpdk_multi_segment_mbufs_enable(void); >> void netdev_dpdk_register(void); >> void free_dpdk_buf(struct dp_packet *); >> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index >> a633226..2b71c4a 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -331,6 +331,26 @@ >> </p> >> </column> >> >> + <column name="other_config" key="dpdk-multi-seg-mbufs" >> + type='{"type": "boolean"}'> >> + <p> >> + Specifies if DPDK uses multi-segment mbufs for handling jumbo >frames. >> + </p> >> + <p> >> + If true, DPDK allocates a single mempool per port, irrespective >> + of the ports' requested MTU sizes. The elements of this mempool >are >> + 'standard'-sized mbufs (typically 2k MB), which may be chained >> + together to accommodate jumbo frames. In this approach, each >mbuf >> + typically stores a fragment of the overall jumbo frame. >> + </p> >> + <p> >> + If not specified, defaults to <code>false</code>, in which >case, the size >> + of each mbuf within a DPDK port's mempool will be grown to >> accommodate >> + jumbo frames within a single mbuf. >> + </p> >> + </column> >> + >> + >> <column name="other_config" key="vhost-sock-dir" >> type='{"type": "string"}'> >> <p> >> -- >> 1.9.3 >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
