Hi Mark, Just one comment below.
/Billy > -----Original Message----- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of Mark Kavanagh > Sent: Tuesday, November 21, 2017 6:29 PM > To: d...@openvswitch.org; qiud...@chinac.com > Subject: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment > jumbo frames > > Currently, jumbo frame support for OvS-DPDK is implemented by increasing the > size of mbufs within a mempool, such that each mbuf within the pool is large > enough to contain an entire jumbo frame of a user-defined size. Typically, for > each user-defined MTU, 'requested_mtu', a new mempool is created, containing > mbufs of size ~requested_mtu. > > With the multi-segment approach, a port uses a single mempool, (containing > standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested > MTU value. To accommodate jumbo frames, mbufs are chained together, where > each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the > chain is termed a segment, hence the name. > > == Enabling multi-segment mbufs == > Multi-segment and single-segment mbufs are mutually exclusive, and the user > must decide on which approach to adopt on init. The introduction of a new > OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global > boolean > value, which determines how jumbo frames are represented across all DPDK > ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' > defaults > to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment > mbufs remain the default. > [[BO'M]] Would it be more useful if they multi-segment was enabled by default? Does enabling multi-segment mbufs result in much of a performance decrease when not-using jumbo frames? Either because jumbo frames are not coming in on the ingress port or because the mtu is set not to accept jumbo frames. Obviously not a blocker to this patch-set. Maybe something to be looked at in the future. > Setting the field is identical to setting existing DPDK-specific OVSDB > fields: > > ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true > ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 > ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 > ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true > > Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> > --- > NEWS | 1 + > lib/dpdk.c | 7 +++++++ > lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++--- > lib/netdev-dpdk.h | 1 + > vswitchd/vswitch.xml | 20 ++++++++++++++++++++ > 5 files changed, 69 insertions(+), 3 deletions(-) > > diff --git a/NEWS b/NEWS > index c15dc24..657b598 100644 > --- a/NEWS > +++ b/NEWS > @@ -15,6 +15,7 @@ Post-v2.8.0 > - DPDK: > * Add support for DPDK v17.11 > * Add support for vHost IOMMU feature > + * Add support for multi-segment mbufs > > v2.8.0 - 31 Aug 2017 > -------------------- > diff --git a/lib/dpdk.c b/lib/dpdk.c > index 8da6c32..4c28bd0 100644 > --- a/lib/dpdk.c > +++ b/lib/dpdk.c > @@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config) > > /* Finally, register the dpdk classes */ > netdev_dpdk_register(); > + > + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, > + "dpdk-multi-seg-mbufs", false); > + if (multi_seg_mbufs_enable) { > + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); > + netdev_dpdk_multi_segment_mbufs_enable(); > + } > } > > void > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 36275bd..293edad > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; > > VLOG_DEFINE_THIS_MODULE(netdev_dpdk); > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > +static bool dpdk_multi_segment_mbufs = false; > > #define DPDK_PORT_WATCHDOG_INTERVAL 5 > > @@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t > frame_len) > + dev->requested_n_txq * dev->requested_txq_size > + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * > NETDEV_MAX_BURST > + MIN_NB_MBUF; > + /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are > + used? */ > > ovs_mutex_lock(&dpdk_mp_mutex); > do { > @@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp) > > /* Tries to allocate a new mempool - or re-use an existing one where > * appropriate - on requested_socket_id with a size determined by > - * requested_mtu and requested Rx/Tx queues. > + * requested_mtu and requested Rx/Tx queues. Some properties of the > + mempool's > + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': > + * - if 'true', then the mempool contains standard-sized mbufs that are > chained > + * together to accommodate packets of size 'requested_mtu'. > + * - if 'false', then the members of the allocated mempool are > + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to > fully > + * accomdate packets of size 'requested_mtu'. > * On success - or when re-using an existing mempool - the new configuration > * will be applied. > * On error, device will be left unchanged. */ @@ -576,10 +584,18 @@ static > int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) > OVS_REQUIRES(dev->mutex) > { > - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); > + uint16_t buf_size = 0; > struct rte_mempool *mp; > int ret = 0; > > + /* Contiguous mbufs in use - permit oversized mbufs */ > + if (!dpdk_multi_segment_mbufs) { > + buf_size = dpdk_buf_size(dev->requested_mtu); > + } else { > + /* multi-segment mbufs - use standard mbuf size */ > + buf_size = dpdk_buf_size(ETHER_MTU); > + } > + > mp = dpdk_mp_create(dev, buf_size); > if (!mp) { > VLOG_ERR("Failed to create memory pool for netdev " > @@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, > int n_rxq, int n_txq) > int diag = 0; > int i; > struct rte_eth_conf conf = port_conf; > + struct rte_eth_txconf txconf; > > /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly > * enabled. */ > @@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, > int n_rxq, int n_txq) > break; > } > > + /* DPDK PMDs typically attempt to use simple or vectorized > + * transmit functions, neither of which are compatible with > + * multi-segment mbufs. Ensure that these are disabled in the > + * when multi-segment mbufs are enabled. > + */ > + if (dpdk_multi_segment_mbufs) { > + struct rte_eth_dev_info dev_info; > + rte_eth_dev_info_get(dev->port_id, &dev_info); > + txconf = dev_info.default_txconf; > + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; > + } > + > for (i = 0; i < n_txq; i++) { > diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, > - dev->socket_id, NULL); > + dev->socket_id, > + dpdk_multi_segment_mbufs ? &txconf > + : > + NULL); > if (diag) { > VLOG_INFO("Interface %s txq(%d) setup error: %s", > dev->up.name, i, rte_strerror(-diag)); @@ -3380,6 > +3411,12 @@ > unlock: > return err; > } > > +void > +netdev_dpdk_multi_segment_mbufs_enable(void) > +{ > + dpdk_multi_segment_mbufs = true; > +} > + > #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ > SET_CONFIG, SET_TX_MULTIQ, SEND, \ > GET_CARRIER, GET_STATS, \ > diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe > 100644 > --- a/lib/netdev-dpdk.h > +++ b/lib/netdev-dpdk.h > @@ -25,6 +25,7 @@ struct dp_packet; > > #ifdef DPDK_NETDEV > > +void netdev_dpdk_multi_segment_mbufs_enable(void); > void netdev_dpdk_register(void); > void free_dpdk_buf(struct dp_packet *); > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > a633226..2b71c4a 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -331,6 +331,26 @@ > </p> > </column> > > + <column name="other_config" key="dpdk-multi-seg-mbufs" > + type='{"type": "boolean"}'> > + <p> > + Specifies if DPDK uses multi-segment mbufs for handling jumbo > frames. > + </p> > + <p> > + If true, DPDK allocates a single mempool per port, irrespective > + of the ports' requested MTU sizes. The elements of this mempool > are > + 'standard'-sized mbufs (typically 2k MB), which may be chained > + together to accommodate jumbo frames. In this approach, each mbuf > + typically stores a fragment of the overall jumbo frame. > + </p> > + <p> > + If not specified, defaults to <code>false</code>, in which case, > the size > + of each mbuf within a DPDK port's mempool will be grown to > accommodate > + jumbo frames within a single mbuf. > + </p> > + </column> > + > + > <column name="other_config" key="vhost-sock-dir" > type='{"type": "string"}'> > <p> > -- > 1.9.3 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev