[dpdk-dev] Performance hit - NICs on different CPU sockets
On Thu, Jun 16, 2016 at 10:19 PM, Wiles, Keith wrote: > > On 6/16/16, 3:16 PM, "dev on behalf of Wiles, Keith" on behalf of keith.wiles at intel.com> wrote: > >> >>On 6/16/16, 3:00 PM, "Take Ceara" wrote: >> >>>On Thu, Jun 16, 2016 at 9:33 PM, Wiles, Keith >>>wrote: On 6/16/16, 1:20 PM, "Take Ceara" wrote: >On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith >wrote: >> >> On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" > dpdk.org on behalf of keith.wiles at intel.com> wrote: >> >>> >>>On 6/16/16, 11:20 AM, "Take Ceara" wrote: >>> On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith >>>intel.com> wrote: > > Right now I do not know what the issue is with the system. Could be > too many Rx/Tx ring pairs per port and limiting the memory in the > NICs, which is why you get better performance when you have 8 core > per port. I am not really seeing the whole picture and how DPDK is > configured to help more. Sorry. I doubt that there is a limitation wrt running 16 cores per port vs 8 cores per port as I've tried with two different machines connected back to back each with one X710 port and 16 cores on each of them running on that port. In that case our performance doubled as expected. > > Maybe seeing the DPDK command line would help. The command line I use with ports 01:00.3 and 81:00.3 is: ./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- --qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 Our own qmap args allow the user to control exactly how cores are split between ports. In this case we end up with: warp17> show port map Port 0[socket: 0]: Core 4[socket:0] (Tx: 0, Rx: 0) Core 5[socket:0] (Tx: 1, Rx: 1) Core 6[socket:0] (Tx: 2, Rx: 2) Core 7[socket:0] (Tx: 3, Rx: 3) Core 8[socket:0] (Tx: 4, Rx: 4) Core 9[socket:0] (Tx: 5, Rx: 5) Core 20[socket:0] (Tx: 6, Rx: 6) Core 21[socket:0] (Tx: 7, Rx: 7) Core 22[socket:0] (Tx: 8, Rx: 8) Core 23[socket:0] (Tx: 9, Rx: 9) Core 24[socket:0] (Tx: 10, Rx: 10) Core 25[socket:0] (Tx: 11, Rx: 11) Core 26[socket:0] (Tx: 12, Rx: 12) Core 27[socket:0] (Tx: 13, Rx: 13) Core 28[socket:0] (Tx: 14, Rx: 14) Core 29[socket:0] (Tx: 15, Rx: 15) Port 1[socket: 1]: Core 10[socket:1] (Tx: 0, Rx: 0) Core 11[socket:1] (Tx: 1, Rx: 1) Core 12[socket:1] (Tx: 2, Rx: 2) Core 13[socket:1] (Tx: 3, Rx: 3) Core 14[socket:1] (Tx: 4, Rx: 4) Core 15[socket:1] (Tx: 5, Rx: 5) Core 16[socket:1] (Tx: 6, Rx: 6) Core 17[socket:1] (Tx: 7, Rx: 7) Core 18[socket:1] (Tx: 8, Rx: 8) Core 19[socket:1] (Tx: 9, Rx: 9) Core 30[socket:1] (Tx: 10, Rx: 10) Core 31[socket:1] (Tx: 11, Rx: 11) Core 32[socket:1] (Tx: 12, Rx: 12) Core 33[socket:1] (Tx: 13, Rx: 13) Core 34[socket:1] (Tx: 14, Rx: 14) Core 35[socket:1] (Tx: 15, Rx: 15) >>> >>>On each socket you have 10 physical cores or 20 lcores per socket for 40 >>>lcores total. >>> >>>The above is listing the LCORES (or hyper-threads) and not COREs, which >>>I understand some like to think they are interchangeable. The problem is >>>the hyper-threads are logically interchangeable, but not performance >>>wise. If you have two run-to-completion threads on a single physical >>>core each on a different hyper-thread of that core [0,1], then the >>>second lcore or thread (1) on that physical core will only get at most >>>about 30-20% of the CPU cycles. Normally it is much less, unless you >>>tune the code to make sure each thread is not trying to share the >>>internal execution units, but some internal execution units are always >>>shared. >>> >>>To get the best performance when hyper-threading is enable is to not run >>>both threads on a single physical core, but only run one hyper-thread-0. >>> >>>In the table below the table lists the physical core id and each of the >>>lcore ids per socket. Use the first lcore per socket for the best >>>performance: >>>Core 1 [1, 21][11, 31] >>>Use lcore 1 or 11 depending on the socket you are on. >>> >>>The info below is most likely the best performance and utilization of >>>your system. If I got the values right ? >>> >>>./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- >>>--qmap 0.0x0003FE --qmap 1.0x0FFE00 >>> >>>Port 0[socket: 0]: >>> Core 2[socket:0] (Tx: 0, Rx: 0) >>> Core 3[socket:0] (Tx: 1, Rx: 1) >>>
[dpdk-dev] [PATCH v10 4/7] ethdev: make get port by name and get name by port public
2016-06-15 15:06, Reshma Pattan: > Converted rte_eth_dev_get_port_by_name to a public API. > Converted rte_eth_dev_get_name_by_port to a public API. > Updated the release notes with the changes. It is not an API change, just a new API, so no need to reference it in the release notes.
[dpdk-dev] [PATCH 4/4] doc: add MTU update to feature matrix for enic
Signed-off-by: John Daley --- doc/guides/nics/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst index 29a6163..6b30085 100644 --- a/doc/guides/nics/overview.rst +++ b/doc/guides/nics/overview.rst @@ -92,7 +92,7 @@ Most of these differences are summarized below. Queue status event Y Rx interrupt Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Queue start/stop Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y - MTU update Y Y Y Y Y Y Y Y Y Y + MTU update Y Y Y Y Y Y Y Y Y Y Y Jumbo frameY Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Scattered Rx Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y LROY Y Y Y -- 2.7.0
[dpdk-dev] [PATCH 3/4] enic: add an update MTU function for non-Rx scatter mode
Provide an update MTU callbaack. The function returns -ENOTSUP if Rx scatter is enabled. Updating the MTU to be greater than the value configured via the Cisco CIMC/UCSM management interface is allowed provided it is still less than the maximum egress packet size allowed by the NIC. Signed-off-by: John Daley --- drivers/net/enic/enic.h| 1 + drivers/net/enic/enic_ethdev.c | 10 +- drivers/net/enic/enic_main.c | 44 ++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h index 78f7bd7..8122358 100644 --- a/drivers/net/enic/enic.h +++ b/drivers/net/enic/enic.h @@ -245,4 +245,5 @@ uint16_t enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int enic_set_mtu(struct enic *enic, uint16_t new_mtu); #endif /* _ENIC_H_ */ diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 31d9600..9a738c2 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -520,6 +520,14 @@ static void enicpmd_remove_mac_addr(struct rte_eth_dev *eth_dev, __rte_unused ui enic_del_mac_address(enic); } +static int enicpmd_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu) +{ + struct enic *enic = pmd_priv(eth_dev); + + ENICPMD_FUNC_TRACE(); + return enic_set_mtu(enic, mtu); +} + static const struct eth_dev_ops enicpmd_eth_dev_ops = { .dev_configure= enicpmd_dev_configure, .dev_start= enicpmd_dev_start, @@ -537,7 +545,7 @@ static const struct eth_dev_ops enicpmd_eth_dev_ops = { .queue_stats_mapping_set = NULL, .dev_infos_get= enicpmd_dev_info_get, .dev_supported_ptypes_get = enicpmd_dev_supported_ptypes_get, - .mtu_set = NULL, + .mtu_set = enicpmd_mtu_set, .vlan_filter_set = enicpmd_vlan_filter_set, .vlan_tpid_set= NULL, .vlan_offload_set = enicpmd_vlan_offload_set, diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c index 32ecdae..c23938a 100644 --- a/drivers/net/enic/enic_main.c +++ b/drivers/net/enic/enic_main.c @@ -854,6 +854,50 @@ int enic_set_vnic_res(struct enic *enic) return rc; } +/* The Cisco NIC can send and receive packets up to a max packet size + * determined by the NIC type and firmware. There is also an MTU + * configured into the NIC via the CIMC/UCSM management interface + * which can be overridden by this function (up to the max packet size). + * Depending on the network setup, doing so may cause packet drops + * and unexpected behavior. + */ +int enic_set_mtu(struct enic *enic, uint16_t new_mtu) +{ + uint16_t old_mtu; /* previous setting */ + uint16_t config_mtu;/* Value configured into NIC via CIMC/UCSM */ + struct rte_eth_dev *eth_dev = enic->rte_dev; + + old_mtu = eth_dev->data->mtu; + config_mtu = enic->config.mtu; + + /* only works with Rx scatter disabled */ + if (enic->rte_dev->data->dev_conf.rxmode.enable_scatter) + return -ENOTSUP; + + if (new_mtu > enic->max_mtu) { + dev_err(enic, + "MTU not updated: requested (%u) greater than max (%u)\n", + new_mtu, enic->max_mtu); + return -EINVAL; + } + if (new_mtu < ENIC_MIN_MTU) { + dev_info(enic, + "MTU not updated: requested (%u) less than min (%u)\n", + new_mtu, ENIC_MIN_MTU); + return -EINVAL; + } + if (new_mtu > config_mtu) + dev_warning(enic, + "MTU (%u) is greater than value configured in NIC (%u)\n", + new_mtu, config_mtu); + + /* update the mtu */ + eth_dev->data->mtu = new_mtu; + + dev_info(enic, "MTU changed from %u to %u\n", old_mtu, new_mtu); + return 0; +} + static int enic_dev_init(struct enic *enic) { int err; -- 2.7.0
[dpdk-dev] [PATCH 2/4] enic: set the max allowed MTU for the NIC
The max MTU is set to the max egress packet size allowed by the VIC minus the size of a an IPv4 L2 header with .1Q (18 bytes). Signed-off-by: John Daley --- drivers/net/enic/enic.h| 1 + drivers/net/enic/enic_ethdev.c | 3 ++- drivers/net/enic/enic_res.c| 25 + drivers/net/enic/enic_res.h| 4 +++- 4 files changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h index 1e6914e..78f7bd7 100644 --- a/drivers/net/enic/enic.h +++ b/drivers/net/enic/enic.h @@ -118,6 +118,7 @@ struct enic { u8 ig_vlan_strip_en; int link_status; u8 hw_ip_checksum; + u16 max_mtu; unsigned int flags; unsigned int priv_flags; diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 697ff82..31d9600 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -435,7 +435,8 @@ static void enicpmd_dev_info_get(struct rte_eth_dev *eth_dev, device_info->max_rx_queues = enic->rq_count; device_info->max_tx_queues = enic->wq_count; device_info->min_rx_bufsize = ENIC_MIN_MTU; - device_info->max_rx_pktlen = enic->config.mtu; + device_info->max_rx_pktlen = enic->rte_dev->data->mtu + + ETHER_HDR_LEN + 4; device_info->max_mac_addrs = 1; device_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP | diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c index ebe379d..e82181f 100644 --- a/drivers/net/enic/enic_res.c +++ b/drivers/net/enic/enic_res.c @@ -83,6 +83,20 @@ int enic_get_vnic_config(struct enic *enic) GET_CONFIG(intr_timer_usec); GET_CONFIG(loop_tag); GET_CONFIG(num_arfs); + GET_CONFIG(max_pkt_size); + + /* max packet size is only defined in newer VIC firmware +* and will be 0 for legacy firmware and VICs +*/ + if (c->max_pkt_size > ENIC_DEFAULT_MAX_PKT_SIZE) + enic->max_mtu = c->max_pkt_size - (ETHER_HDR_LEN + 4); + else + enic->max_mtu = ENIC_DEFAULT_MAX_PKT_SIZE - (ETHER_HDR_LEN + 4); + if (c->mtu == 0) + c->mtu = 1500; + + enic->rte_dev->data->mtu = min_t(u16, enic->max_mtu, +max_t(u16, ENIC_MIN_MTU, c->mtu)); c->wq_desc_count = min_t(u32, ENIC_MAX_WQ_DESCS, @@ -96,21 +110,16 @@ int enic_get_vnic_config(struct enic *enic) c->rq_desc_count)); c->rq_desc_count &= 0xffe0; /* must be aligned to groups of 32 */ - if (c->mtu == 0) - c->mtu = 1500; - c->mtu = min_t(u16, ENIC_MAX_MTU, - max_t(u16, ENIC_MIN_MTU, - c->mtu)); - c->intr_timer_usec = min_t(u32, c->intr_timer_usec, vnic_dev_get_intr_coal_timer_max(enic->vdev)); dev_info(enic_get_dev(enic), "vNIC MAC addr %02x:%02x:%02x:%02x:%02x:%02x " - "wq/rq %d/%d mtu %d\n", + "wq/rq %d/%d mtu %d, max mtu:%d\n", enic->mac_addr[0], enic->mac_addr[1], enic->mac_addr[2], enic->mac_addr[3], enic->mac_addr[4], enic->mac_addr[5], - c->wq_desc_count, c->rq_desc_count, c->mtu); + c->wq_desc_count, c->rq_desc_count, + enic->rte_dev->data->mtu, enic->max_mtu); dev_info(enic_get_dev(enic), "vNIC csum tx/rx %s/%s " "rss %s intr mode %s type %s timer %d usec " "loopback tag 0x%04x\n", diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h index 3c8e303..303530e 100644 --- a/drivers/net/enic/enic_res.h +++ b/drivers/net/enic/enic_res.h @@ -46,7 +46,9 @@ #define ENIC_MAX_RQ_DESCS 4096 #define ENIC_MIN_MTU 68 -#define ENIC_MAX_MTU 9000 + +/* Does not include (possible) inserted VLAN tag and FCS */ +#define ENIC_DEFAULT_MAX_PKT_SIZE 9022 #define ENIC_MULTICAST_PERFECT_FILTERS 32 #define ENIC_UNICAST_PERFECT_FILTERS 32 -- 2.7.0
[dpdk-dev] [PATCH 1/4] enic: enable NIC max packet size discovery
Pull in common VNIC code which enables querying for max egress packet size. Signed-off-by: John Daley --- There are some non-related fields and defines in this file because it is shared with other drivers and interfaces to the VIC. drivers/net/enic/base/vnic_enet.h | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/enic/base/vnic_enet.h b/drivers/net/enic/base/vnic_enet.h index cc34998..5062247 100644 --- a/drivers/net/enic/base/vnic_enet.h +++ b/drivers/net/enic/base/vnic_enet.h @@ -35,6 +35,10 @@ #ifndef _VNIC_ENIC_H_ #define _VNIC_ENIC_H_ +/* Hardware intr coalesce timer is in units of 1.5us */ +#define INTR_COALESCE_USEC_TO_HW(usec) ((usec) * 2 / 3) +#define INTR_COALESCE_HW_TO_USEC(usec) ((usec) * 3 / 2) + /* Device-specific region: enet configuration */ struct vnic_enet_config { u32 flags; @@ -50,6 +54,12 @@ struct vnic_enet_config { u16 vf_rq_count; u16 num_arfs; u64 mem_paddr; + u16 rdma_qp_id; + u16 rdma_qp_count; + u16 rdma_resgrp; + u32 rdma_mr_id; + u32 rdma_mr_count; + u32 max_pkt_size; }; #define VENETF_TSO 0x1 /* TSO enabled */ @@ -64,9 +74,14 @@ struct vnic_enet_config { #define VENETF_RSSHASH_IPV6_EX 0x200 /* Hash on IPv6 extended fields */ #define VENETF_RSSHASH_TCPIPV6_EX 0x400/* Hash on TCP + IPv6 ext. fields */ #define VENETF_LOOP0x800 /* Loopback enabled */ -#define VENETF_VMQ 0x4000 /* using VMQ flag for VMware NETQ */ +#define VENETF_FAILOVER0x1000 /* Fabric failover enabled */ +#define VENETF_USPACE_NIC 0x2000 /* vHPC enabled */ +#define VENETF_VMQ 0x4000 /* VMQ enabled */ +#define VENETF_ARFS0x8000 /* ARFS enabled */ #define VENETF_VXLAN0x1 /* VxLAN offload */ #define VENETF_NVGRE0x2 /* NVGRE offload */ +#define VENETF_GRPINTR 0x4 /* group interrupt */ + #define VENET_INTR_TYPE_MIN0 /* Timer specs min interrupt spacing */ #define VENET_INTR_TYPE_IDLE 1 /* Timer specs idle time before irq */ -- 2.7.0
[dpdk-dev] [PATCH 0/4] enic: enable MTU update callback
This patchset determines the max egress packet size allowed on the NIC and uses it to set an upper limit for MTU. An MTU update function is added, but only works if Rx scatter is disabled. If Rx scatter is enabled, -ENOSUP is returned. Another patch with Rx scatter support will come later. These patches should apply cleanly to dpdk-net-next rel_16_07 or on the enic Rx scatter patch http://www.dpdk.org/dev/patchwork/patch/13933/ John Daley (4): enic: enable NIC max packet size discovery enic: set the max allowed MTU for the NIC enic: add an update MTU function for non-rx scatter mode doc: add MTU update to feature matrix for enic doc/guides/nics/overview.rst | 2 +- drivers/net/enic/base/vnic_enet.h | 17 ++- drivers/net/enic/enic.h | 2 ++ drivers/net/enic/enic_ethdev.c| 13 ++-- drivers/net/enic/enic_main.c | 44 +++ drivers/net/enic/enic_res.c | 25 +++--- drivers/net/enic/enic_res.h | 4 +++- 7 files changed, 94 insertions(+), 13 deletions(-) -- 2.7.0
[dpdk-dev] Performance hit - NICs on different CPU sockets
On Thu, Jun 16, 2016 at 9:33 PM, Wiles, Keith wrote: > On 6/16/16, 1:20 PM, "Take Ceara" wrote: > >>On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith >>wrote: >>> >>> On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" >> dpdk.org on behalf of keith.wiles at intel.com> wrote: >>> On 6/16/16, 11:20 AM, "Take Ceara" wrote: >On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith >wrote: > >> >> Right now I do not know what the issue is with the system. Could be too >> many Rx/Tx ring pairs per port and limiting the memory in the NICs, >> which is why you get better performance when you have 8 core per port. I >> am not really seeing the whole picture and how DPDK is configured to >> help more. Sorry. > >I doubt that there is a limitation wrt running 16 cores per port vs 8 >cores per port as I've tried with two different machines connected >back to back each with one X710 port and 16 cores on each of them >running on that port. In that case our performance doubled as >expected. > >> >> Maybe seeing the DPDK command line would help. > >The command line I use with ports 01:00.3 and 81:00.3 is: >./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- >--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 > >Our own qmap args allow the user to control exactly how cores are >split between ports. In this case we end up with: > >warp17> show port map >Port 0[socket: 0]: > Core 4[socket:0] (Tx: 0, Rx: 0) > Core 5[socket:0] (Tx: 1, Rx: 1) > Core 6[socket:0] (Tx: 2, Rx: 2) > Core 7[socket:0] (Tx: 3, Rx: 3) > Core 8[socket:0] (Tx: 4, Rx: 4) > Core 9[socket:0] (Tx: 5, Rx: 5) > Core 20[socket:0] (Tx: 6, Rx: 6) > Core 21[socket:0] (Tx: 7, Rx: 7) > Core 22[socket:0] (Tx: 8, Rx: 8) > Core 23[socket:0] (Tx: 9, Rx: 9) > Core 24[socket:0] (Tx: 10, Rx: 10) > Core 25[socket:0] (Tx: 11, Rx: 11) > Core 26[socket:0] (Tx: 12, Rx: 12) > Core 27[socket:0] (Tx: 13, Rx: 13) > Core 28[socket:0] (Tx: 14, Rx: 14) > Core 29[socket:0] (Tx: 15, Rx: 15) > >Port 1[socket: 1]: > Core 10[socket:1] (Tx: 0, Rx: 0) > Core 11[socket:1] (Tx: 1, Rx: 1) > Core 12[socket:1] (Tx: 2, Rx: 2) > Core 13[socket:1] (Tx: 3, Rx: 3) > Core 14[socket:1] (Tx: 4, Rx: 4) > Core 15[socket:1] (Tx: 5, Rx: 5) > Core 16[socket:1] (Tx: 6, Rx: 6) > Core 17[socket:1] (Tx: 7, Rx: 7) > Core 18[socket:1] (Tx: 8, Rx: 8) > Core 19[socket:1] (Tx: 9, Rx: 9) > Core 30[socket:1] (Tx: 10, Rx: 10) > Core 31[socket:1] (Tx: 11, Rx: 11) > Core 32[socket:1] (Tx: 12, Rx: 12) > Core 33[socket:1] (Tx: 13, Rx: 13) > Core 34[socket:1] (Tx: 14, Rx: 14) > Core 35[socket:1] (Tx: 15, Rx: 15) On each socket you have 10 physical cores or 20 lcores per socket for 40 lcores total. The above is listing the LCORES (or hyper-threads) and not COREs, which I understand some like to think they are interchangeable. The problem is the hyper-threads are logically interchangeable, but not performance wise. If you have two run-to-completion threads on a single physical core each on a different hyper-thread of that core [0,1], then the second lcore or thread (1) on that physical core will only get at most about 30-20% of the CPU cycles. Normally it is much less, unless you tune the code to make sure each thread is not trying to share the internal execution units, but some internal execution units are always shared. To get the best performance when hyper-threading is enable is to not run both threads on a single physical core, but only run one hyper-thread-0. In the table below the table lists the physical core id and each of the lcore ids per socket. Use the first lcore per socket for the best performance: Core 1 [1, 21][11, 31] Use lcore 1 or 11 depending on the socket you are on. The info below is most likely the best performance and utilization of your system. If I got the values right ? ./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- --qmap 0.0x0003FE --qmap 1.0x0FFE00 Port 0[socket: 0]: Core 2[socket:0] (Tx: 0, Rx: 0) Core 3[socket:0] (Tx: 1, Rx: 1) Core 4[socket:0] (Tx: 2, Rx: 2) Core 5[socket:0] (Tx: 3, Rx: 3) Core 6[socket:0] (Tx: 4, Rx: 4) Core 7[socket:0] (Tx: 5, Rx: 5) Core 8[socket:0] (Tx: 6, Rx: 6) Core 9[socket:0] (Tx: 7, Rx: 7) 8 cores on first socket leaving 0-1 lcores for Linux. >>> >>> 9 cores and leaving the first core or two lcores for Linux Port 1[socket: 1]: Core 10[socket:1] (Tx: 0, Rx: 0) Core 11[socket:1] (Tx: 1, Rx: 1) Core 12[socket:1] (Tx: 2, Rx: 2) Core 13[socket:1] (Tx: 3, Rx: 3)
[dpdk-dev] [PATCH v4] e1000: configure VLAN TPID
This patch enables configuring the outer TPID for double VLAN. Note that all other TPID values are read only. Signed-off-by: Beilei Xing --- v4 changes: Optimize the code to be more readable. v3 changes: Update commit log and comments. v2 changes: Modify return value. Cause inner TPID is not supported by single VLAN, return - ENOTSUP. Add return value. If want to set inner TPID of double VLAN or set outer TPID of single VLAN, return -ENOTSUP. drivers/net/e1000/igb_ethdev.c | 33 ++--- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index f0921ee..0ed95c8 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -86,6 +86,13 @@ #define E1000_INCVALUE_82576 (16 << IGB_82576_TSYNC_SHIFT) #define E1000_TSAUXC_DISABLE_SYSTIME 0x8000 +/* External VLAN Enable bit mask */ +#define E1000_CTRL_EXT_EXT_VLAN (1 << 26) + +/* External VLAN Ether Type bit mask and shift */ +#define E1000_VET_VET_EXT0x +#define E1000_VET_VET_EXT_SHIFT 16 + static int eth_igb_configure(struct rte_eth_dev *dev); static int eth_igb_start(struct rte_eth_dev *dev); static void eth_igb_stop(struct rte_eth_dev *dev); @@ -2237,21 +2244,25 @@ eth_igb_vlan_tpid_set(struct rte_eth_dev *dev, { struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); - uint32_t reg = ETHER_TYPE_VLAN; - int ret = 0; + uint32_t reg, qinq; + + qinq = E1000_READ_REG(hw, E1000_CTRL_EXT); + qinq &= E1000_CTRL_EXT_EXT_VLAN; - switch (vlan_type) { - case ETH_VLAN_TYPE_INNER: - reg |= (tpid << 16); + /* only outer TPID of double VLAN can be configured*/ + if (qinq && vlan_type == ETH_VLAN_TYPE_OUTER) { + reg = E1000_READ_REG(hw, E1000_VET); + reg = (reg & (~E1000_VET_VET_EXT)) | + ((uint32_t)tpid << E1000_VET_VET_EXT_SHIFT); E1000_WRITE_REG(hw, E1000_VET, reg); - break; - default: - ret = -EINVAL; - PMD_DRV_LOG(ERR, "Unsupported vlan type %d\n", vlan_type); - break; + + return 0; } - return ret; + /* all other TPID values are read-only*/ + PMD_DRV_LOG(ERR, "Not supported"); + + return -ENOTSUP; } static void -- 2.5.0
[dpdk-dev] [PATCH v10 3/7] ethdev: add new fields to ethdev info struct
2016-06-15 15:06, Reshma Pattan: > The new fields nb_rx_queues and nb_tx_queues are added to the > rte_eth_dev_info structure. > Changes to API rte_eth_dev_info_get() are done to update these new fields > to the rte_eth_dev_info object. The ABI is changed, not the API. > Release notes is updated with the changes. [...] > --- a/lib/librte_ether/rte_ether_version.map > +++ b/lib/librte_ether/rte_ether_version.map > @@ -137,4 +137,5 @@ DPDK_16.07 { > global: > > rte_eth_add_first_rx_callback; > + rte_eth_dev_info_get; > } DPDK_16.04; Why duplicating this symbol in 16.07? The ABI is broken anyway.
[dpdk-dev] [PATCH 4/4] app/test: typo fixing
Fixing typo in the performance tests for example preftest to perftest. Signed-off-by: Jain, Deepak K --- app/test/test_cryptodev_perf.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c index 6c43a93..903529f 100644 --- a/app/test/test_cryptodev_perf.c +++ b/app/test/test_cryptodev_perf.c @@ -208,7 +208,7 @@ setup_test_string(struct rte_mempool *mpool, static struct crypto_testsuite_params testsuite_params = { NULL }; static struct crypto_unittest_params unittest_params; -static enum rte_cryptodev_type gbl_cryptodev_preftest_devtype; +static enum rte_cryptodev_type gbl_cryptodev_perftest_devtype; static int testsuite_setup(void) @@ -245,7 +245,7 @@ testsuite_setup(void) } /* Create 2 AESNI MB devices if required */ - if (gbl_cryptodev_preftest_devtype == RTE_CRYPTODEV_AESNI_MB_PMD) { + if (gbl_cryptodev_perftest_devtype == RTE_CRYPTODEV_AESNI_MB_PMD) { nb_devs = rte_cryptodev_count_devtype(RTE_CRYPTODEV_AESNI_MB_PMD); if (nb_devs < 2) { for (i = nb_devs; i < 2; i++) { @@ -260,7 +260,7 @@ testsuite_setup(void) } /* Create 2 SNOW3G devices if required */ - if (gbl_cryptodev_preftest_devtype == RTE_CRYPTODEV_SNOW3G_PMD) { + if (gbl_cryptodev_perftest_devtype == RTE_CRYPTODEV_SNOW3G_PMD) { nb_devs = rte_cryptodev_count_devtype(RTE_CRYPTODEV_SNOW3G_PMD); if (nb_devs < 2) { for (i = nb_devs; i < 2; i++) { @@ -283,7 +283,7 @@ testsuite_setup(void) /* Search for the first valid */ for (i = 0; i < nb_devs; i++) { rte_cryptodev_info_get(i, ); - if (info.dev_type == gbl_cryptodev_preftest_devtype) { + if (info.dev_type == gbl_cryptodev_perftest_devtype) { ts_params->dev_id = i; valid_dev_id = 1; break; @@ -1956,7 +1956,7 @@ test_perf_crypto_qp_vary_burst_size(uint16_t dev_num) } while (num_received != num_to_submit) { - if (gbl_cryptodev_preftest_devtype == + if (gbl_cryptodev_perftest_devtype == RTE_CRYPTODEV_AESNI_MB_PMD) rte_cryptodev_enqueue_burst(dev_num, 0, NULL, 0); @@ -2028,7 +2028,7 @@ test_perf_snow3G_optimise_cyclecount(struct perf_test_params *pparams) printf("\nOn %s dev%u qp%u, %s, cipher algo:%s, auth_algo:%s, " "Packet Size %u bytes", - pmd_name(gbl_cryptodev_preftest_devtype), + pmd_name(gbl_cryptodev_perftest_devtype), ts_params->dev_id, 0, chain_mode_name(pparams->chain), cipher_algo_name(pparams->cipher_algo), @@ -2072,7 +2072,7 @@ test_perf_snow3G_optimise_cyclecount(struct perf_test_params *pparams) } while (num_ops_received != num_to_submit) { - if (gbl_cryptodev_preftest_devtype == + if (gbl_cryptodev_perftest_devtype == RTE_CRYPTODEV_AESNI_MB_PMD) rte_cryptodev_enqueue_burst(ts_params->dev_id, 0, NULL, 0); @@ -2680,7 +2680,7 @@ test_perf_snow3g(uint8_t dev_id, uint16_t queue_id, double cycles_B = cycles_buff / pparams->buf_size; double throughput = (ops_s * pparams->buf_size * 8) / 100; - if (gbl_cryptodev_preftest_devtype == RTE_CRYPTODEV_QAT_SYM_PMD) { + if (gbl_cryptodev_perftest_devtype == RTE_CRYPTODEV_QAT_SYM_PMD) { /* Cycle count misleading on HW devices for this test, so don't print */ printf("%4u\t%6.2f\t%10.2f\t n/a \t\t n/a " "\t\t n/a \t\t%8"PRIu64"\t%8"PRIu64, @@ -2824,7 +2824,7 @@ test_perf_snow3G_vary_pkt_size(void) for (k = 0; k < RTE_DIM(burst_sizes); k++) { printf("\nOn %s dev%u qp%u, %s, " "cipher algo:%s, auth algo:%s, burst_size: %d ops", - pmd_name(gbl_cryptodev_preftest_devtype), + pmd_name(gbl_cryptodev_perftest_devtype), testsuite_params.dev_id, 0, chain_mode_name(params_set[i].chain), cipher_algo_name(params_set[i].cipher_algo), @@ -2893,7 +2893,7 @@ static struct unit_test_suite cryptodev_snow3g_testsuite = { static int perftest_aesni_mb_cryptodev(void /*argv __rte_unused, int argc __rte_unused*/) { - gbl_cryptodev_preftest_devtype = RTE_CRYPTODEV_AESNI_MB_PMD; +
[dpdk-dev] [PATCH 3/4] app/test: updating AES SHA performance test
From: Fiona TraheUpdating the AES performance test in line with snow3g peformance test. Output format has been updated so as to get better understanding of numbers. Signed-off-by: Fiona Trahe Signed-off-by: Jain, Deepak K --- app/test/test_cryptodev.h | 2 + app/test/test_cryptodev_perf.c | 551 +++-- 2 files changed, 370 insertions(+), 183 deletions(-) diff --git a/app/test/test_cryptodev.h b/app/test/test_cryptodev.h index d549eca..382802c 100644 --- a/app/test/test_cryptodev.h +++ b/app/test/test_cryptodev.h @@ -64,7 +64,9 @@ #define AES_XCBC_MAC_KEY_SZ(16) #define TRUNCATED_DIGEST_BYTE_LENGTH_SHA1 (12) +#define TRUNCATED_DIGEST_BYTE_LENGTH_SHA224(16) #define TRUNCATED_DIGEST_BYTE_LENGTH_SHA256(16) +#define TRUNCATED_DIGEST_BYTE_LENGTH_SHA384(24) #define TRUNCATED_DIGEST_BYTE_LENGTH_SHA512(32) #endif /* TEST_CRYPTODEV_H_ */ diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c index 06148d0..6c43a93 100644 --- a/app/test/test_cryptodev_perf.c +++ b/app/test/test_cryptodev_perf.c @@ -492,12 +492,11 @@ const char plaintext_quote[] = #define CIPHER_KEY_LENGTH_AES_CBC (16) #define CIPHER_IV_LENGTH_AES_CBC (CIPHER_KEY_LENGTH_AES_CBC) - -static uint8_t aes_cbc_key[] = { +static uint8_t aes_cbc_128_key[] = { 0xE4, 0x23, 0x33, 0x8A, 0x35, 0x64, 0x61, 0xE2, 0xF1, 0x35, 0x5C, 0x3B, 0xDD, 0x9A, 0x65, 0xBA }; -static uint8_t aes_cbc_iv[] = { +static uint8_t aes_cbc_128_iv[] = { 0xf5, 0xd3, 0x89, 0x0f, 0x47, 0x00, 0xcb, 0x52, 0x42, 0x1a, 0x7d, 0x3d, 0xf5, 0x82, 0x80, 0xf1 }; @@ -1846,7 +1845,7 @@ test_perf_crypto_qp_vary_burst_size(uint16_t dev_num) ut_params->cipher_xform.cipher.algo = RTE_CRYPTO_CIPHER_AES_CBC; ut_params->cipher_xform.cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT; - ut_params->cipher_xform.cipher.key.data = aes_cbc_key; + ut_params->cipher_xform.cipher.key.data = aes_cbc_128_key; ut_params->cipher_xform.cipher.key.length = CIPHER_IV_LENGTH_AES_CBC; @@ -1902,7 +1901,7 @@ test_perf_crypto_qp_vary_burst_size(uint16_t dev_num) op->sym->cipher.iv.phys_addr = rte_pktmbuf_mtophys(m); op->sym->cipher.iv.length = CIPHER_IV_LENGTH_AES_CBC; - rte_memcpy(op->sym->cipher.iv.data, aes_cbc_iv, + rte_memcpy(op->sym->cipher.iv.data, aes_cbc_128_iv, CIPHER_IV_LENGTH_AES_CBC); op->sym->cipher.data.offset = CIPHER_IV_LENGTH_AES_CBC; @@ -1985,169 +1984,6 @@ test_perf_crypto_qp_vary_burst_size(uint16_t dev_num) } static int -test_perf_AES_CBC_HMAC_SHA256_encrypt_digest_vary_req_size(uint16_t dev_num) -{ - uint16_t index; - uint32_t burst_sent, burst_received; - uint32_t b, num_sent, num_received; - uint64_t failed_polls, retries, start_cycles, end_cycles; - const uint64_t mhz = rte_get_tsc_hz()/100; - double throughput, mmps; - - struct rte_crypto_op *c_ops[DEFAULT_BURST_SIZE]; - struct rte_crypto_op *proc_ops[DEFAULT_BURST_SIZE]; - - struct crypto_testsuite_params *ts_params = _params; - struct crypto_unittest_params *ut_params = _params; - struct crypto_data_params *data_params = aes_cbc_hmac_sha256_output; - - if (rte_cryptodev_count() == 0) { - printf("\nNo crypto devices available. Is kernel driver loaded?\n"); - return TEST_FAILED; - } - - /* Setup Cipher Parameters */ - ut_params->cipher_xform.type = RTE_CRYPTO_SYM_XFORM_CIPHER; - ut_params->cipher_xform.next = _params->auth_xform; - - ut_params->cipher_xform.cipher.algo = RTE_CRYPTO_CIPHER_AES_CBC; - ut_params->cipher_xform.cipher.op = RTE_CRYPTO_CIPHER_OP_ENCRYPT; - ut_params->cipher_xform.cipher.key.data = aes_cbc_key; - ut_params->cipher_xform.cipher.key.length = CIPHER_IV_LENGTH_AES_CBC; - - /* Setup HMAC Parameters */ - ut_params->auth_xform.type = RTE_CRYPTO_SYM_XFORM_AUTH; - ut_params->auth_xform.next = NULL; - - ut_params->auth_xform.auth.op = RTE_CRYPTO_AUTH_OP_GENERATE; - ut_params->auth_xform.auth.algo = RTE_CRYPTO_AUTH_SHA256_HMAC; - ut_params->auth_xform.auth.key.data = hmac_sha256_key; - ut_params->auth_xform.auth.key.length = HMAC_KEY_LENGTH_SHA256; - ut_params->auth_xform.auth.digest_length = DIGEST_BYTE_LENGTH_SHA256; - - /* Create Crypto session*/ - ut_params->sess = rte_cryptodev_sym_session_create(ts_params->dev_id, - _params->cipher_xform); - - TEST_ASSERT_NOT_NULL(ut_params->sess, "Session creation failed"); - - printf("\nThroughput test which will continually attempt to send " - "AES128_CBC_SHA256_HMAC requests with a constant burst " - "size of %u
[dpdk-dev] [PATCH 2/4] app/test: adding Snow3g performance test
From: Fiona TraheAdding performance test for snow3g wireless algorithm. Performance test can run over both software and hardware. Signed-off-by: Fiona Trahe Signed-off-by: Jain, Deepak K Signed-off-by: Declan Doherty --- app/test/test_cryptodev.h | 2 +- app/test/test_cryptodev_perf.c | 688 - 2 files changed, 688 insertions(+), 2 deletions(-) diff --git a/app/test/test_cryptodev.h b/app/test/test_cryptodev.h index 6059a01..d549eca 100644 --- a/app/test/test_cryptodev.h +++ b/app/test/test_cryptodev.h @@ -46,7 +46,7 @@ #define DEFAULT_BURST_SIZE (64) #define DEFAULT_NUM_XFORMS (2) #define NUM_MBUFS (8191) -#define MBUF_CACHE_SIZE (250) +#define MBUF_CACHE_SIZE (256) #define MBUF_DATAPAYLOAD_SIZE (2048 + DIGEST_BYTE_LENGTH_SHA512) #define MBUF_SIZE (sizeof(struct rte_mbuf) + \ RTE_PKTMBUF_HEADROOM + MBUF_DATAPAYLOAD_SIZE) diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c index b3f4fd9..06148d0 100644 --- a/app/test/test_cryptodev_perf.c +++ b/app/test/test_cryptodev_perf.c @@ -58,6 +58,25 @@ struct crypto_testsuite_params { uint8_t dev_id; }; +enum chain_mode { + CIPHER_HASH, + HASH_CIPHER, + CIPHER_ONLY, + HASH_ONLY +}; + +struct perf_test_params { + + unsigned total_operations; + unsigned burst_size; + unsigned buf_size; + + enum chain_mode chain; + + enum rte_crypto_cipher_algorithm cipher_algo; + unsigned cipher_key_length; + enum rte_crypto_auth_algorithm auth_algo; +}; #define MAX_NUM_OF_OPS_PER_UT (128) @@ -75,6 +94,98 @@ struct crypto_unittest_params { uint8_t *digest; }; +static struct rte_cryptodev_sym_session * +test_perf_create_snow3g_session(uint8_t dev_id, enum chain_mode chain, + enum rte_crypto_cipher_algorithm cipher_algo, unsigned cipher_key_len, + enum rte_crypto_auth_algorithm auth_algo); +static struct rte_mbuf * +test_perf_create_pktmbuf(struct rte_mempool *mpool, unsigned buf_sz); +static inline struct rte_crypto_op * +test_perf_set_crypto_op_snow3g(struct rte_crypto_op *op, struct rte_mbuf *m, + struct rte_cryptodev_sym_session *sess, unsigned data_len, + unsigned digest_len); +static uint32_t get_auth_digest_length(enum rte_crypto_auth_algorithm algo); + + +static const char *chain_mode_name(enum chain_mode mode) +{ + switch (mode) { + case CIPHER_HASH: return "cipher_hash"; break; + case HASH_CIPHER: return "hash_cipher"; break; + case CIPHER_ONLY: return "cipher_only"; break; + case HASH_ONLY: return "hash_only"; break; + default: return ""; break; + } +} + +static const char *pmd_name(enum rte_cryptodev_type pmd) +{ + switch (pmd) { + case RTE_CRYPTODEV_NULL_PMD: return CRYPTODEV_NAME_NULL_PMD; break; + case RTE_CRYPTODEV_AESNI_GCM_PMD: + return CRYPTODEV_NAME_AESNI_GCM_PMD; + case RTE_CRYPTODEV_AESNI_MB_PMD: + return CRYPTODEV_NAME_AESNI_MB_PMD; + case RTE_CRYPTODEV_QAT_SYM_PMD: + return CRYPTODEV_NAME_QAT_SYM_PMD; + case RTE_CRYPTODEV_SNOW3G_PMD: + return CRYPTODEV_NAME_SNOW3G_PMD; + default: + return ""; + } +} + +static const char *cipher_algo_name(enum rte_crypto_cipher_algorithm cipher_algo) +{ + switch (cipher_algo) { + case RTE_CRYPTO_CIPHER_NULL: return "NULL"; + case RTE_CRYPTO_CIPHER_3DES_CBC: return "3DES_CBC"; + case RTE_CRYPTO_CIPHER_3DES_CTR: return "3DES_CTR"; + case RTE_CRYPTO_CIPHER_3DES_ECB: return "3DES_ECB"; + case RTE_CRYPTO_CIPHER_AES_CBC: return "AES_CBC"; + case RTE_CRYPTO_CIPHER_AES_CCM: return "AES_CCM"; + case RTE_CRYPTO_CIPHER_AES_CTR: return "AES_CTR"; + case RTE_CRYPTO_CIPHER_AES_ECB: return "AES_ECB"; + case RTE_CRYPTO_CIPHER_AES_F8: return "AES_F8"; + case RTE_CRYPTO_CIPHER_AES_GCM: return "AES_GCM"; + case RTE_CRYPTO_CIPHER_AES_XTS: return "AES_XTS"; + case RTE_CRYPTO_CIPHER_ARC4: return "ARC4"; + case RTE_CRYPTO_CIPHER_KASUMI_F8: return "KASUMI_F8"; + case RTE_CRYPTO_CIPHER_SNOW3G_UEA2: return "SNOW3G_UEA2"; + case RTE_CRYPTO_CIPHER_ZUC_EEA3: return "ZUC_EEA3"; + default: return "Another cipher algo"; + } +} + +static const char *auth_algo_name(enum rte_crypto_auth_algorithm auth_algo) +{ + switch (auth_algo) { + case RTE_CRYPTO_AUTH_NULL: return "NULL"; break; + case RTE_CRYPTO_AUTH_AES_CBC_MAC: return "AES_CBC_MAC"; break; + case RTE_CRYPTO_AUTH_AES_CCM: return "AES_CCM"; break; + case RTE_CRYPTO_AUTH_AES_CMAC: return "AES_CMAC,"; break; + case RTE_CRYPTO_AUTH_AES_GCM: return "AES_GCM"; break; + case RTE_CRYPTO_AUTH_AES_GMAC: return "AES_GMAC"; break; +
[dpdk-dev] [PATCH 1/4] cryptodev: add rte_crypto_op_bulk_free function
From: Declan DohertyAdding rte_crypto_op_bulk_free to free up the ops in bulk so as to expect improvement in performance. Signed-off-by: Declan Doherty --- lib/librte_cryptodev/rte_crypto.h | 15 +++ 1 file changed, 15 insertions(+) diff --git a/lib/librte_cryptodev/rte_crypto.h b/lib/librte_cryptodev/rte_crypto.h index 5bc3eaa..31abbdc 100644 --- a/lib/librte_cryptodev/rte_crypto.h +++ b/lib/librte_cryptodev/rte_crypto.h @@ -328,6 +328,21 @@ rte_crypto_op_free(struct rte_crypto_op *op) } /** + * free crypto operation structure + * If operation has been allocate from a rte_mempool, then the operation will + * be returned to the mempool. + * + * @param op symmetric crypto operation + */ +static inline void +rte_crypto_op_bulk_free(struct rte_mempool *mpool, struct rte_crypto_op **ops, + uint16_t nb_ops) +{ + if (ops != NULL) + rte_mempool_put_bulk(mpool, (void * const *)ops, nb_ops); +} + +/** * Allocate a symmetric crypto operation in the private data of an mbuf. * * @param m mbuf which is associated with the crypto operation, the -- 2.5.5
[dpdk-dev] [PATCH 0/4] Extending cryptodev Performance tests
Performance tests haven been extended in this patchset. Patchset consists of 4 patches: Patch 1 adds new function rte_crypto_op_bulk_free to be used in patch 2 and patch 3. Patch 2 add snow3g performance tests. Patch 3 updates the existing aes performanc test Patch 4 fixes the typo in names of perftest Declan Doherty (1): cryptodev: add rte_crypto_op_bulk_free function Fiona Trahe (2): app/test: adding Snow3g performance test app/test: updating AES SHA performance test Jain, Deepak K (1): app/test: typo fixing app/test/test_cryptodev.h |4 +- app/test/test_cryptodev_perf.c| 1153 - lib/librte_cryptodev/rte_crypto.h | 15 + 3 files changed, 1030 insertions(+), 142 deletions(-) -- 2.5.5
[dpdk-dev] [PATCH v3 3/4] bonding: take queue spinlock in rx/tx burst functions
2016-06-16 16:41, Iremonger, Bernard: > Hi Thomas, > > > 2016-06-16 15:32, Bruce Richardson: > > > On Mon, Jun 13, 2016 at 01:28:08PM +0100, Iremonger, Bernard wrote: > > > > > Why does this particular PMD need spinlocks when doing RX and TX, > > > > > while other device types do not? How is adding/removing devices > > > > > from a bonded device different to other control operations that > > > > > can be done on physical PMDs? Is this not similar to say bringing > > > > > down or hotplugging out a physical port just before an RX or TX > > operation takes place? > > > > > For all other PMDs we rely on the app to synchronise control and > > > > > data plane operation - why not here? > > > > > > > > > > /Bruce > > > > > > > > This issue arose during VM live migration testing. > > > > For VM live migration it is necessary (while traffic is running) to be > > > > able to > > remove a bonded slave device, stop it, close it and detach it. > > > > It a slave device is removed from a bonded device while traffic is > > > > running > > a segmentation fault may occur in the rx/tx burst function. The spinlock has > > been added to prevent this occurring. > > > > > > > > The bonding device already uses a spinlock to synchronise between the > > add and remove functionality and the slave_link_status_change_monitor > > code. > > > > > > > > Previously testpmd did not allow, stop, close or detach of PMD while > > > > traffic was running. Testpmd has been modified with the following > > > > patchset > > > > > > > > http://dpdk.org/dev/patchwork/patch/13472/ > > > > > > > > It now allows stop, close and detach of a PMD provided in it is not > > forwarding and is not a slave of bonded PMD. > > > > > > > I will admit to not being fully convinced, but if nobody else has any > > > serious objections, and since this patch has been reviewed and acked, > > > I'm ok to merge it in. I'll do so shortly. > > > > Please hold on. > > Seeing locks introduced in the Rx/Tx path is an alert. > > We clearly need a design document to explain where locks can be used and > > what are the responsibility of the control plane. > > If everybody agrees in this document that DPDK can have some locks in the > > fast path, then OK to merge it. > > > > So I would say NACK for 16.07 and maybe postpone to 16.11. > > Looking at the documentation for the bonding PMD. > > http://dpdk.org/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.html > > In section 10.2 it states the following: > > Bonded devices support the dynamical addition and removal of slave devices > using the rte_eth_bond_slave_add / rte_eth_bond_slave_remove APIs. > > If a slave device is added or removed while traffic is running, there is the > possibility of a segmentation fault in the rx/tx burst functions. This is > most likely to occur in the round robin bonding mode. > > This patch set fixes what appears to be a bug in the bonding PMD. It can be fixed by removing this statement in the doc. One of the design principle of DPDK is to avoid locks. > Performance measurements have been made with this patch set applied and > without the patches applied using 64 byte packets. > > With the patches applied the following drop in performance was observed: > > % drop for fwd+io:0.16% > % drop for fwd+mac: 0.39% > > This patch set has been reviewed and ack'ed, so I think it should be applied > in 16.07 I understand your point of view and I gave mine. Now we need more opinions from others.
[dpdk-dev] Performance hit - NICs on different CPU sockets
On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith wrote: > > On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" dpdk.org on behalf of keith.wiles at intel.com> wrote: > >> >>On 6/16/16, 11:20 AM, "Take Ceara" wrote: >> >>>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith >>>wrote: >>> Right now I do not know what the issue is with the system. Could be too many Rx/Tx ring pairs per port and limiting the memory in the NICs, which is why you get better performance when you have 8 core per port. I am not really seeing the whole picture and how DPDK is configured to help more. Sorry. >>> >>>I doubt that there is a limitation wrt running 16 cores per port vs 8 >>>cores per port as I've tried with two different machines connected >>>back to back each with one X710 port and 16 cores on each of them >>>running on that port. In that case our performance doubled as >>>expected. >>> Maybe seeing the DPDK command line would help. >>> >>>The command line I use with ports 01:00.3 and 81:00.3 is: >>>./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- >>>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 >>> >>>Our own qmap args allow the user to control exactly how cores are >>>split between ports. In this case we end up with: >>> >>>warp17> show port map >>>Port 0[socket: 0]: >>> Core 4[socket:0] (Tx: 0, Rx: 0) >>> Core 5[socket:0] (Tx: 1, Rx: 1) >>> Core 6[socket:0] (Tx: 2, Rx: 2) >>> Core 7[socket:0] (Tx: 3, Rx: 3) >>> Core 8[socket:0] (Tx: 4, Rx: 4) >>> Core 9[socket:0] (Tx: 5, Rx: 5) >>> Core 20[socket:0] (Tx: 6, Rx: 6) >>> Core 21[socket:0] (Tx: 7, Rx: 7) >>> Core 22[socket:0] (Tx: 8, Rx: 8) >>> Core 23[socket:0] (Tx: 9, Rx: 9) >>> Core 24[socket:0] (Tx: 10, Rx: 10) >>> Core 25[socket:0] (Tx: 11, Rx: 11) >>> Core 26[socket:0] (Tx: 12, Rx: 12) >>> Core 27[socket:0] (Tx: 13, Rx: 13) >>> Core 28[socket:0] (Tx: 14, Rx: 14) >>> Core 29[socket:0] (Tx: 15, Rx: 15) >>> >>>Port 1[socket: 1]: >>> Core 10[socket:1] (Tx: 0, Rx: 0) >>> Core 11[socket:1] (Tx: 1, Rx: 1) >>> Core 12[socket:1] (Tx: 2, Rx: 2) >>> Core 13[socket:1] (Tx: 3, Rx: 3) >>> Core 14[socket:1] (Tx: 4, Rx: 4) >>> Core 15[socket:1] (Tx: 5, Rx: 5) >>> Core 16[socket:1] (Tx: 6, Rx: 6) >>> Core 17[socket:1] (Tx: 7, Rx: 7) >>> Core 18[socket:1] (Tx: 8, Rx: 8) >>> Core 19[socket:1] (Tx: 9, Rx: 9) >>> Core 30[socket:1] (Tx: 10, Rx: 10) >>> Core 31[socket:1] (Tx: 11, Rx: 11) >>> Core 32[socket:1] (Tx: 12, Rx: 12) >>> Core 33[socket:1] (Tx: 13, Rx: 13) >>> Core 34[socket:1] (Tx: 14, Rx: 14) >>> Core 35[socket:1] (Tx: 15, Rx: 15) >> >>On each socket you have 10 physical cores or 20 lcores per socket for 40 >>lcores total. >> >>The above is listing the LCORES (or hyper-threads) and not COREs, which I >>understand some like to think they are interchangeable. The problem is the >>hyper-threads are logically interchangeable, but not performance wise. If you >>have two run-to-completion threads on a single physical core each on a >>different hyper-thread of that core [0,1], then the second lcore or thread >>(1) on that physical core will only get at most about 30-20% of the CPU >>cycles. Normally it is much less, unless you tune the code to make sure each >>thread is not trying to share the internal execution units, but some internal >>execution units are always shared. >> >>To get the best performance when hyper-threading is enable is to not run both >>threads on a single physical core, but only run one hyper-thread-0. >> >>In the table below the table lists the physical core id and each of the lcore >>ids per socket. Use the first lcore per socket for the best performance: >>Core 1 [1, 21][11, 31] >>Use lcore 1 or 11 depending on the socket you are on. >> >>The info below is most likely the best performance and utilization of your >>system. If I got the values right ? >> >>./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- >>--qmap 0.0x0003FE --qmap 1.0x0FFE00 >> >>Port 0[socket: 0]: >> Core 2[socket:0] (Tx: 0, Rx: 0) >> Core 3[socket:0] (Tx: 1, Rx: 1) >> Core 4[socket:0] (Tx: 2, Rx: 2) >> Core 5[socket:0] (Tx: 3, Rx: 3) >> Core 6[socket:0] (Tx: 4, Rx: 4) >> Core 7[socket:0] (Tx: 5, Rx: 5) >> Core 8[socket:0] (Tx: 6, Rx: 6) >> Core 9[socket:0] (Tx: 7, Rx: 7) >> >>8 cores on first socket leaving 0-1 lcores for Linux. > > 9 cores and leaving the first core or two lcores for Linux >> >>Port 1[socket: 1]: >> Core 10[socket:1] (Tx: 0, Rx: 0) >> Core 11[socket:1] (Tx: 1, Rx: 1) >> Core 12[socket:1] (Tx: 2, Rx: 2) >> Core 13[socket:1] (Tx: 3, Rx: 3) >> Core 14[socket:1] (Tx: 4, Rx: 4) >> Core 15[socket:1] (Tx: 5, Rx: 5) >> Core 16[socket:1] (Tx: 6, Rx: 6) >> Core 17[socket:1] (Tx: 7, Rx: 7) >> Core 18[socket:1] (Tx: 8, Rx: 8) >> Core 19[socket:1] (Tx: 9, Rx: 9) >> >>All 10 cores on the second socket. The values were almost right :) But that's because we reserve the first two lcores
[dpdk-dev] Performance hit - NICs on different CPU sockets
On 6/16/16, 3:16 PM, "dev on behalf of Wiles, Keith" wrote: > >On 6/16/16, 3:00 PM, "Take Ceara" wrote: > >>On Thu, Jun 16, 2016 at 9:33 PM, Wiles, Keith >>wrote: >>> On 6/16/16, 1:20 PM, "Take Ceara" wrote: >>> On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith wrote: > > On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" dpdk.org on behalf of keith.wiles at intel.com> wrote: > >> >>On 6/16/16, 11:20 AM, "Take Ceara" wrote: >> >>>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith >>>wrote: >>> Right now I do not know what the issue is with the system. Could be too many Rx/Tx ring pairs per port and limiting the memory in the NICs, which is why you get better performance when you have 8 core per port. I am not really seeing the whole picture and how DPDK is configured to help more. Sorry. >>> >>>I doubt that there is a limitation wrt running 16 cores per port vs 8 >>>cores per port as I've tried with two different machines connected >>>back to back each with one X710 port and 16 cores on each of them >>>running on that port. In that case our performance doubled as >>>expected. >>> Maybe seeing the DPDK command line would help. >>> >>>The command line I use with ports 01:00.3 and 81:00.3 is: >>>./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- >>>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 >>> >>>Our own qmap args allow the user to control exactly how cores are >>>split between ports. In this case we end up with: >>> >>>warp17> show port map >>>Port 0[socket: 0]: >>> Core 4[socket:0] (Tx: 0, Rx: 0) >>> Core 5[socket:0] (Tx: 1, Rx: 1) >>> Core 6[socket:0] (Tx: 2, Rx: 2) >>> Core 7[socket:0] (Tx: 3, Rx: 3) >>> Core 8[socket:0] (Tx: 4, Rx: 4) >>> Core 9[socket:0] (Tx: 5, Rx: 5) >>> Core 20[socket:0] (Tx: 6, Rx: 6) >>> Core 21[socket:0] (Tx: 7, Rx: 7) >>> Core 22[socket:0] (Tx: 8, Rx: 8) >>> Core 23[socket:0] (Tx: 9, Rx: 9) >>> Core 24[socket:0] (Tx: 10, Rx: 10) >>> Core 25[socket:0] (Tx: 11, Rx: 11) >>> Core 26[socket:0] (Tx: 12, Rx: 12) >>> Core 27[socket:0] (Tx: 13, Rx: 13) >>> Core 28[socket:0] (Tx: 14, Rx: 14) >>> Core 29[socket:0] (Tx: 15, Rx: 15) >>> >>>Port 1[socket: 1]: >>> Core 10[socket:1] (Tx: 0, Rx: 0) >>> Core 11[socket:1] (Tx: 1, Rx: 1) >>> Core 12[socket:1] (Tx: 2, Rx: 2) >>> Core 13[socket:1] (Tx: 3, Rx: 3) >>> Core 14[socket:1] (Tx: 4, Rx: 4) >>> Core 15[socket:1] (Tx: 5, Rx: 5) >>> Core 16[socket:1] (Tx: 6, Rx: 6) >>> Core 17[socket:1] (Tx: 7, Rx: 7) >>> Core 18[socket:1] (Tx: 8, Rx: 8) >>> Core 19[socket:1] (Tx: 9, Rx: 9) >>> Core 30[socket:1] (Tx: 10, Rx: 10) >>> Core 31[socket:1] (Tx: 11, Rx: 11) >>> Core 32[socket:1] (Tx: 12, Rx: 12) >>> Core 33[socket:1] (Tx: 13, Rx: 13) >>> Core 34[socket:1] (Tx: 14, Rx: 14) >>> Core 35[socket:1] (Tx: 15, Rx: 15) >> >>On each socket you have 10 physical cores or 20 lcores per socket for 40 >>lcores total. >> >>The above is listing the LCORES (or hyper-threads) and not COREs, which I >>understand some like to think they are interchangeable. The problem is >>the hyper-threads are logically interchangeable, but not performance >>wise. If you have two run-to-completion threads on a single physical core >>each on a different hyper-thread of that core [0,1], then the second >>lcore or thread (1) on that physical core will only get at most about >>30-20% of the CPU cycles. Normally it is much less, unless you tune the >>code to make sure each thread is not trying to share the internal >>execution units, but some internal execution units are always shared. >> >>To get the best performance when hyper-threading is enable is to not run >>both threads on a single physical core, but only run one hyper-thread-0. >> >>In the table below the table lists the physical core id and each of the >>lcore ids per socket. Use the first lcore per socket for the best >>performance: >>Core 1 [1, 21][11, 31] >>Use lcore 1 or 11 depending on the socket you are on. >> >>The info below is most likely the best performance and utilization of >>your system. If I got the values right ? >> >>./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- >>--qmap 0.0x0003FE --qmap 1.0x0FFE00 >> >>Port 0[socket: 0]: >> Core 2[socket:0] (Tx: 0, Rx: 0) >> Core 3[socket:0] (Tx: 1, Rx: 1) >> Core 4[socket:0] (Tx: 2, Rx: 2) >> Core 5[socket:0] (Tx: 3, Rx: 3) >> Core 6[socket:0] (Tx: 4, Rx: 4) >> Core 7[socket:0] (Tx: 5, Rx: 5) >> Core 8[socket:0] (Tx: 6, Rx: 6) >> Core 9[socket:0] (Tx:
[dpdk-dev] Performance hit - NICs on different CPU sockets
On 6/16/16, 3:00 PM, "Take Ceara" wrote: >On Thu, Jun 16, 2016 at 9:33 PM, Wiles, Keith wrote: >> On 6/16/16, 1:20 PM, "Take Ceara" wrote: >> >>>On Thu, Jun 16, 2016 at 6:59 PM, Wiles, Keith >>>wrote: On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" >>> dpdk.org on behalf of keith.wiles at intel.com> wrote: > >On 6/16/16, 11:20 AM, "Take Ceara" wrote: > >>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith >>wrote: >> >>> >>> Right now I do not know what the issue is with the system. Could be too >>> many Rx/Tx ring pairs per port and limiting the memory in the NICs, >>> which is why you get better performance when you have 8 core per port. >>> I am not really seeing the whole picture and how DPDK is configured to >>> help more. Sorry. >> >>I doubt that there is a limitation wrt running 16 cores per port vs 8 >>cores per port as I've tried with two different machines connected >>back to back each with one X710 port and 16 cores on each of them >>running on that port. In that case our performance doubled as >>expected. >> >>> >>> Maybe seeing the DPDK command line would help. >> >>The command line I use with ports 01:00.3 and 81:00.3 is: >>./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- >>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 >> >>Our own qmap args allow the user to control exactly how cores are >>split between ports. In this case we end up with: >> >>warp17> show port map >>Port 0[socket: 0]: >> Core 4[socket:0] (Tx: 0, Rx: 0) >> Core 5[socket:0] (Tx: 1, Rx: 1) >> Core 6[socket:0] (Tx: 2, Rx: 2) >> Core 7[socket:0] (Tx: 3, Rx: 3) >> Core 8[socket:0] (Tx: 4, Rx: 4) >> Core 9[socket:0] (Tx: 5, Rx: 5) >> Core 20[socket:0] (Tx: 6, Rx: 6) >> Core 21[socket:0] (Tx: 7, Rx: 7) >> Core 22[socket:0] (Tx: 8, Rx: 8) >> Core 23[socket:0] (Tx: 9, Rx: 9) >> Core 24[socket:0] (Tx: 10, Rx: 10) >> Core 25[socket:0] (Tx: 11, Rx: 11) >> Core 26[socket:0] (Tx: 12, Rx: 12) >> Core 27[socket:0] (Tx: 13, Rx: 13) >> Core 28[socket:0] (Tx: 14, Rx: 14) >> Core 29[socket:0] (Tx: 15, Rx: 15) >> >>Port 1[socket: 1]: >> Core 10[socket:1] (Tx: 0, Rx: 0) >> Core 11[socket:1] (Tx: 1, Rx: 1) >> Core 12[socket:1] (Tx: 2, Rx: 2) >> Core 13[socket:1] (Tx: 3, Rx: 3) >> Core 14[socket:1] (Tx: 4, Rx: 4) >> Core 15[socket:1] (Tx: 5, Rx: 5) >> Core 16[socket:1] (Tx: 6, Rx: 6) >> Core 17[socket:1] (Tx: 7, Rx: 7) >> Core 18[socket:1] (Tx: 8, Rx: 8) >> Core 19[socket:1] (Tx: 9, Rx: 9) >> Core 30[socket:1] (Tx: 10, Rx: 10) >> Core 31[socket:1] (Tx: 11, Rx: 11) >> Core 32[socket:1] (Tx: 12, Rx: 12) >> Core 33[socket:1] (Tx: 13, Rx: 13) >> Core 34[socket:1] (Tx: 14, Rx: 14) >> Core 35[socket:1] (Tx: 15, Rx: 15) > >On each socket you have 10 physical cores or 20 lcores per socket for 40 >lcores total. > >The above is listing the LCORES (or hyper-threads) and not COREs, which I >understand some like to think they are interchangeable. The problem is the >hyper-threads are logically interchangeable, but not performance wise. If >you have two run-to-completion threads on a single physical core each on a >different hyper-thread of that core [0,1], then the second lcore or thread >(1) on that physical core will only get at most about 30-20% of the CPU >cycles. Normally it is much less, unless you tune the code to make sure >each thread is not trying to share the internal execution units, but some >internal execution units are always shared. > >To get the best performance when hyper-threading is enable is to not run >both threads on a single physical core, but only run one hyper-thread-0. > >In the table below the table lists the physical core id and each of the >lcore ids per socket. Use the first lcore per socket for the best >performance: >Core 1 [1, 21][11, 31] >Use lcore 1 or 11 depending on the socket you are on. > >The info below is most likely the best performance and utilization of your >system. If I got the values right ? > >./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- >--qmap 0.0x0003FE --qmap 1.0x0FFE00 > >Port 0[socket: 0]: > Core 2[socket:0] (Tx: 0, Rx: 0) > Core 3[socket:0] (Tx: 1, Rx: 1) > Core 4[socket:0] (Tx: 2, Rx: 2) > Core 5[socket:0] (Tx: 3, Rx: 3) > Core 6[socket:0] (Tx: 4, Rx: 4) > Core 7[socket:0] (Tx: 5, Rx: 5) > Core 8[socket:0] (Tx: 6, Rx: 6) > Core 9[socket:0] (Tx: 7, Rx: 7) > >8 cores on first socket leaving 0-1 lcores for Linux. 9 cores and leaving the first core or two lcores for Linux > >Port 1[socket: 1]: >
[dpdk-dev] [PATCH] ena: Update PMD to cooperate with latest ENA firmware
This patch includes: * Update of ENA communication layer * Fixed memory management issue After allocating memzone it's required to zeroize it as well as freeing memzone with dedicated function. * Added debug area and host information * Disabling readless communication regarding to HW revision * Allocating coherent memory in node-aware way Signed-off-by: Alexander Matushevsky Signed-off-by: Jakub Palider Signed-off-by: Jan Medala --- drivers/net/ena/base/ena_com.c | 254 +++--- drivers/net/ena/base/ena_com.h | 82 +++-- drivers/net/ena/base/ena_defs/ena_admin_defs.h | 110 +- drivers/net/ena/base/ena_defs/ena_eth_io_defs.h | 436 ++-- drivers/net/ena/base/ena_defs/ena_gen_info.h| 4 +- drivers/net/ena/base/ena_eth_com.c | 42 +-- drivers/net/ena/base/ena_eth_com.h | 14 + drivers/net/ena/base/ena_plat_dpdk.h| 42 ++- drivers/net/ena/ena_ethdev.c| 268 ++- drivers/net/ena/ena_ethdev.h| 40 +++ 10 files changed, 674 insertions(+), 618 deletions(-) diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c index a21a951..4431346 100644 --- a/drivers/net/ena/base/ena_com.c +++ b/drivers/net/ena/base/ena_com.c @@ -42,9 +42,6 @@ #define ENA_ASYNC_QUEUE_DEPTH 4 #define ENA_ADMIN_QUEUE_DEPTH 32 -#define ENA_EXTENDED_STAT_GET_FUNCT(_funct_queue) (_funct_queue & 0x) -#define ENA_EXTENDED_STAT_GET_QUEUE(_funct_queue) (_funct_queue >> 16) - #define MIN_ENA_VER (((ENA_COMMON_SPEC_VERSION_MAJOR) << \ ENA_REGS_VERSION_MAJOR_VERSION_SHIFT) \ | (ENA_COMMON_SPEC_VERSION_MINOR)) @@ -201,12 +198,16 @@ static inline void comp_ctxt_release(struct ena_com_admin_queue *queue, static struct ena_comp_ctx *get_comp_ctxt(struct ena_com_admin_queue *queue, u16 command_id, bool capture) { - ENA_ASSERT(command_id < queue->q_depth, - "command id is larger than the queue size. cmd_id: %u queue size %d\n", - command_id, queue->q_depth); + if (unlikely(command_id >= queue->q_depth)) { + ena_trc_err("command id is larger than the queue size. cmd_id: %u queue size %d\n", + command_id, queue->q_depth); + return NULL; + } - ENA_ASSERT(!(queue->comp_ctx[command_id].occupied && capture), - "Completion context is occupied"); + if (unlikely(queue->comp_ctx[command_id].occupied && capture)) { + ena_trc_err("Completion context is occupied\n"); + return NULL; + } if (capture) { ATOMIC32_INC(>outstanding_cmds); @@ -290,7 +291,8 @@ static inline int ena_com_init_comp_ctxt(struct ena_com_admin_queue *queue) for (i = 0; i < queue->q_depth; i++) { comp_ctx = get_comp_ctxt(queue, i, false); - ENA_WAIT_EVENT_INIT(comp_ctx->wait_event); + if (comp_ctx) + ENA_WAIT_EVENT_INIT(comp_ctx->wait_event); } return 0; @@ -315,15 +317,21 @@ ena_com_submit_admin_cmd(struct ena_com_admin_queue *admin_queue, cmd_size_in_bytes, comp, comp_size_in_bytes); + if (unlikely(IS_ERR(comp_ctx))) + admin_queue->running_state = false; ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); return comp_ctx; } static int ena_com_init_io_sq(struct ena_com_dev *ena_dev, + struct ena_com_create_io_ctx *ctx, struct ena_com_io_sq *io_sq) { size_t size; + int dev_node; + + ENA_TOUCH(ctx); memset(_sq->desc_addr, 0x0, sizeof(struct ena_com_io_desc_addr)); @@ -334,15 +342,29 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev, size = io_sq->desc_entry_size * io_sq->q_depth; - if (io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_HOST) - ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev, - size, - io_sq->desc_addr.virt_addr, - io_sq->desc_addr.phys_addr, - io_sq->desc_addr.mem_handle); - else - io_sq->desc_addr.virt_addr = - ENA_MEM_ALLOC(ena_dev->dmadev, size); + if (io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_HOST) { + ENA_MEM_ALLOC_COHERENT_NODE(ena_dev->dmadev, + size, + io_sq->desc_addr.virt_addr, + io_sq->desc_addr.phys_addr, + ctx->numa_node, +
[dpdk-dev] [PATCH v3 17/17] ethdev: get rid of device type
From: David MarchandNow that hotplug has been moved to eal, there is no reason to keep the device type in this layer. Signed-off-by: David Marchand --- app/test/virtual_pmd.c| 2 +- drivers/net/af_packet/rte_eth_af_packet.c | 2 +- drivers/net/bonding/rte_eth_bond_api.c| 2 +- drivers/net/cxgbe/cxgbe_main.c| 2 +- drivers/net/mlx4/mlx4.c | 2 +- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/mpipe/mpipe_tilegx.c | 2 +- drivers/net/null/rte_eth_null.c | 2 +- drivers/net/pcap/rte_eth_pcap.c | 2 +- drivers/net/ring/rte_eth_ring.c | 2 +- drivers/net/vhost/rte_eth_vhost.c | 2 +- drivers/net/xenvirt/rte_eth_xenvirt.c | 2 +- examples/ip_pipeline/init.c | 22 -- lib/librte_ether/rte_ethdev.c | 5 ++--- lib/librte_ether/rte_ethdev.h | 15 +-- 15 files changed, 15 insertions(+), 51 deletions(-) diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c index b4bd2f2..8a1f0d0 100644 --- a/app/test/virtual_pmd.c +++ b/app/test/virtual_pmd.c @@ -581,7 +581,7 @@ virtual_ethdev_create(const char *name, struct ether_addr *mac_addr, goto err; /* reserve an ethdev entry */ - eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI); + eth_dev = rte_eth_dev_allocate(name); if (eth_dev == NULL) goto err; diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index f17bd7e..36ac102 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -648,7 +648,7 @@ rte_pmd_init_internals(const char *name, } /* reserve an ethdev entry */ - *eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); + *eth_dev = rte_eth_dev_allocate(name); if (*eth_dev == NULL) goto error; diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index 53df9fe..b858ee1 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -189,7 +189,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t socket_id) } /* reserve an ethdev entry */ - eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); + eth_dev = rte_eth_dev_allocate(name); if (eth_dev == NULL) { RTE_BOND_LOG(ERR, "Unable to allocate rte_eth_dev"); goto err; diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c index ceaf5ab..922155b 100644 --- a/drivers/net/cxgbe/cxgbe_main.c +++ b/drivers/net/cxgbe/cxgbe_main.c @@ -1150,7 +1150,7 @@ int cxgbe_probe(struct adapter *adapter) */ /* reserve an ethdev entry */ - pi->eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI); + pi->eth_dev = rte_eth_dev_allocate(name); if (!pi->eth_dev) goto out_free; diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index b594433..ba42c33 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -5715,7 +5715,7 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) snprintf(name, sizeof(name), "%s port %u", ibv_get_device_name(ibv_dev), port); - eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI); + eth_dev = rte_eth_dev_allocate(name); } if (eth_dev == NULL) { ERROR("can not allocate rte ethdev"); diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 1989a37..f6399fc 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -519,7 +519,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) snprintf(name, sizeof(name), "%s port %u", ibv_get_device_name(ibv_dev), port); - eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI); + eth_dev = rte_eth_dev_allocate(name); } if (eth_dev == NULL) { ERROR("can not allocate rte ethdev"); diff --git a/drivers/net/mpipe/mpipe_tilegx.c b/drivers/net/mpipe/mpipe_tilegx.c index 26e1424..9de556e 100644 --- a/drivers/net/mpipe/mpipe_tilegx.c +++ b/drivers/net/mpipe/mpipe_tilegx.c @@ -1587,7 +1587,7 @@ rte_pmd_mpipe_devinit(const char *ifname, return -ENODEV; } - eth_dev = rte_eth_dev_allocate(ifname, RTE_ETH_DEV_VIRTUAL); + eth_dev = rte_eth_dev_allocate(ifname); if (!eth_dev) { RTE_LOG(ERR, PMD, "%s: Failed to allocate device.\n", ifname); rte_free(priv); diff --git
[dpdk-dev] [PATCH v3 16/17] ethdev: convert to eal hotplug
From: David MarchandRemove bus logic from ethdev hotplug by using eal for this. Current api is preserved: - the last port that has been created is tracked to return it to the application when attaching, - the internal device name is reused when detaching. We can not get rid of ethdev hotplug yet since we still need some mechanism to inform applications of port creation/removal to substitute for ethdev hotplug api. dev_type field in struct rte_eth_dev and rte_eth_dev_allocate are kept as is, but this information is not needed anymore and is removed in the following commit. Signed-off-by: David Marchand --- lib/librte_ether/rte_ethdev.c | 251 ++ 1 file changed, 33 insertions(+), 218 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index a496521..12d24ff 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -72,6 +72,7 @@ static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data"; struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS]; static struct rte_eth_dev_data *rte_eth_dev_data; +static uint8_t eth_dev_last_created_port; static uint8_t nb_ports; /* spinlock for eth device callbacks */ @@ -210,6 +211,7 @@ rte_eth_dev_allocate(const char *name, enum rte_eth_dev_type type) eth_dev->data->port_id = port_id; eth_dev->attached = DEV_ATTACHED; eth_dev->dev_type = type; + eth_dev_last_created_port = port_id; nb_ports++; return eth_dev; } @@ -341,99 +343,6 @@ rte_eth_dev_count(void) return nb_ports; } -static enum rte_eth_dev_type -rte_eth_dev_get_device_type(uint8_t port_id) -{ - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, RTE_ETH_DEV_UNKNOWN); - return rte_eth_devices[port_id].dev_type; -} - -static int -rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr) -{ - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); - - if (addr == NULL) { - RTE_PMD_DEBUG_TRACE("Null pointer is specified\n"); - return -EINVAL; - } - - *addr = rte_eth_devices[port_id].pci_dev->addr; - return 0; -} - -static int -rte_eth_dev_get_name_by_port(uint8_t port_id, char *name) -{ - char *tmp; - - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); - - if (name == NULL) { - RTE_PMD_DEBUG_TRACE("Null pointer is specified\n"); - return -EINVAL; - } - - /* shouldn't check 'rte_eth_devices[i].data', -* because it might be overwritten by VDEV PMD */ - tmp = rte_eth_dev_data[port_id].name; - strcpy(name, tmp); - return 0; -} - -static int -rte_eth_dev_get_port_by_name(const char *name, uint8_t *port_id) -{ - int i; - - if (name == NULL) { - RTE_PMD_DEBUG_TRACE("Null pointer is specified\n"); - return -EINVAL; - } - - *port_id = RTE_MAX_ETHPORTS; - - for (i = 0; i < RTE_MAX_ETHPORTS; i++) { - - if (!strncmp(name, - rte_eth_dev_data[i].name, strlen(name))) { - - *port_id = i; - - return 0; - } - } - return -ENODEV; -} - -static int -rte_eth_dev_get_port_by_addr(const struct rte_pci_addr *addr, uint8_t *port_id) -{ - int i; - struct rte_pci_device *pci_dev = NULL; - - if (addr == NULL) { - RTE_PMD_DEBUG_TRACE("Null pointer is specified\n"); - return -EINVAL; - } - - *port_id = RTE_MAX_ETHPORTS; - - for (i = 0; i < RTE_MAX_ETHPORTS; i++) { - - pci_dev = rte_eth_devices[i].pci_dev; - - if (pci_dev && - !rte_eal_compare_pci_addr(_dev->addr, addr)) { - - *port_id = i; - - return 0; - } - } - return -ENODEV; -} - static int rte_eth_dev_is_detachable(uint8_t port_id) { @@ -459,124 +368,45 @@ rte_eth_dev_is_detachable(uint8_t port_id) return 1; } -/* attach the new physical device, then store port_id of the device */ -static int -rte_eth_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id) -{ - /* Invoke probe func of the driver can handle the new device. */ - if (rte_eal_pci_probe_one(addr)) - goto err; - - if (rte_eth_dev_get_port_by_addr(addr, port_id)) - goto err; - - return 0; -err: - return -1; -} - -/* detach the new physical device, then store pci_addr of the device */ -static int -rte_eth_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr) -{ - struct rte_pci_addr freed_addr; - struct rte_pci_addr vp; - - /* get pci address by port id */ - if (rte_eth_dev_get_addr_by_port(port_id, _addr)) - goto err; - - /* Zeroed pci addr means the port comes from virtual device */ - vp.domain = vp.bus =
[dpdk-dev] [PATCH v3 15/17] eal: add hotplug operations for pci and vdev
From: David Marchandhotplug which deals with resources should come from the layer that already handles them, i.e. eal. For both attach and detach operations, 'name' is used to select the bus that will handle the request. Signed-off-by: David Marchand --- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 ++ lib/librte_eal/common/eal_common_dev.c | 39 + lib/librte_eal/common/include/rte_dev.h | 25 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++ 4 files changed, 68 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index f8c3dea..e776768 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -156,5 +156,7 @@ DPDK_16.07 { global: pci_get_sysfs_path; + rte_eal_dev_attach; + rte_eal_dev_detach; } DPDK_16.04; diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index a8a4146..59ed3a0 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -150,3 +150,42 @@ rte_eal_vdev_uninit(const char *name) RTE_LOG(ERR, EAL, "no driver found for %s\n", name); return -EINVAL; } + +int rte_eal_dev_attach(const char *name, const char *devargs) +{ + struct rte_pci_addr addr; + int ret = -1; + + if (eal_parse_pci_DomBDF(name, ) == 0) { + if (rte_eal_pci_probe_one() < 0) + goto err; + + } else { + if (rte_eal_vdev_init(name, devargs)) + goto err; + } + + return 0; + +err: + RTE_LOG(ERR, EAL, "Driver, cannot attach the device\n"); + return ret; +} + +int rte_eal_dev_detach(const char *name) +{ + struct rte_pci_addr addr; + + if (eal_parse_pci_DomBDF(name, ) == 0) { + if (rte_eal_pci_detach() < 0) + goto err; + } else { + if (rte_eal_vdev_uninit(name)) + goto err; + } + return 0; + +err: + RTE_LOG(ERR, EAL, "Driver, cannot detach the device\n"); + return -1; +} diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 85e48f2..b1c0520 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -178,6 +178,31 @@ int rte_eal_vdev_init(const char *name, const char *args); */ int rte_eal_vdev_uninit(const char *name); +/** + * Attach a resource to a registered driver. + * + * @param name + * The resource name, that refers to a pci resource or some private + * way of designating a resource for vdev drivers. Based on this + * resource name, eal will identify a driver capable of handling + * this resource and pass this resource to the driver probing + * function. + * @param devargs + * Device arguments to be passed to the driver. + * @return + * 0 on success, negative on error. + */ +int rte_eal_dev_attach(const char *name, const char *devargs); + +/** + * Detach a resource from its driver. + * + * @param name + * Same description as for rte_eal_dev_attach(). + * Here, eal will call the driver detaching function. + */ +int rte_eal_dev_detach(const char *name); + #define PMD_REGISTER_DRIVER(d)\ RTE_INIT(devinitfn_ ##d);\ static void devinitfn_ ##d(void)\ diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map index 3d0ff93..50b774b 100644 --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map @@ -159,5 +159,7 @@ DPDK_16.07 { global: pci_get_sysfs_path; + rte_eal_dev_attach; + rte_eal_dev_detach; } DPDK_16.04; -- 2.7.4
[dpdk-dev] [PATCH v3 14/17] ethdev: do not scan all pci devices on attach
From: David MarchandNo need to scan all devices, we only need to update the device being attached. Signed-off-by: David Marchand --- lib/librte_eal/common/eal_common_pci.c | 11 --- lib/librte_ether/rte_ethdev.c | 3 --- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index dfd0a8c..d05dda4 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -339,6 +339,11 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr) if (addr == NULL) return -1; + /* update current pci device in global list, kernel bindings might have +* changed since last time we looked at it */ + if (pci_update_device(addr) < 0) + goto err_return; + TAILQ_FOREACH(dev, _device_list, next) { if (rte_eal_compare_pci_addr(>addr, addr)) continue; @@ -351,9 +356,9 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr) return -1; err_return: - RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT - " cannot be used\n", dev->addr.domain, dev->addr.bus, - dev->addr.devid, dev->addr.function); + RTE_LOG(WARNING, EAL, + "Requested device " PCI_PRI_FMT " cannot be used\n", + addr->domain, addr->bus, addr->devid, addr->function); return -1; } diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 5bcf610..a496521 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -463,9 +463,6 @@ rte_eth_dev_is_detachable(uint8_t port_id) static int rte_eth_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id) { - /* re-construct pci_device_list */ - if (rte_eal_pci_scan()) - goto err; /* Invoke probe func of the driver can handle the new device. */ if (rte_eal_pci_probe_one(addr)) goto err; -- 2.7.4
[dpdk-dev] [PATCH v3 12/17] pci: add a helper for device name
From: David Marchandeal is a better place than crypto / ethdev for naming resources. Add a helper in eal and make use of it in crypto / ethdev. Signed-off-by: David Marchand --- lib/librte_cryptodev/rte_cryptodev.c| 27 --- lib/librte_eal/common/include/rte_pci.h | 25 + lib/librte_ether/rte_ethdev.c | 24 3 files changed, 33 insertions(+), 43 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index a7cb33a..3b587e4 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -276,23 +276,6 @@ rte_cryptodev_pmd_allocate(const char *name, int socket_id) return cryptodev; } -static inline int -rte_cryptodev_create_unique_device_name(char *name, size_t size, - struct rte_pci_device *pci_dev) -{ - int ret; - - if ((name == NULL) || (pci_dev == NULL)) - return -EINVAL; - - ret = snprintf(name, size, "%d:%d.%d", - pci_dev->addr.bus, pci_dev->addr.devid, - pci_dev->addr.function); - if (ret < 0) - return ret; - return 0; -} - int rte_cryptodev_pmd_release_device(struct rte_cryptodev *cryptodev) { @@ -355,9 +338,8 @@ rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv, if (cryptodrv == NULL) return -ENODEV; - /* Create unique Crypto device name using PCI address */ - rte_cryptodev_create_unique_device_name(cryptodev_name, - sizeof(cryptodev_name), pci_dev); + rte_eal_pci_device_name(_dev->addr, cryptodev_name, + sizeof(cryptodev_name)); cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, rte_socket_id()); if (cryptodev == NULL) @@ -412,9 +394,8 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev) if (pci_dev == NULL) return -EINVAL; - /* Create unique device name using PCI address */ - rte_cryptodev_create_unique_device_name(cryptodev_name, - sizeof(cryptodev_name), pci_dev); + rte_eal_pci_device_name(_dev->addr, cryptodev_name, + sizeof(cryptodev_name)); cryptodev = rte_cryptodev_pmd_get_named_dev(cryptodev_name); if (cryptodev == NULL) diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index d7df1d9..5e8bd89 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -82,6 +82,7 @@ extern "C" { #include #include +#include #include TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */ @@ -95,6 +96,7 @@ const char *pci_get_sysfs_path(void); /** Formatting string for PCI device identifier: Ex: :00:01.0 */ #define PCI_PRI_FMT "%.4" PRIx16 ":%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8 +#define PCI_PRI_STR_SIZE sizeof(":XX:XX.X") /** Short formatting string, without domain, for PCI device: Ex: 00:01.0 */ #define PCI_SHORT_PRI_FMT "%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8 @@ -308,6 +310,29 @@ eal_parse_pci_DomBDF(const char *input, struct rte_pci_addr *dev_addr) } #undef GET_PCIADDR_FIELD +/** + * Utility function to write a pci device name, this device name can later be + * used to retrieve the corresponding rte_pci_addr using above functions. + * + * @param addr + * The PCI Bus-Device-Function address + * @param output + * The output buffer string + * @param size + * The output buffer size + * @return + * 0 on success, negative on error. + */ +static inline void +rte_eal_pci_device_name(const struct rte_pci_addr *addr, + char *output, size_t size) +{ + RTE_VERIFY(size >= PCI_PRI_STR_SIZE); + RTE_VERIFY(snprintf(output, size, PCI_PRI_FMT, + addr->domain, addr->bus, + addr->devid, addr->function) >= 0); +} + /* Compare two PCI device addresses. */ /** * Utility function to compare two PCI device addresses. diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 7258062..5bcf610 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -214,20 +214,6 @@ rte_eth_dev_allocate(const char *name, enum rte_eth_dev_type type) return eth_dev; } -static int -rte_eth_dev_create_unique_device_name(char *name, size_t size, - struct rte_pci_device *pci_dev) -{ - int ret; - - ret = snprintf(name, size, "%d:%d.%d", - pci_dev->addr.bus, pci_dev->addr.devid, - pci_dev->addr.function); - if (ret < 0) - return ret; - return 0; -} - int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev) { @@ -251,9 +237,8 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv, eth_drv = (struct eth_driver *)pci_drv; -
[dpdk-dev] [PATCH v3 11/17] eal/linux: move back interrupt thread init before setting affinity
From: David MarchandNow that virtio pci driver is initialized in a constructor, iopl() stuff happens early enough so that interrupt thread can be created right after plugin loading. This way, chelsio driver should be happy again [1]. [1] http://dpdk.org/ml/archives/dev/2015-November/028289.html Signed-off-by: David Marchand Tested-by: Rahul Lakkireddy --- lib/librte_eal/linuxapp/eal/eal.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 5ec3d4e..6eca741 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -821,6 +821,9 @@ rte_eal_init(int argc, char **argv) if (eal_plugins_init() < 0) rte_panic("Cannot init plugins\n"); + if (rte_eal_intr_init() < 0) + rte_panic("Cannot init interrupt-handling thread\n"); + eal_thread_init_master(rte_config.master_lcore); ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN); @@ -832,9 +835,6 @@ rte_eal_init(int argc, char **argv) if (rte_eal_dev_init() < 0) rte_panic("Cannot init pmd devices\n"); - if (rte_eal_intr_init() < 0) - rte_panic("Cannot init interrupt-handling thread\n"); - RTE_LCORE_FOREACH_SLAVE(i) { /* -- 2.7.4
[dpdk-dev] [PATCH v3 10/17] ethdev: get rid of eth driver register callback
From: David MarchandNow that all pdev are pci drivers, we don't need to register ethdev drivers through a dedicated channel. Signed-off-by: David Marchand --- lib/librte_ether/rte_ethdev.c | 22 -- lib/librte_ether/rte_ethdev.h | 12 lib/librte_ether/rte_ether_version.map | 1 - 3 files changed, 35 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index d05eada..7258062 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -334,28 +334,6 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev) return 0; } -/** - * Register an Ethernet [Poll Mode] driver. - * - * Function invoked by the initialization function of an Ethernet driver - * to simultaneously register itself as a PCI driver and as an Ethernet - * Poll Mode Driver. - * Invokes the rte_eal_pci_register() function to register the *pci_drv* - * structure embedded in the *eth_drv* structure, after having stored the - * address of the rte_eth_dev_init() function in the *devinit* field of - * the *pci_drv* structure. - * During the PCI probing phase, the rte_eth_dev_init() function is - * invoked for each PCI [Ethernet device] matching the embedded PCI - * identifiers provided by the driver. - */ -void -rte_eth_driver_register(struct eth_driver *eth_drv) -{ - eth_drv->pci_drv.devinit = rte_eth_dev_pci_probe; - eth_drv->pci_drv.devuninit = rte_eth_dev_pci_remove; - rte_eal_pci_register(_drv->pci_drv); -} - int rte_eth_dev_is_valid_port(uint8_t port_id) { diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 6deafa2..64d889e 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1842,18 +1842,6 @@ struct eth_driver { }; /** - * @internal - * A function invoked by the initialization function of an Ethernet driver - * to simultaneously register itself as a PCI driver and as an Ethernet - * Poll Mode Driver (PMD). - * - * @param eth_drv - * The pointer to the *eth_driver* structure associated with - * the Ethernet driver. - */ -void rte_eth_driver_register(struct eth_driver *eth_drv); - -/** * Convert a numerical speed in Mbps to a bitmap flag that can be used in * the bitmap link_speeds of the struct rte_eth_conf * diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map index 31017d4..d457b21 100644 --- a/lib/librte_ether/rte_ether_version.map +++ b/lib/librte_ether/rte_ether_version.map @@ -80,7 +80,6 @@ DPDK_2.2 { rte_eth_dev_vlan_filter; rte_eth_dev_wd_timeout_store; rte_eth_dma_zone_reserve; - rte_eth_driver_register; rte_eth_led_off; rte_eth_led_on; rte_eth_link; -- 2.7.4
[dpdk-dev] [PATCH v3 09/17] crypto: get rid of crypto driver register callback
From: David MarchandNow that all pdev are pci drivers, we don't need to register crypto drivers through a dedicated channel. Signed-off-by: David Marchand --- lib/librte_cryptodev/rte_cryptodev.c | 22 --- lib/librte_cryptodev/rte_cryptodev_pmd.h | 30 -- lib/librte_cryptodev/rte_cryptodev_version.map | 1 - 3 files changed, 53 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index 65a2e29..a7cb33a 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -444,28 +444,6 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev) return 0; } -int -rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *cryptodrv, - enum pmd_type type) -{ - /* Call crypto device initialization directly if device is virtual */ - if (type == PMD_VDEV) - return rte_cryptodev_pci_probe((struct rte_pci_driver *)cryptodrv, - NULL); - - /* -* Register PCI driver for physical device intialisation during -* PCI probing -*/ - cryptodrv->pci_drv.devinit = rte_cryptodev_pci_probe; - cryptodrv->pci_drv.devuninit = rte_cryptodev_pci_remove; - - rte_eal_pci_register(>pci_drv); - - return 0; -} - - uint16_t rte_cryptodev_queue_pair_count(uint8_t dev_id) { diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h index 3fb7c7c..99fd69e 100644 --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h @@ -491,36 +491,6 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, size_t dev_private_size, extern int rte_cryptodev_pmd_release_device(struct rte_cryptodev *cryptodev); - -/** - * Register a Crypto [Poll Mode] driver. - * - * Function invoked by the initialization function of a Crypto driver - * to simultaneously register itself as Crypto Poll Mode Driver and to either: - * - * a - register itself as PCI driver if the crypto device is a physical - * device, by invoking the rte_eal_pci_register() function to - * register the *pci_drv* structure embedded in the *crypto_drv* - * structure, after having stored the address of the - * rte_cryptodev_init() function in the *devinit* field of the - * *pci_drv* structure. - * - * During the PCI probing phase, the rte_cryptodev_init() - * function is invoked for each PCI [device] matching the - * embedded PCI identifiers provided by the driver. - * - * b, complete the initialization sequence if the device is a virtual - * device by calling the rte_cryptodev_init() directly passing a - * NULL parameter for the rte_pci_device structure. - * - * @param crypto_drv crypto_driver structure associated with the crypto - * driver. - * @param type pmd type - */ -extern int -rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *crypto_drv, - enum pmd_type type); - /** * Executes all the user application registered callbacks for the specific * device. diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map b/lib/librte_cryptodev/rte_cryptodev_version.map index 8d0edfb..e0a9620 100644 --- a/lib/librte_cryptodev/rte_cryptodev_version.map +++ b/lib/librte_cryptodev/rte_cryptodev_version.map @@ -14,7 +14,6 @@ DPDK_16.04 { rte_cryptodev_info_get; rte_cryptodev_pmd_allocate; rte_cryptodev_pmd_callback_process; - rte_cryptodev_pmd_driver_register; rte_cryptodev_pmd_release_device; rte_cryptodev_pmd_virtual_dev_init; rte_cryptodev_sym_session_create; -- 2.7.4
[dpdk-dev] [PATCH v3 08/17] drivers: convert all pdev drivers as pci drivers
From: David MarchandSimplify crypto and ethdev pci drivers init by using newly introduced init macros and helpers. Those drivers then don't need to register as "rte_driver"s anymore. virtio and mlx* drivers use the general purpose RTE_INIT macro, as they both need some special stuff to be done before registering a pci driver. Signed-off-by: David Marchand --- drivers/crypto/qat/rte_qat_cryptodev.c | 16 +++ drivers/net/bnx2x/bnx2x_ethdev.c| 35 +--- drivers/net/cxgbe/cxgbe_ethdev.c| 24 +++-- drivers/net/e1000/em_ethdev.c | 16 +++ drivers/net/e1000/igb_ethdev.c | 40 +--- drivers/net/ena/ena_ethdev.c| 18 +++-- drivers/net/enic/enic_ethdev.c | 23 +++- drivers/net/fm10k/fm10k_ethdev.c| 23 +++- drivers/net/i40e/i40e_ethdev.c | 26 +++--- drivers/net/i40e/i40e_ethdev_vf.c | 25 +++--- drivers/net/ixgbe/ixgbe_ethdev.c| 47 + drivers/net/mlx4/mlx4.c | 20 +++--- drivers/net/mlx5/mlx5.c | 19 +++-- drivers/net/nfp/nfp_net.c | 21 +++ drivers/net/szedata2/rte_eth_szedata2.c | 25 +++--- drivers/net/virtio/virtio_ethdev.c | 26 +- drivers/net/vmxnet3/vmxnet3_ethdev.c| 23 +++- 17 files changed, 68 insertions(+), 359 deletions(-) diff --git a/drivers/crypto/qat/rte_qat_cryptodev.c b/drivers/crypto/qat/rte_qat_cryptodev.c index 08496ab..54f0c95 100644 --- a/drivers/crypto/qat/rte_qat_cryptodev.c +++ b/drivers/crypto/qat/rte_qat_cryptodev.c @@ -120,21 +120,11 @@ static struct rte_cryptodev_driver rte_qat_pmd = { .name = "rte_qat_pmd", .id_table = pci_id_qat_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .devinit = rte_cryptodev_pci_probe, + .devuninit = rte_cryptodev_pci_remove, }, .cryptodev_init = crypto_qat_dev_init, .dev_private_size = sizeof(struct qat_pmd_private), }; -static int -rte_qat_pmd_init(const char *name __rte_unused, const char *params __rte_unused) -{ - PMD_INIT_FUNC_TRACE(); - return rte_cryptodev_pmd_driver_register(_qat_pmd, PMD_PDEV); -} - -static struct rte_driver pmd_qat_drv = { - .type = PMD_PDEV, - .init = rte_qat_pmd_init, -}; - -PMD_REGISTER_DRIVER(pmd_qat_drv); +RTE_EAL_PCI_REGISTER(qat, rte_qat_pmd.pci_drv); diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c b/drivers/net/bnx2x/bnx2x_ethdev.c index 071b44f..ba194b5 100644 --- a/drivers/net/bnx2x/bnx2x_ethdev.c +++ b/drivers/net/bnx2x/bnx2x_ethdev.c @@ -506,11 +506,15 @@ static struct eth_driver rte_bnx2x_pmd = { .name = "rte_bnx2x_pmd", .id_table = pci_id_bnx2x_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .devinit = rte_eth_dev_pci_probe, + .devuninit = rte_eth_dev_pci_remove, }, .eth_dev_init = eth_bnx2x_dev_init, .dev_private_size = sizeof(struct bnx2x_softc), }; +RTE_EAL_PCI_REGISTER(bnx2x, rte_bnx2x_pmd.pci_drv); + /* * virtual function driver struct */ @@ -519,36 +523,11 @@ static struct eth_driver rte_bnx2xvf_pmd = { .name = "rte_bnx2xvf_pmd", .id_table = pci_id_bnx2xvf_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .devinit = rte_eth_dev_pci_probe, + .devuninit = rte_eth_dev_pci_remove, }, .eth_dev_init = eth_bnx2xvf_dev_init, .dev_private_size = sizeof(struct bnx2x_softc), }; -static int rte_bnx2x_pmd_init(const char *name __rte_unused, const char *params __rte_unused) -{ - PMD_INIT_FUNC_TRACE(); - rte_eth_driver_register(_bnx2x_pmd); - - return 0; -} - -static int rte_bnx2xvf_pmd_init(const char *name __rte_unused, const char *params __rte_unused) -{ - PMD_INIT_FUNC_TRACE(); - rte_eth_driver_register(_bnx2xvf_pmd); - - return 0; -} - -static struct rte_driver rte_bnx2x_driver = { - .type = PMD_PDEV, - .init = rte_bnx2x_pmd_init, -}; - -static struct rte_driver rte_bnx2xvf_driver = { - .type = PMD_PDEV, - .init = rte_bnx2xvf_pmd_init, -}; - -PMD_REGISTER_DRIVER(rte_bnx2x_driver); -PMD_REGISTER_DRIVER(rte_bnx2xvf_driver); +RTE_EAL_PCI_REGISTER(bnx2xvf, rte_bnx2xvf_pmd.pci_drv); diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c index 04eddaf..358c240 100644 --- a/drivers/net/cxgbe/cxgbe_ethdev.c +++ b/drivers/net/cxgbe/cxgbe_ethdev.c @@ -869,29 +869,11 @@ static struct eth_driver rte_cxgbe_pmd = { .name = "rte_cxgbe_pmd", .id_table = cxgb4_pci_tbl, .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .devinit =
[dpdk-dev] [PATCH v3 07/17] ethdev: export init/uninit common wrappers for pci drivers
From: David MarchandPreparing for getting rid of eth_drv, here are two wrappers that can be used by pci drivers that assume a 1 to 1 association between pci resource and upper interface. Signed-off-by: David Marchand --- lib/librte_ether/rte_ethdev.c | 14 +++--- lib/librte_ether/rte_ethdev.h | 13 + lib/librte_ether/rte_ether_version.map | 8 3 files changed, 28 insertions(+), 7 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index e148028..d05eada 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -239,9 +239,9 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev) return 0; } -static int -rte_eth_dev_init(struct rte_pci_driver *pci_drv, -struct rte_pci_device *pci_dev) +int +rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev) { struct eth_driver*eth_drv; struct rte_eth_dev *eth_dev; @@ -293,8 +293,8 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv, return diag; } -static int -rte_eth_dev_uninit(struct rte_pci_device *pci_dev) +int +rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev) { const struct eth_driver *eth_drv; struct rte_eth_dev *eth_dev; @@ -351,8 +351,8 @@ rte_eth_dev_uninit(struct rte_pci_device *pci_dev) void rte_eth_driver_register(struct eth_driver *eth_drv) { - eth_drv->pci_drv.devinit = rte_eth_dev_init; - eth_drv->pci_drv.devuninit = rte_eth_dev_uninit; + eth_drv->pci_drv.devinit = rte_eth_dev_pci_probe; + eth_drv->pci_drv.devuninit = rte_eth_dev_pci_remove; rte_eal_pci_register(_drv->pci_drv); } diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index e5e91e4..6deafa2 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -4254,6 +4254,19 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id, uint32_t mask, uint8_t en); +/** + * Wrapper for use by pci drivers as a .devinit function to attach to a ethdev + * interface. + */ +int rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev); + +/** + * Wrapper for use by pci drivers as a .devuninit function to detach a ethdev + * interface. + */ +int rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev); + #ifdef __cplusplus } #endif diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map index 214ecc7..31017d4 100644 --- a/lib/librte_ether/rte_ether_version.map +++ b/lib/librte_ether/rte_ether_version.map @@ -132,3 +132,11 @@ DPDK_16.04 { rte_eth_tx_buffer_set_err_callback; } DPDK_2.2; + +DPDK_16.07 { + global: + + rte_eth_dev_pci_probe; + rte_eth_dev_pci_remove; + +} DPDK_16.04; -- 2.7.4
[dpdk-dev] [PATCH v3 06/17] crypto: export init/uninit common wrappers for pci drivers
From: David MarchandPreparing for getting rid of rte_cryptodev_driver, here are two wrappers that can be used by pci drivers that assume a 1 to 1 association between pci resource and upper interface. Signed-off-by: David Marchand --- lib/librte_cryptodev/rte_cryptodev.c | 16 lib/librte_cryptodev/rte_cryptodev_pmd.h | 12 lib/librte_cryptodev/rte_cryptodev_version.map | 8 3 files changed, 28 insertions(+), 8 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index b0d806c..65a2e29 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -340,9 +340,9 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, size_t dev_private_size, return cryptodev; } -static int -rte_cryptodev_init(struct rte_pci_driver *pci_drv, - struct rte_pci_device *pci_dev) +int +rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev) { struct rte_cryptodev_driver *cryptodrv; struct rte_cryptodev *cryptodev; @@ -401,8 +401,8 @@ rte_cryptodev_init(struct rte_pci_driver *pci_drv, return -ENXIO; } -static int -rte_cryptodev_uninit(struct rte_pci_device *pci_dev) +int +rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev) { const struct rte_cryptodev_driver *cryptodrv; struct rte_cryptodev *cryptodev; @@ -450,15 +450,15 @@ rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *cryptodrv, { /* Call crypto device initialization directly if device is virtual */ if (type == PMD_VDEV) - return rte_cryptodev_init((struct rte_pci_driver *)cryptodrv, + return rte_cryptodev_pci_probe((struct rte_pci_driver *)cryptodrv, NULL); /* * Register PCI driver for physical device intialisation during * PCI probing */ - cryptodrv->pci_drv.devinit = rte_cryptodev_init; - cryptodrv->pci_drv.devuninit = rte_cryptodev_uninit; + cryptodrv->pci_drv.devinit = rte_cryptodev_pci_probe; + cryptodrv->pci_drv.devuninit = rte_cryptodev_pci_remove; rte_eal_pci_register(>pci_drv); diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h index c977c61..3fb7c7c 100644 --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h @@ -534,6 +534,18 @@ rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *crypto_drv, void rte_cryptodev_pmd_callback_process(struct rte_cryptodev *dev, enum rte_cryptodev_event_type event); +/** + * Wrapper for use by pci drivers as a .devinit function to attach to a crypto + * interface. + */ +int rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev); + +/** + * Wrapper for use by pci drivers as a .devuninit function to detach a crypto + * interface. + */ +int rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev); #ifdef __cplusplus } diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map b/lib/librte_cryptodev/rte_cryptodev_version.map index 41004e1..8d0edfb 100644 --- a/lib/librte_cryptodev/rte_cryptodev_version.map +++ b/lib/librte_cryptodev/rte_cryptodev_version.map @@ -32,3 +32,11 @@ DPDK_16.04 { local: *; }; + +DPDK_16.07 { + global: + + rte_cryptodev_pci_probe; + rte_cryptodev_pci_remove; + +} DPDK_16.04; -- 2.7.4
[dpdk-dev] [PATCH v3 05/17] eal: introduce init macros
From: David MarchandIntroduce a RTE_INIT macro used to mark an init function as a constructor. Current eal macros have been converted to use this (no functional impact). RTE_EAL_PCI_REGISTER is added as a helper for pci drivers. Suggested-by: Jan Viktorin Signed-off-by: David Marchand --- lib/librte_eal/common/include/rte_dev.h | 4 ++-- lib/librte_eal/common/include/rte_eal.h | 3 +++ lib/librte_eal/common/include/rte_pci.h | 7 +++ lib/librte_eal/common/include/rte_tailq.h | 4 ++-- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index f1b5507..85e48f2 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -179,8 +179,8 @@ int rte_eal_vdev_init(const char *name, const char *args); int rte_eal_vdev_uninit(const char *name); #define PMD_REGISTER_DRIVER(d)\ -void devinitfn_ ##d(void);\ -void __attribute__((constructor, used)) devinitfn_ ##d(void)\ +RTE_INIT(devinitfn_ ##d);\ +static void devinitfn_ ##d(void)\ {\ rte_eal_driver_register();\ } diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index a71d6f5..186f3c6 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -252,6 +252,9 @@ static inline int rte_gettid(void) return RTE_PER_LCORE(_thread_id); } +#define RTE_INIT(func) \ +static void __attribute__((constructor, used)) func(void) + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index fa74962..d7df1d9 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -470,6 +470,13 @@ void rte_eal_pci_dump(FILE *f); */ void rte_eal_pci_register(struct rte_pci_driver *driver); +#define RTE_EAL_PCI_REGISTER(name, d) \ +RTE_INIT(pciinitfn_ ##name); \ +static void pciinitfn_ ##name(void) \ +{ \ + rte_eal_pci_register(); \ +} + /** * Unregister a PCI driver. * diff --git a/lib/librte_eal/common/include/rte_tailq.h b/lib/librte_eal/common/include/rte_tailq.h index 4a686e6..71ed3bb 100644 --- a/lib/librte_eal/common/include/rte_tailq.h +++ b/lib/librte_eal/common/include/rte_tailq.h @@ -148,8 +148,8 @@ struct rte_tailq_head *rte_eal_tailq_lookup(const char *name); int rte_eal_tailq_register(struct rte_tailq_elem *t); #define EAL_REGISTER_TAILQ(t) \ -void tailqinitfn_ ##t(void); \ -void __attribute__((constructor, used)) tailqinitfn_ ##t(void) \ +RTE_INIT(tailqinitfn_ ##t); \ +static void tailqinitfn_ ##t(void) \ { \ if (rte_eal_tailq_register() < 0) \ rte_panic("Cannot initialize tailq: %s\n", t.name); \ -- 2.7.4
[dpdk-dev] [PATCH v3 04/17] eal: remove duplicate function declaration
From: David Marchandrte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its introduction. This function has been exported in ABI, so remove it from eal_private.h Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices") Signed-off-by: David Marchand --- lib/librte_eal/common/eal_private.h | 7 --- lib/librte_eal/linuxapp/eal/eal.c | 1 + 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 857dc3e..06a68f6 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -259,13 +259,6 @@ int rte_eal_intr_init(void); int rte_eal_alarm_init(void); /** - * This function initialises any virtual devices - * - * This function is private to the EAL. - */ -int rte_eal_dev_init(void); - -/** * Function is to check if the kernel module(like, vfio, vfio_iommu_type1, * etc.) loaded. * diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index bba8fea..5ec3d4e 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -70,6 +70,7 @@ #include #include #include +#include #include #include #include -- 2.7.4
[dpdk-dev] [PATCH v3 03/17] drivers: align pci driver definitions
From: David MarchandPure coding style, but it might make it easier later if we want to move fields in rte_cryptodev_driver and eth_driver structures. Signed-off-by: David Marchand --- drivers/crypto/qat/rte_qat_cryptodev.c | 2 +- drivers/net/ena/ena_ethdev.c | 2 +- drivers/net/nfp/nfp_net.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/qat/rte_qat_cryptodev.c b/drivers/crypto/qat/rte_qat_cryptodev.c index a7912f5..08496ab 100644 --- a/drivers/crypto/qat/rte_qat_cryptodev.c +++ b/drivers/crypto/qat/rte_qat_cryptodev.c @@ -116,7 +116,7 @@ crypto_qat_dev_init(__attribute__((unused)) struct rte_cryptodev_driver *crypto_ } static struct rte_cryptodev_driver rte_qat_pmd = { - { + .pci_drv = { .name = "rte_qat_pmd", .id_table = pci_id_qat_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING, diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index e157587..8d01e9a 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -1427,7 +1427,7 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, } static struct eth_driver rte_ena_pmd = { - { + .pci_drv = { .name = "rte_ena_pmd", .id_table = pci_id_ena_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING, diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 5c9f350..ef7011e 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -2463,7 +2463,7 @@ static struct rte_pci_id pci_id_nfp_net_map[] = { }; static struct eth_driver rte_nfp_net_pmd = { - { + .pci_drv = { .name = "rte_nfp_net_pmd", .id_table = pci_id_nfp_net_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | -- 2.7.4
[dpdk-dev] [PATCH v3 02/17] crypto: no need for a crypto pmd type
From: David MarchandThis information is not used and just adds noise. Signed-off-by: David Marchand --- lib/librte_cryptodev/rte_cryptodev.c | 8 +++- lib/librte_cryptodev/rte_cryptodev.h | 2 -- lib/librte_cryptodev/rte_cryptodev_pmd.h | 3 +-- 3 files changed, 4 insertions(+), 9 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index 960e2d5..b0d806c 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -230,7 +230,7 @@ rte_cryptodev_find_free_device_index(void) } struct rte_cryptodev * -rte_cryptodev_pmd_allocate(const char *name, enum pmd_type type, int socket_id) +rte_cryptodev_pmd_allocate(const char *name, int socket_id) { struct rte_cryptodev *cryptodev; uint8_t dev_id; @@ -269,7 +269,6 @@ rte_cryptodev_pmd_allocate(const char *name, enum pmd_type type, int socket_id) cryptodev->data->dev_started = 0; cryptodev->attached = RTE_CRYPTODEV_ATTACHED; - cryptodev->pmd_type = type; cryptodev_globals.nb_devs++; } @@ -318,7 +317,7 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, size_t dev_private_size, struct rte_cryptodev *cryptodev; /* allocate device structure */ - cryptodev = rte_cryptodev_pmd_allocate(name, PMD_VDEV, socket_id); + cryptodev = rte_cryptodev_pmd_allocate(name, socket_id); if (cryptodev == NULL) return NULL; @@ -360,8 +359,7 @@ rte_cryptodev_init(struct rte_pci_driver *pci_drv, rte_cryptodev_create_unique_device_name(cryptodev_name, sizeof(cryptodev_name), pci_dev); - cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, PMD_PDEV, - rte_socket_id()); + cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, rte_socket_id()); if (cryptodev == NULL) return -ENOMEM; diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h index d47f1e8..2d0b809 100644 --- a/lib/librte_cryptodev/rte_cryptodev.h +++ b/lib/librte_cryptodev/rte_cryptodev.h @@ -697,8 +697,6 @@ struct rte_cryptodev { enum rte_cryptodev_type dev_type; /**< Crypto device type */ - enum pmd_type pmd_type; - /**< PMD type - PDEV / VDEV */ struct rte_cryptodev_cb_list link_intr_cbs; /**< User application callback for interrupts if present */ diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h index 7d049ea..c977c61 100644 --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h @@ -454,13 +454,12 @@ struct rte_cryptodev_ops { * to that slot for the driver to use. * * @param nameUnique identifier name for each device - * @param typeDevice type of this Crypto device * @param socket_id Socket to allocate resources on. * @return * - Slot in the rte_dev_devices array for a new device; */ struct rte_cryptodev * -rte_cryptodev_pmd_allocate(const char *name, enum pmd_type type, int socket_id); +rte_cryptodev_pmd_allocate(const char *name, int socket_id); /** * Creates a new virtual crypto device and returns the pointer -- 2.7.4
[dpdk-dev] [PATCH v3 01/17] pci: no need for dynamic tailq init
From: David MarchandThese lists can be initialized once and for all at build time. With this, those lists are only manipulated in a common place (and we could even make them private). A nice side effect is that pci drivers can now register in constructors. Signed-off-by: David Marchand Reviewed-by: Jan Viktorin --- lib/librte_eal/bsdapp/eal/eal_pci.c| 3 --- lib/librte_eal/common/eal_common_pci.c | 6 -- lib/librte_eal/linuxapp/eal/eal_pci.c | 3 --- 3 files changed, 4 insertions(+), 8 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 7fdd6f1..880483d 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -623,9 +623,6 @@ rte_eal_pci_ioport_unmap(struct rte_pci_ioport *p) int rte_eal_pci_init(void) { - TAILQ_INIT(_driver_list); - TAILQ_INIT(_device_list); - /* for debug purposes, PCI can be disabled */ if (internal_config.no_pci) return 0; diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index ba5283d..fee4aa5 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -82,8 +82,10 @@ #include "eal_private.h" -struct pci_driver_list pci_driver_list; -struct pci_device_list pci_device_list; +struct pci_driver_list pci_driver_list = + TAILQ_HEAD_INITIALIZER(pci_driver_list); +struct pci_device_list pci_device_list = + TAILQ_HEAD_INITIALIZER(pci_device_list); #define SYSFS_PCI_DEVICES "/sys/bus/pci/devices" diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index f9c3efd..bfc410f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -743,9 +743,6 @@ rte_eal_pci_ioport_unmap(struct rte_pci_ioport *p) int rte_eal_pci_init(void) { - TAILQ_INIT(_driver_list); - TAILQ_INIT(_device_list); - /* for debug purposes, PCI can be disabled */ if (internal_config.no_pci) return 0; -- 2.7.4
[dpdk-dev] [PATCH v3 00/17] prepare for rte_device / rte_driver
From: David Marchand* Original patch series is from David Marchand. This is just a rebase over master (d76c19309) * Following discussions with Jan [1] and some cleanup I started on pci code, here is a patchset that reworks pdev drivers registration and hotplug api. The structures changes mentioned in [1] are still to be done, but at least, I think we are one step closer to it. Before this patchset, rte_driver .init semantics differed whether it concerned a pdev or a vdev driver: - for vdev, it actually meant that a devargs is given to the driver so that it creates ethdev / crypto objects, so it was a probing action - for pdev, it only registered the driver triggering no ethdev / crypto objects >From my pov, eal hotplug api introduced in this patchset still needs more work so that it does not need to know about devargs. So a new devargs api is needed. Changes since v2: - rebase over HEAD (d76c193) - Move SYSFS_PCI_DRIVERS macro to rte_pci.h to avoid compilation issue Changes since v1: - rebased on HEAD, new drivers should be okay - patches have been split into smaller pieces - RTE_INIT macro has been added, but in the end, I am not sure it is useful - device type has been removed from ethdev, as it was used only by hotplug - getting rid of pmd type in eal patch (patch 5 of initial series) has been dropped for now, we can do this once vdev drivers have been converted [1] http://dpdk.org/ml/archives/dev/2016-January/031390.html David Marchand (17): pci: no need for dynamic tailq init crypto: no need for a crypto pmd type drivers: align pci driver definitions eal: remove duplicate function declaration eal: introduce init macros crypto: export init/uninit common wrappers for pci drivers ethdev: export init/uninit common wrappers for pci drivers drivers: convert all pdev drivers as pci drivers crypto: get rid of crypto driver register callback ethdev: get rid of eth driver register callback eal/linux: move back interrupt thread init before setting affinity pci: add a helper for device name pci: add a helper to update a device ethdev: do not scan all pci devices on attach eal: add hotplug operations for pci and vdev ethdev: convert to eal hotplug ethdev: get rid of device type app/test/virtual_pmd.c | 2 +- drivers/crypto/qat/rte_qat_cryptodev.c | 18 +- drivers/net/af_packet/rte_eth_af_packet.c | 2 +- drivers/net/bnx2x/bnx2x_ethdev.c| 35 +-- drivers/net/bonding/rte_eth_bond_api.c | 2 +- drivers/net/cxgbe/cxgbe_ethdev.c| 24 +- drivers/net/cxgbe/cxgbe_main.c | 2 +- drivers/net/e1000/em_ethdev.c | 16 +- drivers/net/e1000/igb_ethdev.c | 40 +-- drivers/net/ena/ena_ethdev.c| 20 +- drivers/net/enic/enic_ethdev.c | 23 +- drivers/net/fm10k/fm10k_ethdev.c| 23 +- drivers/net/i40e/i40e_ethdev.c | 26 +- drivers/net/i40e/i40e_ethdev_vf.c | 25 +- drivers/net/ixgbe/ixgbe_ethdev.c| 47 +--- drivers/net/mlx4/mlx4.c | 22 +- drivers/net/mlx5/mlx5.c | 21 +- drivers/net/mpipe/mpipe_tilegx.c| 2 +- drivers/net/nfp/nfp_net.c | 23 +- drivers/net/null/rte_eth_null.c | 2 +- drivers/net/pcap/rte_eth_pcap.c | 2 +- drivers/net/ring/rte_eth_ring.c | 2 +- drivers/net/szedata2/rte_eth_szedata2.c | 25 +- drivers/net/vhost/rte_eth_vhost.c | 2 +- drivers/net/virtio/virtio_ethdev.c | 26 +- drivers/net/vmxnet3/vmxnet3_ethdev.c| 23 +- drivers/net/xenvirt/rte_eth_xenvirt.c | 2 +- examples/ip_pipeline/init.c | 22 -- lib/librte_cryptodev/rte_cryptodev.c| 67 + lib/librte_cryptodev/rte_cryptodev.h| 2 - lib/librte_cryptodev/rte_cryptodev_pmd.h| 45 +--- lib/librte_cryptodev/rte_cryptodev_version.map | 9 +- lib/librte_eal/bsdapp/eal/eal_pci.c | 52 +++- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 + lib/librte_eal/common/eal_common_dev.c | 39 +++ lib/librte_eal/common/eal_common_pci.c | 19 +- lib/librte_eal/common/eal_private.h | 20 +- lib/librte_eal/common/include/rte_dev.h | 29 ++- lib/librte_eal/common/include/rte_eal.h | 3 + lib/librte_eal/common/include/rte_pci.h | 35 +++ lib/librte_eal/common/include/rte_tailq.h | 4 +- lib/librte_eal/linuxapp/eal/eal.c | 7 +- lib/librte_eal/linuxapp/eal/eal_pci.c | 16 +- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 + lib/librte_ether/rte_ethdev.c | 315 lib/librte_ether/rte_ethdev.h | 40 ++-
[dpdk-dev] [PATCH] port: add kni interface support
Hi Cristian, The new patch has been submitted just now. Please note that I do ignore some check patch errors this time. B.R. Ethan 2016-06-13 21:18 GMT+08:00 Dumitrescu, Cristian < cristian.dumitrescu at intel.com>: > Hi Ethan, > > > > Great, we?ll wait for your patch later this week then. I recommend you add > any other changes that you might have on top of the latest code that I just > send, as this will minimize your work, my work to further code reviews and > number of future iterations to merge this patch. > > > > Answers to your questions are inlined below. > > > > Regards, > > Cristian > > > > *From:* zhuangweijie at gmail.com [mailto:zhuangweijie at gmail.com] *On > Behalf > Of *Ethan > *Sent:* Monday, June 13, 2016 11:48 AM > *To:* Dumitrescu, Cristian > *Cc:* dev at dpdk.org; Singh, Jasvinder ; Yigit, > Ferruh > *Subject:* Re: [PATCH] port: add kni interface support > > > > Hi Cristian, > > > > I've got your comments. Thank you for review the code from a DPDK newbie. > :-) > > I plan to submit a new patch to fix all during this week hopefully. > > > > There are four places I'd like to discuss further: > > > > 1. Dedicated lcore for kni kernel thread > > First of all, it is a bug to add kni kernel core to the user space core > mask. What I want is just to check if the kni kernel thread has a dedicated > core. > > The reason I prefer to allocate a dedicated core to kni kernel thread is > that my application is latency sensitive. I worry the context switch and > cache miss will cause the latency increasing if the kni kernel thread and > application thread share one core. > > Anyway, I think I should remove the hard coded check because it will be > more generic. Users who has the similar usage like mine can achieve so > through configuration file. > > > > [Cristian] I agree with you that the user should be able to specify the > core where the kernel thread should run, and this requirement is fully met > by the latest code I sent, but implemented in a slightly different way, > which I think it is a cleaner way. > > > > In your initial solution, the application redefines the meaning of the > core mask as the reunion of cores used by the user space application (cores > running the pipelines) and the cores used to run the kernel space KNI > threads. This does not make sense to me. The application is in user space > and it does not start or manage any kernel threads itself, why should the > application worry about the cores running kernel threads? The application > should just pick up the user instructions from the config file and send > them to the KNI kernel module transparently. > > > > In the code that I just sent, the application preserves the current > definition of the core mask, i.e. just the collection of cores running the > pipelines. This leads to simpler code that meets all the requirements for > kernel threads affinity: > > i) The user wants to affinitize the kernel thread to a CPU core that is > not used to run any pipeline (this core will run just KNI kernel threads): > Core entry in KNI section is set to be different than the core entry of any > PIPELINE section in the config file; > > ii) The user affinitizes the kernel thread to a CPU core that also runs > some of the pipelines (this core will run both user space and kernel space > threads): Core entry in KNI section is equal to the core entry in one or > several of the PIPELINE sections in the config file; > > iii) The user does not affinitize the kernel thread to any CPU core, so > the kernel decides the scheduling policy for the KNI threads: Core entry of > the KNI section is not present; this results in force_bind KNI parameter to > be set to 0. > > > > Makes sense? > > > > 2. The compiler error of the Macro RTE_PORT_KNI_WRITER_STATS_PKTS_IN_ADD > > Actually I implements the macro similar > to RTE_PORT_RING_READER_STATS_PKTS_IN_ADD first. But the > scripts/checkpatches.sh fails: ERROR:COMPLEX_MACRO: Macros with complex > values should be enclosed in parentheses > > I'm not share either I have done something wrong or the checkpatches > script need an update. > > > > [Cristian] Let?s use the same consistent rule to create the stats macros > for all the ports, i.e. follow the existing rule used for other ports. You > can ignore this check patch issue. > > > > 3. KNI kernel operations callback > > To be honest, I made reference to the the KNI sample application. > > Since there is very little docs tell the difference between link up call > and device start call, I am not sure which one is better here. > > Any help will be appreciate. :-) > > > > [Cristian] I suggest you use the ones from the code that I just sent. > > > > 4. Shall I use DPDK_16.07 in the librte_port/rte_port_version.map file? > > > > [Cristian] Yes. > > > > > > 2016-06-10 7:42 GMT+08:00 Dumitrescu, Cristian < > cristian.dumitrescu at intel.com>: > > Hi Ethan, > > Great work! There are still several comments below that need to be > addressed, but I am confident
[dpdk-dev] [PATCH v3 3/3] port: document update
add kni configurations into the document of ip pipeline sample application Signed-off-by: WeiJie Zhuang --- doc/guides/sample_app_ug/ip_pipeline.rst | 112 +++ 1 file changed, 83 insertions(+), 29 deletions(-) diff --git a/doc/guides/sample_app_ug/ip_pipeline.rst b/doc/guides/sample_app_ug/ip_pipeline.rst index 899fd4a..566106b 100644 --- a/doc/guides/sample_app_ug/ip_pipeline.rst +++ b/doc/guides/sample_app_ug/ip_pipeline.rst @@ -1,5 +1,5 @@ .. BSD LICENSE -Copyright(c) 2015 Intel Corporation. All rights reserved. +Copyright(c) 2016 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without @@ -351,33 +351,35 @@ Application resources present in the configuration file .. table:: Application resource names in the configuration file - +--+-+-+ - | Resource type| Format | Examples | - +==+=+=+ - | Pipeline | ``PIPELINE``| ``PIPELINE0``, ``PIPELINE1``| - +--+-+-+ - | Mempool | ``MEMPOOL`` | ``MEMPOOL0``, ``MEMPOOL1`` | - +--+-+-+ - | Link (network interface) | ``LINK``| ``LINK0``, ``LINK1``| - +--+-+-+ - | Link RX queue| ``RXQ.`` | ``RXQ0.0``, ``RXQ1.5`` | - +--+-+-+ - | Link TX queue| ``TXQ.`` | ``TXQ0.0``, ``TXQ1.5`` | - +--+-+-+ - | Software queue | ``SWQ`` | ``SWQ0``, ``SWQ1`` | - +--+-+-+ - | Traffic Manager | ``TM`` | ``TM0``, ``TM1`` | - +--+-+-+ - | Source | ``SOURCE`` | ``SOURCE0``, ``SOURCE1``| - +--+-+-+ - | Sink | ``SINK``| ``SINK0``, ``SINK1``| - +--+-+-+ - | Message queue| ``MSGQ``| ``MSGQ0``, ``MSGQ1``, | - | | ``MSGQ-REQ-PIPELINE`` | ``MSGQ-REQ-PIPELINE2``, ``MSGQ-RSP-PIPELINE2,`` | - | | ``MSGQ-RSP-PIPELINE`` | ``MSGQ-REQ-CORE-s0c1``, ``MSGQ-RSP-CORE-s0c1`` | - | | ``MSGQ-REQ-CORE-`` | | - | | ``MSGQ-RSP-CORE-`` | | - +--+-+-+ + ++-+-+ + | Resource type | Format | Examples | + ++=+=+ + | Pipeline | ``PIPELINE``| ``PIPELINE0``, ``PIPELINE1``| + ++-+-+ + | Mempool| ``MEMPOOL`` | ``MEMPOOL0``, ``MEMPOOL1`` | + ++-+-+ + | Link (network interface) | ``LINK``| ``LINK0``, ``LINK1``| + ++-+-+ + | Link RX queue | ``RXQ.`` | ``RXQ0.0``, ``RXQ1.5`` | +
[dpdk-dev] [PATCH v3 2/3] port: add kni nodrop writer
1. add no drop writing operations to the kni port 2. support dropless kni config in the ip pipeline sample application Signed-off-by: WeiJie Zhuang --- examples/ip_pipeline/app.h | 2 + examples/ip_pipeline/config_parse.c | 31 - examples/ip_pipeline/init.c | 26 - examples/ip_pipeline/pipeline_be.h | 6 + lib/librte_port/rte_port_kni.c | 220 +++ lib/librte_port/rte_port_kni.h | 13 +++ lib/librte_port/rte_port_version.map | 1 + 7 files changed, 292 insertions(+), 7 deletions(-) diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h index abbd6d4..6a6fdd9 100644 --- a/examples/ip_pipeline/app.h +++ b/examples/ip_pipeline/app.h @@ -147,6 +147,8 @@ struct app_pktq_kni_params { uint32_t mempool_id; /* Position in the app->mempool_params */ uint32_t burst_read; uint32_t burst_write; + uint32_t dropless; + uint64_t n_retries; }; #ifndef APP_FILE_NAME_SIZE diff --git a/examples/ip_pipeline/config_parse.c b/examples/ip_pipeline/config_parse.c index c55be31..31a50c2 100644 --- a/examples/ip_pipeline/config_parse.c +++ b/examples/ip_pipeline/config_parse.c @@ -199,6 +199,8 @@ struct app_pktq_kni_params default_kni_params = { .mempool_id = 0, .burst_read = 32, .burst_write = 32, + .dropless = 0, + .n_retries = 0, }; struct app_pktq_source_params default_source_params = { @@ -1927,7 +1929,7 @@ parse_kni(struct app_params *app, if (strcmp(ent->name, "mempool") == 0) { int status = validate_name(ent->value, - "MEMPOOL", 1); + "MEMPOOL", 1); ssize_t idx; PARSE_ERROR((status == 0), section_name, @@ -1940,7 +1942,7 @@ parse_kni(struct app_params *app, if (strcmp(ent->name, "burst_read") == 0) { int status = parser_read_uint32(>burst_read, - ent->value); + ent->value); PARSE_ERROR((status == 0), section_name, ent->name); @@ -1949,7 +1951,25 @@ parse_kni(struct app_params *app, if (strcmp(ent->name, "burst_write") == 0) { int status = parser_read_uint32(>burst_write, - ent->value); + ent->value); + + PARSE_ERROR((status == 0), section_name, + ent->name); + continue; + } + + if (strcmp(ent->name, "dropless") == 0) { + int status = parser_read_arg_bool(ent->value); + + PARSE_ERROR((status != -EINVAL), section_name, + ent->name); + param->dropless = status; + continue; + } + + if (strcmp(ent->name, "n_retries") == 0) { + int status = parser_read_uint64(>n_retries, + ent->value); PARSE_ERROR((status == 0), section_name, ent->name); @@ -2794,6 +2814,11 @@ save_kni_params(struct app_params *app, FILE *f) /* burst_write */ fprintf(f, "%s = %" PRIu32 "\n", "burst_write", p->burst_write); + /* dropless */ + fprintf(f, "%s = %s\n", + "dropless", + p->dropless ? "yes" : "no"); + fputc('\n', f); } } diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c index d522de4..af24f52 100644 --- a/examples/ip_pipeline/init.c +++ b/examples/ip_pipeline/init.c @@ -1434,10 +1434,28 @@ void app_pipeline_params_get(struct app_params *app, #ifdef RTE_LIBRTE_KNI case APP_PKTQ_OUT_KNI: { - out->type = PIPELINE_PORT_OUT_KNI_WRITER; - out->params.kni.kni = app->kni[in->id]; - out->params.kni.tx_burst_sz = - app->kni_params[in->id].burst_write; + struct app_pktq_kni_params *p_kni = + >kni_params[in->id]; + + if (p_kni->dropless == 0) { + struct rte_port_kni_writer_params *params = + >params.kni; + + out->type = PIPELINE_PORT_OUT_KNI_WRITER; + params->kni = app->kni[in->id]; +
[dpdk-dev] [PATCH v3 1/3] port: add kni interface support
1. add KNI port type to the packet framework 2. add KNI support to the IP Pipeline sample Application 3. some bug fix Signed-off-by: WeiJie Zhuang --- v2: * Fix check patch error. v3: * Fix code review comments. --- doc/api/doxy-api-index.md | 1 + examples/ip_pipeline/Makefile | 2 +- examples/ip_pipeline/app.h | 181 +++- examples/ip_pipeline/config/kni.cfg| 67 + examples/ip_pipeline/config_check.c| 26 +- examples/ip_pipeline/config_parse.c| 166 ++- examples/ip_pipeline/init.c| 132 - examples/ip_pipeline/pipeline/pipeline_common_fe.c | 29 ++ examples/ip_pipeline/pipeline/pipeline_master_be.c | 6 + examples/ip_pipeline/pipeline_be.h | 27 ++ lib/librte_port/Makefile | 7 + lib/librte_port/rte_port_kni.c | 325 + lib/librte_port/rte_port_kni.h | 82 ++ lib/librte_port/rte_port_version.map | 8 + 14 files changed, 1047 insertions(+), 12 deletions(-) create mode 100644 examples/ip_pipeline/config/kni.cfg create mode 100644 lib/librte_port/rte_port_kni.c create mode 100644 lib/librte_port/rte_port_kni.h diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index f626386..5e7f024 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -118,6 +118,7 @@ There are many libraries, so their headers may be grouped by topics: [frag] (@ref rte_port_frag.h), [reass](@ref rte_port_ras.h), [sched](@ref rte_port_sched.h), +[kni] (@ref rte_port_kni.h), [src/sink] (@ref rte_port_source_sink.h) * [table](@ref rte_table.h): [lpm IPv4] (@ref rte_table_lpm.h), diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile index 5827117..6dc3f52 100644 --- a/examples/ip_pipeline/Makefile +++ b/examples/ip_pipeline/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h index 7611341..abbd6d4 100644 --- a/examples/ip_pipeline/app.h +++ b/examples/ip_pipeline/app.h @@ -44,6 +44,9 @@ #include #include +#ifdef RTE_LIBRTE_KNI +#include +#endif #include "cpu_core_map.h" #include "pipeline.h" @@ -132,6 +135,20 @@ struct app_pktq_swq_params { uint32_t mempool_indirect_id; }; +struct app_pktq_kni_params { + char *name; + uint32_t parsed; + + uint32_t socket_id; + uint32_t core_id; + uint32_t hyper_th_id; + uint32_t force_bind; + + uint32_t mempool_id; /* Position in the app->mempool_params */ + uint32_t burst_read; + uint32_t burst_write; +}; + #ifndef APP_FILE_NAME_SIZE #define APP_FILE_NAME_SIZE 256 #endif @@ -185,6 +202,7 @@ enum app_pktq_in_type { APP_PKTQ_IN_HWQ, APP_PKTQ_IN_SWQ, APP_PKTQ_IN_TM, + APP_PKTQ_IN_KNI, APP_PKTQ_IN_SOURCE, }; @@ -197,6 +215,7 @@ enum app_pktq_out_type { APP_PKTQ_OUT_HWQ, APP_PKTQ_OUT_SWQ, APP_PKTQ_OUT_TM, + APP_PKTQ_OUT_KNI, APP_PKTQ_OUT_SINK, }; @@ -420,6 +439,8 @@ struct app_eal_params { #define APP_MAX_PKTQ_TM APP_MAX_LINKS +#define APP_MAX_PKTQ_KNI APP_MAX_LINKS + #ifndef APP_MAX_PKTQ_SOURCE #define APP_MAX_PKTQ_SOURCE 64 #endif @@ -471,6 +492,7 @@ struct app_params { struct app_pktq_hwq_out_params hwq_out_params[APP_MAX_HWQ_OUT]; struct app_pktq_swq_params swq_params[APP_MAX_PKTQ_SWQ]; struct app_pktq_tm_params tm_params[APP_MAX_PKTQ_TM]; + struct app_pktq_kni_params kni_params[APP_MAX_PKTQ_KNI]; struct app_pktq_source_params source_params[APP_MAX_PKTQ_SOURCE]; struct app_pktq_sink_params sink_params[APP_MAX_PKTQ_SINK]; struct app_msgq_params msgq_params[APP_MAX_MSGQ]; @@ -482,6 +504,7 @@ struct app_params { uint32_t n_pktq_hwq_out; uint32_t n_pktq_swq; uint32_t n_pktq_tm; + uint32_t n_pktq_kni; uint32_t n_pktq_source; uint32_t n_pktq_sink; uint32_t n_msgq; @@ -495,6 +518,9 @@ struct app_params { struct app_link_data link_data[APP_MAX_LINKS]; struct rte_ring *swq[APP_MAX_PKTQ_SWQ]; struct rte_sched_port *tm[APP_MAX_PKTQ_TM]; +#ifdef RTE_LIBRTE_KNI + struct rte_kni *kni[APP_MAX_PKTQ_KNI]; +#endif /* RTE_LIBRTE_KNI */ struct rte_ring *msgq[APP_MAX_MSGQ]; struct pipeline_type pipeline_type[APP_MAX_PIPELINE_TYPES]; struct app_pipeline_data
[dpdk-dev] [PATCH v4 0/3] Keep-alive enhancements
> Remy Horton (3): > eal: export keepalive state enumerations > eal: add additional keepalive callbacks > examples/l2fwd-keepalive: add IPC liveness reporting Applied, thanks Just a last comment: the agent in the example should not appear in examples/Makefile.
[dpdk-dev] Performance hit - NICs on different CPU sockets
On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith wrote: > > Right now I do not know what the issue is with the system. Could be too many > Rx/Tx ring pairs per port and limiting the memory in the NICs, which is why > you get better performance when you have 8 core per port. I am not really > seeing the whole picture and how DPDK is configured to help more. Sorry. I doubt that there is a limitation wrt running 16 cores per port vs 8 cores per port as I've tried with two different machines connected back to back each with one X710 port and 16 cores on each of them running on that port. In that case our performance doubled as expected. > > Maybe seeing the DPDK command line would help. The command line I use with ports 01:00.3 and 81:00.3 is: ./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- --qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 Our own qmap args allow the user to control exactly how cores are split between ports. In this case we end up with: warp17> show port map Port 0[socket: 0]: Core 4[socket:0] (Tx: 0, Rx: 0) Core 5[socket:0] (Tx: 1, Rx: 1) Core 6[socket:0] (Tx: 2, Rx: 2) Core 7[socket:0] (Tx: 3, Rx: 3) Core 8[socket:0] (Tx: 4, Rx: 4) Core 9[socket:0] (Tx: 5, Rx: 5) Core 20[socket:0] (Tx: 6, Rx: 6) Core 21[socket:0] (Tx: 7, Rx: 7) Core 22[socket:0] (Tx: 8, Rx: 8) Core 23[socket:0] (Tx: 9, Rx: 9) Core 24[socket:0] (Tx: 10, Rx: 10) Core 25[socket:0] (Tx: 11, Rx: 11) Core 26[socket:0] (Tx: 12, Rx: 12) Core 27[socket:0] (Tx: 13, Rx: 13) Core 28[socket:0] (Tx: 14, Rx: 14) Core 29[socket:0] (Tx: 15, Rx: 15) Port 1[socket: 1]: Core 10[socket:1] (Tx: 0, Rx: 0) Core 11[socket:1] (Tx: 1, Rx: 1) Core 12[socket:1] (Tx: 2, Rx: 2) Core 13[socket:1] (Tx: 3, Rx: 3) Core 14[socket:1] (Tx: 4, Rx: 4) Core 15[socket:1] (Tx: 5, Rx: 5) Core 16[socket:1] (Tx: 6, Rx: 6) Core 17[socket:1] (Tx: 7, Rx: 7) Core 18[socket:1] (Tx: 8, Rx: 8) Core 19[socket:1] (Tx: 9, Rx: 9) Core 30[socket:1] (Tx: 10, Rx: 10) Core 31[socket:1] (Tx: 11, Rx: 11) Core 32[socket:1] (Tx: 12, Rx: 12) Core 33[socket:1] (Tx: 13, Rx: 13) Core 34[socket:1] (Tx: 14, Rx: 14) Core 35[socket:1] (Tx: 15, Rx: 15) Just for reference, the cpu_layout script shows: $ $RTE_SDK/tools/cpu_layout.py Core and Socket Information (as reported by '/proc/cpuinfo') cores = [0, 1, 2, 3, 4, 8, 9, 10, 11, 12] sockets = [0, 1] Socket 0Socket 1 Core 0 [0, 20] [10, 30] Core 1 [1, 21] [11, 31] Core 2 [2, 22] [12, 32] Core 3 [3, 23] [13, 33] Core 4 [4, 24] [14, 34] Core 8 [5, 25] [15, 35] Core 9 [6, 26] [16, 36] Core 10 [7, 27] [17, 37] Core 11 [8, 28] [18, 38] Core 12 [9, 29] [19, 39] I know it might be complicated to gigure out exactly what's happening in our setup with our own code so please let me know if you need additional information. I appreciate the help! Thanks, Dumitru
[dpdk-dev] [PATCH] qat: fix for VFs not getting recognized
2016-06-16 16:29, Jain, Deepak K: > Due to addition of CLASS_ID in EAL, class_id is > amended into the code. Why the VF is not recognized? The class id should not be mandatory.
[dpdk-dev] [PATCH v5 0/7] Remove string operations from xstats
> Remy Horton (7): > rte: change xstats to use integer ids > drivers/net/ixgbe: change xstats to use integer ids > drivers/net/e1000: change xstats to use integer ids > drivers/net/fm10k: change xstats to use integer ids > drivers/net/i40e: change xstats to use integer ids > drivers/net/virtio: change xstats to use integer ids > rte: change xstats usage to new API Applied, thanks
[dpdk-dev] [PATCH v3] i40e: configure MTU
On 6/16/16, 10:40 AM, "dev on behalf of Yong Wang" wrote: >On 5/16/16, 5:27 AM, "dev on behalf of Olivier Matz" on behalf of olivier.matz at 6wind.com> wrote: > >>Hi Beilei, >> >>On 05/13/2016 10:15 AM, Beilei Xing wrote: >>> This patch enables configuring MTU for i40e. >>> Since changing MTU needs to reconfigure queue, stop port first >>> before configuring MTU. >>> >>> Signed-off-by: Beilei Xing >>> --- >>> v3 changes: >>> Add frame size with extra I40E_VLAN_TAG_SIZE. >>> Delete i40e_dev_rx_init(pf) cause it will be called when port starts. >>> >>> v2 changes: >>> If mtu is not within the allowed range, return -EINVAL instead of -EBUSY. >>> Delete rxq reconfigure cause rxq reconfigure will be finished in >>> i40e_dev_rx_init. >>> >>> drivers/net/i40e/i40e_ethdev.c | 34 ++ >>> 1 file changed, 34 insertions(+) >>> >>> [...] >>> +static int >>> +i40e_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) >>> +{ >>> + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); >>> + struct rte_eth_dev_data *dev_data = pf->dev_data; >>> + uint32_t frame_size = mtu + ETHER_HDR_LEN >>> + + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE; >>> + int ret = 0; >>> + >>> + /* check if mtu is within the allowed range */ >>> + if ((mtu < ETHER_MIN_MTU) || (frame_size > I40E_FRAME_SIZE_MAX)) >>> + return -EINVAL; >>> + >>> + /* mtu setting is forbidden if port is start */ >>> + if (dev_data->dev_started) { >>> + PMD_DRV_LOG(ERR, >>> + "port %d must be stopped before configuration\n", >>> + dev_data->port_id); >>> + return -ENOTSUP; >>> + } >> >>I'm not convinced that ENOTSUP is the proper return value here. >>It is usually returned when a function is not implemented, which >>is not the case here: the function is implemented but is forbidden >>because the port is running. >> >>I saw that Julien commented on your v1 that the return value should >>be one of: >> - (0) if successful. >> - (-ENOTSUP) if operation is not supported. >> - (-ENODEV) if *port_id* invalid. >> - (-EINVAL) if *mtu* invalid. >> >>But I think your initial value (-EBUSY) was fine. Maybe it should be >>added in the API instead, with the following description: >> (-EBUSY) if the operation is not allowed when the port is running > >AFAICT, the same check is not done for other drivers that implement >the mac_set op. Wouldn?t it make more sense to have the driver disable Correction: this should read as mtu_set. >the port, reconfigure and re-enable it in this case, instead of returning >error code? If the consensus in DPDK is to have the application disable >the port first, we need to enforce this policy across all devices and >clearly document this behavior. > >>This would allow the application to take its dispositions to stop the >>port and restart it with the proper jumbo_frame argument. >> >>+CC Thomas which maintains ethdev API. >> >> >>Regards, >>Olivier >
[dpdk-dev] [PATCH v3] i40e: configure MTU
On 5/16/16, 5:27 AM, "dev on behalf of Olivier Matz" wrote: >Hi Beilei, > >On 05/13/2016 10:15 AM, Beilei Xing wrote: >> This patch enables configuring MTU for i40e. >> Since changing MTU needs to reconfigure queue, stop port first >> before configuring MTU. >> >> Signed-off-by: Beilei Xing >> --- >> v3 changes: >> Add frame size with extra I40E_VLAN_TAG_SIZE. >> Delete i40e_dev_rx_init(pf) cause it will be called when port starts. >> >> v2 changes: >> If mtu is not within the allowed range, return -EINVAL instead of -EBUSY. >> Delete rxq reconfigure cause rxq reconfigure will be finished in >> i40e_dev_rx_init. >> >> drivers/net/i40e/i40e_ethdev.c | 34 ++ >> 1 file changed, 34 insertions(+) >> >> [...] >> +static int >> +i40e_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) >> +{ >> +struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); >> +struct rte_eth_dev_data *dev_data = pf->dev_data; >> +uint32_t frame_size = mtu + ETHER_HDR_LEN >> + + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE; >> +int ret = 0; >> + >> +/* check if mtu is within the allowed range */ >> +if ((mtu < ETHER_MIN_MTU) || (frame_size > I40E_FRAME_SIZE_MAX)) >> +return -EINVAL; >> + >> +/* mtu setting is forbidden if port is start */ >> +if (dev_data->dev_started) { >> +PMD_DRV_LOG(ERR, >> +"port %d must be stopped before configuration\n", >> +dev_data->port_id); >> +return -ENOTSUP; >> +} > >I'm not convinced that ENOTSUP is the proper return value here. >It is usually returned when a function is not implemented, which >is not the case here: the function is implemented but is forbidden >because the port is running. > >I saw that Julien commented on your v1 that the return value should >be one of: > - (0) if successful. > - (-ENOTSUP) if operation is not supported. > - (-ENODEV) if *port_id* invalid. > - (-EINVAL) if *mtu* invalid. > >But I think your initial value (-EBUSY) was fine. Maybe it should be >added in the API instead, with the following description: > (-EBUSY) if the operation is not allowed when the port is running AFAICT, the same check is not done for other drivers that implement the mac_set op. Wouldn?t it make more sense to have the driver disable the port, reconfigure and re-enable it in this case, instead of returning error code? If the consensus in DPDK is to have the application disable the port first, we need to enforce this policy across all devices and clearly document this behavior. >This would allow the application to take its dispositions to stop the >port and restart it with the proper jumbo_frame argument. > >+CC Thomas which maintains ethdev API. > > >Regards, >Olivier
[dpdk-dev] [PATCH v5 1/4] lib/librte_ether: support device reset
2016-06-15 11:03, Wenzhuo Lu: > +/** > + * Reset an Ethernet device. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + */ > +int > +rte_eth_dev_reset(uint8_t port_id); Please explain in the doxygen comment what means a reset. We must understand why and when an application should call it. And it must be clear for a PMD developper how to implement it. What is the return value?
[dpdk-dev] [PATCH v3 1/2] ethdev: add tunnel and port RSS offload types
On Fri, Apr 01, 2016 at 07:59:33PM +0530, Jerin Jacob wrote: > On Fri, Apr 01, 2016 at 04:04:13PM +0200, Thomas Monjalon wrote: > > 2016-03-31 02:21, Jerin Jacob: > > > - added VXLAN, GENEVE and NVGRE tunnel flow types > > > - added PORT flow type for accounting physical/virtual > > > port or channel number in flow creation > > > > These API change could be considered for 16.07 if they are motivated > > by any use. Please bring some use cases, thanks. > > The use case is to spray the packets to multiple queues using RSS on > Tunnel type packets. > > Considering the case if RSS hash does not account inner packet in tunnel > case, the packet always to go a particular queue as mostly likely > outer header remains same in tunnel packets and RSS spread > will not be achieved in tunnel packets case. > > This feature is part of the RSS capability of ThunderX > NIC HW. Which, we are planning to upstream on next release. > > I thought of pushing the common code changes first. Ping Can we merge this changeset if their are no concerns? and their is a real consumer for this, http://dpdk.org/ml/archives/dev/2016-June/041374.html Jerin
[dpdk-dev] Performance hit - NICs on different CPU sockets
On Thu, Jun 16, 2016 at 4:58 PM, Wiles, Keith wrote: > > From the output below it appears the x710 devices 01:00.[0-3] are on socket 0 > And the x710 devices 02:00.[0-3] sit on socket 1. > I assume there's a mistake here. The x710 devices on socket 0 are: $ lspci | grep -ie "01:.*x710" 01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 01:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 01:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 01:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) and the X710 devices on socket 1 are: $ lspci | grep -ie "81:.*x710" 81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 81:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 81:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) > This means the ports on 01.00.xx should be handled by socket 0 CPUs and > 02:00.xx should be handled by Socket 1. I can not tell if that is the case > for you here. The CPUs or lcores from the cpu_layout.py should help > understand the layout. > That was the first scenario I tried: - assign 16 CPUs from socket 0 to port 0 (01:00.3) - assign 16 CPUs from socket 1 to port 1 (81:00.3) Our performance measurements show then a setup rate of 1.6M sess/s which is less then half of what I get when i install both X710 on socket 1 and use only 16 CPUs from socket 1 for both ports. I double checked the cpu layout. We also have our own CLI and warnings when using cores that are not on the same socket as the port they're assigned too so the mapping should be fine. Thanks, Dumitru
[dpdk-dev] [PATCH v5 1/1] eal: fix resource leak of mapped memory
Patch fixes resource leak in rte_eal_hugepage_attach() where mapped files were not freed back to the OS in case of failure. Patch uses the behavior of Linux munmap: "It is not an error if the indicated range does not contain any mapped pages". Coverity issue: 13295, 13296, 13303 Fixes: af75078fece3 ("first public release") Signed-off-by: Marcin Kerlin Acked-by: Sergio Gonzalez Monroy --- v5: -shift the history of changes v4: -removed keyword const from pointer and dependent on that casting (void *) v3: -removed redundant casting -removed update error message v2: -unmapping also previous addresses lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 79d1d2d..c935765 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1399,7 +1399,7 @@ int rte_eal_hugepage_attach(void) { const struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - const struct hugepage_file *hp = NULL; + struct hugepage_file *hp = NULL; unsigned num_hp = 0; unsigned i, s = 0; /* s used to track the segment number */ off_t size; @@ -1481,7 +1481,7 @@ rte_eal_hugepage_attach(void) size = getFileSize(fd_hugepage); hp = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd_hugepage, 0); - if (hp == NULL) { + if (hp == MAP_FAILED) { RTE_LOG(ERR, EAL, "Could not mmap %s\n", eal_hugepage_info_path()); goto error; } @@ -1545,12 +1545,19 @@ rte_eal_hugepage_attach(void) s++; } /* unmap the hugepage config file, since we are done using it */ - munmap((void *)(uintptr_t)hp, size); + munmap(hp, size); close(fd_zero); close(fd_hugepage); return 0; error: + s = 0; + while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0) { + munmap(mcfg->memseg[s].addr, mcfg->memseg[s].len); + s++; + } + if (hp != NULL && hp != MAP_FAILED) + munmap(hp, size); if (fd_zero >= 0) close(fd_zero); if (fd_hugepage >= 0) -- 1.9.1
[dpdk-dev] [PATCHv7 1/6] pmdinfogen: Add buildtools and pmdinfogen utility
On 06/16/2016 04:33 PM, Neil Horman wrote: > On Thu, Jun 16, 2016 at 03:29:57PM +0300, Panu Matilainen wrote: >> On 06/09/2016 08:46 PM, Neil Horman wrote: >>> pmdinfogen is a tool used to parse object files and build json strings for >>> use in later determining hardware support in a dso or application binary. >>> pmdinfo looks for the non-exported symbol names this_pmd_name and >>> this_pmd_tbl (where n is a integer counter). It records the name of >>> each of these tuples, using the later to find the symbolic name of the >>> pci_table for physical devices that the object supports. With this >>> information, it outputs a C file with a single line of the form: >>> >>> static char *_driver_info[] __attribute__((used)) = " \ >>> PMD_DRIVER_INFO="; >>> >>> Where is the arbitrary name of the pmd, and is the >>> json encoded string that hold relevant pmd information, including the pmd >>> name, type and optional array of pci device/vendor ids that the driver >>> supports. >>> >>> This c file is suitable for compiling to object code, then relocatably >>> linking into the parent file from which the C was generated. This creates >>> an entry in the string table of the object that can inform a later tool >>> about hardware support. >>> >>> Signed-off-by: Neil Horman >>> CC: Bruce Richardson >>> CC: Thomas Monjalon >>> CC: Stephen Hemminger >>> CC: Panu Matilainen >>> --- >> >> Unlike earlier versions, pmdinfogen ends up installed in bindir during "make >> install". Is that intentional, or just a side-effect from using >> rte.hostapp.mk? If its intentional it probably should be prefixed with dpdk_ >> like the other tools. >> > Im not sure what the answer is here. As you can see, Thomas and I argued at > length over which makefile to use, and I gave up, so I suppose you can call it > intentional. Being in bindir makes a reasonable amount of sense I suppose, as > 3rd party developers can use it during their independent driver development. Right, it'd be useful for 3rd party driver developer, so lets consider it intentional :) > I'm not sure I agree with prefixing it though. Given that the hostapp.mk file > installs everything there, and nothing that previously used that make file > had a > dpdk_ prefix that I can tell, I'm not sure why this would. pmdinfogen seems > like a pretty unique name, and I know of no other project that uses the term > pmd > to describe anything. I agree about "pmd" being fairly unique as is, but if pmdinfo is dpdk_ prefixed then this should be too, or neither should be prefixed. I dont personally care which way, but it should be consistent. - Panu - > > Neil > >> - Panu - >> >>
[dpdk-dev] [PATCH v3 3/4] bonding: take queue spinlock in rx/tx burst functions
2016-06-16 15:32, Bruce Richardson: > On Mon, Jun 13, 2016 at 01:28:08PM +0100, Iremonger, Bernard wrote: > > > Why does this particular PMD need spinlocks when doing RX and TX, while > > > other device types do not? How is adding/removing devices from a bonded > > > device different to other control operations that can be done on physical > > > PMDs? Is this not similar to say bringing down or hotplugging out a > > > physical > > > port just before an RX or TX operation takes place? > > > For all other PMDs we rely on the app to synchronise control and data > > > plane > > > operation - why not here? > > > > > > /Bruce > > > > This issue arose during VM live migration testing. > > For VM live migration it is necessary (while traffic is running) to be able > > to remove a bonded slave device, stop it, close it and detach it. > > It a slave device is removed from a bonded device while traffic is running > > a segmentation fault may occur in the rx/tx burst function. The spinlock > > has been added to prevent this occurring. > > > > The bonding device already uses a spinlock to synchronise between the add > > and remove functionality and the slave_link_status_change_monitor code. > > > > Previously testpmd did not allow, stop, close or detach of PMD while > > traffic was running. Testpmd has been modified with the following patchset > > > > http://dpdk.org/dev/patchwork/patch/13472/ > > > > It now allows stop, close and detach of a PMD provided in it is not > > forwarding and is not a slave of bonded PMD. > > > I will admit to not being fully convinced, but if nobody else has any serious > objections, and since this patch has been reviewed and acked, I'm ok to merge > it > in. I'll do so shortly. Please hold on. Seeing locks introduced in the Rx/Tx path is an alert. We clearly need a design document to explain where locks can be used and what are the responsibility of the control plane. If everybody agrees in this document that DPDK can have some locks in the fast path, then OK to merge it. So I would say NACK for 16.07 and maybe postpone to 16.11.
[dpdk-dev] Performance hit - NICs on different CPU sockets
On 6/16/16, 11:56 AM, "dev on behalf of Wiles, Keith" wrote: > >On 6/16/16, 11:20 AM, "Take Ceara" wrote: > >>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith >>wrote: >> >>> >>> Right now I do not know what the issue is with the system. Could be too >>> many Rx/Tx ring pairs per port and limiting the memory in the NICs, which >>> is why you get better performance when you have 8 core per port. I am not >>> really seeing the whole picture and how DPDK is configured to help more. >>> Sorry. >> >>I doubt that there is a limitation wrt running 16 cores per port vs 8 >>cores per port as I've tried with two different machines connected >>back to back each with one X710 port and 16 cores on each of them >>running on that port. In that case our performance doubled as >>expected. >> >>> >>> Maybe seeing the DPDK command line would help. >> >>The command line I use with ports 01:00.3 and 81:00.3 is: >>./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- >>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 >> >>Our own qmap args allow the user to control exactly how cores are >>split between ports. In this case we end up with: >> >>warp17> show port map >>Port 0[socket: 0]: >> Core 4[socket:0] (Tx: 0, Rx: 0) >> Core 5[socket:0] (Tx: 1, Rx: 1) >> Core 6[socket:0] (Tx: 2, Rx: 2) >> Core 7[socket:0] (Tx: 3, Rx: 3) >> Core 8[socket:0] (Tx: 4, Rx: 4) >> Core 9[socket:0] (Tx: 5, Rx: 5) >> Core 20[socket:0] (Tx: 6, Rx: 6) >> Core 21[socket:0] (Tx: 7, Rx: 7) >> Core 22[socket:0] (Tx: 8, Rx: 8) >> Core 23[socket:0] (Tx: 9, Rx: 9) >> Core 24[socket:0] (Tx: 10, Rx: 10) >> Core 25[socket:0] (Tx: 11, Rx: 11) >> Core 26[socket:0] (Tx: 12, Rx: 12) >> Core 27[socket:0] (Tx: 13, Rx: 13) >> Core 28[socket:0] (Tx: 14, Rx: 14) >> Core 29[socket:0] (Tx: 15, Rx: 15) >> >>Port 1[socket: 1]: >> Core 10[socket:1] (Tx: 0, Rx: 0) >> Core 11[socket:1] (Tx: 1, Rx: 1) >> Core 12[socket:1] (Tx: 2, Rx: 2) >> Core 13[socket:1] (Tx: 3, Rx: 3) >> Core 14[socket:1] (Tx: 4, Rx: 4) >> Core 15[socket:1] (Tx: 5, Rx: 5) >> Core 16[socket:1] (Tx: 6, Rx: 6) >> Core 17[socket:1] (Tx: 7, Rx: 7) >> Core 18[socket:1] (Tx: 8, Rx: 8) >> Core 19[socket:1] (Tx: 9, Rx: 9) >> Core 30[socket:1] (Tx: 10, Rx: 10) >> Core 31[socket:1] (Tx: 11, Rx: 11) >> Core 32[socket:1] (Tx: 12, Rx: 12) >> Core 33[socket:1] (Tx: 13, Rx: 13) >> Core 34[socket:1] (Tx: 14, Rx: 14) >> Core 35[socket:1] (Tx: 15, Rx: 15) > >On each socket you have 10 physical cores or 20 lcores per socket for 40 >lcores total. > >The above is listing the LCORES (or hyper-threads) and not COREs, which I >understand some like to think they are interchangeable. The problem is the >hyper-threads are logically interchangeable, but not performance wise. If you >have two run-to-completion threads on a single physical core each on a >different hyper-thread of that core [0,1], then the second lcore or thread (1) >on that physical core will only get at most about 30-20% of the CPU cycles. >Normally it is much less, unless you tune the code to make sure each thread is >not trying to share the internal execution units, but some internal execution >units are always shared. > >To get the best performance when hyper-threading is enable is to not run both >threads on a single physical core, but only run one hyper-thread-0. > >In the table below the table lists the physical core id and each of the lcore >ids per socket. Use the first lcore per socket for the best performance: >Core 1 [1, 21][11, 31] >Use lcore 1 or 11 depending on the socket you are on. > >The info below is most likely the best performance and utilization of your >system. If I got the values right ? > >./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- >--qmap 0.0x0003FE --qmap 1.0x0FFE00 > >Port 0[socket: 0]: > Core 2[socket:0] (Tx: 0, Rx: 0) > Core 3[socket:0] (Tx: 1, Rx: 1) > Core 4[socket:0] (Tx: 2, Rx: 2) > Core 5[socket:0] (Tx: 3, Rx: 3) > Core 6[socket:0] (Tx: 4, Rx: 4) > Core 7[socket:0] (Tx: 5, Rx: 5) > Core 8[socket:0] (Tx: 6, Rx: 6) > Core 9[socket:0] (Tx: 7, Rx: 7) > >8 cores on first socket leaving 0-1 lcores for Linux. 9 cores and leaving the first core or two lcores for Linux > >Port 1[socket: 1]: > Core 10[socket:1] (Tx: 0, Rx: 0) > Core 11[socket:1] (Tx: 1, Rx: 1) > Core 12[socket:1] (Tx: 2, Rx: 2) > Core 13[socket:1] (Tx: 3, Rx: 3) > Core 14[socket:1] (Tx: 4, Rx: 4) > Core 15[socket:1] (Tx: 5, Rx: 5) > Core 16[socket:1] (Tx: 6, Rx: 6) > Core 17[socket:1] (Tx: 7, Rx: 7) > Core 18[socket:1] (Tx: 8, Rx: 8) > Core 19[socket:1] (Tx: 9, Rx: 9) > >All 10 cores on the second socket. > >++Keith > >> >>Just for reference, the cpu_layout script shows: >>$ $RTE_SDK/tools/cpu_layout.py >> >>Core and Socket Information (as reported by '/proc/cpuinfo') >> >> >>cores = [0, 1,
[dpdk-dev] Performance hit - NICs on different CPU sockets
On 6/16/16, 11:20 AM, "Take Ceara" wrote: >On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith wrote: > >> >> Right now I do not know what the issue is with the system. Could be too many >> Rx/Tx ring pairs per port and limiting the memory in the NICs, which is why >> you get better performance when you have 8 core per port. I am not really >> seeing the whole picture and how DPDK is configured to help more. Sorry. > >I doubt that there is a limitation wrt running 16 cores per port vs 8 >cores per port as I've tried with two different machines connected >back to back each with one X710 port and 16 cores on each of them >running on that port. In that case our performance doubled as >expected. > >> >> Maybe seeing the DPDK command line would help. > >The command line I use with ports 01:00.3 and 81:00.3 is: >./warp17 -c 0xF3 -m 32768 -w :81:00.3 -w :01:00.3 -- >--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00 > >Our own qmap args allow the user to control exactly how cores are >split between ports. In this case we end up with: > >warp17> show port map >Port 0[socket: 0]: > Core 4[socket:0] (Tx: 0, Rx: 0) > Core 5[socket:0] (Tx: 1, Rx: 1) > Core 6[socket:0] (Tx: 2, Rx: 2) > Core 7[socket:0] (Tx: 3, Rx: 3) > Core 8[socket:0] (Tx: 4, Rx: 4) > Core 9[socket:0] (Tx: 5, Rx: 5) > Core 20[socket:0] (Tx: 6, Rx: 6) > Core 21[socket:0] (Tx: 7, Rx: 7) > Core 22[socket:0] (Tx: 8, Rx: 8) > Core 23[socket:0] (Tx: 9, Rx: 9) > Core 24[socket:0] (Tx: 10, Rx: 10) > Core 25[socket:0] (Tx: 11, Rx: 11) > Core 26[socket:0] (Tx: 12, Rx: 12) > Core 27[socket:0] (Tx: 13, Rx: 13) > Core 28[socket:0] (Tx: 14, Rx: 14) > Core 29[socket:0] (Tx: 15, Rx: 15) > >Port 1[socket: 1]: > Core 10[socket:1] (Tx: 0, Rx: 0) > Core 11[socket:1] (Tx: 1, Rx: 1) > Core 12[socket:1] (Tx: 2, Rx: 2) > Core 13[socket:1] (Tx: 3, Rx: 3) > Core 14[socket:1] (Tx: 4, Rx: 4) > Core 15[socket:1] (Tx: 5, Rx: 5) > Core 16[socket:1] (Tx: 6, Rx: 6) > Core 17[socket:1] (Tx: 7, Rx: 7) > Core 18[socket:1] (Tx: 8, Rx: 8) > Core 19[socket:1] (Tx: 9, Rx: 9) > Core 30[socket:1] (Tx: 10, Rx: 10) > Core 31[socket:1] (Tx: 11, Rx: 11) > Core 32[socket:1] (Tx: 12, Rx: 12) > Core 33[socket:1] (Tx: 13, Rx: 13) > Core 34[socket:1] (Tx: 14, Rx: 14) > Core 35[socket:1] (Tx: 15, Rx: 15) On each socket you have 10 physical cores or 20 lcores per socket for 40 lcores total. The above is listing the LCORES (or hyper-threads) and not COREs, which I understand some like to think they are interchangeable. The problem is the hyper-threads are logically interchangeable, but not performance wise. If you have two run-to-completion threads on a single physical core each on a different hyper-thread of that core [0,1], then the second lcore or thread (1) on that physical core will only get at most about 30-20% of the CPU cycles. Normally it is much less, unless you tune the code to make sure each thread is not trying to share the internal execution units, but some internal execution units are always shared. To get the best performance when hyper-threading is enable is to not run both threads on a single physical core, but only run one hyper-thread-0. In the table below the table lists the physical core id and each of the lcore ids per socket. Use the first lcore per socket for the best performance: Core 1 [1, 21][11, 31] Use lcore 1 or 11 depending on the socket you are on. The info below is most likely the best performance and utilization of your system. If I got the values right ? ./warp17 -c 0x0FFFe0 -m 32768 -w :81:00.3 -w :01:00.3 -- --qmap 0.0x0003FE --qmap 1.0x0FFE00 Port 0[socket: 0]: Core 2[socket:0] (Tx: 0, Rx: 0) Core 3[socket:0] (Tx: 1, Rx: 1) Core 4[socket:0] (Tx: 2, Rx: 2) Core 5[socket:0] (Tx: 3, Rx: 3) Core 6[socket:0] (Tx: 4, Rx: 4) Core 7[socket:0] (Tx: 5, Rx: 5) Core 8[socket:0] (Tx: 6, Rx: 6) Core 9[socket:0] (Tx: 7, Rx: 7) 8 cores on first socket leaving 0-1 lcores for Linux. Port 1[socket: 1]: Core 10[socket:1] (Tx: 0, Rx: 0) Core 11[socket:1] (Tx: 1, Rx: 1) Core 12[socket:1] (Tx: 2, Rx: 2) Core 13[socket:1] (Tx: 3, Rx: 3) Core 14[socket:1] (Tx: 4, Rx: 4) Core 15[socket:1] (Tx: 5, Rx: 5) Core 16[socket:1] (Tx: 6, Rx: 6) Core 17[socket:1] (Tx: 7, Rx: 7) Core 18[socket:1] (Tx: 8, Rx: 8) Core 19[socket:1] (Tx: 9, Rx: 9) All 10 cores on the second socket. ++Keith > >Just for reference, the cpu_layout script shows: >$ $RTE_SDK/tools/cpu_layout.py > >Core and Socket Information (as reported by '/proc/cpuinfo') > > >cores = [0, 1, 2, 3, 4, 8, 9, 10, 11, 12] >sockets = [0, 1] > >Socket 0Socket 1 > >Core 0 [0, 20] [10, 30] >Core 1 [1, 21] [11, 31] >Core 2 [2, 22] [12, 32] >Core 3 [3, 23] [13, 33] >Core 4
[dpdk-dev] [PATCH v5] eal: out-of-bounds write
Overrunning array mcfg->memseg of 256 44-byte elements at element index 257 using index j. Fixed by add condition with message information. Fixes: af75078fece3 ("first public release") Coverity ID 13282 Signed-off-by: Slawomir Mrozowicz --- v5: - update message v4: - remove check condition from loop v3: - add check condition inside and outside the loop v2: - add message information --- lib/librte_eal/linuxapp/eal/eal_memory.c | 8 1 file changed, 8 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 5b9132c..ffe069c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1301,6 +1301,14 @@ rte_eal_hugepage_init(void) break; } + if (j >= RTE_MAX_MEMSEG) { + RTE_LOG(ERR, EAL, + "All memory segments exhausted by IVSHMEM. " + "Try recompiling with larger RTE_MAX_MEMSEG " + "then current %d\n", RTE_MAX_MEMSEG); + return -ENOMEM; + } + for (i = 0; i < nr_hugefiles; i++) { new_memseg = 0; -- 1.9.1
[dpdk-dev] [PATCH v2] xenvirt: fix compilation after mempool changes
On Mon, Jun 13, 2016 at 01:54:29PM +0200, Christian Ehrhardt wrote: > Yeah, working now - thanks for the fast update! > > Kind Regards, > Christian > > Christian Ehrhardt > Software Engineer, Ubuntu Server > Canonical Ltd > Applied to dpdk-next-net/rel_16_07 /Bruce
[dpdk-dev] [PATCH] app/testpmd: unchecked return value
> > Calling rte_eth_dev_rss_hash_update without checking return value. > > Fixed by handle return value and print out error status. > > > > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > > Coverity ID 119251 > > > > Signed-off-by: Slawomir Mrozowicz > > Acked-by: Pablo de Lara Applied, thanks
[dpdk-dev] [PATCH v5 00/25] DPDK PMD for ThunderX NIC device
On Thu, Jun 16, 2016 at 11:58:27AM +0100, Bruce Richardson wrote: > On Thu, Jun 16, 2016 at 03:01:02PM +0530, Jerin Jacob wrote: > > On Wed, Jun 15, 2016 at 03:39:25PM +0100, Bruce Richardson wrote: > > > On Wed, Jun 15, 2016 at 12:36:15AM +0530, Jerin Jacob wrote: > > > > This patch set provides the initial version of DPDK PMD for the > > > > built-in NIC device in Cavium ThunderX SoC family. > > > > > > > > Implemented features and ThunderX nicvf PMD documentation added > > > > in doc/guides/nics/overview.rst and doc/guides/nics/thunderx.rst > > > > respectively in this patch set. > > > > > > > > These patches are checked using checkpatch.sh with following > > > > additional ignore option: > > > > options="$options --ignore=CAMELCASE,BRACKET_SPACE" > > > > CAMELCASE - To accommodate PRIx64 > > > > BRACKET_SPACE - To accommodate AT inline line assembly in two places > > > > > > > > This patch set is based on DPDK 16.07-RC1 > > > > and tested with git HEAD change-set > > > > ca173a909538a2f1082cd0dcb4d778a97dab69c3 along with > > > > following depended patch > > > > > > > > http://dpdk.org/dev/patchwork/patch/11826/ > > > > ethdev: add tunnel and port RSS offload types > > > > > > > Hi Jerin, > > > > > > hopefully a final set of comments before merge on this set, as it's > > > looking > > > very good now. > > > > > > * Two patches look like they need to be split, as they are combining > > > multiple > > > functions into one patch. They are: > > > [dpdk-dev,v5,16/25] net/thunderx: add MTU set and promiscuous enable > > > support > > > [dpdk-dev,v5,20/25] net/thunderx: implement supported ptype get and > > > Rx queue count > > > For the other patches which add multiple functions, the functions seem > > > to be > > > logically related so I don't think there is a problem > > > > > > * check-git-logs.sh is warning about a few of the commit messages being > > > too long. > > > Splitting patch 20 should fix one of those, but there are a few > > > remaining. > > > A number of titles refer to ThunderX in the message, but this is > > > probably > > > unnecessary, as the prefix already contains "net/thunderx" in it. > > > > OK. I will send the next revision. > > > > Please hold off a few hours, as I'm hoping to merge in the bnxt driver this > afternoon. If all goes well, I would appreciate it if you could base your > patchset > off the rel_16_07 tree with that set applied - save me having to resolve > conflicts > in files like the nic overview doc, which is always a pain to try and edit. > :-) OK. I will re-base the changes once you have done with bnxt merge. Let me know once its done. > > Regards, > /Bruce
[dpdk-dev] [PATCH 0/2] vhost: Fix leaks on migration.
Thanks for fixing them! Would you please resend them, with a rebase based on master branch of following tree: http://dpdk.org/browse/next/dpdk-next-virtio/ --yliu On Thu, Jun 16, 2016 at 11:32:03AM +0300, Ilya Maximets wrote: > Ilya Maximets (2): > vhost: fix leak of file descriptors. > vhost: unmap log memory on cleanup. > > lib/librte_vhost/rte_virtio_net.h | 3 ++- > lib/librte_vhost/vhost_user/virtio-net-user.c | 16 ++-- > 2 files changed, 16 insertions(+), 3 deletions(-) > > -- > 2.7.4
[dpdk-dev] [PATCH v3 3/4] bonding: take queue spinlock in rx/tx burst functions
Hi Thomas, > 2016-06-16 15:32, Bruce Richardson: > > On Mon, Jun 13, 2016 at 01:28:08PM +0100, Iremonger, Bernard wrote: > > > > Why does this particular PMD need spinlocks when doing RX and TX, > > > > while other device types do not? How is adding/removing devices > > > > from a bonded device different to other control operations that > > > > can be done on physical PMDs? Is this not similar to say bringing > > > > down or hotplugging out a physical port just before an RX or TX > operation takes place? > > > > For all other PMDs we rely on the app to synchronise control and > > > > data plane operation - why not here? > > > > > > > > /Bruce > > > > > > This issue arose during VM live migration testing. > > > For VM live migration it is necessary (while traffic is running) to be > > > able to > remove a bonded slave device, stop it, close it and detach it. > > > It a slave device is removed from a bonded device while traffic is running > a segmentation fault may occur in the rx/tx burst function. The spinlock has > been added to prevent this occurring. > > > > > > The bonding device already uses a spinlock to synchronise between the > add and remove functionality and the slave_link_status_change_monitor > code. > > > > > > Previously testpmd did not allow, stop, close or detach of PMD while > > > traffic was running. Testpmd has been modified with the following > > > patchset > > > > > > http://dpdk.org/dev/patchwork/patch/13472/ > > > > > > It now allows stop, close and detach of a PMD provided in it is not > forwarding and is not a slave of bonded PMD. > > > > > I will admit to not being fully convinced, but if nobody else has any > > serious objections, and since this patch has been reviewed and acked, > > I'm ok to merge it in. I'll do so shortly. > > Please hold on. > Seeing locks introduced in the Rx/Tx path is an alert. > We clearly need a design document to explain where locks can be used and > what are the responsibility of the control plane. > If everybody agrees in this document that DPDK can have some locks in the > fast path, then OK to merge it. > > So I would say NACK for 16.07 and maybe postpone to 16.11. Looking at the documentation for the bonding PMD. http://dpdk.org/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.html In section 10.2 it states the following: Bonded devices support the dynamical addition and removal of slave devices using the rte_eth_bond_slave_add / rte_eth_bond_slave_remove APIs. If a slave device is added or removed while traffic is running, there is the possibility of a segmentation fault in the rx/tx burst functions. This is most likely to occur in the round robin bonding mode. This patch set fixes what appears to be a bug in the bonding PMD. Performance measurements have been made with this patch set applied and without the patches applied using 64 byte packets. With the patches applied the following drop in performance was observed: % drop for fwd+io: 0.16% % drop for fwd+mac: 0.39% This patch set has been reviewed and ack'ed, so I think it should be applied in 16.07 Regards, Bernard.
[dpdk-dev] [PATCH v3 0/5] vhost/virtio performance loopback utility
> > Zhihong Wang (5): > > testpmd: add retry option > > testpmd: configurable tx_first burst number > > testpmd: show throughput in port stats > > testpmd: handle all rxqs in rss setup > > testpmd: show topology at forwarding start > > Series-acked-by: Pablo de Lara Applied, thanks
[dpdk-dev] [PATCH v3 5/5] testpmd: show topology at forwarding start
2016-06-16 11:09, De Lara Guarch, Pablo: > > --- a/app/test-pmd/testpmd.c > > +++ b/app/test-pmd/testpmd.c > > @@ -1016,6 +1016,7 @@ start_packet_forwarding(int with_tx_first) > > flush_fwd_rx_queues(); > > > > fwd_config_setup(); > > + fwd_config_display(); > > rxtx_config_display(); > > > > for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++) { > > -- > > 2.5.0 > > Already acked this, but note that fwd_config_display() has been renamed to > pkt_fwd_config_display(). > Thomas, can you make that change when merging this? Yes done :)
[dpdk-dev] [PATCH v5 00/25] DPDK PMD for ThunderX NIC device
On Thu, Jun 16, 2016 at 04:47:39PM +0530, Jerin Jacob wrote: > On Thu, Jun 16, 2016 at 11:58:27AM +0100, Bruce Richardson wrote: > > On Thu, Jun 16, 2016 at 03:01:02PM +0530, Jerin Jacob wrote: > > > On Wed, Jun 15, 2016 at 03:39:25PM +0100, Bruce Richardson wrote: > > > > On Wed, Jun 15, 2016 at 12:36:15AM +0530, Jerin Jacob wrote: > > > > > This patch set provides the initial version of DPDK PMD for the > > > > > built-in NIC device in Cavium ThunderX SoC family. > > > > > > > > > > Implemented features and ThunderX nicvf PMD documentation added > > > > > in doc/guides/nics/overview.rst and doc/guides/nics/thunderx.rst > > > > > respectively in this patch set. > > > > > > > > > > These patches are checked using checkpatch.sh with following > > > > > additional ignore option: > > > > > options="$options --ignore=CAMELCASE,BRACKET_SPACE" > > > > > CAMELCASE - To accommodate PRIx64 > > > > > BRACKET_SPACE - To accommodate AT inline line assembly in two places > > > > > > > > > > This patch set is based on DPDK 16.07-RC1 > > > > > and tested with git HEAD change-set > > > > > ca173a909538a2f1082cd0dcb4d778a97dab69c3 along with > > > > > following depended patch > > > > > > > > > > http://dpdk.org/dev/patchwork/patch/11826/ > > > > > ethdev: add tunnel and port RSS offload types > > > > > > > > > Hi Jerin, > > > > > > > > hopefully a final set of comments before merge on this set, as it's > > > > looking > > > > very good now. > > > > > > > > * Two patches look like they need to be split, as they are combining > > > > multiple > > > > functions into one patch. They are: > > > > [dpdk-dev,v5,16/25] net/thunderx: add MTU set and promiscuous > > > > enable support > > > > [dpdk-dev,v5,20/25] net/thunderx: implement supported ptype get and > > > > Rx queue count > > > > For the other patches which add multiple functions, the functions > > > > seem to be > > > > logically related so I don't think there is a problem > > > > > > > > * check-git-logs.sh is warning about a few of the commit messages being > > > > too long. > > > > Splitting patch 20 should fix one of those, but there are a few > > > > remaining. > > > > A number of titles refer to ThunderX in the message, but this is > > > > probably > > > > unnecessary, as the prefix already contains "net/thunderx" in it. > > > > > > OK. I will send the next revision. > > > > > > > Please hold off a few hours, as I'm hoping to merge in the bnxt driver this > > afternoon. If all goes well, I would appreciate it if you could base your > > patchset > > off the rel_16_07 tree with that set applied - save me having to resolve > > conflicts > > in files like the nic overview doc, which is always a pain to try and edit. > > :-) > > OK. I will re-base the changes once you have done with bnxt merge. > Let me know once its done. > Done now. Feel free to submit a new version based on rel_16_07 branch. Thanks, /Bruce
[dpdk-dev] [PATCHv7 5/6] pmdinfo.py: Add tool to query binaries for hw and other support information
On 06/09/2016 08:47 PM, Neil Horman wrote: > This tool searches for the primer sting PMD_DRIVER_INFO= in any ELF binary, > and, if found parses the remainder of the string as a json encoded string, > outputting the results in either a human readable or raw, script parseable > format > > Note that, in the case of dynamically linked applications, pmdinfo.py will > scan for implicitly linked PMDs by searching the specified binaries > .dynamic section for DT_NEEDED entries that contain the substring > librte_pmd. The DT_RUNPATH, LD_LIBRARY_PATH, /usr/lib and /lib are > searched for these libraries, in that order > > If a file is specified with no path, it is assumed to be a PMD DSO, and the > LD_LIBRARY_PATH, /usr/lib[64]/ and /lib[64] is searched for it > > Currently the tool can output data in 3 formats: > > a) raw, suitable for scripting, where the raw JSON strings are dumped out > b) table format (default) where hex pci ids are dumped in a table format > c) pretty, where a user supplied pci.ids file is used to print out vendor > and device strings > > Signed-off-by: Neil Horman > CC: Bruce Richardson > CC: Thomas Monjalon > CC: Stephen Hemminger > CC: Panu Matilainen > --- > mk/rte.sdkinstall.mk | 2 + > tools/pmdinfo.py | 629 > +++ > 2 files changed, 631 insertions(+) > create mode 100755 tools/pmdinfo.py > > diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk > index 68e56b6..dc36df5 100644 > --- a/mk/rte.sdkinstall.mk > +++ b/mk/rte.sdkinstall.mk > @@ -126,6 +126,8 @@ install-runtime: > $(Q)$(call rte_mkdir, $(DESTDIR)$(sbindir)) > $(Q)$(call rte_symlink,$(DESTDIR)$(datadir)/tools/dpdk_nic_bind.py, > \ > $(DESTDIR)$(sbindir)/dpdk_nic_bind) > + $(Q)$(call rte_symlink,$(DESTDIR)$(datadir)/tools/pmdinfo.py, \ > +$(DESTDIR)$(bindir)/dpdk-pmdinfo) The symlink should be with underscore instead of dash for consistency with all the other tools, ie dpdk_pmdinfo. Neil, I already gave you an ack on the series as per the functionality, feel free to include that in any future versions of the patch series. Minor nits like these are ... well, minor nits from my POV at least. - Panu -
[dpdk-dev] [PATCH v3 3/4] bonding: take queue spinlock in rx/tx burst functions
On Mon, Jun 13, 2016 at 01:28:08PM +0100, Iremonger, Bernard wrote: > Hi Bruce, > > > > > Subject: Re: [dpdk-dev] [PATCH v3 3/4] bonding: take queue spinlock in rx/tx > > burst functions > > > > On Sun, Jun 12, 2016 at 06:11:28PM +0100, Bernard Iremonger wrote: > > > Use rte_spinlock_trylock() in the rx/tx burst functions to take the > > > queue spinlock. > > > > > > Signed-off-by: Bernard Iremonger > > > Acked-by: Konstantin Ananyev > > > --- > > > > Why does this particular PMD need spinlocks when doing RX and TX, while > > other device types do not? How is adding/removing devices from a bonded > > device different to other control operations that can be done on physical > > PMDs? Is this not similar to say bringing down or hotplugging out a physical > > port just before an RX or TX operation takes place? > > For all other PMDs we rely on the app to synchronise control and data plane > > operation - why not here? > > > > /Bruce > > This issue arose during VM live migration testing. > For VM live migration it is necessary (while traffic is running) to be able > to remove a bonded slave device, stop it, close it and detach it. > It a slave device is removed from a bonded device while traffic is running a > segmentation fault may occur in the rx/tx burst function. The spinlock has > been added to prevent this occurring. > > The bonding device already uses a spinlock to synchronise between the add and > remove functionality and the slave_link_status_change_monitor code. > > Previously testpmd did not allow, stop, close or detach of PMD while traffic > was running. Testpmd has been modified with the following patchset > > http://dpdk.org/dev/patchwork/patch/13472/ > > It now allows stop, close and detach of a PMD provided in it is not > forwarding and is not a slave of bonded PMD. > I will admit to not being fully convinced, but if nobody else has any serious objections, and since this patch has been reviewed and acked, I'm ok to merge it in. I'll do so shortly. /Bruce
[dpdk-dev] [PATCHv7 1/6] pmdinfogen: Add buildtools and pmdinfogen utility
On 06/09/2016 08:46 PM, Neil Horman wrote: > pmdinfogen is a tool used to parse object files and build json strings for > use in later determining hardware support in a dso or application binary. > pmdinfo looks for the non-exported symbol names this_pmd_name and > this_pmd_tbl (where n is a integer counter). It records the name of > each of these tuples, using the later to find the symbolic name of the > pci_table for physical devices that the object supports. With this > information, it outputs a C file with a single line of the form: > > static char *_driver_info[] __attribute__((used)) = " \ > PMD_DRIVER_INFO="; > > Where is the arbitrary name of the pmd, and is the > json encoded string that hold relevant pmd information, including the pmd > name, type and optional array of pci device/vendor ids that the driver > supports. > > This c file is suitable for compiling to object code, then relocatably > linking into the parent file from which the C was generated. This creates > an entry in the string table of the object that can inform a later tool > about hardware support. > > Signed-off-by: Neil Horman > CC: Bruce Richardson > CC: Thomas Monjalon > CC: Stephen Hemminger > CC: Panu Matilainen > --- Unlike earlier versions, pmdinfogen ends up installed in bindir during "make install". Is that intentional, or just a side-effect from using rte.hostapp.mk? If its intentional it probably should be prefixed with dpdk_ like the other tools. - Panu -
[dpdk-dev] [PATCH v6 00/38] new bnxt poll mode driver library
On Wed, Jun 15, 2016 at 02:23:00PM -0700, Stephen Hurd wrote: > The bnxt poll mode library (librte_pmd_bnxt) implements support for > Broadcom NetXtreme C-Series. These adapters support Standards- > compliant 10/25/50Gbps 30MPPS full-duplex throughput. > > Information about this family of adapters can be found in the > NetXtreme Brand section https://goo.gl/4H7q63 of the Broadcom web > site http://www.broadcom.com/ > > With the current driver, allocated mbufs must be large enough to hold > the entire received frame. If the mbufs are not large enough, the > packets will be dropped. This is most limiting when jumbo frames are > used. > Applied to dpdk-next-net/rel_16_07 On apply I got conflicts with the nic overview document, so please check the resulting information in that document is correct in the next-net tree. I also added a very short entry to the release notes for this new driver as part of patch 1, since that was missing. Please also check that for correctness and send on any additional comments/corrections you want on that. Thanks for all the work on this driver. Regards, /Bruce
[dpdk-dev] [PATCH v3] rte_hash: add scalable multi-writer insertion w/ Intel TSX
This patch introduced scalable multi-writer Cuckoo Hash insertion based on a split Cuckoo Search and Move operation using Intel TSX. It can do scalable hash insertion with 22 cores with little performance loss and negligible TSX abortion rate. * Added an extra rte_hash flag definition to switch default single writer Cuckoo Hash behavior to multiwriter. - If HTM is available, it would use hardware feature for concurrency. - If HTM is not available, it would fall back to spinlock. * Created a rte_cuckoo_hash_x86.h file to hold all x86-arch related cuckoo_hash functions. And rte_cuckoo_hash.c uses compile time flag to select x86 file or other platform-specific implementations. While HTM check is still done at runtime (same idea with RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT) * Moved rte_hash private struct definitions to rte_cuckoo_hash.h, to allow rte_cuckoo_hash_x86.h or future platform dependent functions to include. * Following new functions are created for consistent names when new platform TM support are added. - rte_hash_cuckoo_move_insert_mw_tm: do insertion with bucket movement. - rte_hash_cuckoo_insert_mw_tm: do insertion without bucket movement. * One extra multi-writer test case is added. Signed-off-by: Shen Wei Signed-off-by: Sameh Gobriel --- app/test/Makefile | 1 + app/test/test_hash_multiwriter.c | 287 + doc/guides/rel_notes/release_16_07.rst | 12 ++ lib/librte_hash/rte_cuckoo_hash.c | 258 ++--- lib/librte_hash/rte_cuckoo_hash.h | 219 + lib/librte_hash/rte_cuckoo_hash_x86.h | 193 ++ lib/librte_hash/rte_hash.h | 3 + 7 files changed, 796 insertions(+), 177 deletions(-) create mode 100644 app/test/test_hash_multiwriter.c create mode 100644 lib/librte_hash/rte_cuckoo_hash.h create mode 100644 lib/librte_hash/rte_cuckoo_hash_x86.h diff --git a/app/test/Makefile b/app/test/Makefile index 053f3a2..5476300 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -120,6 +120,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_thash.c SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_scaling.c +SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_multiwriter.c SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm_perf.c diff --git a/app/test/test_hash_multiwriter.c b/app/test/test_hash_multiwriter.c new file mode 100644 index 000..b0f31b0 --- /dev/null +++ b/app/test/test_hash_multiwriter.c @@ -0,0 +1,287 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "test.h" + +/* + * Check condition and return an error if true. Assumes that "handle" is the + * name of the hash structure pointer to be freed. + */ +#define RETURN_IF_ERROR(cond, str, ...) do {\ + if (cond) { \ + printf("ERROR line %d: " str "\n", __LINE__,\ + ##__VA_ARGS__); \ + if (handle)
[dpdk-dev] [PATCH v3] rte_hash: add scalable multi-writer insertion w/ Intel TSX
Here's the latest version of the rte_hash multi-writer patch. It's re-based on top of the latest head as of Jun 16, 2016. http://dpdk.org/dev/patchwork/patch/13886/ http://dpdk.org/dev/patchwork/patch/12589/ v3 changes: * Made spinlock as fall back behavior when developer choose to use multi-writer behavior while HTM is not available. * Created a rte_cuckoo_hash_x86.h file to hold all x86-specific related cuckoo_hash functions. And rte_cuckoo_hash.c uses compile time flag to select x86 file or other platform-specific implementations. While HTM check is still done at runtime (same with RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT) * Moved rte_hash private struct definitions to rte_cuckoo_hash.h, to allow rte_cuckoo_hash_x86.h or future platform dependent functions to include. * Following renaming for consistent names when new platform TM support are added. - rte_hash_cuckoo_insert_mw_tm for trying insertion without moving buckets around. - rte_hash_cuckoo_move_insert_mw_tm for trying insertion by moving buckets around. v2 changes: * Address issues pointed out by reviews on mailing list. * Removed the RTE_HASH_KEY_FLAG_MOVED flag used in v1, which would cause problem when key deletion happens. Wei Shen (1): rte_hash: add scalable multi-writer insertion w/ Intel TSX app/test/Makefile | 1 + app/test/test_hash_multiwriter.c | 287 + doc/guides/rel_notes/release_16_07.rst | 12 ++ lib/librte_hash/rte_cuckoo_hash.c | 258 ++--- lib/librte_hash/rte_cuckoo_hash.h | 219 + lib/librte_hash/rte_cuckoo_hash_x86.h | 193 ++ lib/librte_hash/rte_hash.h | 3 + 7 files changed, 796 insertions(+), 177 deletions(-) create mode 100644 app/test/test_hash_multiwriter.c create mode 100644 lib/librte_hash/rte_cuckoo_hash.h create mode 100644 lib/librte_hash/rte_cuckoo_hash_x86.h -- 2.5.5
[dpdk-dev] [PATCH v5 00/25] DPDK PMD for ThunderX NIC device
On Wed, Jun 15, 2016 at 03:39:25PM +0100, Bruce Richardson wrote: > On Wed, Jun 15, 2016 at 12:36:15AM +0530, Jerin Jacob wrote: > > This patch set provides the initial version of DPDK PMD for the > > built-in NIC device in Cavium ThunderX SoC family. > > > > Implemented features and ThunderX nicvf PMD documentation added > > in doc/guides/nics/overview.rst and doc/guides/nics/thunderx.rst > > respectively in this patch set. > > > > These patches are checked using checkpatch.sh with following > > additional ignore option: > > options="$options --ignore=CAMELCASE,BRACKET_SPACE" > > CAMELCASE - To accommodate PRIx64 > > BRACKET_SPACE - To accommodate AT inline line assembly in two places > > > > This patch set is based on DPDK 16.07-RC1 > > and tested with git HEAD change-set > > ca173a909538a2f1082cd0dcb4d778a97dab69c3 along with > > following depended patch > > > > http://dpdk.org/dev/patchwork/patch/11826/ > > ethdev: add tunnel and port RSS offload types > > > Hi Jerin, > > hopefully a final set of comments before merge on this set, as it's looking > very good now. > > * Two patches look like they need to be split, as they are combining multiple > functions into one patch. They are: > [dpdk-dev,v5,16/25] net/thunderx: add MTU set and promiscuous enable > support > [dpdk-dev,v5,20/25] net/thunderx: implement supported ptype get and Rx > queue count > For the other patches which add multiple functions, the functions seem to be > logically related so I don't think there is a problem > > * check-git-logs.sh is warning about a few of the commit messages being too > long. > Splitting patch 20 should fix one of those, but there are a few remaining. > A number of titles refer to ThunderX in the message, but this is probably > unnecessary, as the prefix already contains "net/thunderx" in it. OK. I will send the next revision. > > Regards, > /Bruce > > PS: Please also baseline patches on dpdk-next-net/rel_16_07 tree. They > currently > apply fine to that tree so there is no problem, but just in case later commits > break things, that is the tree that net patches should be based on.
[dpdk-dev] Performance hit - NICs on different CPU sockets
On 6/16/16, 9:36 AM, "Take Ceara" wrote: >Hi Keith, > >On Tue, Jun 14, 2016 at 3:47 PM, Wiles, Keith wrote: Normally the limitation is in the hardware, basically how the PCI bus is connected to the CPUs (or sockets). How the PCI buses are connected to the system depends on the Mother board design. I normally see the buses attached to socket 0, but you could have some of the buses attached to the other sockets or all on one socket via a PCI bridge device. No easy way around the problem if some of your PCI buses are split or all on a single socket. Need to look at your system docs or look at lspci it has an option to dump the PCI bus as an ASCII tree, at least on Ubuntu. >>> >>>This is the motherboard we use on our system: >>> >>>http://www.supermicro.com/products/motherboard/Xeon/C600/X10DRX.cfm >>> >>>I need to swap some NICs around (as now we moved everything on socket >>>1) before I can share the lspci output. >> >> FYI: the option for lspci is ?lspci ?tv?, but maybe more options too. >> > >I retested with two 10G X710 ports connected back to back: >port 0: :01:00.3 - socket 0 >port 1: :81:00.3 - socket 1 Please provide the output from tools/cpu_layout.py. > >I ran the following scenarios: >- assign 16 threads from CPU 0 on socket 0 to port 0 and 16 threads >from CPU 1 to port 1 => setup rate of 1.6M sess/s >- assign only the 16 threads from CPU0 for both ports (so 8 threads on >socket 0 for port 0 and 8 threads on socket 0 for port 1) => setup >rate of 3M sess/s >- assign only the 16 threads from CPU1 for both ports (so 8 threads on >socket 1 for port 0 and 8 threads on socket 1 for port 1) => setup >rate of 3M sess/s > >I also tried a scenario with two machines connected back to back each >of which had a NIC on socket 1. I assigned 16 threads from socket 1 on >each machine to the port and performance scaled to 6M sess/s as >expected. > >I double checked all our memory allocations and, at least in the >tested scenario, we never use memory that's not on the same socket as >the core. > >I pasted below the output of lspci -tv. I see that :01:00.3 and >:81:00.3 are connected to different PCI bridges but on each of >those bridges there are also "Intel Corporation Xeon E7 v3/Xeon E5 >v3/Core i7 DMA Channel " devices. > >It would be great if you could also take a look in case I >missed/misunderstood something. > >Thanks, >Dumitru >
[dpdk-dev] [PATCH v4] eal: out-of-bounds write
On 06/15/2016 04:25 PM, Slawomir Mrozowicz wrote: > Overrunning array mcfg->memseg of 256 44-byte elements > at element index 257 using index j. > Fixed by add condition with message information. > > Fixes: af75078fece3 ("first public release") > Coverity ID 13282 > > Signed-off-by: Slawomir Mrozowicz > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 5b9132c..19753b1 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1301,6 +1301,15 @@ rte_eal_hugepage_init(void) > break; > } > > + if (j >= RTE_MAX_MEMSEG) { > + RTE_LOG(ERR, EAL, > + "Failed: all memsegs used by ivshmem.\n" > + "Current %d is not enough.\n" > + "Please either increase the RTE_MAX_MEMSEG\n", > + RTE_MAX_MEMSEG); > + return -ENOMEM; > + } The error message is either incomplete or not coherent: "please either increase..." or what? Also no need for that "Failed:" because its already prefixed by "Error:". I'm not sure how helpful it is to have an error message suggest increasing a value that requires recomplication, but maybe something more in the lines of: ("All memory segments exhausted by IVSHMEM. Try recompiling with larger RTE_MAX_MEMSEG than current %d?", RTE_MAX_MEMSEG) - Panu -
[dpdk-dev] [PATCH v4] e1000: configure VLAN TPID
> -Original Message- > From: Xing, Beilei > Sent: Thursday, June 16, 2016 9:36 PM > To: Zhang, Helin > Cc: dev at dpdk.org; Xing, Beilei > Subject: [PATCH v4] e1000: configure VLAN TPID > > This patch enables configuring the outer TPID for double VLAN. > Note that all other TPID values are read only. > > Signed-off-by: Beilei Xing Acked-by: Helin Zhang
[dpdk-dev] [PATCH v4 1/1] eal: fix resource leak of mapped memory
On 15/06/2016 13:25, Marcin Kerlin wrote: > Patch fixes resource leak in rte_eal_hugepage_attach() where mapped files > were not freed back to the OS in case of failure. Patch uses the behavior > of Linux munmap: "It is not an error if the indicated range does not > contain any mapped pages". > > v4: > 1)removed keyword const from pointer and dependent on that casting (void *) > v3: > 1)removed redundant casting > 2)removed update error message > v2: > 1)unmapping also previous addresses The patch version history should be after the triple dash below so it won't show up on git log. > Coverity issue: 13295, 13296, 13303 > Fixes: af75078fece3 ("first public release") > > Signed-off-by: Marcin Kerlin > --- Insert here patch version history. > lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 79d1d2d..c935765 100644 Thomas, are you ok to update the commit message? Otherwise, please Mercin do v5 with changes and keep my ack. Acked-by: Sergio Gonzalez Monroy
[dpdk-dev] [PATCH v6 00/38] new bnxt poll mode driver library
On Thu, Jun 16, 2016 at 9:24 AM, Bruce Richardson < bruce.richardson at intel.com> wrote: > On Wed, Jun 15, 2016 at 02:23:00PM -0700, Stephen Hurd wrote: > > The bnxt poll mode library (librte_pmd_bnxt) implements support for > > Broadcom NetXtreme C-Series. These adapters support Standards- > > compliant 10/25/50Gbps 30MPPS full-duplex throughput. > > > > Information about this family of adapters can be found in the > > NetXtreme Brand section https://goo.gl/4H7q63 of the Broadcom web > > site http://www.broadcom.com/ > > > > With the current driver, allocated mbufs must be large enough to hold > > the entire received frame. If the mbufs are not large enough, the > > packets will be dropped. This is most limiting when jumbo frames are > > used. > > > > Applied to dpdk-next-net/rel_16_07 > > On apply I got conflicts with the nic overview document, so please check > the > resulting information in that document is correct in the next-net tree. > I also added a very short entry to the release notes for this new driver as > part of patch 1, since that was missing. Please also check that for > correctness > and send on any additional comments/corrections you want on that. > Thanks Bruce. I had a cursory glance ?and it looked good. We will update the ?m? further if necessary. ? > Thanks for all the work on this driver. > > Regards, > /Bruce >
[dpdk-dev] [PATCH] examples/ip_pipeline: fix build error for gcc 4.8
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > Sent: Tuesday, June 14, 2016 9:04 PM > To: Mrzyglod, DanielX T > Cc: dev at dpdk.org; Singh, Jasvinder ; > Dumitrescu, Cristian > Subject: Re: [dpdk-dev] [PATCH] examples/ip_pipeline: fix build error for gcc > 4.8 > > 2016-06-09 13:38, Daniel Mrzyglod: > > This patch fixes a maybe-uninitialized warning when compiling DPDK with > GCC 4.8 > > > > examples/ip_pipeline/pipeline/pipeline_common_fe.c: In function > 'app_pipeline_track_pktq_out_to_link': > > examples/ip_pipeline/pipeline/pipeline_common_fe.c:66:31: error: > > 'reader' may be used uninitialized in this function [-Werror=maybe- > uninitialized] > > > >struct app_pktq_out_params *pktq_out = > > > > Fixes: 760064838ec0 ("examples/ip_pipeline: link routing output ports to > devices") > > > > Signed-off-by: Daniel Mrzyglod > > For a weird reason, this patch triggers a new error: > > examples/ip_pipeline/pipeline/pipeline_common_fe.c:In function > ?app_pipeline_track_pktq_out_to_link?: > examples/ip_pipeline/pipeline/pipeline_common_fe.c:124:11: > error: ?id? may be used uninitialized in this function [-Werror=maybe- > uninitialized] > status = ptype->fe_ops->f_track(, >^ > In file included from > examples/ip_pipeline/pipeline/pipeline_common_fe.h:44:0, > from examples/ip_pipeline/pipeline/pipeline_common_fe.c:47: > examples/ip_pipeline/app.h:734:26: note: ?id? was declared here > uint32_t n_readers = 0, id, i; > ^ > examples/ip_pipeline/pipeline/pipeline_common_fe.c:97:11: > error: ?id? may be used uninitialized in this function [-Werror=maybe- > uninitialized] > status = ptype->fe_ops->f_track(, >^ > In file included from > examples/ip_pipeline/pipeline/pipeline_common_fe.h:44:0, > from examples/ip_pipeline/pipeline/pipeline_common_fe.c:47: > examples/ip_pipeline/app.h:674:26: note: ?id? was declared here > uint32_t n_readers = 0, id, i; > ^ Hi Thomas, Do You have this error on the same environment?
[dpdk-dev] [PATCH v13 2/3] app/test: test external mempool manager
Use a minimal custom mempool external ops and check that it also passes basic mempool autotests. Signed-off-by: Olivier Matz Signed-off-by: David Hunt Acked-by: Shreyansh Jain Acked-by: Olivier Matz --- app/test/test_mempool.c | 122 +++- 1 file changed, 120 insertions(+), 2 deletions(-) diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c index b586249..31582d8 100644 --- a/app/test/test_mempool.c +++ b/app/test/test_mempool.c @@ -83,6 +83,99 @@ static rte_atomic32_t synchro; /* + * Simple example of custom mempool structure. Holds pointers to all the + * elements which are simply malloc'd in this example. + */ +struct custom_mempool { + rte_spinlock_t lock; + unsigned count; + unsigned size; + void *elts[]; +}; + +/* + * Loop through all the element pointers and allocate a chunk of memory, then + * insert that memory into the ring. + */ +static int +custom_mempool_alloc(struct rte_mempool *mp) +{ + struct custom_mempool *cm; + + cm = rte_zmalloc("custom_mempool", + sizeof(struct custom_mempool) + mp->size * sizeof(void *), 0); + if (cm == NULL) + return -ENOMEM; + + rte_spinlock_init(>lock); + cm->count = 0; + cm->size = mp->size; + mp->pool_data = cm; + return 0; +} + +static void +custom_mempool_free(struct rte_mempool *mp) +{ + rte_free((void *)(mp->pool_data)); +} + +static int +custom_mempool_enqueue(struct rte_mempool *mp, void * const *obj_table, + unsigned n) +{ + struct custom_mempool *cm = (struct custom_mempool *)(mp->pool_data); + int ret = 0; + + rte_spinlock_lock(>lock); + if (cm->count + n > cm->size) { + ret = -ENOBUFS; + } else { + memcpy(>elts[cm->count], obj_table, sizeof(void *) * n); + cm->count += n; + } + rte_spinlock_unlock(>lock); + return ret; +} + + +static int +custom_mempool_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n) +{ + struct custom_mempool *cm = (struct custom_mempool *)(mp->pool_data); + int ret = 0; + + rte_spinlock_lock(>lock); + if (n > cm->count) { + ret = -ENOENT; + } else { + cm->count -= n; + memcpy(obj_table, >elts[cm->count], sizeof(void *) * n); + } + rte_spinlock_unlock(>lock); + return ret; +} + +static unsigned +custom_mempool_get_count(const struct rte_mempool *mp) +{ + struct custom_mempool *cm = (struct custom_mempool *)(mp->pool_data); + + return cm->count; +} + +static struct rte_mempool_ops mempool_ops_custom = { + .name = "custom_handler", + .alloc = custom_mempool_alloc, + .free = custom_mempool_free, + .enqueue = custom_mempool_enqueue, + .dequeue = custom_mempool_dequeue, + .get_count = custom_mempool_get_count, +}; + +MEMPOOL_REGISTER_OPS(mempool_ops_custom); + +/* * save the object number in the first 4 bytes of object data. All * other bytes are set to 0. */ @@ -292,12 +385,14 @@ static int test_mempool_single_consumer(void) * test function for mempool test based on singple consumer and single producer, * can run on one lcore only */ -static int test_mempool_launch_single_consumer(__attribute__((unused)) void *arg) +static int +test_mempool_launch_single_consumer(__attribute__((unused)) void *arg) { return test_mempool_single_consumer(); } -static void my_mp_init(struct rte_mempool * mp, __attribute__((unused)) void * arg) +static void +my_mp_init(struct rte_mempool *mp, __attribute__((unused)) void *arg) { printf("mempool name is %s\n", mp->name); /* nothing to be implemented here*/ @@ -477,6 +572,7 @@ test_mempool(void) { struct rte_mempool *mp_cache = NULL; struct rte_mempool *mp_nocache = NULL; + struct rte_mempool *mp_ext = NULL; rte_atomic32_init(); @@ -505,6 +601,27 @@ test_mempool(void) goto err; } + /* create a mempool with an external handler */ + mp_ext = rte_mempool_create_empty("test_ext", + MEMPOOL_SIZE, + MEMPOOL_ELT_SIZE, + RTE_MEMPOOL_CACHE_MAX_SIZE, 0, + SOCKET_ID_ANY, 0); + + if (mp_ext == NULL) { + printf("cannot allocate mp_ext mempool\n"); + goto err; + } + if (rte_mempool_set_ops_byname(mp_ext, "custom_handler", NULL) < 0) { + printf("cannot set custom handler\n"); + goto err; + } + if (rte_mempool_populate_default(mp_ext) < 0) { + printf("cannot populate mp_ext mempool\n"); + goto err; + } + rte_mempool_obj_iter(mp_ext, my_obj_init, NULL); + /* retrieve the mempool from its name */ if (rte_mempool_lookup("test_nocache") != mp_nocache) { printf("Cannot lookup mempool from its
[dpdk-dev] [PATCH v13 1/3] mempool: support external mempool operations
Until now, the objects stored in a mempool were internally stored in a ring. This patch introduces the possibility to register external handlers replacing the ring. The default behavior remains unchanged, but calling the new function rte_mempool_set_ops_byname() right after rte_mempool_create_empty() allows the user to change the handler that will be used when populating the mempool. This patch also adds a set of default ops (function callbacks) based on rte_ring. Signed-off-by: Olivier Matz Signed-off-by: David Hunt Acked-by: Shreyansh Jain Acked-by: Olivier Matz --- app/test/test_mempool_perf.c | 1 - doc/guides/prog_guide/mempool_lib.rst | 31 +++- doc/guides/rel_notes/deprecation.rst | 9 - lib/librte_mempool/Makefile| 2 + lib/librte_mempool/rte_mempool.c | 66 +++- lib/librte_mempool/rte_mempool.h | 253 ++--- lib/librte_mempool/rte_mempool_ops.c | 150 + lib/librte_mempool/rte_mempool_ring.c | 161 ++ lib/librte_mempool/rte_mempool_version.map | 13 +- 9 files changed, 605 insertions(+), 81 deletions(-) create mode 100644 lib/librte_mempool/rte_mempool_ops.c create mode 100644 lib/librte_mempool/rte_mempool_ring.c diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c index c5e3576..c5f8455 100644 --- a/app/test/test_mempool_perf.c +++ b/app/test/test_mempool_perf.c @@ -161,7 +161,6 @@ per_lcore_mempool_test(__attribute__((unused)) void *arg) n_get_bulk); if (unlikely(ret < 0)) { rte_mempool_dump(stdout, mp); - rte_ring_dump(stdout, mp->ring); /* in this case, objects are lost... */ return -1; } diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst index c3afc2e..2e3116e 100644 --- a/doc/guides/prog_guide/mempool_lib.rst +++ b/doc/guides/prog_guide/mempool_lib.rst @@ -34,7 +34,7 @@ Mempool Library === A memory pool is an allocator of a fixed-sized object. -In the DPDK, it is identified by name and uses a ring to store free objects. +In the DPDK, it is identified by name and uses a ring or an external mempool manager to store free objects. It provides some other optional services such as a per-core object cache and an alignment helper to ensure that objects are padded to spread them equally on all DRAM or DDR3 channels. @@ -127,6 +127,35 @@ The maximum size of the cache is static and is defined at compilation time (CONF A mempool in Memory with its Associated Ring +External Mempool Manager + + +This allows external memory subsystems, such as external hardware memory +management systems and software based memory allocators, to be used with DPDK. + +There are two aspects to external mempool manager. + +* Adding the code for your new mempool operations (ops). This is achieved by + adding a new mempool ops code, and using the ``REGISTER_MEMPOOL_OPS`` macro. + +* Using the new API to call ``rte_mempool_create_empty()`` and + ``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which + ops to use. + +Several external mempool managers may be used in the same application. A new +mempool can be created by using the ``rte_mempool_create_empty()`` function, +then using ``rte_mempool_set_ops_byname()`` to point the mempool to the +relevant mempool manager callback (ops) structure. + +Legacy applications may continue to use the old ``rte_mempool_create()`` API +call, which uses a ring based mempool manager by default. These applications +will need to be modified to use a new external mempool manager. + +For applications that use ``rte_pktmbuf_create()``, there is a config setting +(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of +an external mempool manager. + + Use Cases - diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 7d947ae..c415095 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -39,15 +39,6 @@ Deprecation Notices compact API. The ones that remain are backwards compatible and use the per-lcore default cache if available. This change targets release 16.07. -* The rte_mempool struct will be changed in 16.07 to facilitate the new - external mempool manager functionality. - The ring element will be replaced with a more generic 'pool' opaque pointer - to allow new mempool handlers to use their own user-defined mempool - layout. Also newly added to rte_mempool is a handler index. - The existing API will be backward compatible, but there will be new API - functions added to facilitate the creation of mempools using
[dpdk-dev] [PATCH v13 0/3] mempool: add external mempool manager
Here's the latest version of the External Mempool Manager patchset. It's re-based on top of the latest head as of 15/6/2016, including Olivier's 35-part patch series on mempool re-org [1] [1] http://dpdk.org/ml/archives/dev/2016-May/039229.html v13 changes: * Added in extra opaque data (pool_config) to mempool struct for mempool configuration by the ops functions. For example, this can be used to pass device names or device flags to the underlying alloc function. * Added mempool_config param to rte_mempool_set_ops_byname() v12 changes: * Fixed a comment (function pram h -> ops) * fixed a typo (callbacki) v11 changes: * Fixed comments (added '.' where needed for consistency) * removed ABI breakage notice for mempool manager in deprecation.rst * Added description of the external mempool manager functionality to doc/guides/prog_guide/mempool_lib.rst (John Mc reviewed) * renamed rte_mempool_default.c to rte_mempool_ring.c v10 changes: * changed the _put/_get op names to _enqueue/_dequeue to be consistent with the function names * some rte_errno cleanup * comment tweaks about when to set pool_data * removed an un-needed check for ops->alloc == NULL v9 changes: * added a check for NULL alloc in rte_mempool_ops_register * rte_mempool_alloc_t now returns int instead of void* * fixed some comment typo's * removed some unneeded typecasts * changed a return NULL to return -EEXIST in rte_mempool_ops_register * fixed rte_mempool_version.map file so builds ok as shared libs * moved flags check from rte_mempool_create_empty to rte_mempool_create v8 changes: * merged first three patches in the series into one. * changed parameters to ops callback to all be rte_mempool pointer rather than than pointer to opaque data or uint64. * comment fixes. * fixed parameter to _free function (was inconsistent). * changed MEMPOOL_F_RING_CREATED to MEMPOOL_F_POOL_CREATED v7 changes: * Changed rte_mempool_handler_table to rte_mempool_ops_table * Changed hander_idx to ops_index in rte_mempool struct * Reworked comments in rte_mempool.h around ops functions * Changed rte_mempool_hander.c to rte_mempool_ops.c * Changed all functions containing _handler_ to _ops_ * Now there is no mention of 'handler' left * Other small changes out of review of mailing list v6 changes: * Moved the flags handling from rte_mempool_create_empty to rte_mempool_create, as it's only there for backward compatibility * Various comment additions and cleanup * Renamed rte_mempool_handler to rte_mempool_ops * Added a union for *pool and u64 pool_id in struct rte_mempool * split the original patch into a few parts for easier review. * rename functions with _ext_ to _ops_. * addressed review comments * renamed put and get functions to enqueue and dequeue * changed occurences of rte_mempool_ops to const, as they contain function pointers (security) * split out the default external mempool handler into a separate patch for easier review v5 changes: * rebasing, as it is dependent on another patch series [1] v4 changes (Olivier Matz): * remove the rte_mempool_create_ext() function. To change the handler, the user has to do the following: - mp = rte_mempool_create_empty() - rte_mempool_set_handler(mp, "my_handler") - rte_mempool_populate_default(mp) This avoids to add another function with more than 10 arguments, duplicating the doxygen comments * change the api of rte_mempool_alloc_t: only the mempool pointer is required as all information is available in it * change the api of rte_mempool_free_t: remove return value * move inline wrapper functions from the .c to the .h (else they won't be inlined). This implies to have one header file (rte_mempool.h), or it would have generate cross dependencies issues. * remove now unused MEMPOOL_F_INT_HANDLER (note: it was misused anyway due to the use of && instead of &) * fix build in debug mode (__MEMPOOL_STAT_ADD(mp, put_pool, n) remaining) * fix build with shared libraries (global handler has to be declared in the .map file) * rationalize #include order * remove unused function rte_mempool_get_handler_name() * rename some structures, fields, functions * remove the static in front of rte_tailq_elem rte_mempool_tailq (comment from Yuanhan) * test the ext mempool handler in the same file than standard mempool tests, avoiding to duplicate the code * rework the custom handler in mempool_test * rework a bit the patch selecting default mbuf pool handler * fix some doxygen comments v3 changes: * simplified the file layout, renamed to rte_mempool_handler.[hc] * moved the default handlers into rte_mempool_default.c * moved the example handler out into app/test/test_ext_mempool.c * removed is_mc/is_mp change, slight perf degredation on sp cached operation * removed stack hanler, may re-introduce at a later date * Changes out of code reviews v2 changes: * There was a lot of duplicate code between
[dpdk-dev] enic in passhtrough mode tx drops
Hi all, I'm running a vm attached to 2 cisco Virtual Card Interfaces in passthrough mode in a cisco UCS. The vNICs are configured on access mode without VLAN ID. The incoming packets are arriving with 802.1q header containing vlan priority bit according to the class of service configured on the vNIC. I understood this is expected from a fiber channel Ethernet card. According to dpdk documentation there's a need to set the VLAN_STRIP_OFFLOAD flag and call rte_eth_dev_set_vlan_offload on the ports. If I run a simple l2fwd application where the same packet received in one port is sent through the other the traffic works ok. If I generate the packets in my vm and send them out traffic doesn't work. (I tried send the traffic out with/without a 802.1q header with priority bit) Is there a specific configuration to be added to the mbuff for the tx packets generated in the VM? Could be the vlan_tci/ ol_flags/ or any other missing flag set? Does somebody know the exact behavior of the enic card with the priority tagging? BTW in virtio mode the traffic works in both the flows. Thanks a lot!
[dpdk-dev] [PATCH v2 00/17] prepare for rte_device / rte_driver
On Thu, 16 Jun 2016 08:42:29 + Shreyansh Jain wrote: > Hi, > > > -Original Message- > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > Sent: Thursday, June 16, 2016 1:04 PM > > To: Shreyansh Jain > > Cc: David Marchand ; viktorin at > > rehivetech.com; > > dev at dpdk.org; Iremonger, Bernard > > Subject: Re: [dpdk-dev] [PATCH v2 00/17] prepare for rte_device / rte_driver > > > > 2016-06-16 06:32, Shreyansh Jain: > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Iremonger, > > > Bernard > > > > Patches 3,8,16 and 17 no longer apply to the latest master branch. > > > > A rebase is needed. > > > > > > With the recent most head (04920e6): 01, 03, 08, 15, 16 and 17 are > > > failing. > > > > > > Just wanted to check if there is a rebase of this series anytime soon? > > > > I will take care of this series if time permit. > > Ok. > By the way, I have already rebased it on master. I can post the patches here > if you want. > (only trivial conflicts were there) Sounds good. +1 I'd rebase my patchset on top of it and repost. > > > It would help to have more reviews on other series touching EAL, like > > pmdinfo. > > Ok. I can try and review this, non-PCI/SoC and similar patchset in next few > days. The original David's patchset was quite OK. I didn't have any comments. The thing is (from my POV) that it was incomplete. Jan > > > > > > I was looking at Jan's non-PCI patchset [1] and they are based on this > > series. > > > > > > [1] > > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/30913/focus=38486 > > - > Shreyansh >
[dpdk-dev] [PATCH v2 00/17] prepare for rte_device / rte_driver
On Thu, 16 Jun 2016 11:19:59 +0200 Thomas Monjalon wrote: > 2016-06-16 10:23, Jan Viktorin: > > I think, we should consider to move it to somebody else. I would work on > > it, however, I don't see all the tasks that are to be done. That's why I > > was waiting to finalize those patchs by David or Thomas. For me, the > > important things were to generalize certain things to remove dependency on > > PCI. This is mostly done (otherwise the SoC patchset couldn't be done in > > the way I've posted it). > > > > Now, there is some pending work to remove pmd_type. Next, to find out some > > generalization of rte_pci_device/driver to create rte_device/driver (I've > > posted several suggestions in the of SoC patchset). For the pmd_type removal, I am not very sure about the original David's intentions. What should be the result? Should there be a special struct rte_virt_device or something like that? > > > > What more? > > We need a clean devargs API in EAL, not directly related to hotplug. > Then the hotplug can benefit of the devargs API as any other device config. Do we have some requirements for this? Would it be a complete redefinition of the API? I don't see the relations to hotplug. > > The EAL resources (also called devices) need an unique naming convention. > No idea about this. What do you mean by the unique naming convention? Jan
[dpdk-dev] [PATCH v12 0/3] mempool: add external mempool manager
On 16/6/2016 9:58 AM, Olivier MATZ wrote: >>> >>> So I don't think we should have more cache misses whether it's >>> placed at the beginning or at the end. Maybe I'm missing something... >>> >>> I still believe it's better to group the 2 fields as they are >>> tightly linked together. It could be at the end if you see better >>> performance. >>> >> >> OK, I'll leave at the end because of the performance hit. > > Sorry, my message was not clear. > I mean, having both at the end. Do you see a performance > impact in that case? > I ran multiple more tests, and average drop I'm seeing on an older server reduced to 1% average (local cached use-case), with 0% change on a newer Haswell server, so I think at this stage we're safe to put it up alongside pool_data. There was 0% reduction when I moved both to the bottom of the struct. So on the Haswell, it seems to have minimal impact regardless of where they go. I'll post the patch up soon. Regards, Dave.
[dpdk-dev] [PATCH] hash: new function to retrieve a key given its position
> -Original Message- > From: Yari Adan Petralanda [mailto:yari.adan.petralanda at ericsson.com] > Sent: Thursday, June 16, 2016 9:23 AM > To: Richardson, Bruce; De Lara Guarch, Pablo; Juan Antonio Montesinos > Delgado > Cc: dev at dpdk.org > Subject: [PATCH] hash: new function to retrieve a key given its position > > The function rte_hash_get_key_with_position is added in this patch. > As the position returned when adding a key is frequently used as an > offset into an array of user data, this function performs the operation > of retrieving a key given this offset. > > A possible use case would be to delete a key from the hash table when > its entry in the array of data has certain value. For instance, the key > could be a flow 5-tuple, and the value stored in the array a time stamp. > > Signed-off-by: Juan Antonio Montesinos > > Signed-off-by: Yari Adan Petralanda > > --- > app/test/test_hash.c | 42 > > lib/librte_hash/rte_cuckoo_hash.c| 18 > lib/librte_hash/rte_hash.h | 18 > lib/librte_hash/rte_hash_version.map | 7 ++ > 4 files changed, 85 insertions(+) > [...] > diff --git a/lib/librte_hash/rte_hash_version.map > b/lib/librte_hash/rte_hash_version.map > index 4f25436..19a7b26 100644 > --- a/lib/librte_hash/rte_hash_version.map > +++ b/lib/librte_hash/rte_hash_version.map > @@ -38,3 +38,10 @@ DPDK_2.2 { > rte_hash_set_cmp_func; > > } DPDK_2.1; > + > +DPDK_16.04 { This should be DPDK_16.07. > + global: > + > + rte_hash_get_key_with_position; > + > +}; DPDK_2.2 > -- > 2.1.4 >
[dpdk-dev] [PATCH v4] eal: out-of-bounds write
On 15/06/2016 14:25, Slawomir Mrozowicz wrote: > Overrunning array mcfg->memseg of 256 44-byte elements > at element index 257 using index j. > Fixed by add condition with message information. > > Fixes: af75078fece3 ("first public release") > Coverity ID 13282 > > Signed-off-by: Slawomir Mrozowicz > --- Acked-by: Sergio Gonzalez Monroy
[dpdk-dev] random pkt generator PMD
On 15.06.2016 19:02, Neil Horman wrote: > On Wed, Jun 15, 2016 at 03:43:56PM +0600, Yerden Zhumabekov wrote: >> Hello everybody, >> >> DPDK already got a number of PMDs for various eth devices, it even has PMD >> emulations for backends such as pcap, sw rings etc. >> >> I've been thinking about the idea of having PMD which would generate mbufs >> on the fly in some randomized fashion. This would serve goals like, for >> example: >> >> 1) running tests for applications with network processing capabilities >> without additional software packet generators; >> 2) making performance measurements with no hw inteference; >> 3) ability to run without root privileges, --no-pci, --no-huge, for CI >> build, so on. >> >> Maybe there's no such need, and these goals may be achieved by other means >> and this idea is flawed? Any thoughts? >> > I think you already have a solution to this problem. Linux/BSD have multiple > user space packet generators that can dump thier output to a pcap format file, > and dpdk has a pcap pmd that accepts a pcap file as input to send in packets. Things that I don't like about the idea of using PCAP PMD: 1) the need to create additional files with additional scripts and keep those with your test suite; 2) the need to rewind pcap once you played it (fixable); 3) reading packets one-by-one, file operations which may lead to perf impact; 4) low variability among source packets. Those are things which put me on idea of randomized packet generator PMD. Possible devargs could be: 1) id of a template, like "ipv4", "ipv6", "dot1q" etc; 2) size of mbuf payload; 3) array of tuples like (offset, size, value) with value being exact value or "rnd" keyword.
[dpdk-dev] [PATCH v2] enic: scattered Rx
For performance reasons, this patch uses 2 VIC RQs per RQ presented to DPDK. The VIC requires that each descriptor be marked as either a start of packet (SOP) descriptor or a non-SOP descriptor. A one RQ solution requires skipping descriptors when receiving small packets and results in bad performance when receiving many small packets. The 2 RQ solution makes use of the VIC feature that allows a receive on primary queue to 'spill over' into another queue if the receive is too large to fit in the buffer assigned to the descriptor on the primary queue. This means that there is no skipping of descriptors when receiving small packets and results in much better performance. Signed-off-by: Nelson Escobar Reviewed-by: John Daley --- v2: - fixes upstream checkpatch complaint - fixes bug where packet type and flags were set on last mbuf instead of first mbuf of scattered receive - adds ethernet hdr length to mtu when calculating the number of mbufs it would take to receive maximum sized packet doc/guides/nics/overview.rst | 2 +- drivers/net/enic/base/rq_enet_desc.h | 2 +- drivers/net/enic/base/vnic_rq.c | 8 +- drivers/net/enic/base/vnic_rq.h | 18 ++- drivers/net/enic/enic.h | 22 ++- drivers/net/enic/enic_ethdev.c | 10 +- drivers/net/enic/enic_main.c | 277 +++ drivers/net/enic/enic_res.c | 5 +- drivers/net/enic/enic_rxtx.c | 140 -- 9 files changed, 361 insertions(+), 123 deletions(-) diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst index 2200171..d0ae847 100644 --- a/doc/guides/nics/overview.rst +++ b/doc/guides/nics/overview.rst @@ -94,7 +94,7 @@ Most of these differences are summarized below. Queue start/stop Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y MTU update Y Y Y Y Y Y Y Y Y Y Jumbo frame Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y - Scattered Rx Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y + Scattered Rx Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y LRO Y Y Y Y TSO Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Promiscuous mode Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y diff --git a/drivers/net/enic/base/rq_enet_desc.h b/drivers/net/enic/base/rq_enet_desc.h index 7292d9d..13e24b4 100644 --- a/drivers/net/enic/base/rq_enet_desc.h +++ b/drivers/net/enic/base/rq_enet_desc.h @@ -55,7 +55,7 @@ enum rq_enet_type_types { #define RQ_ENET_TYPE_BITS 2 #define RQ_ENET_TYPE_MASK ((1 << RQ_ENET_TYPE_BITS) - 1) -static inline void rq_enet_desc_enc(struct rq_enet_desc *desc, +static inline void rq_enet_desc_enc(volatile struct rq_enet_desc *desc, u64 address, u8 type, u16 length) { desc->address = cpu_to_le64(address); diff --git a/drivers/net/enic/base/vnic_rq.c b/drivers/net/enic/base/vnic_rq.c index cb62c5e..0e700a1 100644 --- a/drivers/net/enic/base/vnic_rq.c +++ b/drivers/net/enic/base/vnic_rq.c @@ -84,11 +84,12 @@ void vnic_rq_init_start(struct vnic_rq *rq, unsigned int cq_index, iowrite32(cq_index, >ctrl->cq_index); iowrite32(error_interrupt_enable, >ctrl->error_interrupt_enable); iowrite32(error_interrupt_offset, >ctrl->error_interrupt_offset); - iowrite32(0, >ctrl->dropped_packet_count); iowrite32(0, >ctrl->error_status); iowrite32(fetch_index, >ctrl->fetch_index); iowrite32(posted_index, >ctrl->posted_index); - + if (rq->is_sop) + iowrite32(((rq->is_sop << 10) | rq->data_queue_idx), + >ctrl->data_ring); } void vnic_rq_init(struct vnic_rq *rq, unsigned int cq_index, @@ -96,6 +97,7 @@ void vnic_rq_init(struct vnic_rq *rq, unsigned int cq_index, unsigned int error_interrupt_offset) { u32 fetch_index = 0; + /* Use current fetch_index as the ring starting point */ fetch_index = ioread32(>ctrl->fetch_index); @@ -110,6 +112,8 @@ void vnic_rq_init(struct vnic_rq *rq, unsigned int cq_index, error_interrupt_offset); rq->rxst_idx = 0; rq->tot_pkts = 0; + rq->pkt_first_seg = NULL; + rq->pkt_last_seg = NULL; } void vnic_rq_error_out(struct vnic_rq *rq, unsigned int error) diff --git a/drivers/net/enic/base/vnic_rq.h b/drivers/net/enic/base/vnic_rq.h index e083ccc..fd9e170 100644 --- a/drivers/net/enic/base/vnic_rq.h +++ b/drivers/net/enic/base/vnic_rq.h @@ -60,10 +60,18 @@ struct vnic_rq_ctrl { u32 pad7; u32 error_status; /* 0x48 */ u32 pad8; - u32 dropped_packet_count; /* 0x50 */ + u32 tcp_sn; /* 0x50 */ u32 pad9; - u32 dropped_packet_count_rc;/* 0x58 */ +
[dpdk-dev] [PATCH v2 2/2] vhost: unmap log memory on cleanup.
Fixes memory leak on QEMU migration. Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request") Signed-off-by: Ilya Maximets --- lib/librte_vhost/vhost-net.h | 1 + lib/librte_vhost/vhost_user/virtio-net-user.c | 15 +-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h index ec8f964..38593a2 100644 --- a/lib/librte_vhost/vhost-net.h +++ b/lib/librte_vhost/vhost-net.h @@ -134,6 +134,7 @@ struct virtio_net { charifname[IF_NAME_SZ]; uint64_tlog_size; uint64_tlog_base; + uint64_tlog_addr; struct ether_addr mac; } __rte_cache_aligned; diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index e6a2aed..a867a43 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -95,6 +95,10 @@ vhost_backend_cleanup(struct virtio_net *dev) free(dev->mem); dev->mem = NULL; } + if (dev->log_addr) { + munmap((void *)(uintptr_t)dev->log_addr, dev->log_size); + dev->log_addr = 0; + } } int @@ -407,8 +411,15 @@ user_set_log_base(int vid, struct VhostUserMsg *msg) return -1; } - /* TODO: unmap on stop */ - dev->log_base = (uint64_t)(uintptr_t)addr + off; + /* +* Free previously mapped log memory on occasionally +* multiple VHOST_USER_SET_LOG_BASE. +*/ + if (dev->log_addr) { + munmap((void *)(uintptr_t)dev->log_addr, dev->log_size); + } + dev->log_addr = (uint64_t)(uintptr_t)addr; + dev->log_base = dev->log_addr + off; dev->log_size = size; return 0; -- 2.7.4
[dpdk-dev] [PATCH v2 1/2] vhost: fix leak of file descriptors.
While migration of vhost-user device QEMU allocates memfd to store information about dirty pages and sends fd to vhost-user process. File descriptor for this memory should be closed to prevent "Too many open files" error for vhost-user process after some amount of migrations. Ex.: # ls /proc//fd/ -alh total 0 root qemu . root qemu .. root qemu 0 -> /dev/pts/0 root qemu 1 -> pipe:[1804353] root qemu 10 -> socket:[1782240] root qemu 100 -> /memfd:vhost-log (deleted) root qemu 1000 -> /memfd:vhost-log (deleted) root qemu 1001 -> /memfd:vhost-log (deleted) root qemu 1004 -> /memfd:vhost-log (deleted) [...] root qemu 996 -> /memfd:vhost-log (deleted) root qemu 997 -> /memfd:vhost-log (deleted) ovs-vswitchd.log: |WARN|punix:ovs-vswitchd.ctl: accept failed: Too many open files Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request") Signed-off-by: Ilya Maximets --- lib/librte_vhost/vhost_user/virtio-net-user.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index 64a6ec4..e6a2aed 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -401,6 +401,7 @@ user_set_log_base(int vid, struct VhostUserMsg *msg) * fail when offset is not page size aligned. */ addr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + close(fd); if (addr == MAP_FAILED) { RTE_LOG(ERR, VHOST_CONFIG, "mmap log base failed!\n"); return -1; -- 2.7.4
[dpdk-dev] [PATCH v2 0/2] vhost: Fix leaks on migration.
v2: * rebased on top of dpdk-next-virtio/master Ilya Maximets (2): vhost: fix leak of file descriptors. vhost: unmap log memory on cleanup. lib/librte_vhost/vhost-net.h | 1 + lib/librte_vhost/vhost_user/virtio-net-user.c | 16 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH v2] rte_hash: add scalable multi-writer insertion w/ Intel TSX
Hi Wei, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wei Shen > Sent: Thursday, June 16, 2016 5:53 AM > To: dev at dpdk.org > Cc: De Lara Guarch, Pablo; stephen at networkplumber.org; Tai, Charlie; > Maciocco, Christian; Gobriel, Sameh; Shen, Wei1 > Subject: [dpdk-dev] [PATCH v2] rte_hash: add scalable multi-writer insertion > w/ Intel TSX > > This patch introduced scalable multi-writer Cuckoo Hash insertion > based on a split Cuckoo Search and Move operation using Intel > TSX. It can do scalable hash insertion with 22 cores with little > performance loss and negligible TSX abortion rate. > > * Added an extra rte_hash flag definition to switch default > single writer Cuckoo Hash behavior to multiwriter. > > * Added a make_space_insert_bfs_mw() function to do split Cuckoo > search in BFS order. > > * Added tsx_cuckoo_move_insert() to do Cuckoo move in Intel TSX > protected manner. > > * Added test_hash_multiwriter() as test case for multi-writer > Cuckoo Hash. > > Signed-off-by: Shen Wei > Signed-off-by: Sameh Gobriel > --- > app/test/Makefile | 1 + > app/test/test_hash_multiwriter.c | 272 > + > doc/guides/rel_notes/release_16_07.rst | 12 ++ > lib/librte_hash/rte_cuckoo_hash.c | 231 +--- > lib/librte_hash/rte_hash.h | 3 + > 5 files changed, 494 insertions(+), 25 deletions(-) > create mode 100644 app/test/test_hash_multiwriter.c > > diff --git a/app/test/Makefile b/app/test/Makefile > index 053f3a2..5476300 100644 > --- a/app/test/Makefile > +++ b/app/test/Makefile > @@ -120,6 +120,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_thash.c > SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c > SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c > SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_scaling.c > +SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_multiwriter.c > > SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c > SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm_perf.c > diff --git a/app/test/test_hash_multiwriter.c > b/app/test/test_hash_multiwriter.c > new file mode 100644 > index 000..54a0d2c > --- /dev/null > +++ b/app/test/test_hash_multiwriter.c > @@ -0,0 +1,272 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + *notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + *notice, this list of conditions and the following disclaimer in > + *the documentation and/or other materials provided with the > + *distribution. > + * * Neither the name of Intel Corporation nor the names of its > + *contributors may be used to endorse or promote products derived > + *from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "test.h" > + > +/* > + * Check condition and return an error if true. Assumes that "handle" is the > + * name of the hash structure pointer to be freed. > + */ > +#define RETURN_IF_ERROR(cond, str, ...) do {\ > + if (cond) { \ > + printf("ERROR line %d: " str "\n", __LINE__,\ > + ##__VA_ARGS__); \ > + if (handle) \ > + rte_hash_free(handle); \ > + return -1; \ > + } \ > +} while (0) > + > +#define
[dpdk-dev] [PATCH v3 3/3] mempool: allow for user-owned mempool caches
The mempool cache is only available to EAL threads as a per-lcore resource. Change this so that the user can create and provide their own cache on mempool get and put operations. This works with non-EAL threads too. This commit introduces the new API calls: rte_mempool_cache_create(size, socket_id) rte_mempool_cache_free(cache) rte_mempool_cache_flush(cache, mp) rte_mempool_default_cache(mp, lcore_id) Changes the API calls: rte_mempool_generic_put(mp, obj_table, n, cache, flags) rte_mempool_generic_get(mp, obj_table, n, cache, flags) The cache-oblivious API calls use the per-lcore default local cache. Signed-off-by: Lazaros Koromilas --- app/test/test_mempool.c | 94 -- app/test/test_mempool_perf.c | 70 ++--- lib/librte_mempool/rte_mempool.c | 66 +++- lib/librte_mempool/rte_mempool.h | 163 --- 4 files changed, 310 insertions(+), 83 deletions(-) diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c index 10d706f..723cd39 100644 --- a/app/test/test_mempool.c +++ b/app/test/test_mempool.c @@ -79,6 +79,9 @@ printf("test failed at %s():%d\n", __func__, __LINE__); \ return -1; \ } while (0) +#define LOG_ERR() do { \ + printf("test failed at %s():%d\n", __func__, __LINE__); \ + } while (0) static rte_atomic32_t synchro; @@ -191,7 +194,7 @@ my_obj_init(struct rte_mempool *mp, __attribute__((unused)) void *arg, /* basic tests (done on one core) */ static int -test_mempool_basic(struct rte_mempool *mp) +test_mempool_basic(struct rte_mempool *mp, int use_external_cache) { uint32_t *objnum; void **objtable; @@ -199,47 +202,79 @@ test_mempool_basic(struct rte_mempool *mp) char *obj_data; int ret = 0; unsigned i, j; + int offset; + struct rte_mempool_cache *cache; + + if (use_external_cache) { + /* Create a user-owned mempool cache. */ + cache = rte_mempool_cache_create(RTE_MEMPOOL_CACHE_MAX_SIZE, +SOCKET_ID_ANY); + if (cache == NULL) + RET_ERR(); + } else { + /* May be NULL if cache is disabled. */ + cache = rte_mempool_default_cache(mp, rte_lcore_id()); + } /* dump the mempool status */ rte_mempool_dump(stdout, mp); printf("get an object\n"); - if (rte_mempool_get(mp, ) < 0) - RET_ERR(); + if (rte_mempool_generic_get(mp, , 1, cache, 0) < 0) { + LOG_ERR(); + ret = -1; + goto out; + } rte_mempool_dump(stdout, mp); /* tests that improve coverage */ printf("get object count\n"); - if (rte_mempool_count(mp) != MEMPOOL_SIZE - 1) - RET_ERR(); + /* We have to count the extra caches, one in this case. */ + offset = use_external_cache ? 1 * cache->len : 0; + if (rte_mempool_count(mp) + offset != MEMPOOL_SIZE - 1) { + LOG_ERR(); + ret = -1; + goto out; + } printf("get private data\n"); if (rte_mempool_get_priv(mp) != (char *)mp + - MEMPOOL_HEADER_SIZE(mp, mp->cache_size)) - RET_ERR(); + MEMPOOL_HEADER_SIZE(mp, mp->cache_size)) { + LOG_ERR(); + ret = -1; + goto out; + } #ifndef RTE_EXEC_ENV_BSDAPP /* rte_mem_virt2phy() not supported on bsd */ printf("get physical address of an object\n"); - if (rte_mempool_virt2phy(mp, obj) != rte_mem_virt2phy(obj)) - RET_ERR(); + if (rte_mempool_virt2phy(mp, obj) != rte_mem_virt2phy(obj)) { + LOG_ERR(); + ret = -1; + goto out; + } #endif printf("put the object back\n"); - rte_mempool_put(mp, obj); + rte_mempool_generic_put(mp, , 1, cache, 0); rte_mempool_dump(stdout, mp); printf("get 2 objects\n"); - if (rte_mempool_get(mp, ) < 0) - RET_ERR(); - if (rte_mempool_get(mp, ) < 0) { - rte_mempool_put(mp, obj); - RET_ERR(); + if (rte_mempool_generic_get(mp, , 1, cache, 0) < 0) { + LOG_ERR(); + ret = -1; + goto out; + } + if (rte_mempool_generic_get(mp, , 1, cache, 0) < 0) { + rte_mempool_generic_put(mp, , 1, cache, 0); + LOG_ERR(); + ret = -1; + goto out; } rte_mempool_dump(stdout, mp); printf("put the objects back\n"); - rte_mempool_put(mp, obj); - rte_mempool_put(mp, obj2); + rte_mempool_generic_put(mp, , 1, cache, 0); +
[dpdk-dev] [PATCH v3 2/3] mempool: use bit flags instead of is_mp and is_mc
Pass the same flags as in rte_mempool_create(). Changes API calls: rte_mempool_generic_put(mp, obj_table, n, flags) rte_mempool_generic_get(mp, obj_table, n, flags) Signed-off-by: Lazaros Koromilas --- lib/librte_mempool/rte_mempool.h | 58 +--- 1 file changed, 30 insertions(+), 28 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 7446843..191edba 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -949,12 +949,13 @@ void rte_mempool_dump(FILE *f, struct rte_mempool *mp); * @param n * The number of objects to store back in the mempool, must be strictly * positive. - * @param is_mp - * Mono-producer (0) or multi-producers (1). + * @param flags + * The flags used for the mempool creation. + * Single-producer (MEMPOOL_F_SP_PUT flag) or multi-producers. */ static inline void __attribute__((always_inline)) __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, - unsigned n, int is_mp) + unsigned n, int flags) { struct rte_mempool_cache *cache; uint32_t index; @@ -967,7 +968,7 @@ __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, __MEMPOOL_STAT_ADD(mp, put, n); /* cache is not enabled or single producer or non-EAL thread */ - if (unlikely(cache_size == 0 || is_mp == 0 || + if (unlikely(cache_size == 0 || flags & MEMPOOL_F_SP_PUT || lcore_id >= RTE_MAX_LCORE)) goto ring_enqueue; @@ -1020,15 +1021,16 @@ ring_enqueue: * A pointer to a table of void * pointers (objects). * @param n * The number of objects to add in the mempool from the obj_table. - * @param is_mp - * Mono-producer (0) or multi-producers (1). + * @param flags + * The flags used for the mempool creation. + * Single-producer (MEMPOOL_F_SP_PUT flag) or multi-producers. */ static inline void __attribute__((always_inline)) rte_mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, - unsigned n, int is_mp) + unsigned n, int flags) { __mempool_check_cookies(mp, obj_table, n, 0); - __mempool_generic_put(mp, obj_table, n, is_mp); + __mempool_generic_put(mp, obj_table, n, flags); } /** @@ -1046,7 +1048,7 @@ __rte_deprecated static inline void __attribute__((always_inline)) rte_mempool_mp_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n) { - rte_mempool_generic_put(mp, obj_table, n, 1); + rte_mempool_generic_put(mp, obj_table, n, 0); } /** @@ -1064,7 +1066,7 @@ __rte_deprecated static inline void __attribute__((always_inline)) rte_mempool_sp_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n) { - rte_mempool_generic_put(mp, obj_table, n, 0); + rte_mempool_generic_put(mp, obj_table, n, MEMPOOL_F_SP_PUT); } /** @@ -1085,8 +1087,7 @@ static inline void __attribute__((always_inline)) rte_mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n) { - rte_mempool_generic_put(mp, obj_table, n, - !(mp->flags & MEMPOOL_F_SP_PUT)); + rte_mempool_generic_put(mp, obj_table, n, mp->flags); } /** @@ -1101,7 +1102,7 @@ rte_mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, __rte_deprecated static inline void __attribute__((always_inline)) rte_mempool_mp_put(struct rte_mempool *mp, void *obj) { - rte_mempool_generic_put(mp, , 1, 1); + rte_mempool_generic_put(mp, , 1, 0); } /** @@ -1116,7 +1117,7 @@ rte_mempool_mp_put(struct rte_mempool *mp, void *obj) __rte_deprecated static inline void __attribute__((always_inline)) rte_mempool_sp_put(struct rte_mempool *mp, void *obj) { - rte_mempool_generic_put(mp, , 1, 0); + rte_mempool_generic_put(mp, , 1, MEMPOOL_F_SP_PUT); } /** @@ -1145,15 +1146,16 @@ rte_mempool_put(struct rte_mempool *mp, void *obj) * A pointer to a table of void * pointers (objects). * @param n * The number of objects to get, must be strictly positive. - * @param is_mc - * Mono-consumer (0) or multi-consumers (1). + * @param flags + * The flags used for the mempool creation. + * Single-consumer (MEMPOOL_F_SC_GET flag) or multi-consumers. * @return * - >=0: Success; number of objects supplied. * - <0: Error; code of ring dequeue function. */ static inline int __attribute__((always_inline)) __mempool_generic_get(struct rte_mempool *mp, void **obj_table, - unsigned n, int is_mc) + unsigned n, int flags) { int ret; struct rte_mempool_cache *cache; @@ -1163,7 +1165,7 @@ __mempool_generic_get(struct rte_mempool *mp, void **obj_table, uint32_t cache_size = mp->cache_size; /* cache is not enabled or
[dpdk-dev] [PATCH v3 1/3] mempool: deprecate specific get/put functions
This commit introduces the API calls: rte_mempool_generic_put(mp, obj_table, n, is_mp) rte_mempool_generic_get(mp, obj_table, n, is_mc) Deprecates the API calls: rte_mempool_mp_put_bulk(mp, obj_table, n) rte_mempool_sp_put_bulk(mp, obj_table, n) rte_mempool_mp_put(mp, obj) rte_mempool_sp_put(mp, obj) rte_mempool_mc_get_bulk(mp, obj_table, n) rte_mempool_sc_get_bulk(mp, obj_table, n) rte_mempool_mc_get(mp, obj_p) rte_mempool_sc_get(mp, obj_p) We also check cookies in one place now. Signed-off-by: Lazaros Koromilas --- app/test/test_mempool.c | 10 ++-- lib/librte_mempool/rte_mempool.h | 115 +++ 2 files changed, 85 insertions(+), 40 deletions(-) diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c index bcf379b..10d706f 100644 --- a/app/test/test_mempool.c +++ b/app/test/test_mempool.c @@ -338,7 +338,7 @@ static int test_mempool_single_producer(void) printf("obj not owned by this mempool\n"); RET_ERR(); } - rte_mempool_sp_put(mp_spsc, obj); + rte_mempool_put(mp_spsc, obj); rte_spinlock_lock(_spinlock); scsp_obj_table[i] = NULL; rte_spinlock_unlock(_spinlock); @@ -371,7 +371,7 @@ static int test_mempool_single_consumer(void) rte_spinlock_unlock(_spinlock); if (i >= MAX_KEEP) continue; - if (rte_mempool_sc_get(mp_spsc, ) < 0) + if (rte_mempool_get(mp_spsc, ) < 0) break; rte_spinlock_lock(_spinlock); scsp_obj_table[i] = obj; @@ -477,13 +477,13 @@ test_mempool_basic_ex(struct rte_mempool *mp) } for (i = 0; i < MEMPOOL_SIZE; i ++) { - if (rte_mempool_mc_get(mp, [i]) < 0) { + if (rte_mempool_get(mp, [i]) < 0) { printf("test_mp_basic_ex fail to get object for [%u]\n", i); goto fail_mp_basic_ex; } } - if (rte_mempool_mc_get(mp, _obj) == 0) { + if (rte_mempool_get(mp, _obj) == 0) { printf("test_mempool_basic_ex get an impossible obj\n"); goto fail_mp_basic_ex; } @@ -494,7 +494,7 @@ test_mempool_basic_ex(struct rte_mempool *mp) } for (i = 0; i < MEMPOOL_SIZE; i++) - rte_mempool_mp_put(mp, obj[i]); + rte_mempool_put(mp, obj[i]); if (rte_mempool_full(mp) != 1) { printf("test_mempool_basic_ex the mempool should be full\n"); diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 92deb42..7446843 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -953,8 +953,8 @@ void rte_mempool_dump(FILE *f, struct rte_mempool *mp); * Mono-producer (0) or multi-producers (1). */ static inline void __attribute__((always_inline)) -__mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, - unsigned n, int is_mp) +__mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, + unsigned n, int is_mp) { struct rte_mempool_cache *cache; uint32_t index; @@ -1012,7 +1012,7 @@ ring_enqueue: /** - * Put several objects back in the mempool (multi-producers safe). + * Put several objects back in the mempool. * * @param mp * A pointer to the mempool structure. @@ -1020,16 +1020,37 @@ ring_enqueue: * A pointer to a table of void * pointers (objects). * @param n * The number of objects to add in the mempool from the obj_table. + * @param is_mp + * Mono-producer (0) or multi-producers (1). */ static inline void __attribute__((always_inline)) +rte_mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, + unsigned n, int is_mp) +{ + __mempool_check_cookies(mp, obj_table, n, 0); + __mempool_generic_put(mp, obj_table, n, is_mp); +} + +/** + * @deprecated + * Put several objects back in the mempool (multi-producers safe). + * + * @param mp + * A pointer to the mempool structure. + * @param obj_table + * A pointer to a table of void * pointers (objects). + * @param n + * The number of objects to add in the mempool from the obj_table. + */ +__rte_deprecated static inline void __attribute__((always_inline)) rte_mempool_mp_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n) { - __mempool_check_cookies(mp, obj_table, n, 0); - __mempool_put_bulk(mp, obj_table, n, 1); + rte_mempool_generic_put(mp, obj_table, n, 1); } /** + * @deprecated * Put several objects back in the mempool (NOT multi-producers safe). * * @param mp @@ -1039,12 +1060,11 @@ rte_mempool_mp_put_bulk(struct rte_mempool *mp, void * const *obj_table, * @param
[dpdk-dev] [PATCH v3 0/3] mempool: user-owned mempool caches
Updated version of the user-owned cache patchset. It applies on top of the latest external mempool manager patches from David Hunt [1]. [1] http://dpdk.org/ml/archives/dev/2016-June/041479.html v3 changes: * Deprecate specific mempool API calls instead of removing them. * Split deprecation into a separate commit to limit noise. * Fix cache flush by setting cache->len = 0 and make it inline. * Remove cache->size == 0 checks and ensure size != 0 at creation. * Fix tests to check if cache creation succeeded. * Fix tests to free allocated resources on error. The mempool cache is only available to EAL threads as a per-lcore resource. Change this so that the user can create and provide their own cache on mempool get and put operations. This works with non-EAL threads too. Also, deprecate the explicit {mp,sp}_put and {mc,sc}_get calls and re-route them through the new generic calls. Minor cleanup to pass the mempool bit flags instead of using specific is_mp and is_mc. The old cache-oblivious API calls use the per-lcore default local cache. The mempool and mempool_perf tests are also updated to handle the user-owned cache case. Introduced API calls: rte_mempool_cache_create(size, socket_id) rte_mempool_cache_free(cache) rte_mempool_cache_flush(cache, mp) rte_mempool_default_cache(mp, lcore_id) rte_mempool_generic_put(mp, obj_table, n, cache, flags) rte_mempool_generic_get(mp, obj_table, n, cache, flags) Deprecated API calls: rte_mempool_mp_put_bulk(mp, obj_table, n) rte_mempool_sp_put_bulk(mp, obj_table, n) rte_mempool_mp_put(mp, obj) rte_mempool_sp_put(mp, obj) rte_mempool_mc_get_bulk(mp, obj_table, n) rte_mempool_sc_get_bulk(mp, obj_table, n) rte_mempool_mc_get(mp, obj_p) rte_mempool_sc_get(mp, obj_p) Lazaros Koromilas (3): mempool: deprecate specific get/put functions mempool: use bit flags instead of is_mp and is_mc mempool: allow for user-owned mempool caches app/test/test_mempool.c | 104 +++- app/test/test_mempool_perf.c | 70 +-- lib/librte_mempool/rte_mempool.c | 66 +- lib/librte_mempool/rte_mempool.h | 256 +-- 4 files changed, 385 insertions(+), 111 deletions(-) -- 1.9.1
[dpdk-dev] [PATCH v5 00/25] DPDK PMD for ThunderX NIC device
On Thu, Jun 16, 2016 at 03:01:02PM +0530, Jerin Jacob wrote: > On Wed, Jun 15, 2016 at 03:39:25PM +0100, Bruce Richardson wrote: > > On Wed, Jun 15, 2016 at 12:36:15AM +0530, Jerin Jacob wrote: > > > This patch set provides the initial version of DPDK PMD for the > > > built-in NIC device in Cavium ThunderX SoC family. > > > > > > Implemented features and ThunderX nicvf PMD documentation added > > > in doc/guides/nics/overview.rst and doc/guides/nics/thunderx.rst > > > respectively in this patch set. > > > > > > These patches are checked using checkpatch.sh with following > > > additional ignore option: > > > options="$options --ignore=CAMELCASE,BRACKET_SPACE" > > > CAMELCASE - To accommodate PRIx64 > > > BRACKET_SPACE - To accommodate AT inline line assembly in two places > > > > > > This patch set is based on DPDK 16.07-RC1 > > > and tested with git HEAD change-set > > > ca173a909538a2f1082cd0dcb4d778a97dab69c3 along with > > > following depended patch > > > > > > http://dpdk.org/dev/patchwork/patch/11826/ > > > ethdev: add tunnel and port RSS offload types > > > > > Hi Jerin, > > > > hopefully a final set of comments before merge on this set, as it's looking > > very good now. > > > > * Two patches look like they need to be split, as they are combining > > multiple > > functions into one patch. They are: > > [dpdk-dev,v5,16/25] net/thunderx: add MTU set and promiscuous enable > > support > > [dpdk-dev,v5,20/25] net/thunderx: implement supported ptype get and Rx > > queue count > > For the other patches which add multiple functions, the functions seem to > > be > > logically related so I don't think there is a problem > > > > * check-git-logs.sh is warning about a few of the commit messages being too > > long. > > Splitting patch 20 should fix one of those, but there are a few remaining. > > A number of titles refer to ThunderX in the message, but this is probably > > unnecessary, as the prefix already contains "net/thunderx" in it. > > OK. I will send the next revision. > Please hold off a few hours, as I'm hoping to merge in the bnxt driver this afternoon. If all goes well, I would appreciate it if you could base your patchset off the rel_16_07 tree with that set applied - save me having to resolve conflicts in files like the nic overview doc, which is always a pain to try and edit. :-) Regards, /Bruce
[dpdk-dev] [PATCH] hash: new function to retrieve a key given its position
On Thu, Jun 16, 2016 at 10:23:42AM +, Juan Antonio Montesinos Delgado wrote: > Hi, > > As I understand it, the hash table entry can change position in the first > hash table but the index in the second hash table remains the same. So, > regardless the bucket the entry is in, the index (of the second hash table) > stored in that entry will be the same. Am I right? > > Best, > > Juan Antonio > Ah, yes, you are right. The key data should not move, only the hash value. I'd forgotten that. /Bruce > -Original Message- > From: Bruce Richardson [mailto:bruce.richardson at intel.com] > Sent: jueves, 16 de junio de 2016 11:50 > To: Yari Adan PETRALANDA > Cc: pablo.de.lara.guarch at intel.com; Juan Antonio Montesinos Delgado > ; dev at dpdk.org > Subject: Re: [PATCH] hash: new function to retrieve a key given its position > > On Thu, Jun 16, 2016 at 10:22:30AM +0200, Yari Adan Petralanda wrote: > > The function rte_hash_get_key_with_position is added in this patch. > > As the position returned when adding a key is frequently used as an > > offset into an array of user data, this function performs the > > operation of retrieving a key given this offset. > > > > A possible use case would be to delete a key from the hash table when > > its entry in the array of data has certain value. For instance, the > > key could be a flow 5-tuple, and the value stored in the array a time stamp. > > > > I have my doubts that this will work. With cuckoo hashing, a hash table entry > can change position multiple times after it is added, as the table is > reorganised to make room for new entries. > > Regards, > /Bruce >
[dpdk-dev] [PATCH v1 02/28] eal: extract function eal_parse_sysfs_valuef
Sorry, didn't notice this email earlier... Comments inline > -Original Message- > From: Jan Viktorin [mailto:viktorin at rehivetech.com] > Sent: Wednesday, June 15, 2016 3:26 PM > To: Shreyansh Jain > Cc: dev at dpdk.org; David Marchand ; Thomas > Monjalon > ; Bruce Richardson intel.com>; > Declan Doherty ; jianbo.liu at linaro.org; > jerin.jacob at caviumnetworks.com; Keith Wiles ; > Stephen > Hemminger > Subject: Re: [dpdk-dev] [PATCH v1 02/28] eal: extract function > eal_parse_sysfs_valuef > > On Tue, 14 Jun 2016 04:30:57 + > Shreyansh Jain wrote: > > > Hi Jan, > > [...] > > > > > > > > I almost skipped the '..f' in the name and wondered how two functions > > > having same name exist :D > > > > > > I agree that a better name would be nice here. This convention was based > on > > > the libc naming > > > (fopen, fclose) but the "f" letter could not be at the beginning. > > > > > > What about one of those? > > > > > > * eal_parse_sysfs_fd_value > > > * eal_parse_sysfs_file_value > > > > I don't have any better idea than above. > > > > Though, I still feel that 'eal_parse_sysfs_value -> > eal_parse_sysfs_file_value' would be slightly asymmetrical - but again, this > is highly subjective argument. > > I don't see any asymmetry here. The functions equal, just the new one accepts > a file pointer instead of a path > and we don't have function name overloading in C. Asymmetrical because cascading function names maybe additive for easy reading/recall. 'eal_parse_sysfs_value ==> eal_parse_sysfs_value_ ==> eal_parse_sysfs_value__' Obviously, this is not a rule - it just makes reading and recalling of cascade easier. As for: eal_parse_sysfs_value => eal_parse_sysfs_file_value inserts an identifier between a name, making it (slightly) difficult to correlate. Again, as I mentioned earlier, this is subjective argument and matter of (personal!) choice. > > > > > Or, eal_parse_sysfs_value -> eal_parse_sysfs_value_read() may be... > > I think, I'll go with eal_parse_sysfs_file_value for v2. Ideally, it should > be > eal_parse_sysfs_path_value and eal_parse_sysfs_file_value. Thus, this looks > like > a good way. > > > > > But, eal_parse_sysfs_file_value is still preferred than > eal_parse_sysfs_fd_value, for me. > > Agree. > [...] - Shreyansh
[dpdk-dev] [PATCH] igb_uio: fix build with backported kernel
On 15 June 2016 at 11:59, Ferruh Yigit wrote: > On 6/15/2016 4:57 PM, Ferruh Yigit wrote: > > Following compile error observed with CentOS 6.8, which uses kernel > > kernel-devel-2.6.32-642.el6.x86_64: > > > > CC eal_thread.o > > .../build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c: > > In function 'igbuio_msix_mask_irq': > > .../build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:157: > > error: 'PCI_MSIX_ENTRY_CTRL_MASKBIT' undeclared (first use in this > > function) > > > > Reported-by: Thiago > > Signed-off-by: Ferruh Yigit > > Hi Thiago, > > Can you please test this patch? > > Thanks, > ferruh > > Hi Ferruh, That patch applied and worked (kind of): --- [root at centos6-1 dpdk-16.04]# patch -p1 < ../dpdk-centos6.patch patching file lib/librte_eal/linuxapp/igb_uio/compat.h Hunk #1 succeeded at 24 with fuzz 2. --- It passed that broken step, however, it is failing in a different part of build process now, as follows: --- [root at centos6-1 ~]# time rpmbuild --ba /root/rpmbuild/SPECS/dpdk.spec ... ... LD librte_eal.so.2 INSTALL-LIB librte_eal.so.2 == Build lib/librte_eal/linuxapp/kni LD /root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/built-in.o CC [M] /root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.o CC [M] /root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_api.o In file included from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h:41, from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_type.h:31, from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_api.h:31, from /root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_api.c:28: /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h: In function '__kc_vlan_get_protocol': /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h:2836: error: implicit declaration of function 'vlan_tx_tag_present' make[8]: *** [/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_api.o] Error 1 make[8]: *** Waiting for unfinished jobs In file included from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h:41, from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_type.h:31, from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_dcb.h:32, from /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe.h:52, from /root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.c:56: /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h: In function '__kc_vlan_get_protocol': /root/rpmbuild/BUILD/dpdk-16.04/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h:2836: error: implicit declaration of function 'vlan_tx_tag_present' make[8]: *** [/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.o] Error 1 make[7]: *** [_module_/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni] Error 2 make[6]: *** [sub-make] Error 2 make[5]: *** [rte_kni.ko] Error 2 make[4]: *** [kni] Error 2 make[3]: *** [linuxapp] Error 2 make[2]: *** [librte_eal] Error 2 make[1]: *** [lib] Error 2 make: *** [all] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.Naoj9c (%build) --- Might be a totally different problem now, I don't know... :-) Best, Thiago