[dpdk-dev] bond: mode 4 promiscuous mode
Hey guys, Can we in function bond_mode_8023ad_activate_slave() try to add to the slave bond and LACP multicast MACs first? And then we would fall back into promiscuous mode if the adding has failed. In other words: if (rte_eth_dev_mac_addr_add(slave_id, bond_mac) != 0 || rte_eth_dev_mac_addr_add(slave_id, lacp_mac) != 0) { ... rte_eth_promiscuous_enable(slave_id) } Seems to work fine on my setup, but I might miss something. Regards, Andriy
[dpdk-dev] [PATCH] ether: fix configure() to use a default for max_rx_pkt_len
At the moment rte_eth_dev_configure() behaves inconsistent: - for normal frames: out of range max_rx_pkt_len uses a default - for jumbo frames: out of range max_rx_pkt_len gives an error This patch fixes this inconsistency by using a default value for max_rx_pkt_len both for normal and jumbo frames. Signed-off-by: Andriy Berestovskyy --- lib/librte_ether/rte_ethdev.c | 20 +--- lib/librte_ether/rte_ethdev.h | 6 +- 2 files changed, 10 insertions(+), 16 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index eb0a94a..f560051 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -856,21 +856,11 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, * length is supported by the configured device. */ if (dev_conf->rxmode.jumbo_frame == 1) { - if (dev_conf->rxmode.max_rx_pkt_len > - dev_info.max_rx_pktlen) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d max_rx_pkt_len %u" - " > max valid value %u\n", - port_id, - (unsigned)dev_conf->rxmode.max_rx_pkt_len, - (unsigned)dev_info.max_rx_pktlen); - return -EINVAL; - } else if (dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d max_rx_pkt_len %u" - " < min valid value %u\n", - port_id, - (unsigned)dev_conf->rxmode.max_rx_pkt_len, - (unsigned)ETHER_MIN_LEN); - return -EINVAL; + if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen || + dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN) { + /* Use maximum frame size the NIC supports */ + dev->data->dev_conf.rxmode.max_rx_pkt_len = + dev_info.max_rx_pktlen; } } else { if (dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN || diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 4be217c..2adfd77 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -349,7 +349,11 @@ enum rte_eth_tx_mq_mode { struct rte_eth_rxmode { /** The multi-queue packet distribution mode to be used, e.g. RSS. */ enum rte_eth_rx_mq_mode mq_mode; - uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */ + /** +* Desired maximum RX frame size. Too short or too long size will be +* substituted by a default value. +*/ + uint32_t max_rx_pkt_len; uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/ __extension__ uint16_t header_split : 1, /**< Header Split enable. */ -- 2.7.4
[dpdk-dev] [PATCH v2] ether: use a default for max Rx frame size in configure()
At the moment rte_eth_dev_configure() behaves inconsistent: - for normal frames: out of range max_rx_pkt_len uses a default - for jumbo frames: out of range max_rx_pkt_len gives an error This patch fixes this inconsistency by using a default value for max_rx_pkt_len both for normal and jumbo frames. Signed-off-by: Andriy Berestovskyy --- Notes: v2 changes: - reword the commit title according to the check-git-log.sh lib/librte_ether/rte_ethdev.c | 20 +--- lib/librte_ether/rte_ethdev.h | 6 +- 2 files changed, 10 insertions(+), 16 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index eb0a94a..f560051 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -856,21 +856,11 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, * length is supported by the configured device. */ if (dev_conf->rxmode.jumbo_frame == 1) { - if (dev_conf->rxmode.max_rx_pkt_len > - dev_info.max_rx_pktlen) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d max_rx_pkt_len %u" - " > max valid value %u\n", - port_id, - (unsigned)dev_conf->rxmode.max_rx_pkt_len, - (unsigned)dev_info.max_rx_pktlen); - return -EINVAL; - } else if (dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d max_rx_pkt_len %u" - " < min valid value %u\n", - port_id, - (unsigned)dev_conf->rxmode.max_rx_pkt_len, - (unsigned)ETHER_MIN_LEN); - return -EINVAL; + if (dev_conf->rxmode.max_rx_pkt_len > dev_info.max_rx_pktlen || + dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN) { + /* Use maximum frame size the NIC supports */ + dev->data->dev_conf.rxmode.max_rx_pkt_len = + dev_info.max_rx_pktlen; } } else { if (dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN || diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 4be217c..2adfd77 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -349,7 +349,11 @@ enum rte_eth_tx_mq_mode { struct rte_eth_rxmode { /** The multi-queue packet distribution mode to be used, e.g. RSS. */ enum rte_eth_rx_mq_mode mq_mode; - uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */ + /** +* Desired maximum RX frame size. Too short or too long size will be +* substituted by a default value. +*/ + uint32_t max_rx_pkt_len; uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/ __extension__ uint16_t header_split : 1, /**< Header Split enable. */ -- 2.7.4
Re: [dpdk-dev] [PATCH v2] ether: use a default for max Rx frame size in configure()
Hey Qiming, On 27.03.2017 08:15, Yang, Qiming wrote: I don't think this is a bug. Return errors when configure an invalid max_rx_pkt_len is suitable for this generic API. It is not a bug, it is an inconsistency. At the moment we can set max_rx_pkt_len for normal frames and if it is out of range a default value will be used instead. IMO we should expect the same behavior from the same function for the jumbo frames. So at the moment we have: jumbo == 0, max_rx_pkt_len == 0, result: max_rx_pkt_len = ETHER_MAX_LEN jumbo == 0, max_rx_pkt_len == 1200, result: max_rx_pkt_len = 1200 jumbo == 1, max_rx_pkt_len == 0, result: error jumbo == 1, max_rx_pkt_len == 9K, result: error or max_rx_pkt_len = 9K It's not suitable to give a default value in this function. We use a default value for normal frames at the moment. The comment: uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */ is obsolete and in fact we use max_rx_pkt_len both for jumbo and normal frames. So the patch clarifies this as well. Regards, Andriy
[dpdk-dev] [PATCH] usertools: use /sys/devices/system/cpu for CPU layout script
Some platforms do not have core/socket info in /proc/cpuinfo. Signed-off-by: Andriy Berestovskyy --- usertools/cpu_layout.py | 53 + 1 file changed, 23 insertions(+), 30 deletions(-) diff --git a/usertools/cpu_layout.py b/usertools/cpu_layout.py index 0e049a6..5735891 100755 --- a/usertools/cpu_layout.py +++ b/usertools/cpu_layout.py @@ -4,6 +4,7 @@ # BSD LICENSE # # Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# Copyright(c) 2017 Cavium Networks Ltd. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -38,40 +39,32 @@ sockets = [] cores = [] core_map = {} - -fd = open("/proc/cpuinfo") -lines = fd.readlines() +base_path = "/sys/devices/system/cpu" +fd = open("{}/kernel_max".format(base_path)) +max_cpus = int(fd.read()) fd.close() - -core_details = [] -core_lines = {} -for line in lines: -if len(line.strip()) != 0: -name, value = line.split(":", 1) -core_lines[name.strip()] = value.strip() -else: -core_details.append(core_lines) -core_lines = {} - -for core in core_details: -for field in ["processor", "core id", "physical id"]: -if field not in core: -print("Error getting '%s' value from /proc/cpuinfo" % field) -sys.exit(1) -core[field] = int(core[field]) - -if core["core id"] not in cores: -cores.append(core["core id"]) -if core["physical id"] not in sockets: -sockets.append(core["physical id"]) -key = (core["physical id"], core["core id"]) +for cpu in xrange(max_cpus + 1): +try: +fd = open("{}/cpu{}/topology/core_id".format(base_path, cpu)) +except: +break +core = int(fd.read()) +fd.close() +fd = open("{}/cpu{}/topology/physical_package_id".format(base_path, cpu)) +socket = int(fd.read()) +fd.close() +if core not in cores: +cores.append(core) +if socket not in sockets: +sockets.append(socket) +key = (socket, core) if key not in core_map: core_map[key] = [] -core_map[key].append(core["processor"]) +core_map[key].append(cpu) -print("") -print("Core and Socket Information (as reported by '/proc/cpuinfo')") -print("\n") +print(format("=" * (47 + len(base_path +print("Core and Socket Information (as reported by '{}')".format(base_path)) +print("{}\n".format("=" * (47 + len(base_path print("cores = ", cores) print("sockets = ", sockets) print("") -- 2.7.4
[dpdk-dev] [PATCH 2/5] examples/ip_pipeline: avoid panic if link up/down is not supported
Some PMDs (mostly VFs) do not provide link up/down functionality. Signed-off-by: Andriy Berestovskyy --- examples/ip_pipeline/init.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c index 1dc2a04..be148fc 100644 --- a/examples/ip_pipeline/init.c +++ b/examples/ip_pipeline/init.c @@ -717,7 +717,8 @@ app_link_up_internal(struct app_params *app, struct app_link_params *cp) /* PMD link up */ status = rte_eth_dev_set_link_up(cp->pmd_id); - if (status < 0) + /* Do not panic if PMD does not provide link up functionality */ + if (status < 0 && status != -ENOTSUP) rte_panic("%s (%" PRIu32 "): PMD set link up error %" PRId32 "\n", cp->name, cp->pmd_id, status); @@ -733,7 +734,8 @@ app_link_down_internal(struct app_params *app, struct app_link_params *cp) /* PMD link down */ status = rte_eth_dev_set_link_down(cp->pmd_id); - if (status < 0) + /* Do not panic if PMD does not provide link down functionality */ + if (status < 0 && status != -ENOTSUP) rte_panic("%s (%" PRIu32 "): PMD set link down error %" PRId32 "\n", cp->name, cp->pmd_id, status); -- 2.7.4
[dpdk-dev] [PATCH 1/5] examples/ip_pipeline: add support for more than 32 CPUs
At the moment ip_pipeline example uses 32 during the initialization, which leads to an error on systems with more than 32 CPUs. Signed-off-by: Andriy Berestovskyy --- examples/ip_pipeline/init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c index c7f9470..1dc2a04 100644 --- a/examples/ip_pipeline/init.c +++ b/examples/ip_pipeline/init.c @@ -69,7 +69,8 @@ static void app_init_core_map(struct app_params *app) { APP_LOG(app, HIGH, "Initializing CPU core map ..."); - app->core_map = cpu_core_map_init(4, 32, 4, 0); + app->core_map = cpu_core_map_init(RTE_MAX_NUMA_NODES, RTE_MAX_LCORE, + 4, 0); if (app->core_map == NULL) rte_panic("Cannot create CPU core map\n"); -- 2.7.4
[dpdk-dev] [PATCH 4/5] port: fix file descriptor reader
The code should return the actual number of packets read. Fixes: 5a99f208 ("port: support file descriptor") Signed-off-by: Andriy Berestovskyy --- lib/librte_port/rte_port_fd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_port/rte_port_fd.c b/lib/librte_port/rte_port_fd.c index 03e69f5..914dfac 100644 --- a/lib/librte_port/rte_port_fd.c +++ b/lib/librte_port/rte_port_fd.c @@ -108,7 +108,7 @@ static int rte_port_fd_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) { struct rte_port_fd_reader *p = (struct rte_port_fd_reader *) port; - uint32_t i; + uint32_t i, j; if (rte_pktmbuf_alloc_bulk(p->mempool, pkts, n_pkts) != 0) return 0; @@ -126,12 +126,12 @@ rte_port_fd_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) pkt->pkt_len = n_bytes; } - for ( ; i < n_pkts; i++) - rte_pktmbuf_free(pkts[i]); + for (j = i; j < n_pkts; j++) + rte_pktmbuf_free(pkts[j]); RTE_PORT_FD_READER_STATS_PKTS_IN_ADD(p, i); - return n_pkts; + return i; } static int -- 2.7.4
[dpdk-dev] [PATCH 3/5] port: use mbuf alloc bulk instead of mempool
Makes code a bit cleaner and type-aware. Signed-off-by: Andriy Berestovskyy --- lib/librte_port/rte_port_fd.c | 7 +-- lib/librte_port/rte_port_source_sink.c | 7 +-- 2 files changed, 2 insertions(+), 12 deletions(-) diff --git a/lib/librte_port/rte_port_fd.c b/lib/librte_port/rte_port_fd.c index 0d640f3..03e69f5 100644 --- a/lib/librte_port/rte_port_fd.c +++ b/lib/librte_port/rte_port_fd.c @@ -110,15 +110,10 @@ rte_port_fd_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) struct rte_port_fd_reader *p = (struct rte_port_fd_reader *) port; uint32_t i; - if (rte_mempool_get_bulk(p->mempool, (void **) pkts, n_pkts) != 0) + if (rte_pktmbuf_alloc_bulk(p->mempool, pkts, n_pkts) != 0) return 0; for (i = 0; i < n_pkts; i++) { - rte_mbuf_refcnt_set(pkts[i], 1); - rte_pktmbuf_reset(pkts[i]); - } - - for (i = 0; i < n_pkts; i++) { struct rte_mbuf *pkt = pkts[i]; void *pkt_data = rte_pktmbuf_mtod(pkt, void *); ssize_t n_bytes; diff --git a/lib/librte_port/rte_port_source_sink.c b/lib/librte_port/rte_port_source_sink.c index 4cad710..796418a 100644 --- a/lib/librte_port/rte_port_source_sink.c +++ b/lib/librte_port/rte_port_source_sink.c @@ -289,14 +289,9 @@ rte_port_source_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts) struct rte_port_source *p = (struct rte_port_source *) port; uint32_t i; - if (rte_mempool_get_bulk(p->mempool, (void **) pkts, n_pkts) != 0) + if (rte_pktmbuf_alloc_bulk(p->mempool, pkts, n_pkts) != 0) return 0; - for (i = 0; i < n_pkts; i++) { - rte_mbuf_refcnt_set(pkts[i], 1); - rte_pktmbuf_reset(pkts[i]); - } - if (p->pkt_buff != NULL) { for (i = 0; i < n_pkts; i++) { uint8_t *pkt_data = rte_pktmbuf_mtod(pkts[i], -- 2.7.4
[dpdk-dev] [PATCH 5/5] port: minor typo
Signed-off-by: Andriy Berestovskyy --- lib/librte_port/rte_port_ethdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c index 5aaa8f7..6862849 100644 --- a/lib/librte_port/rte_port_ethdev.c +++ b/lib/librte_port/rte_port_ethdev.c @@ -456,8 +456,8 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void *port, return 0; /* -* If we didnt manage to send all packets in single burst, move -* remaining packets to the buffer and call send burst. +* If we did not manage to send all packets in single burst, +* move remaining packets to the buffer and call send burst. */ for (; n_pkts_ok < n_pkts; n_pkts_ok++) { struct rte_mbuf *pkt = pkts[n_pkts_ok]; -- 2.7.4
[dpdk-dev] [PATCH 1/2] net/thunderx: add empty link up/down callbacks
Some applications and DPDK examples expect link up/down functionality to be provided. Signed-off-by: Andriy Berestovskyy --- drivers/net/thunderx/nicvf_ethdev.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c index 1060319..984c218 100644 --- a/drivers/net/thunderx/nicvf_ethdev.c +++ b/drivers/net/thunderx/nicvf_ethdev.c @@ -1924,11 +1924,25 @@ nicvf_dev_configure(struct rte_eth_dev *dev) return 0; } +static int +nicvf_dev_set_link_up(struct rte_eth_dev *dev __rte_unused) +{ + return 0; +} + +static int +nicvf_dev_set_link_down(struct rte_eth_dev *dev __rte_unused) +{ + return 0; +} + /* Initialize and register driver with DPDK Application */ static const struct eth_dev_ops nicvf_eth_dev_ops = { .dev_configure= nicvf_dev_configure, .dev_start= nicvf_dev_start, .dev_stop = nicvf_dev_stop, + .dev_set_link_up = nicvf_dev_set_link_up, + .dev_set_link_down= nicvf_dev_set_link_down, .link_update = nicvf_dev_link_update, .dev_close= nicvf_dev_close, .stats_get= nicvf_dev_stats_get, -- 2.7.4
[dpdk-dev] [PATCH 2/2] net/thunderx: wait to complete during link update
Some DPDK applications/examples check link status on their start. NICVF does not wait for the link, so those apps fail. Wait up to 9 seconds for the link as other PMDs do in order to fix those apps/examples. Signed-off-by: Andriy Berestovskyy --- drivers/net/thunderx/nicvf_ethdev.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c index 984c218..2fe653a 100644 --- a/drivers/net/thunderx/nicvf_ethdev.c +++ b/drivers/net/thunderx/nicvf_ethdev.c @@ -145,16 +145,29 @@ nicvf_periodic_alarm_stop(void (fn)(void *), void *arg) * Return 0 means link status changed, -1 means not changed */ static int -nicvf_dev_link_update(struct rte_eth_dev *dev, - int wait_to_complete __rte_unused) +nicvf_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) { +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ struct rte_eth_link link; struct nicvf *nic = nicvf_pmd_priv(dev); + int i; PMD_INIT_FUNC_TRACE(); - memset(&link, 0, sizeof(link)); - nicvf_set_eth_link_status(nic, &link); + if (wait_to_complete) { + /* rte_eth_link_get() might need to wait up to 9 seconds */ + for (i = 0; i < MAX_CHECK_TIME; i++) { + memset(&link, 0, sizeof(link)); + nicvf_set_eth_link_status(nic, &link); + if (link.link_status) + break; + rte_delay_ms(CHECK_INTERVAL); + } + } else { + memset(&link, 0, sizeof(link)); + nicvf_set_eth_link_status(nic, &link); + } return nicvf_atomic_write_link_status(dev, &link); } -- 2.7.4
[dpdk-dev] [PATCH] mempool: few typos
Signed-off-by: Andriy Berestovskyy --- lib/librte_mempool/rte_mempool.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 991feaa..898f443 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -654,7 +654,7 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, void *); * when using rte_mempool_get() or rte_mempool_get_bulk() is * "single-consumer". Otherwise, it is "multi-consumers". * - MEMPOOL_F_NO_PHYS_CONTIG: If set, allocated objects won't - * necessarilly be contiguous in physical memory. + * necessarily be contiguous in physical memory. * @return * The pointer to the new allocated mempool, on success. NULL on error * with rte_errno set appropriately. Possible rte_errno values include: @@ -794,7 +794,7 @@ rte_mempool_free(struct rte_mempool *mp); * Add physically contiguous memory for objects in the pool at init * * Add a virtually and physically contiguous memory chunk in the pool - * where objects can be instanciated. + * where objects can be instantiated. * * If the given physical address is unknown (paddr = RTE_BAD_PHYS_ADDR), * the chunk doesn't need to be physically contiguous (only virtually), @@ -825,7 +825,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr, * Add physical memory for objects in the pool at init * * Add a virtually contiguous memory chunk in the pool where objects can - * be instanciated. The physical addresses corresponding to the virtual + * be instantiated. The physical addresses corresponding to the virtual * area are described in paddr[], pg_num, pg_shift. * * @param mp @@ -856,7 +856,7 @@ int rte_mempool_populate_phys_tab(struct rte_mempool *mp, char *vaddr, * Add virtually contiguous memory for objects in the pool at init * * Add a virtually contiguous memory chunk in the pool where objects can - * be instanciated. + * be instantiated. * * @param mp * A pointer to the mempool structure. -- 2.7.4
[dpdk-dev] lpm performance
Hey, You are correct. The LPM might need just one (TBL24) or two memory reads (TBL24 + TBL8). The performance also drops once you have a variety of destination addresses instead of just one (cache misses). In your case for the dst IP 192.168.1.2 you will have two memory reads (TBL24 + TBL8), because 192.168.1/24 block has the more specific route 192.168.1.1/32. Regards, Andriy On Tue, Sep 20, 2016 at 12:18 AM, ?? wrote: > Hi all, > > > Does anyone test IPv4 performance? If so, what's the throughput? I can get > almost 10Gb with 64 byte packets. But before the test, I would expect it > will be less than 10G. I thought the performance will not be affected by the > number of rule entires. But the throughput will be related to whether the > flow needs to check the second layer table : TBL8. Is my understanding > correct? I added this flow entries following this link: > http://www.slideshare.net/garyachy/understanding-ddpd-algorithmics > slide 10, > > > > struct ipv4_lpm_route ipv4_lpm_route_array[] = { > > {IPv4(192, 168, 0, 0), 16, 0}, > > {IPv4(192, 168, 1, 0), 24, 1}, > > {IPv4(192, 168, 1, 1), 32, 2} > > }; > > send the flow with dst IP: > > 192.168.1.2 > > It should check the second layer table. But the performance is still 10G. > Does any part go wrong with my setup? Or it really can achieve 10G with 64 > byte packet size. > > Thanks, > > -- Andriy Berestovskyy
[dpdk-dev] lpm performance
AFAIR Intel hardware should do the 10Gbit/s line rate (i.e. ~14,8 MPPS) with one flow and LPM quite easily. Sorry, I don't have numbers to share at hand. Regarding the tool please see the pktgen-dpdk or TRex. Regarding the number of flows and overall benchmarking methodology - please see RFC2544. Andriy On Tue, Sep 20, 2016 at 12:47 PM, ?? wrote: > Thanks so much for your reply! Usually how did you test lpm performance > with variety of destination addresses? use which tool send the traffic? how > many flows rules will you add? what's the performance you get? > > > > > > > At 2016-09-20 17:41:13, "Andriy Berestovskyy" wrote: >>Hey, >>You are correct. The LPM might need just one (TBL24) or two memory >>reads (TBL24 + TBL8). The performance also drops once you have a >>variety of destination addresses instead of just one (cache misses). >> >>In your case for the dst IP 192.168.1.2 you will have two memory reads >>(TBL24 + TBL8), because 192.168.1/24 block has the more specific route >>192.168.1.1/32. >> >>Regards, >>Andriy >> >>On Tue, Sep 20, 2016 at 12:18 AM, ?? wrote: >>> Hi all, >>> >>> >>> Does anyone test IPv4 performance? If so, what's the throughput? I can >>> get almost 10Gb with 64 byte packets. But before the test, I would expect >>> it will be less than 10G. I thought the performance will not be affected by >>> the number of rule entires. But the throughput will be related to whether >>> the flow needs to check the second layer table : TBL8. Is my understanding >>> correct? I added this flow entries following this link: >>> http://www.slideshare.net/garyachy/understanding-ddpd-algorithmics >>> slide 10, >>> >>> >>> >>> struct ipv4_lpm_route ipv4_lpm_route_array[] = { >>> >>> {IPv4(192, 168, 0, 0), 16, 0}, >>> >>> {IPv4(192, 168, 1, 0), 24, 1}, >>> >>> {IPv4(192, 168, 1, 1), 32, 2} >>> >>> }; >>> >>> send the flow with dst IP: >>> >>> 192.168.1.2 >>> >>> It should check the second layer table. But the performance is still 10G. >>> Does any part go wrong with my setup? Or it really can achieve 10G with 64 >>> byte packet size. >>> >>> Thanks, >>> >>> >> >> >> >>-- >>Andriy Berestovskyy > > > > -- Andriy Berestovskyy
Re: [dpdk-dev] [PATCH v2] ether: use a default for max Rx frame size in configure()
Hi Thomas, On 06.04.2017 22:48, Thomas Monjalon wrote: Anyway, why not fixing it in the reverse way: returning error for out of range of non-jumbo frames? I guess we need to fix most of the examples then, since most of them just pass 0 for normal frames. And there is no default for jumbo frames, so an app must first get this info from the NIC... I am not sure setting a default value in the back of the caller is really a good behaviour. From app perspective, any working default is better that a non-working app, which you have to fix and recompile on each PMD/platform. What if we use 0 for a default value both for normal and jumbo frames (i.e. ETHER_MAX_LEN and dev_info.max_rx_pktlen) and an error if user passed a non-zero max_rx_pkt_len? It will make it consistent, we will not need to fix the existing apps and we will have a default both for normal and jumbo frames. Win-win? ;) Andriy
Re: [dpdk-dev] [PATCH v2] ether: use a default for max Rx frame size in configure()
On 07.04.2017 10:34, Thomas Monjalon wrote: We can set the right default value if the app input is 0, as a special case. For any other value, we must try to set it or return an error. Right, I will resend the patch. Andriy
[dpdk-dev] [PATCH v3] ether: use a default for max Rx frame size in configure()
At the moment rte_eth_dev_configure() behaves inconsistent: - for normal frames: zero max_rx_pkt_len uses a default - for jumbo frames: zero max_rx_pkt_len gives an error This patch fixes this inconsistency by using a default value if max_rx_pkt_len is zero both for normal and jumbo frames. Signed-off-by: Andriy Berestovskyy --- Notes: v3 changes: - use a default only if max_rx_pkt_len is zero v2 changes: - reword the commit title according to the check-git-log.sh lib/librte_ether/rte_ethdev.c | 23 --- lib/librte_ether/rte_ethdev.h | 2 +- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 4e1e6dc..2700c69 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -790,6 +790,7 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, { struct rte_eth_dev *dev; struct rte_eth_dev_info dev_info; + uint32_t max_len; int diag; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -858,17 +859,23 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, } /* -* If jumbo frames are enabled, check that the maximum RX packet -* length is supported by the configured device. +* Check that the maximum RX packet length is supported +* by the configured device. */ if (dev_conf->rxmode.jumbo_frame == 1) { - if (dev_conf->rxmode.max_rx_pkt_len > - dev_info.max_rx_pktlen) { + max_len = dev_info.max_rx_pktlen; + } else { + max_len = ETHER_MAX_LEN; + } + if (dev_conf->rxmode.max_rx_pkt_len == 0) { + dev->data->dev_conf.rxmode.max_rx_pkt_len = max_len; + } else { + if (dev_conf->rxmode.max_rx_pkt_len > max_len) { RTE_PMD_DEBUG_TRACE("ethdev port_id=%d max_rx_pkt_len %u" " > max valid value %u\n", port_id, (unsigned)dev_conf->rxmode.max_rx_pkt_len, - (unsigned)dev_info.max_rx_pktlen); + (unsigned int)max_len); return -EINVAL; } else if (dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN) { RTE_PMD_DEBUG_TRACE("ethdev port_id=%d max_rx_pkt_len %u" @@ -878,12 +885,6 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, (unsigned)ETHER_MIN_LEN); return -EINVAL; } - } else { - if (dev_conf->rxmode.max_rx_pkt_len < ETHER_MIN_LEN || - dev_conf->rxmode.max_rx_pkt_len > ETHER_MAX_LEN) - /* Use default value */ - dev->data->dev_conf.rxmode.max_rx_pkt_len = - ETHER_MAX_LEN; } /* diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index d072538..ea760dc 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -349,7 +349,7 @@ enum rte_eth_tx_mq_mode { struct rte_eth_rxmode { /** The multi-queue packet distribution mode to be used, e.g. RSS. */ enum rte_eth_rx_mq_mode mq_mode; - uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */ + uint32_t max_rx_pkt_len; /**< If zero, use a default packet length. */ uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/ __extension__ uint16_t header_split : 1, /**< Header Split enable. */ -- 2.7.4
Re: [dpdk-dev] [PATCH v3] ether: use a default for max Rx frame size in configure()
Hey Bruce, On 07.04.2017 14:29, Bruce Richardson wrote: Is this entirely hidden from drivers? As I said previously, I believe NICs using ixgbe/i40e etc. only use the frame size value when the jumbo frame flag is set. That may lead to further inconsistent behaviour unless all NICs are set up to behave as expected too. You are right. If we take just Intel PMDs: some use the max_rx_pkt_len only for jumbo frames (ixgbe), some always (i40) and some never (fm10k). What if we add to the max_rx_pkt_len description: "the effective maximum RX frame size depends on PMD, please refer the PMD guide for the details"? So with this patch we make rte_eth_dev_configure() clear and later PMDs might change or clarify their limitations in the NIC guides. Andriy
Re: [dpdk-dev] [PATCH v3] ether: use a default for max Rx frame size in configure()
Hey Thomas, On 07.04.2017 16:47, Thomas Monjalon wrote: What if we add to the max_rx_pkt_len description: "the effective maximum RX frame size depends on PMD, please refer the PMD guide for the details"? I think the problem is not in the documentation but in the implementations which should be more consistent. The hardware is different, there is not much we can do about it. Nevertheless, we can fix the false comment and have a default for the jumbos, which is beneficial for the apps/examples. If I understand well, the inconsistency between drivers was already an issue before your patch. Your patch fixes an inconsistency in ethdev without fixing the drivers. We need to know if it is a good first step and if the drivers can be fixed later. Thomas, some of the examples use a hard-coded jumbo frame size, which is too big for the underlaying PMDs, so those examples fail. The plan was to fix them all with this commit in ethdev but I am not sure now you are to accept the change. It is important for us to have those examples working in the upcoming release, do you think it is better to send fixes for those examples instead of this commit then? Andriy
[dpdk-dev] [PATCH 1/3] examples/ip_fragmentation: limit max frame size
Some PMDs do not support 9,5K jumbo frames, so the example fails. Limit the frame size to the maximum supported by the underlying NIC. Signed-off-by: Andriy Berestovskyy --- examples/ip_fragmentation/main.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c index 8d2ec43..31499c3 100644 --- a/examples/ip_fragmentation/main.c +++ b/examples/ip_fragmentation/main.c @@ -168,7 +168,7 @@ struct lcore_queue_conf { } __rte_cache_aligned; struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE]; -static const struct rte_eth_conf port_conf = { +static struct rte_eth_conf port_conf = { .rxmode = { .max_rx_pkt_len = JUMBO_FRAME_MAX_SIZE, .split_hdr_size = 0, @@ -915,6 +915,11 @@ main(int argc, char **argv) qconf = &lcore_queue_conf[rx_lcore_id]; + /* limit the frame size to the maximum supported by NIC */ + rte_eth_dev_info_get(portid, &dev_info); + port_conf.rxmode.max_rx_pkt_len = RTE_MIN( + dev_info.max_rx_pktlen, port_conf.rxmode.max_rx_pkt_len); + /* get the lcore_id for this port */ while (rte_lcore_is_enabled(rx_lcore_id) == 0 || qconf->n_rx_queue == (unsigned)rx_queue_per_lcore) { @@ -980,7 +985,6 @@ main(int argc, char **argv) printf("txq=%u,%d ", lcore_id, queueid); fflush(stdout); - rte_eth_dev_info_get(portid, &dev_info); txconf = &dev_info.default_txconf; txconf->txq_flags = 0; ret = rte_eth_tx_queue_setup(portid, queueid, nb_txd, -- 2.7.4
[dpdk-dev] [PATCH 2/3] examples/ip_reassembly: limit max frame size
Some PMDs do not support 9,5K jumbo frames, so the example fails. Limit the frame size to the maximum supported by the underlying NIC. Signed-off-by: Andriy Berestovskyy --- examples/ip_reassembly/main.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c index b641576..257881c 100644 --- a/examples/ip_reassembly/main.c +++ b/examples/ip_reassembly/main.c @@ -1063,6 +1063,11 @@ main(int argc, char **argv) qconf = &lcore_queue_conf[rx_lcore_id]; + /* limit the frame size to the maximum supported by NIC */ + rte_eth_dev_info_get(portid, &dev_info); + port_conf.rxmode.max_rx_pkt_len = RTE_MIN( + dev_info.max_rx_pktlen, port_conf.rxmode.max_rx_pkt_len); + /* get the lcore_id for this port */ while (rte_lcore_is_enabled(rx_lcore_id) == 0 || qconf->n_rx_queue == (unsigned)rx_queue_per_lcore) { @@ -1129,7 +1134,6 @@ main(int argc, char **argv) printf("txq=%u,%d,%d ", lcore_id, queueid, socket); fflush(stdout); - rte_eth_dev_info_get(portid, &dev_info); txconf = &dev_info.default_txconf; txconf->txq_flags = 0; -- 2.7.4
[dpdk-dev] [PATCH 3/3] examples/ipv4_multicast: limit max frame size
Some PMDs do not support 9,5K jumbo frames, so the example fails. Limit the frame size to the maximum supported by the underlying NIC. Signed-off-by: Andriy Berestovskyy --- examples/ipv4_multicast/main.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c index b681f8e..b4bd699 100644 --- a/examples/ipv4_multicast/main.c +++ b/examples/ipv4_multicast/main.c @@ -137,7 +137,7 @@ struct lcore_queue_conf { } __rte_cache_aligned; static struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE]; -static const struct rte_eth_conf port_conf = { +static struct rte_eth_conf port_conf = { .rxmode = { .max_rx_pkt_len = JUMBO_FRAME_MAX_SIZE, .split_hdr_size = 0, @@ -725,6 +725,11 @@ main(int argc, char **argv) qconf = &lcore_queue_conf[rx_lcore_id]; + /* limit the frame size to the maximum supported by NIC */ + rte_eth_dev_info_get(portid, &dev_info); + port_conf.rxmode.max_rx_pkt_len = RTE_MIN( + dev_info.max_rx_pktlen, port_conf.rxmode.max_rx_pkt_len); + /* get the lcore_id for this port */ while (rte_lcore_is_enabled(rx_lcore_id) == 0 || qconf->n_rx_queue == (unsigned)rx_queue_per_lcore) { @@ -777,7 +782,6 @@ main(int argc, char **argv) printf("txq=%u,%hu ", lcore_id, queueid); fflush(stdout); - rte_eth_dev_info_get(portid, &dev_info); txconf = &dev_info.default_txconf; txconf->txq_flags = 0; ret = rte_eth_tx_queue_setup(portid, queueid, nb_txd, -- 2.7.4
Re: [dpdk-dev] [PATCH v3] ether: use a default for max Rx frame size in configure()
Hey Thomas, On 21.04.2017 00:25, Thomas Monjalon wrote: The hardware is different, there is not much we can do about it. We can return an error if the max_rx_pkt_len cannot be set in the NIC. Yes, we pass the value to the PMD, which might check the value and return an error. >> Nevertheless, we can fix the false comment and have a default for the >> jumbos, which is beneficial for the apps/examples. > > The examples are using a hardcoded value, so they need to be fixed > anyway. We might change the hardcoded values to zeros once the patch is in. This will make the examples a bit more clear. This ethdev patch is about a behaviour change of the API. The behaviour was not documented, so IMO it is not an issue. It is about considering 0 as a request for default value and return an error if a value cannot be set. Right. It will require more agreements and changes in the drivers for returning an error where appropriate. IMO the changes are transparent for the PMDs (please see below), but it might affect some applications. Here is the change in API behaviour: Before the patch: jumbo == 0, max_rx_pkt_len == 0, RESULT: max_rx_pkt_len = ETHER_MAX_LEN jumbo == 0, max_rx_pkt_len == 10, RESULT: max_rx_pkt_len = ETHER_MAX_LEN jumbo == 0, max_rx_pkt_len == 1200, RESULT: max_rx_pkt_len = 1200 jumbo == 0, max_rx_pkt_len == 9K, RESULT: max_rx_pkt_len = ETHER_MAX_LEN jumbo == 1, max_rx_pkt_len == 0, RESULT: ERROR jumbo == 1, max_rx_pkt_len == 10, RESULT: ERROR jumbo == 1, max_rx_pkt_len == 1200, RESULT: max_rx_pkt_len = 1200 jumbo == 1, max_rx_pkt_len == 9K, RESULT: ERROR or max_rx_pkt_len = 9K jumbo == 1, max_rx_pkt_len == 90K, RESULT: ERROR After the patch: jumbo == 0, max_rx_pkt_len == 0, RESULT: max_rx_pkt_len = ETHER_MAX_LEN jumbo == 0, max_rx_pkt_len == 10, RESULT: ERROR (changed) jumbo == 0, max_rx_pkt_len == 1200, RESULT: max_rx_pkt_len = 1200 jumbo == 0, max_rx_pkt_len == 9K, RESULT: ERROR (changed) jumbo == 1, max_rx_pkt_len == 0, RESULT: max_rx_pkt_len = dev_info() jumbo == 1, max_rx_pkt_len == 10, RESULT: ERROR jumbo == 1, max_rx_pkt_len == 1200, RESULT: max_rx_pkt_len = 1200 jumbo == 1, max_rx_pkt_len == 9K, RESULT: ERROR or max_rx_pkt_len = 9K jumbo == 1, max_rx_pkt_len == 90K, RESULT: ERROR Only the apps which requested too small or too big normal frames will be affected. In most cases it will be rather an error in the app... Also I have looked through all the PMDs to confirm they are not affected. Here is the summary: af_packet configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = ETH_FRAME_LEN (1514) ark configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = ETH_FRAME_LEN (16K - 128) avp configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = avp->max_rx_pkt_len rx_queue_setup() uses max_rx_pkt_len for scattering bnx2x configure() uses max_rx_pkt_len to set internal mtu info() returns max_rx_pktlen = BNX2X_MAX_RX_PKT_LEN (15872) bnxt configure() uses max_rx_pkt_len to set internal mtu info() returns max_rx_pktlen = BNXT_MAX_MTU + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE (9000 + 14 + 4 + 4) bonding configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = internals->candidate_max_rx_pktlen or ETHER_MAX_JUMBO_FRAME_LEN (0x3F00) cxgbe configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = CXGBE_MAX_RX_PKTLEN (9000 + 14 + 4) rx_queue_setup() checks max_rx_pkt_len boundaries dpaa2 configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = DPAA2_MAX_RX_PKT_LEN (10240) e1000 (em) configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = em_get_max_pktlen() (0x2412, 0x1000, 1518, 0x3f00, depends on model) e1000 (igb) configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = 0x3fff start() writes max_rx_pkt_len to HW for jumbo frames only start() uses max_rx_pkt_len for scattering ena configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = adapter->max_mtu start() checks max_rx_pkt_len boundaries enic configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = enic->max_mtu + 14 + 4 fm10k configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = FM10K_MAX_PKT_SIZE (15 * 1024) start() uses max_rx_pkt_len for scattering i40e configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = I40E_FRAME_SIZE_MAX (9728) rx_queue_config() checks max_rx_pkt_len boundaries ixgbe configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = 15872 (9728 for vf) start() writes max_rx_pkt_len to HW for jumbo frames only start() uses max_rx_pkt_len for scattering kni configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = UINT32_MAX liquidio configure() does not use max_rx_pkt_len info() returns max_rx_pktlen = LIO_MAX_RX_PKTLEN (64K) start() checks max_rx_pkt_len boundaries mlx4 configure() uses max_rx_pkt_len for scattering info() returns max_rx_pktlen = 65536 mlx5 configure() u
Re: [dpdk-dev] [PATCH] usertools: use /sys/devices/system/cpu for CPU layout script
Hi, On 25.04.2017 10:48, Thomas Monjalon wrote: Do you think it is really a good idea to keep and maintain this script in DPDK? It was intentionnally not exported in "make install". I think it is a bit out of scope, and I wonder which alternatives do we have? I know hwloc/lstopo, but there are probably others. hwloc does not work on my target, but you are right, there are a variety of tools for that. For example, I prefer numactl (option -H) because it also allows to do many useful things, like bind CPUs to one node and memory allocations to another. At the moment the script is just like the lscpu, which is preinstalled on Ubuntu and mentioned in the documentation alongside with the cpu_layout. We could try to make the script more useful, for example, show which NIC is on which NUMA node. Still, it will be just a subset of functionality of tools like hwloc... Regards, Andriy
[dpdk-dev] [PATCH] examples/load_balancer: fix Tx flush
Port ID is not an index from 0 to n_nic_ports, but rather a value of nic_ports array. Signed-off-by: Andriy Berestovskyy --- examples/load_balancer/runtime.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c index 8192c08..7f918aa 100644 --- a/examples/load_balancer/runtime.c +++ b/examples/load_balancer/runtime.c @@ -420,10 +420,12 @@ static inline void app_lcore_io_tx_flush(struct app_lcore_params_io *lp) { uint8_t port; + uint32_t i; - for (port = 0; port < lp->tx.n_nic_ports; port ++) { + for (i = 0; i < lp->tx.n_nic_ports; i++) { uint32_t n_pkts; + port = lp->tx.nic_ports[i]; if (likely((lp->tx.mbuf_out_flush[port] == 0) || (lp->tx.mbuf_out[port].n_mbufs == 0))) { lp->tx.mbuf_out_flush[port] = 1; -- 2.7.4
Re: [dpdk-dev] [dpdk-maintainers] Example(Load_balancer) Tx Flush Bug(This bug DPDK each version)
Hey, Those patches superseded by: http://dpdk.org/ml/archives/dev/2017-April/064858.html Regards, Andriy On Mon, Jan 16, 2017 at 3:18 PM, Thomas Monjalon wrote: > Hi, > > Sorry if you feel your patch is ignored. > It is not in the right format for several reasons we tried > to explain earlier I think. > Please read carefully this doc: > http://dpdk.org/doc/guides/contributing/patches.html > > > 2017-01-16 19:16, Maple: >> From: Maple >> To: ; >> Cc: >> Subject: [PATCH] Load_balancer Tx Flush Bug >> Date: Mon, 16 Dec 2017 19:15:48 +0800 >> Message-Id: <1482371868-19669-1-git-send-email-liuj...@raisecom.com> >> X-Mailer: git-send-email 1.9.1 >> In-Reply-To: <2016122122394164225...@raisecom.com> >> References: <2016122122394164225...@raisecom.com> >> >> We found a bug in use load_balancer example,and,This bug DPDK each version. >> In IO tx flush, only flush port 0. >> So,If I enable more than the Port,then,In addition to 0 port won't flush. >> >> Signed-off-by: Maple >> --- >> a/examples/load_balancer/runtime.c | 667 >> >> b/examples/load_balancer/runtime.c | 669 >> + >> 2 files changed, 1336 insertions(+) >> create mode 100644 a/examples/load_balancer/runtime.c >> create mode 100644 b/examples/load_balancer/runtime.c >> >> diff --git a/examples/load_balancer/runtime.c >> b/examples/load_balancer/runtime.c >> index 9612392..3a2e900 100644 >> --- a/test/a/examples/load_balancer/runtime.c >> +++ b/test/b/examples/load_balancer/runtime.c >> @@ -418,9 +418,11 @@ app_lcore_io_tx( >> static inline void >> app_lcore_io_tx_flush(struct app_lcore_params_io *lp) >> { >> + uint8_t i; >> uint8_t port; >> >> - for (port = 0; port < lp->tx.n_nic_ports; port ++) { >> + port = lp->tx.nic_ports[0]; >> + for (i = 0; i < lp->tx.n_nic_ports; i ++) { >> uint32_t n_pkts; >> >> if (likely((lp->tx.mbuf_out_flush[port] == 0) || >> > > -- Andriy Berestovskyy
Re: [dpdk-dev] [PATCH] usertools: fix cpu_layout script for multithreads of more than 2
Works fine on ThunderX and does not brake Intel either. Reviewed-by: Andriy Berestovskyy Tested-by: Andriy Berestovskyy Andriy On 28.04.2017 13:58, Thomas Monjalon wrote: Andriy, please would you like to review this patch? 28/04/2017 12:34, Gowrishankar: From: Gowrishankar Muthukrishnan Current usertools/cpu_layout.py is broken to handle multithreads of count more than 2 as in IBM powerpc P8 servers. Below patch addressed this issue. Also, added minor exception catch on failing to open unavailable sys file in case of multithread=off configuration in server. Patch has been verified not to break existing topology configurations and also not changing anything in current output. Signed-off-by: Gowrishankar Muthukrishnan --- usertools/cpu_layout.py | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/usertools/cpu_layout.py b/usertools/cpu_layout.py index 5735891..99152a2 100755 --- a/usertools/cpu_layout.py +++ b/usertools/cpu_layout.py @@ -46,6 +46,8 @@ for cpu in xrange(max_cpus + 1): try: fd = open("{}/cpu{}/topology/core_id".format(base_path, cpu)) +except IOError: +continue except: break core = int(fd.read()) @@ -70,7 +72,10 @@ print("") max_processor_len = len(str(len(cores) * len(sockets) * 2 - 1)) -max_core_map_len = max_processor_len * 2 + len('[, ]') + len('Socket ') +max_thread_count = len(core_map.values()[0]) +max_core_map_len = (max_processor_len * max_thread_count) \ + + len(", ") * (max_thread_count - 1) \ + + len('[]') + len('Socket ') max_core_id_len = len(str(max(cores))) output = " ".ljust(max_core_id_len + len('Core ')) @@ -87,5 +92,8 @@ for c in cores: output = "Core %s" % str(c).ljust(max_core_id_len) for s in sockets: -output += " " + str(core_map[(s, c)]).ljust(max_core_map_len) +if core_map.has_key((s,c)): +output += " " + str(core_map[(s, c)]).ljust(max_core_map_len) +else: +output += " " * (max_core_map_len + 1) print(output)
[dpdk-dev] [PATCH 6/8] bond: handle slaves with fewer queues than bonding device
> -q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) { > +q_id < nb_rx_queues ; q_id++) { > bd_rx_q = (struct bond_rx_queue > *)bonded_eth_dev->data->rx_queues[q_id]; > > errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, > q_id, > @@ -1361,7 +1449,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev, > /* Setup Tx Queues */ > /* Use existing queues, if any */ > for (q_id = slave_eth_dev->data->nb_tx_queues; > -q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) { > +q_id < nb_tx_queues ; q_id++) { > bd_tx_q = (struct bond_tx_queue > *)bonded_eth_dev->data->tx_queues[q_id]; > > errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, > q_id, > @@ -1440,7 +1528,8 @@ bond_ethdev_slave_link_status_change_monitor(void > *cb_arg); > > void > slave_add(struct bond_dev_private *internals, > - struct rte_eth_dev *slave_eth_dev) > + struct rte_eth_dev *slave_eth_dev, > + const struct rte_eth_dev_info *slave_dev_info) > { > struct bond_slave_details *slave_details = > &internals->slaves[internals->slave_count]; > @@ -1448,6 +1537,20 @@ slave_add(struct bond_dev_private *internals, > slave_details->port_id = slave_eth_dev->data->port_id; > slave_details->last_link_status = 0; > > + uint16_t bond_nb_rx_queues = > + rte_eth_devices[internals->port_id].data->nb_rx_queues; > + uint16_t bond_nb_tx_queues = > + rte_eth_devices[internals->port_id].data->nb_tx_queues; > + > + slave_details->nb_rx_queues = > + bond_nb_rx_queues > slave_dev_info->max_rx_queues > + ? slave_dev_info->max_rx_queues > + : bond_nb_rx_queues; > + slave_details->nb_tx_queues = > + bond_nb_tx_queues > slave_dev_info->max_tx_queues > + ? slave_dev_info->max_tx_queues > + : bond_nb_tx_queues; > + > /* If slave device doesn't support interrupts then we need to enabled > * polling to monitor link status */ > if (!(slave_eth_dev->data->dev_flags & RTE_PCI_DRV_INTR_LSC)) { > diff --git a/drivers/net/bonding/rte_eth_bond_private.h > b/drivers/net/bonding/rte_eth_bond_private.h > index 6c47a29..02f6de1 100644 > --- a/drivers/net/bonding/rte_eth_bond_private.h > +++ b/drivers/net/bonding/rte_eth_bond_private.h > @@ -101,6 +101,8 @@ struct bond_slave_details { > uint8_t link_status_poll_enabled; > uint8_t link_status_wait_to_complete; > uint8_t last_link_status; > + uint16_t nb_rx_queues; > + uint16_t nb_tx_queues; > /**< Port Id of slave eth_dev */ > struct ether_addr persisted_mac_addr; > > @@ -240,7 +242,8 @@ slave_remove(struct bond_dev_private *internals, > > void > slave_add(struct bond_dev_private *internals, > - struct rte_eth_dev *slave_eth_dev); > + struct rte_eth_dev *slave_eth_dev, > + const struct rte_eth_dev_info *slave_dev_info); > > uint16_t > xmit_l2_hash(const struct rte_mbuf *buf, uint8_t slave_count); > -- > 2.1.4 > -- Andriy Berestovskyy
[dpdk-dev] [PATCH] bond: fix LACP mempool size
The following messages might appear after some idle time: "PMD: Failed to allocate LACP packet from pool" The fix ensures the mempool size is greater than the sum of TX descriptors. --- drivers/net/bonding/rte_eth_bond_8023ad.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index ee2964a..b3b30f6 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -851,6 +851,9 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, uint8_t slave_id) char mem_name[RTE_ETH_NAME_MAX_LEN]; int socket_id; unsigned element_size; + uint32_t total_tx_desc; + struct bond_tx_queue *bd_tx_q; + uint16_t q_id; /* Given slave mus not be in active list */ RTE_VERIFY(find_slave_by_id(internals->active_slaves, @@ -884,14 +887,17 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, uint8_t slave_id) element_size = sizeof(struct slow_protocol_frame) + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM; -/* How big memory pool should be? If driver will not - * free packets quick enough there will be ENOMEM in tx_machine. - * For now give 511 pkts * max number of queued TX packets per slave. - * Hope it will be enough. */ + /* The size of the mempool should be at least: +* the sum of the TX descriptors + BOND_MODE_8023AX_SLAVE_TX_PKTS */ + total_tx_desc = BOND_MODE_8023AX_SLAVE_TX_PKTS; + for (q_id = 0; q_id < bond_dev->data->nb_rx_queues; q_id++) { + bd_tx_q = (struct bond_tx_queue*)bond_dev->data->tx_queues[q_id]; + total_tx_desc += bd_tx_q->nb_tx_desc; + } + snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", slave_id); port->mbuf_pool = rte_mempool_create(mem_name, - BOND_MODE_8023AX_SLAVE_TX_PKTS * 512 - 1, - element_size, + total_tx_desc, element_size, RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? 32 : RTE_MEMPOOL_CACHE_MAX_SIZE, sizeof(struct rte_pktmbuf_pool_private), rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, socket_id, MEMPOOL_F_NO_SPREAD); @@ -932,12 +938,12 @@ bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev, struct port *port; uint8_t i; - /* Given slave mus be in active list */ + /* Given slave must be in active list */ RTE_VERIFY(find_slave_by_id(internals->active_slaves, internals->active_slave_count, slave_id) < internals->active_slave_count); /* Exclude slave from transmit policy. If this slave is an aggregator -* make all aggregated slaves unselected to force sellection logic +* make all aggregated slaves unselected to force selection logic * to select suitable aggregator for this port. */ for (i = 0; i < internals->active_slave_count; i++) { port = &mode_8023ad_ports[internals->active_slaves[i]]; @@ -1095,7 +1101,7 @@ bond_mode_8023ad_handle_slow_pkt(struct bond_dev_private *internals, goto free_out; } - /* Setup marker timer. Do it in loop in case concurent access. */ + /* Setup marker timer. Do it in loop in case concurrent access. */ do { old_marker_timer = port->rx_marker_timer; if (!timer_is_expired(&old_marker_timer)) { -- 1.9.1
[dpdk-dev] [PATCH] bonding: fix reordering of IP fragments
Fragmented IPv4 packets have no TCP/UDP headers, so we hashed random data introducing reordering of the fragments. --- drivers/net/bonding/rte_eth_bond_pmd.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 8f84ec1..b1373c6 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -552,17 +553,20 @@ xmit_l34_hash(const struct rte_mbuf *buf, uint8_t slave_count) l3hash = ipv4_hash(ipv4_hdr); - ip_hdr_offset = (ipv4_hdr->version_ihl & IPV4_HDR_IHL_MASK) * - IPV4_IHL_MULTIPLIER; - - if (ipv4_hdr->next_proto_id == IPPROTO_TCP) { - tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + - ip_hdr_offset); - l4hash = HASH_L4_PORTS(tcp_hdr); - } else if (ipv4_hdr->next_proto_id == IPPROTO_UDP) { - udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + - ip_hdr_offset); - l4hash = HASH_L4_PORTS(udp_hdr); + /* there is no L4 header in fragmented packet */ + if (likely(rte_ipv4_frag_pkt_is_fragmented(ipv4_hdr) == 0)) { + ip_hdr_offset = (ipv4_hdr->version_ihl & IPV4_HDR_IHL_MASK) * + IPV4_IHL_MULTIPLIER; + + if (ipv4_hdr->next_proto_id == IPPROTO_TCP) { + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + + ip_hdr_offset); + l4hash = HASH_L4_PORTS(tcp_hdr); + } else if (ipv4_hdr->next_proto_id == IPPROTO_UDP) { + udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + + ip_hdr_offset); + l4hash = HASH_L4_PORTS(udp_hdr); + } } } else if (rte_cpu_to_be_16(ETHER_TYPE_IPv6) == proto) { struct ipv6_hdr *ipv6_hdr = (struct ipv6_hdr *) -- 1.9.1
[dpdk-dev] [PATCH] bond: fix LACP mempool size
On Tue, Dec 8, 2015 at 2:23 PM, Andriy Berestovskyy wrote: > The following messages might appear after some idle time: > "PMD: Failed to allocate LACP packet from pool" > > The fix ensures the mempool size is greater than the sum > of TX descriptors. Signed-off-by: Andriy Berestovskyy
[dpdk-dev] [PATCH] bonding: fix reordering of IP fragments
On Tue, Dec 8, 2015 at 3:47 PM, Andriy Berestovskyy wrote: > Fragmented IPv4 packets have no TCP/UDP headers, so we hashed > random data introducing reordering of the fragments. Signed-off-by: Andriy Berestovskyy
Re: [dpdk-dev] [PATCH v3] ether: use a default for max Rx frame size in configure()
Hi Shahaf, > On 23 May 2018, at 07:21, Shahaf Shuler wrote: > I think this patch addressing just small issue in a bigger problem. > The way I see it all application needs to specify is the max packet size it > expects to receive, nothing else(!). [...] > IMO The "jumbo_frame" bit can be set by the underlying PMD directly to the > device registers given the max_rx_pkt_len configuration. Sure, it can be deducted in PMD if max_rx_pkt_len is greater than the normal frame size. The background behind this patch was to fix some examples on some platforms by allowing them to just set the jumbo bit in config and let the DPDK to deduct the optimal jumbo max_rx_pkt_len. There was also another patch which fixed those examples, so they first query the max_rx_pkt_len and then pass it with the config: http://dpdk.org/commit/5e470a6654 That patch has been merged, so now we can fix/change the API in any way we decide, there is no urgency anymore. Looks like the jumbo bit in config is redundant, but there might be other opinions. Andriy
Re: [dpdk-dev] [PATCH v3] ether: use a default for max Rx frame size in configure()
Sure, Ferruh. Just let me know how can I help you. Andriy > On 23 Jan 2019, at 19:36, Ferruh Yigit wrote: > >> On 5/24/2018 10:20 AM, Andriy Berestovskyy wrote: >> Hi Shahaf, >> >>> On 23 May 2018, at 07:21, Shahaf Shuler wrote: >>> I think this patch addressing just small issue in a bigger problem. >>> The way I see it all application needs to specify is the max packet size it >>> expects to receive, nothing else(!). >> >> [...] >> >>> IMO The "jumbo_frame" bit can be set by the underlying PMD directly to the >>> device registers given the max_rx_pkt_len configuration. >> >> Sure, it can be deducted in PMD if max_rx_pkt_len is greater than the normal >> frame size. >> >> The background behind this patch was to fix some examples on some platforms >> by allowing them to just set the jumbo bit in config and let the DPDK to >> deduct the optimal jumbo max_rx_pkt_len. >> >> There was also another patch which fixed those examples, so they first query >> the max_rx_pkt_len and then pass it with the config: >> http://dpdk.org/commit/5e470a6654 >> >> That patch has been merged, so now we can fix/change the API in any way we >> decide, there is no urgency anymore. >> >> Looks like the jumbo bit in config is redundant, but there might be other >> opinions. > > Back to this old issue, the mentioned inconsistency is still exist in the > current code, and this or relevant ones mentioned a few times already. > > What would you think about developing an unit test on 19.05 to test these on > ethdev, and ask vendors to run it and fix failures in next releases? > A more TDD approach, first right the test that fails, later fix it. > If there is a support I can start writing it but will require support. > > > And related issues: > max_rx_pkt_len > DEV_RX_OFFLOAD_JUMBO_FRAME > DEV_TX_OFFLOAD_MULTI_SEGS > scattered_rx > mtu > > > These are provided by user as config option, but some drivers updates some of > them, initial question is, are they input only or can be modified by drivers. > > Like if user not requested JUMBO_FRAME but provided a large max_rx_pkt_len, > should user get an error or should PMD enable jumbo frame itself? > > > And another question around 'max_rx_pkt_len' / 'mtu', both are related and > close. 'max_rx_pkt_len' is frame size as far as I can understand, and since we > have capability to set 'mtu', this looks duplicate. > And I assume users are mostly will be interested in 'mtu', for given 'mtu' > driver can calculate 'max_rx_pkt_len' taking other config options into account > affecting frame size.
[dpdk-dev] [PATCH] keepalive: fix keepalive state alignment
The __rte_cache_aligned was applied to the whole array, not the array elements. This leads to a false sharing between the monitored cores. Fixes: e70a61ad50ab ("keepalive: export states") Cc: remy.hor...@intel.com Signed-off-by: Andriy Berestovskyy --- lib/librte_eal/common/rte_keepalive.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/lib/librte_eal/common/rte_keepalive.c b/lib/librte_eal/common/rte_keepalive.c index 7ddf201..a586e03 100644 --- a/lib/librte_eal/common/rte_keepalive.c +++ b/lib/librte_eal/common/rte_keepalive.c @@ -13,8 +13,13 @@ struct rte_keepalive { /** Core Liveness. */ - enum rte_keepalive_state __rte_cache_aligned state_flags[ - RTE_KEEPALIVE_MAXCORES]; + struct { + /* +* Each element of the state_flags table must be cache aligned +* to prevent false sharing. +*/ + enum rte_keepalive_state s __rte_cache_aligned; + } state_flags[RTE_KEEPALIVE_MAXCORES]; /** Last-seen-alive timestamps */ uint64_t last_alive[RTE_KEEPALIVE_MAXCORES]; @@ -67,19 +72,19 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer, if (keepcfg->active_cores[idx_core] == 0) continue; - switch (keepcfg->state_flags[idx_core]) { + switch (keepcfg->state_flags[idx_core].s) { case RTE_KA_STATE_UNUSED: break; case RTE_KA_STATE_ALIVE: /* Alive */ - keepcfg->state_flags[idx_core] = RTE_KA_STATE_MISSING; + keepcfg->state_flags[idx_core].s = RTE_KA_STATE_MISSING; keepcfg->last_alive[idx_core] = rte_rdtsc(); break; case RTE_KA_STATE_MISSING: /* MIA */ print_trace("Core MIA. ", keepcfg, idx_core); - keepcfg->state_flags[idx_core] = RTE_KA_STATE_DEAD; + keepcfg->state_flags[idx_core].s = RTE_KA_STATE_DEAD; break; case RTE_KA_STATE_DEAD: /* Dead */ - keepcfg->state_flags[idx_core] = RTE_KA_STATE_GONE; + keepcfg->state_flags[idx_core].s = RTE_KA_STATE_GONE; print_trace("Core died. ", keepcfg, idx_core); if (keepcfg->callback) keepcfg->callback( @@ -90,7 +95,7 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer, case RTE_KA_STATE_GONE: /* Buried */ break; case RTE_KA_STATE_DOZING: /* Core going idle */ - keepcfg->state_flags[idx_core] = RTE_KA_STATE_SLEEP; + keepcfg->state_flags[idx_core].s = RTE_KA_STATE_SLEEP; keepcfg->last_alive[idx_core] = rte_rdtsc(); break; case RTE_KA_STATE_SLEEP: /* Idled core */ @@ -100,7 +105,7 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer, keepcfg->relay_callback( keepcfg->relay_callback_data, idx_core, - keepcfg->state_flags[idx_core], + keepcfg->state_flags[idx_core].s, keepcfg->last_alive[idx_core] ); } @@ -144,11 +149,11 @@ rte_keepalive_register_core(struct rte_keepalive *keepcfg, const int id_core) void rte_keepalive_mark_alive(struct rte_keepalive *keepcfg) { - keepcfg->state_flags[rte_lcore_id()] = RTE_KA_STATE_ALIVE; + keepcfg->state_flags[rte_lcore_id()].s = RTE_KA_STATE_ALIVE; } void rte_keepalive_mark_sleep(struct rte_keepalive *keepcfg) { - keepcfg->state_flags[rte_lcore_id()] = RTE_KA_STATE_DOZING; + keepcfg->state_flags[rte_lcore_id()].s = RTE_KA_STATE_DOZING; } -- 2.7.4
Re: [dpdk-dev] [PATCH] keepalive: fix keepalive state alignment
Hey Harry, Thanks for the review. On Fri, Jan 19, 2018 at 6:31 PM, Van Haaren, Harry wrote: > These changes do reduce false-sharing however is there actually a performance > benefit? A lot of cache space will be taken up if each core requires its own > cache line, which will reduce performance again.. it's a tradeoff. 1. False sharing is happening in the data path vs loops in control paths. 2. The original code (prior e70a61ad50ab "keepalive: export states") had each element aligned to the cache line, not the whole array. > Little fix for a v2: "s" is not a good variable name for the > rte_keepalive_state, please use something more descriptive. Sure, if there are no more comments, I'll change it. Andriy
[dpdk-dev] [PATCH v2] keepalive: fix keepalive state alignment
The __rte_cache_aligned was applied to the whole array, not the array elements. This leads to a false sharing between the monitored cores. Fixes: e70a61ad50ab ("keepalive: export states") Cc: remy.hor...@intel.com Signed-off-by: Andriy Berestovskyy --- Notes (changelog): V2 Changes: - fixed struct name - fixed documentation doc/guides/sample_app_ug/keep_alive.rst | 2 +- lib/librte_eal/common/rte_keepalive.c | 28 ++-- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst index 38856d2..27ed2a8 100644 --- a/doc/guides/sample_app_ug/keep_alive.rst +++ b/doc/guides/sample_app_ug/keep_alive.rst @@ -168,5 +168,5 @@ The rte_keepalive_mark_alive function simply sets the core state to alive. static inline void rte_keepalive_mark_alive(struct rte_keepalive *keepcfg) { -keepcfg->state_flags[rte_lcore_id()] = ALIVE; +keepcfg->live_data[rte_lcore_id()].core_state = RTE_KA_STATE_ALIVE; } diff --git a/lib/librte_eal/common/rte_keepalive.c b/lib/librte_eal/common/rte_keepalive.c index 7ddf201..e0494b2 100644 --- a/lib/librte_eal/common/rte_keepalive.c +++ b/lib/librte_eal/common/rte_keepalive.c @@ -13,8 +13,12 @@ struct rte_keepalive { /** Core Liveness. */ - enum rte_keepalive_state __rte_cache_aligned state_flags[ - RTE_KEEPALIVE_MAXCORES]; + struct { + /* +* Each element must be cache aligned to prevent false sharing. +*/ + enum rte_keepalive_state core_state __rte_cache_aligned; + } live_data[RTE_KEEPALIVE_MAXCORES]; /** Last-seen-alive timestamps */ uint64_t last_alive[RTE_KEEPALIVE_MAXCORES]; @@ -67,19 +71,22 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer, if (keepcfg->active_cores[idx_core] == 0) continue; - switch (keepcfg->state_flags[idx_core]) { + switch (keepcfg->live_data[idx_core].core_state) { case RTE_KA_STATE_UNUSED: break; case RTE_KA_STATE_ALIVE: /* Alive */ - keepcfg->state_flags[idx_core] = RTE_KA_STATE_MISSING; + keepcfg->live_data[idx_core].core_state = + RTE_KA_STATE_MISSING; keepcfg->last_alive[idx_core] = rte_rdtsc(); break; case RTE_KA_STATE_MISSING: /* MIA */ print_trace("Core MIA. ", keepcfg, idx_core); - keepcfg->state_flags[idx_core] = RTE_KA_STATE_DEAD; + keepcfg->live_data[idx_core].core_state = + RTE_KA_STATE_DEAD; break; case RTE_KA_STATE_DEAD: /* Dead */ - keepcfg->state_flags[idx_core] = RTE_KA_STATE_GONE; + keepcfg->live_data[idx_core].core_state = + RTE_KA_STATE_GONE; print_trace("Core died. ", keepcfg, idx_core); if (keepcfg->callback) keepcfg->callback( @@ -90,7 +97,8 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer, case RTE_KA_STATE_GONE: /* Buried */ break; case RTE_KA_STATE_DOZING: /* Core going idle */ - keepcfg->state_flags[idx_core] = RTE_KA_STATE_SLEEP; + keepcfg->live_data[idx_core].core_state = + RTE_KA_STATE_SLEEP; keepcfg->last_alive[idx_core] = rte_rdtsc(); break; case RTE_KA_STATE_SLEEP: /* Idled core */ @@ -100,7 +108,7 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer, keepcfg->relay_callback( keepcfg->relay_callback_data, idx_core, - keepcfg->state_flags[idx_core], + keepcfg->live_data[idx_core].core_state, keepcfg->last_alive[idx_core] ); } @@ -144,11 +152,11 @@ rte_keepalive_register_core(struct rte_keepalive *keepcfg, const int id_core) void rte_keepalive_mark_alive(struct rte_keepalive *keepcfg) { - keepcfg->state_flags[rte_lcore_id()] = RTE_KA_STATE_ALIVE; + keepcfg->live_data[rte_lcore_id()].core_state = RTE_KA_STATE_ALIVE; } void rte_keepalive_mark_sleep(struct rte_keepalive *keepcfg) { - keepcfg->state_flags[rte_lcore_id()] = RTE_KA_STATE_DOZING; + keepcfg->live_data[rte_lcore_id()].core_state = RTE_KA_STATE_DOZING; } -- 2.7.4
Re: [dpdk-dev] cuckoo hash in dpdk
Hey Pragash, I am not the author of the code, but I guess it is done that way because modern compilers do recognize power of two constants and do substitute division and modulo operations with corresponding bit manipulations. Just try to compile a small program like the following: volatile unsigned a = 123, b, c; int main(int argc, char **argv) { b = a / 4; c = a % 4; printf("%x %x %x\n", a, b, c); } and then disassemble it with gdb: (gdb) disassemble /s main [...] 13 b = a / 4; 0x00400464 <+20>: shr$0x2,%eax 0x00400467 <+23>: mov%eax,0x200bd3(%rip)# 0x601040 14 c = a % 4; 0x0040046d <+29>: mov0x200bc5(%rip),%eax# 0x601038 0x00400473 <+35>: and$0x3,%eax 0x00400476 <+38>: mov%eax,0x200bc8(%rip)# 0x601044 [...] As you can see both division and modulo was substituted with "shr" and "and". So basically nowadays there is no need to worry about that and complicate code with explicit low-level optimizations. Hope that answers your question. Regards, Andriy On Wed, Aug 23, 2017 at 4:15 PM, Pragash Vijayaragavan wrote: > Hi, > > I got the chance to look at the cuckoo hash used in dpdk and have a query. > > would using division and modulo operations be slower than bitwise > operations on RTE_HASH_BUCKET_ENTRIES, specially since > RTE_HASH_BUCKET_ENTRIES is a power of 2. > For example, to do a modulo we can do a "AND" operation on > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo > filter for VPP and doing this gave a slight improvement in speed. > Is there any particular reason its done this way. > > Sorry if i am being wrong in any way, i was just curious. > > Thanks, > > Pragash Vijayaragavan > Grad Student at Rochester Institute of Technology > email : pxv3...@rit.edu > ph : 585 764 4662 -- Andriy Berestovskyy
Re: [dpdk-dev] cuckoo hash in dpdk
Hey Pragash, You can pass your own hash function to rte_hash_create() otherwise a default one will be used, see http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_cuckoo_hash.c#n281 The default hash function is rte_hash_crc() or in some cases rte_jhash(), see http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_cuckoo_hash.h#n61 You can find the implementation of rte_hash_crc() over here: http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_hash_crc.h#n588 Please note there is a separate mailing list for DPDK usage discussions: http://dpdk.org/ml/listinfo/users The dev@ list is mostly for patch reviews and RFCs... Andriy On Thu, Aug 24, 2017 at 8:54 PM, Pragash Vijayaragavan wrote: > Thats great, what about the hash functions. > > On 24 Aug 2017 10:54, "Andriy Berestovskyy" wrote: >> >> Hey Pragash, >> I am not the author of the code, but I guess it is done that way >> because modern compilers do recognize power of two constants and do >> substitute division and modulo operations with corresponding bit >> manipulations. >> >> Just try to compile a small program like the following: >> >> volatile unsigned a = 123, b, c; >> int main(int argc, char **argv) >> { >> b = a / 4; >> c = a % 4; >> printf("%x %x %x\n", a, b, c); >> } >> >> >> and then disassemble it with gdb: >> >> (gdb) disassemble /s main >> [...] >> 13 b = a / 4; >>0x00400464 <+20>: shr$0x2,%eax >>0x00400467 <+23>: mov%eax,0x200bd3(%rip)# 0x601040 >> >> >> 14 c = a % 4; >>0x0040046d <+29>: mov0x200bc5(%rip),%eax# 0x601038 >> >>0x00400473 <+35>: and$0x3,%eax >>0x00400476 <+38>: mov%eax,0x200bc8(%rip)# 0x601044 >> >> [...] >> >> As you can see both division and modulo was substituted with "shr" and >> "and". >> >> So basically nowadays there is no need to worry about that and >> complicate code with explicit low-level optimizations. Hope that >> answers your question. >> >> Regards, >> Andriy >> >> >> On Wed, Aug 23, 2017 at 4:15 PM, Pragash Vijayaragavan >> wrote: >> > Hi, >> > >> > I got the chance to look at the cuckoo hash used in dpdk and have a >> > query. >> > >> > would using division and modulo operations be slower than bitwise >> > operations on RTE_HASH_BUCKET_ENTRIES, specially since >> > RTE_HASH_BUCKET_ENTRIES is a power of 2. >> > For example, to do a modulo we can do a "AND" operation on >> > (RTE_HASH_BUCKET_ENTRIES - 1), which might be faster. We did a cuckoo >> > filter for VPP and doing this gave a slight improvement in speed. >> > Is there any particular reason its done this way. >> > >> > Sorry if i am being wrong in any way, i was just curious. >> > >> > Thanks, >> > >> > Pragash Vijayaragavan >> > Grad Student at Rochester Institute of Technology >> > email : pxv3...@rit.edu >> > ph : 585 764 4662 >> >> >> >> -- >> Andriy Berestovskyy -- Andriy Berestovskyy
Re: [dpdk-dev] Why cuckoo based hashing in DPDK library?
Hey Evgeny, Please see inline. On Thu, Aug 31, 2017 at 9:35 AM, Evgeny Agronsky wrote: > I'm basicly asking because of it's poor performance under high Well, it is not the academic cuckoo hash implementation, so the performance is not that bad and it also utilizes cache ;) Please have a look at this paper for more details: http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf > with universal hash functions? I'm simply curious, maybe you have some sort > of benchmarks. Once someone implement a Hopscotch for DPDK, we could run some benchmarks... ;) But sure, it would be great to have a faster hash implementation, since DPDK is all about performance... Andriy
[dpdk-dev] [PATCH] pktgen-dpdk: Add support for make O=OUTPUT option
Add support for make O=OUTPUT compile time option. Signed-off-by: Andriy Berestovskyy --- app/Makefile | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/app/Makefile b/app/Makefile index 9207d2b..88e8716 100644 --- a/app/Makefile +++ b/app/Makefile @@ -57,10 +57,10 @@ yy := $(shell $(ver_cmd) -yy) # $(info yy=$(yy)) ifeq ($(yy),17) -COMMON_PRE := $(RTE_SRCDIR)/../lib/common -LUA_PRE := $(RTE_SRCDIR)/../lib/lua/src -CLI_PRE := $(RTE_SRCDIR)/../lib/cli -GUI_PRE := $(RTE_SRCDIR)/../gui/gui +COMMON_PRE := $(RTE_OUTPUT)/../../lib/common +LUA_PRE := $(RTE_OUTPUT)/../../lib/lua/src +CLI_PRE := $(RTE_OUTPUT)/../../lib/cli +GUI_PRE := $(RTE_OUTPUT)/../../gui/gui else ifeq ($(yy),16) COMMON_PRE := $(RTE_SRCDIR)/../lib/common/lib/common -- 2.7.4
[dpdk-dev] Missing Outstanding Patches (By Me) In Patchwork
Hi Matthew, I hope that is what you are looking for: http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=37&state=*&archive=both You just click on Filters and there are few options... Andriy On Wed, Jan 20, 2016 at 6:20 AM, Matthew Hall wrote: > I have some outstanding minor patches which do not appear in Patchwork > anywhere I can see but the interface is also pretty confusing. > > Is there a way to find all patches by a person throughout time so I can see > what happened to them and check why they are not listed and also not merged > (that I am aware of anyway)? > > Sincerely, > Matthew. -- Andriy Berestovskyy
[dpdk-dev] [PKTGEN] additional terminal IO question
Hi Matthew, Every software has bugs. pktgen is a great tool and we appreciate it as is. I would prefer we discuss a patch rather that questioning a functionality implemented way ago... Andriy On Sat, Jan 23, 2016 at 3:48 AM, Matthew Hall wrote: > On Thu, Jan 21, 2016 at 05:35:00PM +0200, Arnon Warshavsky wrote: >> Keith, >> For the record, on my end (can only speak for myself) this is not a real >> problem. >> I work around it by using a different theme and live with it happily ever >> after. >> I just provided the input since I encountered it. >> >> /Arnon > > For me, breaking stuff with a black background to gain questionably useful > colors and/or themes seems like more overhead for cognition of the code for > not much benefit. > > This is going to break the tool people who use a Linux standard framebuffer > with no X also, isn't it? > > Matthew. -- Andriy Berestovskyy
[dpdk-dev] [PATCH] vhost-user: enable virtio 1.0
e_vhost/virtio-net.c >> > > b/lib/librte_vhost/virtio-net.c >> > > index a51327d..ee4650e 100644 >> > > --- a/lib/librte_vhost/virtio-net.c >> > > +++ b/lib/librte_vhost/virtio-net.c >> > > @@ -75,6 +75,7 @@ static struct virtio_net_config_ll *ll_root; >> > > (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ >> > > (1ULL << VIRTIO_NET_F_CTRL_RX) | \ >> > > (1ULL << VIRTIO_NET_F_MQ) | \ >> > > + (1ULL << VIRTIO_F_VERSION_1) | \ >> > > (1ULL << VHOST_F_LOG_ALL) | \ >> > > (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)) >> > > static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; >> > > @@ -477,17 +478,17 @@ set_features(struct vhost_device_ctx ctx, uint64_t >> > > *pu) >> > > return -1; >> > > >> > > dev->features = *pu; >> > > - if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) { >> > > - LOG_DEBUG(VHOST_CONFIG, >> > > - "(%"PRIu64") Mergeable RX buffers enabled\n", >> > > - dev->device_fh); >> > > + if (dev->features & >> > > + ((1 << VIRTIO_NET_F_MRG_RXBUF) | (1ULL << VIRTIO_F_VERSION_1))) { >> > > vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf); >> > > } else { >> > > - LOG_DEBUG(VHOST_CONFIG, >> > > - "(%"PRIu64") Mergeable RX buffers disabled\n", >> > > - dev->device_fh); >> > > vhost_hlen = sizeof(struct virtio_net_hdr); >> > > } >> > > + LOG_DEBUG(VHOST_CONFIG, >> > > + "(%"PRIu64") Mergeable RX buffers %s, virtio 1 %s\n", >> > > + dev->device_fh, >> > > + (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? "on" : >> > > "off", >> > > + (dev->features & (1ULL << VIRTIO_F_VERSION_1)) ? "on" : >> > > "off"); >> > > >> > > for (i = 0; i < dev->virt_qp_nb; i++) { >> > > uint16_t base_idx = i * VIRTIO_QNUM; >> > > -- >> > > 2.1.0 -- Andriy Berestovskyy
[dpdk-dev] ixgbe: ierrors counter spuriously increasing in DPDK 2.1
Yes Marcin, The issue was discussed here: http://dpdk.org/ml/archives/dev/2015-September/023229.html You can either fix the ierrors in ixgbe_dev_stats_get() or implement a workaround in your app getting the extended statistics and counting out some of extended counters from the ierrors. Here is an example: https://github.com/Juniper/contrail-vrouter/commit/72f6ca05ac81d0ca5e7eb93c6ffe7a93648c2b00#diff-99c1f65a00658c7d38b3d1b64cb5fd93R1306 Regards, Andriy On Wed, Oct 21, 2015 at 10:38 AM, Martin Weiser wrote: > Hi, > > with DPDK 2.1 we are seeing the ierrors counter increasing for 82599ES > ports without reason. Even directly after starting test-pmd the error > counter immediately is 1 without even a single packet being sent to the > device: > > ./testpmd -c 0xfe -n 4 -- --portmask 0x3 --interactive > ... > testpmd> show port stats all > > NIC statistics for port 0 > RX-packets: 0 RX-missed: 0 RX-bytes: 0 > RX-badcrc: 0 RX-badlen: 0 RX-errors: 1 > RX-nombuf: 0 > TX-packets: 0 TX-errors: 0 TX-bytes: 0 > > > NIC statistics for port 1 > RX-packets: 0 RX-missed: 0 RX-bytes: 0 > RX-badcrc: 0 RX-badlen: 0 RX-errors: 1 > RX-nombuf: 0 > TX-packets: 0 TX-errors: 0 TX-bytes: 0 > > > > When packet forwarding is started the ports perform normally and > properly forward all packets but a huge number of ierrors is counted: > > testpmd> start > ... > testpmd> show port stats all > > NIC statistics for port 0 > RX-packets: 9011857RX-missed: 0 RX-bytes: 5020932992 > RX-badcrc: 0 RX-badlen: 0 RX-errors: 9011753 > RX-nombuf: 0 > TX-packets: 9026250TX-errors: 0 TX-bytes: 2922375542 > > > NIC statistics for port 1 > RX-packets: 9026250RX-missed: 0 RX-bytes: 2922375542 > RX-badcrc: 0 RX-badlen: 0 RX-errors: 9026138 > RX-nombuf: 0 > TX-packets: 9011857TX-errors: 0 TX-bytes: 5020932992 > > > > When running the exact same test with DPDK version 2.0 no ierrors are > reported. > Is anyone else seeing strange ierrors being reported for Intel Niantic > cards with DPDK 2.1? > > Best regards, > Martin > -- Andriy Berestovskyy
[dpdk-dev] ixgbe: ierrors counter spuriously increasing in DPDK 2.1
Hi Martin, We agreed on the main point: it's an issue. IMO the implementation details are up to Maryam. There have been few patches, so I guess it will be fixed in 2.2. Andriy On Thu, Oct 22, 2015 at 9:46 AM, Martin Weiser wrote: > Hi Andriy, > > thank you for pointing this discussion out to me. I somehow missed it. > Unfortunately it looks like the discussion stopped after Maryam made a > good proposal so I will vote in on that and hopefully get things started > again. > > Best regards, > Martin > > > > On 21.10.15 17:53, Andriy Berestovskyy wrote: >> Yes Marcin, >> The issue was discussed here: >> http://dpdk.org/ml/archives/dev/2015-September/023229.html >> >> You can either fix the ierrors in ixgbe_dev_stats_get() or implement a >> workaround in your app getting the extended statistics and counting >> out some of extended counters from the ierrors. >> >> Here is an example: >> https://github.com/Juniper/contrail-vrouter/commit/72f6ca05ac81d0ca5e7eb93c6ffe7a93648c2b00#diff-99c1f65a00658c7d38b3d1b64cb5fd93R1306 >> >> Regards, >> Andriy >> >> On Wed, Oct 21, 2015 at 10:38 AM, Martin Weiser >> wrote: >>> Hi, >>> >>> with DPDK 2.1 we are seeing the ierrors counter increasing for 82599ES >>> ports without reason. Even directly after starting test-pmd the error >>> counter immediately is 1 without even a single packet being sent to the >>> device: >>> >>> ./testpmd -c 0xfe -n 4 -- --portmask 0x3 --interactive >>> ... >>> testpmd> show port stats all >>> >>> NIC statistics for port 0 >>> >>> RX-packets: 0 RX-missed: 0 RX-bytes: 0 >>> RX-badcrc: 0 RX-badlen: 0 RX-errors: 1 >>> RX-nombuf: 0 >>> TX-packets: 0 TX-errors: 0 TX-bytes: 0 >>> >>> >>> >>> NIC statistics for port 1 >>> >>> RX-packets: 0 RX-missed: 0 RX-bytes: 0 >>> RX-badcrc: 0 RX-badlen: 0 RX-errors: 1 >>> RX-nombuf: 0 >>> TX-packets: 0 TX-errors: 0 TX-bytes: 0 >>> >>> >>> >>> >>> When packet forwarding is started the ports perform normally and >>> properly forward all packets but a huge number of ierrors is counted: >>> >>> testpmd> start >>> ... >>> testpmd> show port stats all >>> >>> NIC statistics for port 0 >>> >>> RX-packets: 9011857RX-missed: 0 RX-bytes: 5020932992 >>> RX-badcrc: 0 RX-badlen: 0 RX-errors: 9011753 >>> RX-nombuf: 0 >>> TX-packets: 9026250TX-errors: 0 TX-bytes: 2922375542 >>> >>> >>> >>> NIC statistics for port 1 >>> #### >>> RX-packets: 9026250RX-missed: 0 RX-bytes: 2922375542 >>> RX-badcrc: 0 RX-badlen: 0 RX-errors: 9026138 >>> RX-nombuf: 0 >>> TX-packets: 9011857TX-errors: 0 TX-bytes: 5020932992 >>> >>> >>> >>> >>> When running the exact same test with DPDK version 2.0 no ierrors are >>> reported. >>> Is anyone else seeing strange ierrors being reported for Intel Niantic >>> cards with DPDK 2.1? >>> >>> Best regards, >>> Martin >>> >> >> > > -- Andriy Berestovskyy
[dpdk-dev] how to get driver name for a given port ID
Hi Francesco, You're on the right track. Please note that struct rte_eth_dev_info also has max_rx_queues field - maximum number of RX queues the NIC supports. Regards, Andriy On Tue, Oct 27, 2015 at 10:59 AM, Montorsi, Francesco wrote: > Hi, > Just as reference for other DPDK users: the solution to the problem is simple: > > rte_eth_dev_info_get (uint8_t port_id, struct rte_eth_dev_info *dev_info) > > returns a dev_info structure that contains "driver_name"... > > HTH, > Francesco > > > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Montorsi, >> Francesco >> Sent: luned? 26 ottobre 2015 15:18 >> To: dev at dpdk.org >> Subject: [dpdk-dev] how to get driver name for a given port ID >> >> Hi all, >> >> Is there an API to retrieve the driver name for a certain port ID before >> calling >> rte_eth_dev_configure()? >> >> My use case is: I'm trying to call rte_eth_dev_configure() with nb_rx_q=4 >> and found that this works for ixgbe driver but it doesn't for "rte_em_pmd" >> (1Gbps device): >> >> ERROR HwEmulDPDKPort::init() rte_eth_dev_configure: err=-22, port=0: >> Unknown error -22 >> EAL: PCI device :03:00.0 on NUMA socket 0 >> EAL: remove driver: 8086:105e rte_em_pmd >> EAL: PCI memory unmapped at 0x7feb4000 >> EAL: PCI memory unmapped at 0x7feb4002 >> >> So, for those devices I want to use nb_rx_q=1... >> >> Thanks, >> >> Francesco Montorsi > -- Andriy Berestovskyy
[dpdk-dev] ixgbe: account more Rx errors Issue
Hi, Updating to DPDK 2.1 I noticed an issue with the ixgbe stats. In commit f6bf669b9900 "ixgbe: account more Rx errors" we add XEC hardware counter (l3_l4_xsum_error) to the ierrors now. The issue is the UDP packets with zero check sum are counted in XEC and now in ierrors too. I've tried to disable hw_ip_checksum in rxmode, but it didn't help. I'm not sure we should add XEC to ierrors, because packets counted in XEC are not dropped by the NIC actually. So in my case ierrors counter is now greater than actual number of packets received by the NIC, which makes no sense. What's your opinion? Regards, Andriy
[dpdk-dev] ixgbe: account more Rx errors Issue
Hi Maryam, Please see below. > XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors Please note than UDP checksum is optional for IPv4, but UDP packets with zero checksum hit XEC. > And general crc errors counts Counts the number of receive packets with CRC > errors. Let me explain you with an example. DPDK 2.0 behavior: host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M ipackets + 1M ierrors (missed) = 10M DPDK 2.1 behavior: host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M? > So our options are we can: > 1. Add only one of these into the error stats. > 2. We can introduce some cooking of stats in this scenario, so only add > either or if they are equal or one is higher than the other. > 3. Add them all which means you can have more errors than the number of > received packets, but TBH this is going to be the case if your packets have > multiple errors anyway. 4. ierrors should reflect NIC drops only. XEC does not count drops, so IMO it should be removed from ierrors. Please note that we still can access the XEC using rte_eth_xstats_get() Regards, Andriy
[dpdk-dev] [PATCH] doc: announce KNI ethtool removal
Hi folks, Just to clarify. Thomas is talking about removing just the KNI ethtool (i.e. lib/librte_eal/linuxapp/kni/ethtool/*). The major functionality of those 45K lines of code is to get the same MAC address on the KNI interface and the underlying igb/ixgbe NIC. At the moment the rest of the DPDK eth devices work fine without the KNI ethtool. The workaround is very simple: use ifconfig or ip tool to set the same MAC you have on your NIC. Put it into your network configuration to make it permanent. Examples: ifconfig vEth0_0 hw ether or ip link set vEth0_0 address or in /etc/network/interfaces under the "iface vEth0_0" section add the following: hwaddress Andriy On Thu, Jul 21, 2016 at 10:54 PM, Jay Rolette wrote: > On Thu, Jul 21, 2016 at 3:32 PM, Thomas Monjalon 6wind.com> > wrote: > >> 2016-07-21 13:20, Jay Rolette: >> > On Thu, Jul 21, 2016 at 10:33 AM, Ferruh Yigit >> > wrote: >> > > KNI ethtool is functional and maintained, and it may have users! >> > > >> > > Why just removing it, specially without providing an alternative? >> > > Is is good time to discuss KCP again? >> > >> > Yes, my product uses it. >> >> Your product uses what? KCP? KNI? KNI ethtool? >> > > Sorry, that wasn't very clear. It uses KNI + ifconfig to configure the > device/interface in Linux. I'm assuming the "ethtool" bits under discussion > are the same things that make ifconfig work with KNI to the limited extent > it does. > >> Seems like we are back to the same discussion we >> > had a few months ago about the KNI situation... >> > >> > It shouldn't be removed unless there is a replacement, ideally one that >> > works with the normal Linux tools like every other network device. >> >> This ethtool module works only for igb and ixgbe! >> There is already no replacement for other drivers. >> Who works on a replacement? >> > > Ferruh submitted KCP previously, but you guys didn't like the fact that it > was a kernel module. IIRC, one of the gains from that was simplified > maintenance because you didn't need driver specific support for KNI. > Assuming he's still willing to beat it into shape, we have something that > is already most of the way there. > > If people are going to continue to block it because it is a kernel module, > then IMO, it's better to leave the existing support on igx / ixgbe in place > instead of stepping backwards to zero support for ethtool. > >> While the code wasn't ready at the time, it was a definite improvement >> over what >> > we have with KNI today. >> -- Andriy Berestovskyy
[dpdk-dev] [PATCH] doc: announce API change for virtual device initialization
Hey folks, > On 28 Jul 2016, at 17:47, De Lara Guarch, Pablo intel.com> wrote: > Fair enough. So you mean to use rte_eth_dev_attach in ethdev library and > a similar function in cryptodev library? There is a rte_eth_dev_get_port_by_name() which gets the port id right after the rte_eal_vdev_init() call. You might consider the same for the crypto... Regards, Andriy
[dpdk-dev] [dpdk-announce] DPDK 16.07 released
On behalf of contributors, thank you so much all the reviewers, maintainers and un tr?s grand merci ? Thomas for your great job, help and patience ;) Regards, Andriy > On 28 Jul 2016, at 23:39, Thomas Monjalon > wrote: > > Once again, a great release from the impressive DPDK community: >http://fast.dpdk.org/rel/dpdk-16.07.tar.xz > > The statistics are awesome: >955 patches from 115 authors >839 files changed, 127162 insertions(+), 24668 deletions(-) > > There are 50 new contributors > (including authors, reviewers and testers): > Thanks to Adam Bynes, Ajit Khaparde, Akhil Goyal, Alex Wang, > Amin Tootoonchian, Anupam Kapoor, Bj?rn T?pel, Chaeyong Chong, > David Christensen, Dmitriy Yakovlev, Dumitru Ceara, Eoin Breen, > Fengtian Guo, Guruprasad Mukundarao, Hiroyuki Mikita, Ian Stokes, > Ido Barnea, Jeff Guo, John Guzik, Juan Antonio Montesinos, > Juhamatti Kuusisaari, Maxime Coquelin, Michael Habibi, Nikhil Rao, > Patrik Andersson, Radoslaw Biernacki, Raslan Darawsheh, Ricardo Salveti, > Ricky Li, Ronghua Zhang, Sameh Gobriel, Sankar Chokkalingam, > Sergey Dyasly, Shreyansh Jain, Slawomir Rosek, Sony Chacko, Stephen Hurd, > Thadeu Lima de Souza Cascardo, Thomas Petazzoni, Tiwei Bie, > Vasily Philipov, Vincent Li, Wei Dai, WeiJie Zhuang, Wei Shen, > Xiaoban Wu, Xueqin Lin, Yari Adan Petralanda, Yongseok Koh, Zyta Szpak. > > These new contributors are associated with these domain names: > 6wind.com, awakenetworks.com, broadcom.com, caviumnetworks.com, > cisco.com, coriant.com, ericsson.com, free-electrons.com, gmail.com, > berkeley.edu, intel.com, linaro.org, mellanox.com, nxp.com, outlook.com, > qlogic.com, redhat.com, samsung.com, schaman.hu, semihalf.com, > shieldxnetworks.com, uml.edu, vmware.com. > > Some highlights: >* mempool reworked >* KASUMI crypto >* driver bnxt >* driver for ThunderX >* virtio for POWER8 >* virtio-user for containers >* vhost-user client mode >* packet capture framework > > More details in the release notes: >http://dpdk.org/doc/guides/rel_notes/release_16_07.html > > The new features for the 16.11 cycle must be submitted before August 28. > The features properly reviewed and approved before October will be part > of the next release which will be a bit shorter (3 months) than before. > > If you read until this line, please take few more minutes to fill out > this survey: http://surveymonkey.com/r/DPDK_Community_Survey > > Thanks everyone
[dpdk-dev] Vhost user no connection vm2vm
using on host/guest? In my case on host I >> >> had 3.13.0 and on guests old 3.2 debian. >> >> >> >> >> >> >> >>> I just looked deeper into virtio back-end (vhost) but at first glace >> it >> >> seems like nothing coming from virtio. >> >> >> >> >> >> >> >>> What I'm going to do today is to compile newest kernel for vhost and >> >> guest and debug where packet flow stuck, I will report the result >> >> >> >> >> >> >> >>> On Thu, May 21, 2015 at 11:12 AM, Gaohaifeng (A) < >> >> gaohaifeng.gao at huawei.com> wrote: >> >> >> >>> Hi Maciej >> >> >Did you solve your problem? I meet this problem as your case. >> >> And I found avail_idx(in rte_vhost_dequeue_burst function) is always >> zero >> >> although I do send packets in VM. >> >> >> >>> Thanks. >> >> >> >> >> >>> Hello, I have strange issue with example/vhost app. >> >>> >> >>> I had compiled DPDK to run a vhost example app with followed flags >> >>> >> >>> CONFIG_RTE_LIBRTE_VHOST=y >> >>> CONFIG_RTE_LIBRTE_VHOST_USER=y >> >>> CONFIG_RTE_LIBRTE_VHOST_DEBUG=n >> >>> >> >>> then I run vhost app based on documentation: >> >>> >> >>> ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge --socket-mem >> >>> 3712 >> >>> -- -p 0x1 --dev-basename usvhost --vm2vm 1 --stats 9 >> >>> >> >>> -I use this strange --socket-mem 3712 because of physical limit of >> >>> memoryon device -with this vhost user I run two KVM machines with >> >>> followed parameters >> >>> >> >>> kvm -nographic -boot c -machine pc-i440fx-1.4,accel=kvm -name vm1 -cpu >> >>> host -smp 2 -hda /home/ubuntu/qemu/debian_squeeze2_amd64.qcow2 -m >> >>> 1024 -mem-path /mnt/huge -mem-prealloc -chardev >> >>> socket,id=char1,path=/home/ubuntu/dpdk/examples/vhost/usvhost >> >>> -netdev type=vhost-user,id=hostnet1,chardev=char1 >> >>> -device virtio-net >> >>> pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6 >> >>> = >> >>> off,guest_ecn=off >> >>> -chardev >> >>> socket,id=char2,path=/home/ubuntu/dpdk/examples/vhost/usvhost >> >>> -netdev type=vhost-user,id=hostnet2,chardev=char2 >> >>> -device >> >>> virtio-net- >> >>> pci,netdev=hostnet2,id=net2,csum=off,gso=off,guest_tso4=off,guest_tso6 >> >>> = >> >>> off,guest_ecn=off >> >>> >> >>> After running KVM virtio correctly starting (below logs from vhost app) >> >> ... >> >>> VHOST_CONFIG: mapped region 0 fd:31 to 0x2aaabae0 sz:0xa >> >>> off:0x0 >> >>> VHOST_CONFIG: mapped region 1 fd:37 to 0x2aaabb00 sz:0x1000 >> >>> off:0xc >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK >> >>> VHOST_CONFIG: vring kick idx:0 file:38 >> >>> VHOST_CONFIG: virtio isn't ready for processing. >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR >> >>> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK >> >>> VHOST_CONFIG: vring kick idx:1 file:39 >> >>> VHOST_CONFIG: virtio is now ready for processing. >> >>> VHOST_DATA: (1) Device has been added to data core 2 >> >>> >> >>> So everything looking good. >> >>> >> >>> Maybe it is something trivial but using options: --vm2vm 1 (or) 2 >> >>> --stats 9 it seems that I didn't have connection between VM2VM >> >>> communication. I set manually IP for eth0 and eth1: >> >>> >> >>> on 1 VM >> >>> ifconfig eth0 192.168.0.100 netmask 255.255.255.0 up ifconfig eth1 >> >>> 192.168.1.101 netmask 255.255.255.0 up >> >>> >> >>> on 2 VM >> >>> ifconfig eth0 192.168.1.200 netmask 255.255.255.0 up ifconfig eth1 >> >>> 192.168.0.202 netmask 255.255.255.0 up >> >>> >> >>> I notice that in vhostapp are one directional rx/tx queue so I tryied >> >>> to ping between VM1 to VM2 using both interfaces ping -I eth0 >> >>> 192.168.1.200 ping -I >> >>> eth1 192.168.1.200 ping -I eth0 192.168.0.202 ping -I eth1 >> >>> 192.168.0.202 >> >>> >> >>> on VM2 using tcpdump on both interfaces I didn't see any ICMP requests >> >>> or traffic >> >>> >> >>> And I cant ping between any IP/interfaces, moreover stats show me that: >> >>> >> >>> Device statistics >> >>> Statistics for device 0 -- >> >>> TX total: 0 >> >>> TX dropped: 0 >> >>> TX successful: 0 >> >>> RX total: 0 >> >>> RX dropped: 0 >> >>> RX successful: 0 >> >>> Statistics for device 1 -- >> >>> TX total: 0 >> >>> TX dropped: 0 >> >>> TX successful: 0 >> >>> RX total: 0 >> >>> RX dropped: 0 >> >>> RX successful: 0 >> >>> Statistics for device 2 -- >> >>> TX total: 0 >> >>> TX dropped: 0 >> >>> TX successful: 0 >> >>> RX total: 0 >> >>> RX dropped: 0 >> >>> RX successful: 0 >> >>> Statistics for device 3 -- >> >>> TX total: 0 >> >>> TX dropped: 0 >> >>> TX successful: 0 >> >>> RX total: 0 >> >>> RX dropped: 0 >> >>> RX successful: 0 >> >>> == >> >>> >> >>> So it seems like any packet didn't leave my VM. >> >>> also arp table is empty on each VM. >> >> >> >> >> >> -- Andriy Berestovskyy
[dpdk-dev] Non-working TX IP checksum offload
Cze?? Angela, Make sure your NIC is configured properly as described in this thread: http://dpdk.org/ml/archives/dev/2015-May/018096.html Andriy On Fri, Jul 17, 2015 at 4:23 PM, Angela Czubak wrote: > Hi, > > I have some difficulties using ip checksum tx offload capabilities - I > think I set everything as advised by the API documentation, but > unfortunately the packet leaves the interface with its ip checksum still > being zero (it reaches its destination). > > What I do is: > buffer->ol_flags |= PKT_TX_IP_CKSUM|PKT_TX_IPV4; > ip_header->hdr_checksum = 0; > buffer->l3_len = sizeof(struct ipv4_hdr); > buffer->l2_len = sizeof(struct ether_hdr); > > In L4 there's UDP, which checksum is zeroed if that matters. > > Is there something I am missing? The NIC is Intel Corporation Ethernet > Controller X710 for 10GbE SFP+ (rev 01). > > What is more, is there any particular reason for assuming in > i40e_xmit_pkts that offloading checksums is unlikely (I mean the line no > 1307 "if (unlikely(ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK))" at > dpdk-2.0.0/lib/librte_pmd_i40e/i40e_rxtx.c)? > > Regards, > Angela -- Andriy Berestovskyy
[dpdk-dev] Free up completed TX buffers
Hi Zoltan, On Fri, May 29, 2015 at 7:00 PM, Zoltan Kiss wrote: > The easy way is just to increase your buffer pool's size to make > sure that doesn't happen. Go for it! > But there is no bulletproof way to calculate such > a number Yeah, there are many places for mbufs to stay :( I would try: Mempool size = sum(numbers of all TX descriptors) + sum(rx_free_thresh) + (mempool cache size * (number of lcores - 1)) + (burst size * number of lcores) > I'm thinking about a foolproof way, which is exposing functions like > ixgbe_tx_free_bufs from the PMDs, so the application can call it as a last > resort to avoid deadlock. Have a look at rte_eth_dev_tx_queue_stop()/start(). Some NICs (i.e. ixgbe) do reset the queue and free all the mbufs. Regards, Andriy
[dpdk-dev] About bond api lacp problem.
Hi, Basically, you have to make sure you call rte_eth_tx_burst() every 100 ms in your forwarding loop. Here is such an example: const uint64_t bond_tx_cycles = (rte_get_timer_hz() + MS_PER_S - 1) * 100 / MS_PER_S; uint64_t cur_bond_cycles, diff_cycles; uint64_t last_bond_tx_cycles = 0; /* Inside your forwarding loop: */ cur_bond_cycles = rte_get_timer_cycles(); diff_cycles = cur_bond_cycles - last_bond_tx_cycles; if (diff_cycles > bond_tx_cycles) { last_bond_tx_cycles = cur_bond_cycles; rte_eth_tx_burst(bond_port_id, 0, NULL, 0); } There is a user at dpdk.org mailing list, please address such questions there. Regards, Andriy On Sat, Apr 16, 2016 at 11:41 AM, yangbo wrote: > Hi, > > How to understand bond api comments: > > for LACP mode to work the rx/tx burst functions must be invoked at least once > every 100ms, otherwise the out-of-band LACP messages will not be handled with > the expected latency and this may cause the link status to be incorrectly > marked as down or failure to correctly negotiate with peers. > > > can any one give me example or more detail info ? > > I am extremely grateful for it. -- Andriy Berestovskyy
[dpdk-dev] Questions about reading/writing/modifying packet header.
Hi Ick-Sung, Please see inline. On Mon, Apr 18, 2016 at 2:14 PM, ??? wrote: > If I take an example, the worker assignment method using & (not %) in > load balancing was not fixed yet. If the code works, there is nothing to fix, right? ;) > Question #1) I would like to know how can I read/write/modify > TCP/UDP/ICMP/IGMP/... headers from packet in rte_mbuf. > I will really appreciate if I can be given an example code. I guess it > would be somewhat complex. For an example please have a look at parse_ethernet() in test-pmd: http://dpdk.org/browse/dpdk/tree/app/test-pmd/csumonly.c#n171 The example usage is in the same file: eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); parse_ethernet(eth_hdr, &info); l3_hdr = (char *)eth_hdr + info.l2_len; if (info.l4_proto == IPPROTO_UDP) { udp_hdr = (struct udp_hdr *)((char *)l3_hdr + info.l3_len); udp_hdr->dst_port = ... } Then you might need to recalculate the L4 checksum, so have a look at rte_ipv4_udptcp_cksum(). > Question #2) The IP checksum does not include 6 the ptr. 6 th ptr (ptr16[5]) > is missing in the example code. Is it right? > ( ip_cksum += ptr16[5]; in the following code.) The code seems fine, ptr16[5] is the checksum itself. It should be zero, so we can skip it. There is a users at dpdk.org mailing list now, so please use it for your further questions. Here is the link for your convenience: http://dpdk.org/ml Regards, Andriy
[dpdk-dev] Couple of PMD questions
Hi Jay, On Tue, Apr 19, 2016 at 10:16 PM, Jay Rolette wrote: > Should the driver error out in that case instead of only "sort of" working? +1, we hit the same issue. Error or log message would help. > If I support a max frame size of 9216 bytes (exactly a 1K multiple to make > the NIC happy), then max_rx_pkt_len is going to be 9216 and data_room_size > will be 9216 + RTE_PKTMBUF_HEADROOM. Try to set max_rx_pkt_len <= 9K - 8 and mempool element size to 9K + headroom + size of structures. > Is that check correct? Datasheet says: The MFS does not include the 4 bytes of the VLAN header. Packets with VLAN header can be as large as MFS + 4. When double VLAN is enabled, the device adds 8 to the MFS for any packets. Regards, Andriy
[dpdk-dev] Couple of PMD questions
ata_room_size parameter of >> > > > rte_pktmbuf_pool_create(): >> > > > >> > > > "Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM." >> > > > >> > > > >> > > > If I support a max frame size of 9216 bytes (exactly a 1K multiple >> > > > to >> > > make >> > > > the NIC happy), then max_rx_pkt_len is going to be 9216 and >> > > data_room_size >> > > > will be 9216 + RTE_PKTMBUF_HEADROOM. >> > > > >> > > > ixgbe_dev_rx_init() will calculate normalize that back to 9216, >> > > > which >> > > will >> > > > fail the dual VLAN length check. The really nasty part about that is >> > > > it >> > > has >> > > > a side-effect of enabling scattered RX regardless of the fact that I >> > > didn't >> > > > enable scattered RX in dev_conf.rxmode. >> > > > >> > > > The code in the e1000 PMD is similar, so nothing unique to ixgbe. >> > > > >> > > > Is that check correct? It seems wrong to be adding space for q-in-q >> > > > on >> > > top >> > > > of your specified max frame size... >> > > >> > > The issue here is what the correct behaviour needs to be. If we have >> > > the >> > > user >> > > specify the maximum frame size including all vlan tags, then we hit >> > > the >> > > problem >> > > where we have to subtract the VLAN tag sizes when writing the value to >> > > the >> > > NIC. >> > > In that case, we will hit a problem where we get a e.g. 9210 byte >> > > frame - >> > > to >> > > reuse your example - without any VLAN tags, which will be rejected by >> > > the >> > > hardware as being oversized. If we don't do the subtraction, and we >> > > get the >> > > same 9210 byte packet with 2 VLAN tags on it, the hardware will accept >> > > it >> > > and >> > > then split it across multiple descriptors because the actual DMA size >> > > is >> > > 9218 bytes. >> > > >> > >> > As an app developer, I didn't realize the max frame size didn't include >> > VLAN tags. I expected max frame size to be the size of the ethernet >> > frame >> > on the wire, which I would expect to include space used by any VLAN or >> > MPLS >> > tags. >> > >> > Is there anything in the docs or example apps about that? I did some >> > digging as I was debugging this and didn't notice it, but entirely >> > possible >> > I just missed it. >> > >> > >> > > I'm not sure there is a works-in-all-cases solution here. >> > > >> > >> > Andriy's suggestion seems like it points in the right direction. >> > >> > From an app developer point of view, I'd expect to have a single max >> > frame >> > size value to track and the APIs should take care of any adjustments >> > required internally. Maybe have rte_pktmbuf_pool_create() add the >> > additional bytes when it calls rte_mempool_create() under the covers? >> > Then >> > it's nice and clean for the API without unexpected side-effects. >> > >> >> It will still have unintended side-effects I think, depending on the >> resolution >> of the NIC buffer length paramters. For drivers like ixgbe or e1000, the >> mempool >> create call could potentially have to add an additional 1k to each buffer >> just >> to be able to store the extra eight bytes. > > > The comments in the ixgbe driver say that the value programmed into SRRCTL > must be on a 1K boundary. Based on your previous response, it sounded like > the NIC ignores that limit for VLAN tags, hence the check for the extra 8 > bytes on the mbuf element size. Are you worried about the size resolution on > mempool elements? > > Sounds like I've got to go spend some quality time in the NIC data sheets... > Maybe I should back up and just ask the higher level question: > > What's the right incantation in both the dev_conf structure and in creating > the mbuf pool to support jumbo frames of some particular size on the wire, > with or without VLAN tags, without requiring scattered_rx support in an app? > > Thanks, > Jay -- Andriy Berestovskyy
[dpdk-dev] how to search this mailing list? is gmane link archive broken?
Hey, You can try "site:" Google search operator, i.e. try to google: site:dpdk.org/ml/archives/dev/ Regards, Andriy On Mon, Nov 7, 2016 at 6:12 PM, Montorsi, Francesco wrote: > Hi all, > if this was already raised, sorry for that. > I noticed that the gmane archive for this mailing list is not working anymore: > > http://news.gmane.org/gmane.comp.networking.dpdk.devel > > reports "Page not found". Also I noticed that the gmane link on the dpdk.org > website has been removed. > That was my only way to search through the archives of this mailing list... > is there any other way to search them? > > Thanks, > > Francesco Montorsi > > > -- Andriy Berestovskyy