[dpdk-dev] [PATCH v6 03/16] fm10k: register fm10k pmd PF driver

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Add init function to scan and initialize fm10k PF device.
2. Add implementation to register fm10k pmd PF driver.
3. Add 3 functions fm10k_dev_configure, fm10k_stats_get and
   fm10k_stats_get.
4. Add fm10k.h to define macros and basic data structure.
5. Add fm10k_logs.h to control log message output.
6. Add Makefile.
7. Add ABI version of librte_pmd_fm10k

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
Signed-off-by: Michael Qiu 
---
 lib/librte_pmd_fm10k/Makefile  |   99 +++
 lib/librte_pmd_fm10k/fm10k.h   |  224 +++
 lib/librte_pmd_fm10k/fm10k_ethdev.c|  343 
 lib/librte_pmd_fm10k/fm10k_logs.h  |   78 ++
 lib/librte_pmd_fm10k/rte_pmd_fm10k_version.map |4 +
 5 files changed, 748 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_pmd_fm10k/Makefile
 create mode 100644 lib/librte_pmd_fm10k/fm10k.h
 create mode 100644 lib/librte_pmd_fm10k/fm10k_ethdev.c
 create mode 100644 lib/librte_pmd_fm10k/fm10k_logs.h
 create mode 100644 lib/librte_pmd_fm10k/rte_pmd_fm10k_version.map

diff --git a/lib/librte_pmd_fm10k/Makefile b/lib/librte_pmd_fm10k/Makefile
new file mode 100644
index 000..b24cc67
--- /dev/null
+++ b/lib/librte_pmd_fm10k/Makefile
@@ -0,0 +1,99 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_fm10k.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_fm10k_version.map
+
+LIBABIVER := 1
+
+ifeq ($(CC), icc)
+#
+# CFLAGS for icc
+#
+CFLAGS_BASE_DRIVER = -wd174 -wd593 -wd869 -wd981 -wd2259
+
+else ifeq ($(CC), clang)
+#
+## CFLAGS for clang
+#
+CFLAGS_BASE_DRIVER = -Wno-unused-parameter -Wno-unused-value
+CFLAGS_BASE_DRIVER += -Wno-strict-aliasing -Wno-format-extra-args
+CFLAGS_BASE_DRIVER += -Wno-unused-variable -Wno-unused-but-set-variable
+CFLAGS_BASE_DRIVER += -Wno-missing-field-initializers
+
+else
+#
+# CFLAGS for gcc
+#
+ifneq ($(shell test $(GCC_MAJOR_VERSION) -le 4 -a $(GCC_MINOR_VERSION) -le 3 
&& echo 1), 1)
+CFLAGS += -Wno-deprecated
+endif
+CFLAGS_BASE_DRIVER = -Wno-unused-parameter -Wno-unused-value
+CFLAGS_BASE_DRIVER += -Wno-strict-aliasing -Wno-format-extra-args
+CFLAGS_BASE_DRIVER += -Wno-unused-variable -Wno-unused-but-set-variable
+CFLAGS_BASE_DRIVER += -Wno-missing-field-initializers
+endif
+
+#
+# Add extra flags for base driver source files to disable warnings in them
+#
+BASE_DRIVER_OBJS=$(patsubst %.c,%.o,$(notdir $(wildcard 
$(RTE_SDK)/lib/librte_pmd_fm10k/base/*.c)))
+$(foreach obj, $(BASE_DRIVER_OBJS), $(eval 
CFLAGS_$(obj)+=$(CFLAGS_BASE_DRIVER)))
+
+VPATH += $(RTE_SDK)/lib/librte_pmd_fm10k/base
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_ethdev.c
+
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_pf.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_tlv.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_mempool lib/librte_mbuf

[dpdk-dev] [PATCH v6 04/16] config: change config files to add fm10k into compile

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Change config/common_bsdapp and config/common_linuxapp, add
   macros to control fm10k pmd driver compile for linux and bsd.
2. Change lib/Makefile to add fm10k driver into compile list.
3. Change mk/rte.app.mk to add fm10k lib into link.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 config/common_bsdapp   |   11 +++
 config/common_linuxapp |   11 +++
 lib/Makefile   |1 +
 mk/rte.app.mk  |4 
 4 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 57bacb8..8cfa4e6 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -182,6 +182,17 @@ CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4
 CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1

 #
+# Compile burst-oriented FM10K PMD
+#
+CONFIG_RTE_LIBRTE_FM10K_PMD=y
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
+
+#
 # Compile burst-oriented Cisco ENIC PMD driver
 #
 CONFIG_RTE_LIBRTE_ENIC_PMD=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d428f84..db8332d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -180,6 +180,17 @@ CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4
 CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1

 #
+# Compile burst-oriented FM10K PMD
+#
+CONFIG_RTE_LIBRTE_FM10K_PMD=y
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
+CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
+
+#
 # Compile burst-oriented Cisco ENIC PMD driver
 #
 CONFIG_RTE_LIBRTE_ENIC_PMD=y
diff --git a/lib/Makefile b/lib/Makefile
index d617d81..561a696 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -44,6 +44,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += librte_pmd_e1000
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += librte_pmd_ixgbe
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e
+DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += librte_pmd_fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += librte_pmd_enic
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 95dbb0b..d181eb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -211,6 +211,10 @@ ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y)
 LDLIBS += -lrte_pmd_i40e
 endif

+ifeq ($(CONFIG_RTE_LIBRTE_FM10K_PMD),y)
+LDLIBS += -lrte_pmd_fm10k
+endif
+
 ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y)
 LDLIBS += -lrte_pmd_ixgbe
 endif
-- 
1.7.7.6



[dpdk-dev] [PATCH v6 05/16] fm10k: add reta update/requery functions

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Add fm10k_reta_update and fm10k_reta_query functions.
2. Add fm10k_link_update and fm10k_dev_infos_get functions.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |  162 +++
 1 files changed, 162 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 0b75299..b3d4d79 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -44,6 +44,10 @@
 /* Default delay to acquire mailbox lock */
 #define FM10K_MBXLOCK_DELAY_US 20

+/* Number of chars per uint32 type */
+#define CHARS_PER_UINT32 (sizeof(uint32_t))
+#define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
+
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
 {
@@ -74,6 +78,22 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
return 0;
 }

+static int
+fm10k_link_update(struct rte_eth_dev *dev,
+   __rte_unused int wait_to_complete)
+{
+   PMD_INIT_FUNC_TRACE();
+
+   /* The host-interface link is always up.  The speed is ~50Gbps per Gen3
+* x8 PCIe interface. For now, we leave the speed undefined since there
+* is no 50Gbps Ethernet. */
+   dev->data->dev_link.link_speed  = 0;
+   dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX;
+   dev->data->dev_link.link_status = 1;
+
+   return 0;
+}
+
 static void
 fm10k_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
@@ -119,6 +139,144 @@ fm10k_stats_reset(struct rte_eth_dev *dev)
fm10k_rebind_hw_stats(hw, hw_stats);
 }

+static void
+fm10k_dev_infos_get(struct rte_eth_dev *dev,
+   struct rte_eth_dev_info *dev_info)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   PMD_INIT_FUNC_TRACE();
+
+   dev_info->min_rx_bufsize = FM10K_MIN_RX_BUF_SIZE;
+   dev_info->max_rx_pktlen  = FM10K_MAX_PKT_SIZE;
+   dev_info->max_rx_queues  = hw->mac.max_queues;
+   dev_info->max_tx_queues  = hw->mac.max_queues;
+   dev_info->max_mac_addrs  = 1;
+   dev_info->max_hash_mac_addrs = 0;
+   dev_info->max_vfs= FM10K_MAX_VF_NUM;
+   dev_info->max_vmdq_pools = ETH_64_POOLS;
+   dev_info->rx_offload_capa =
+   DEV_RX_OFFLOAD_IPV4_CKSUM |
+   DEV_RX_OFFLOAD_UDP_CKSUM  |
+   DEV_RX_OFFLOAD_TCP_CKSUM;
+   dev_info->tx_offload_capa= 0;
+   dev_info->reta_size = FM10K_MAX_RSS_INDICES;
+
+   dev_info->default_rxconf = (struct rte_eth_rxconf) {
+   .rx_thresh = {
+   .pthresh = FM10K_DEFAULT_RX_PTHRESH,
+   .hthresh = FM10K_DEFAULT_RX_HTHRESH,
+   .wthresh = FM10K_DEFAULT_RX_WTHRESH,
+   },
+   .rx_free_thresh = FM10K_RX_FREE_THRESH_DEFAULT(0),
+   .rx_drop_en = 0,
+   };
+
+   dev_info->default_txconf = (struct rte_eth_txconf) {
+   .tx_thresh = {
+   .pthresh = FM10K_DEFAULT_TX_PTHRESH,
+   .hthresh = FM10K_DEFAULT_TX_HTHRESH,
+   .wthresh = FM10K_DEFAULT_TX_WTHRESH,
+   },
+   .tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
+   .tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
+   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
+   ETH_TXQ_FLAGS_NOOFFLOADS,
+   };
+
+}
+
+static int
+fm10k_reta_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf,
+   uint16_t reta_size)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint16_t i, j, idx, shift;
+   uint8_t mask;
+   uint32_t reta;
+
+   PMD_INIT_FUNC_TRACE();
+
+   if (reta_size > FM10K_MAX_RSS_INDICES) {
+   PMD_INIT_LOG(ERR, "The size of hash lookup table configured "
+   "(%d) doesn't match the number hardware can supported "
+   "(%d)", reta_size, FM10K_MAX_RSS_INDICES);
+   return -EINVAL;
+   }
+
+   /*
+* Update Redirection Table RETA[n], n=0..31. The redirection table has
+* 128-entries in 32 registers
+*/
+   for (i = 0; i < FM10K_MAX_RSS_INDICES; i += CHARS_PER_UINT32) {
+   idx = i / RTE_RETA_GROUP_SIZE;
+   shift = i % RTE_RETA_GROUP_SIZE;
+   mask = (uint8_t)((reta_conf[idx].mask >> shift) &
+   BIT_MASK_PER_UINT32);
+   if (mask == 0)
+   continue;
+
+   reta = 0;
+   if (mask != BIT_MASK_PER_UINT32)
+   reta = FM10K_READ_REG(hw, FM10K_RETA(0, i >> 2));
+
+   for (j = 0; j < CHARS_PER_UINT32; j++) {
+   if (mask & (0x1 << j)) {
+  

[dpdk-dev] [PATCH v6 06/16] fm10k: add Rx queue setup/release function

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

Add fm10k_rx_queue_setup and fm10k_rx_queue_release functions.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |  254 +++
 1 files changed, 254 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index b3d4d79..8799c1a 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -41,6 +41,7 @@
 #include "fm10k.h"
 #include "base/fm10k_api.h"

+#define FM10K_RX_BUFF_ALIGN 512
 /* Default delay to acquire mailbox lock */
 #define FM10K_MBXLOCK_DELAY_US 20

@@ -67,6 +68,46 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
rte_spinlock_unlock(FM10K_DEV_PRIVATE_TO_MBXLOCK(hw->back));
 }

+/*
+ * clean queue, descriptor rings, free software buffers used when stopping
+ * device.
+ */
+static inline void
+rx_queue_clean(struct fm10k_rx_queue *q)
+{
+   union fm10k_rx_desc zero = {.q = {0, 0, 0, 0} };
+   uint32_t i;
+   PMD_INIT_FUNC_TRACE();
+
+   /* zero descriptor rings */
+   for (i = 0; i < q->nb_desc; ++i)
+   q->hw_ring[i] = zero;
+
+   /* free software buffers */
+   for (i = 0; i < q->nb_desc; ++i) {
+   if (q->sw_ring[i]) {
+   rte_pktmbuf_free_seg(q->sw_ring[i]);
+   q->sw_ring[i] = NULL;
+   }
+   }
+}
+
+/*
+ * free all queue memory used when releasing the queue (i.e. configure)
+ */
+static inline void
+rx_queue_free(struct fm10k_rx_queue *q)
+{
+   PMD_INIT_FUNC_TRACE();
+   if (q) {
+   PMD_INIT_LOG(DEBUG, "Freeing rx queue %p", q);
+   rx_queue_clean(q);
+   if (q->sw_ring)
+   rte_free(q->sw_ring);
+   rte_free(q);
+   }
+}
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -186,6 +227,217 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,

 }

+static inline int
+check_nb_desc(uint16_t min, uint16_t max, uint16_t mult, uint16_t request)
+{
+   if ((request < min) || (request > max) || ((request % mult) != 0))
+   return -1;
+   else
+   return 0;
+}
+
+/*
+ * Create a memzone for hardware descriptor rings. Malloc cannot be used since
+ * the physical address is required. If the memzone is already created, then
+ * this function returns a pointer to the existing memzone.
+ */
+static inline const struct rte_memzone *
+allocate_hw_ring(const char *driver_name, const char *ring_name,
+   uint8_t port_id, uint16_t queue_id, int socket_id,
+   uint32_t size, uint32_t align)
+{
+   char name[RTE_MEMZONE_NAMESIZE];
+   const struct rte_memzone *mz;
+
+   snprintf(name, sizeof(name), "%s_%s_%d_%d_%d",
+driver_name, ring_name, port_id, queue_id, socket_id);
+
+   /* return the memzone if it already exists */
+   mz = rte_memzone_lookup(name);
+   if (mz)
+   return mz;
+
+#ifdef RTE_LIBRTE_XEN_DOM0
+   return rte_memzone_reserve_bounded(name, size, socket_id, 0, align,
+  RTE_PGSIZE_2M);
+#else
+   return rte_memzone_reserve_aligned(name, size, socket_id, 0, align);
+#endif
+}
+
+static inline int
+check_thresh(uint16_t min, uint16_t max, uint16_t div, uint16_t request)
+{
+   if ((request < min) || (request > max) || ((div % request) != 0))
+   return -1;
+   else
+   return 0;
+}
+
+static inline int
+handle_rxconf(struct fm10k_rx_queue *q, const struct rte_eth_rxconf *conf)
+{
+   uint16_t rx_free_thresh;
+
+   if (conf->rx_free_thresh == 0)
+   rx_free_thresh = FM10K_RX_FREE_THRESH_DEFAULT(q);
+   else
+   rx_free_thresh = conf->rx_free_thresh;
+
+   /* make sure the requested threshold satisfies the constraints */
+   if (check_thresh(FM10K_RX_FREE_THRESH_MIN(q),
+   FM10K_RX_FREE_THRESH_MAX(q),
+   FM10K_RX_FREE_THRESH_DIV(q),
+   rx_free_thresh)) {
+   PMD_INIT_LOG(ERR, "rx_free_thresh (%u) must be "
+   "less than or equal to %u, "
+   "greater than or equal to %u, "
+   "and a divisor of %u",
+   rx_free_thresh, FM10K_RX_FREE_THRESH_MAX(q),
+   FM10K_RX_FREE_THRESH_MIN(q),
+   FM10K_RX_FREE_THRESH_DIV(q));
+   return (-EINVAL);
+   }
+
+   q->alloc_thresh = rx_free_thresh;
+   q->drop_en = conf->rx_drop_en;
+   q->rx_deferred_start = conf->rx_deferred_start;
+
+   return 0;
+}
+
+/*
+ * Hardware requires specific alignment for Rx packet buffers. At
+ * least one of the following two conditions must be satisfied.
+ *  1. Address is 512B aligned
+ *  2. Address is 8B aligned and buffer does not cross 4K boundary.
+ *
+ * As such, the driver 

[dpdk-dev] [PATCH v6 07/16] fm10k: add Tx queue setup/release function

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

Add fm10k_tx_queue_setup and fm10k_tx_queue_release functions.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |  205 +++
 1 files changed, 205 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 8799c1a..47bfe59 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -108,6 +108,48 @@ rx_queue_free(struct fm10k_rx_queue *q)
}
 }

+/*
+ * clean queue, descriptor rings, free software buffers used when stopping
+ * device
+ */
+static inline void
+tx_queue_clean(struct fm10k_tx_queue *q)
+{
+   struct fm10k_tx_desc zero = {0, 0, 0, 0, 0, 0};
+   uint32_t i;
+   PMD_INIT_FUNC_TRACE();
+
+   /* zero descriptor rings */
+   for (i = 0; i < q->nb_desc; ++i)
+   q->hw_ring[i] = zero;
+
+   /* free software buffers */
+   for (i = 0; i < q->nb_desc; ++i) {
+   if (q->sw_ring[i]) {
+   rte_pktmbuf_free_seg(q->sw_ring[i]);
+   q->sw_ring[i] = NULL;
+   }
+   }
+}
+
+/*
+ * free all queue memory used when releasing the queue (i.e. configure)
+ */
+static inline void
+tx_queue_free(struct fm10k_tx_queue *q)
+{
+   PMD_INIT_FUNC_TRACE();
+   if (q) {
+   PMD_INIT_LOG(DEBUG, "Freeing tx queue %p", q);
+   tx_queue_clean(q);
+   if (q->rs_tracker.list)
+   rte_free(q->rs_tracker.list);
+   if (q->sw_ring)
+   rte_free(q->sw_ring);
+   rte_free(q);
+   }
+}
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -438,6 +480,167 @@ fm10k_rx_queue_release(void *queue)
rx_queue_free(queue);
 }

+static inline int
+handle_txconf(struct fm10k_tx_queue *q, const struct rte_eth_txconf *conf)
+{
+   uint16_t tx_free_thresh;
+   uint16_t tx_rs_thresh;
+
+   /* constraint MACROs require that tx_free_thresh is configured
+* before tx_rs_thresh */
+   if (conf->tx_free_thresh == 0)
+   tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(q);
+   else
+   tx_free_thresh = conf->tx_free_thresh;
+
+   /* make sure the requested threshold satisfies the constraints */
+   if (check_thresh(FM10K_TX_FREE_THRESH_MIN(q),
+   FM10K_TX_FREE_THRESH_MAX(q),
+   FM10K_TX_FREE_THRESH_DIV(q),
+   tx_free_thresh)) {
+   PMD_INIT_LOG(ERR, "tx_free_thresh (%u) must be "
+   "less than or equal to %u, "
+   "greater than or equal to %u, "
+   "and a divisor of %u",
+   tx_free_thresh, FM10K_TX_FREE_THRESH_MAX(q),
+   FM10K_TX_FREE_THRESH_MIN(q),
+   FM10K_TX_FREE_THRESH_DIV(q));
+   return (-EINVAL);
+   }
+
+   q->free_thresh = tx_free_thresh;
+
+   if (conf->tx_rs_thresh == 0)
+   tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(q);
+   else
+   tx_rs_thresh = conf->tx_rs_thresh;
+
+   q->tx_deferred_start = conf->tx_deferred_start;
+
+   /* make sure the requested threshold satisfies the constraints */
+   if (check_thresh(FM10K_TX_RS_THRESH_MIN(q),
+   FM10K_TX_RS_THRESH_MAX(q),
+   FM10K_TX_RS_THRESH_DIV(q),
+   tx_rs_thresh)) {
+   PMD_INIT_LOG(ERR, "tx_rs_thresh (%u) must be "
+   "less than or equal to %u, "
+   "greater than or equal to %u, "
+   "and a divisor of %u",
+   tx_rs_thresh, FM10K_TX_RS_THRESH_MAX(q),
+   FM10K_TX_RS_THRESH_MIN(q),
+   FM10K_TX_RS_THRESH_DIV(q));
+   return (-EINVAL);
+   }
+
+   q->rs_thresh = tx_rs_thresh;
+
+   return 0;
+}
+
+static int
+fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id,
+   uint16_t nb_desc, unsigned int socket_id,
+   const struct rte_eth_txconf *conf)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_tx_queue *q;
+   const struct rte_memzone *mz;
+
+   PMD_INIT_FUNC_TRACE();
+
+   /* make sure a valid number of descriptors have been requested */
+   if (check_nb_desc(FM10K_MIN_TX_DESC, FM10K_MAX_TX_DESC,
+   FM10K_MULT_TX_DESC, nb_desc)) {
+   PMD_INIT_LOG(ERR, "Number of Tx descriptors (%u) must be "
+   "less than or equal to %"PRIu32", "
+   "greater than or equal to %u, "
+   "and a multiple of %u",
+   nb_desc, (uint32_t)FM10K_MAX_TX_DESC, FM10K_MIN_TX_DESC,
+   

[dpdk-dev] [PATCH v6 09/16] fm10k: add dev start/stop functions

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Add function to initialize RX queues.
2. Add function to initialize TX queues.
3. Add fm10k_dev_start, fm10k_dev_stop and fm10k_dev_close
   functions.
4. Add function to close mailbox service.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |  224 +++
 1 files changed, 224 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 0fb7b95..b79badc 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -44,11 +44,14 @@
 #define FM10K_RX_BUFF_ALIGN 512
 /* Default delay to acquire mailbox lock */
 #define FM10K_MBXLOCK_DELAY_US 20
+#define UINT64_LOWER_32BITS_MASK 0xULL

 /* Number of chars per uint32 type */
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

+static void fm10k_close_mbx_service(struct fm10k_hw *hw);
+
 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
 {
@@ -268,6 +271,98 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
 }

 static int
+fm10k_dev_tx_init(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int i, ret;
+   struct fm10k_tx_queue *txq;
+   uint64_t base_addr;
+   uint32_t size;
+
+   /* Disable TXINT to avoid possible interrupt */
+   for (i = 0; i < hw->mac.max_queues; i++)
+   FM10K_WRITE_REG(hw, FM10K_TXINT(i),
+   3 << FM10K_TXINT_TIMER_SHIFT);
+
+   /* Setup TX queue */
+   for (i = 0; i < dev->data->nb_tx_queues; ++i) {
+   txq = dev->data->tx_queues[i];
+   base_addr = txq->hw_ring_phys_addr;
+   size = txq->nb_desc * sizeof(struct fm10k_tx_desc);
+
+   /* disable queue to avoid issues while updating state */
+   ret = tx_queue_disable(hw, i);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "failed to disable queue %d", i);
+   return -1;
+   }
+
+   /* set location and size for descriptor ring */
+   FM10K_WRITE_REG(hw, FM10K_TDBAL(i),
+   base_addr & UINT64_LOWER_32BITS_MASK);
+   FM10K_WRITE_REG(hw, FM10K_TDBAH(i),
+   base_addr >> (CHAR_BIT * sizeof(uint32_t)));
+   FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
+   }
+   return 0;
+}
+
+static int
+fm10k_dev_rx_init(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int i, ret;
+   struct fm10k_rx_queue *rxq;
+   uint64_t base_addr;
+   uint32_t size;
+   uint32_t rxdctl = FM10K_RXDCTL_WRITE_BACK_MIN_DELAY;
+   uint16_t buf_size;
+   struct rte_pktmbuf_pool_private *mbp_priv;
+
+   /* Disable RXINT to avoid possible interrupt */
+   for (i = 0; i < hw->mac.max_queues; i++)
+   FM10K_WRITE_REG(hw, FM10K_RXINT(i),
+   3 << FM10K_RXINT_TIMER_SHIFT);
+
+   /* Setup RX queues */
+   for (i = 0; i < dev->data->nb_rx_queues; ++i) {
+   rxq = dev->data->rx_queues[i];
+   base_addr = rxq->hw_ring_phys_addr;
+   size = rxq->nb_desc * sizeof(union fm10k_rx_desc);
+
+   /* disable queue to avoid issues while updating state */
+   ret = rx_queue_disable(hw, i);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "failed to disable queue %d", i);
+   return -1;
+   }
+
+   /* Setup the Base and Length of the Rx Descriptor Ring */
+   FM10K_WRITE_REG(hw, FM10K_RDBAL(i),
+   base_addr & UINT64_LOWER_32BITS_MASK);
+   FM10K_WRITE_REG(hw, FM10K_RDBAH(i),
+   base_addr >> (CHAR_BIT * sizeof(uint32_t)));
+   FM10K_WRITE_REG(hw, FM10K_RDLEN(i), size);
+
+   /* Configure the Rx buffer size for one buff without split */
+   mbp_priv = rte_mempool_get_priv(rxq->mp);
+   buf_size = (uint16_t) (mbp_priv->mbuf_data_room_size -
+   RTE_PKTMBUF_HEADROOM);
+   FM10K_WRITE_REG(hw, FM10K_SRRCTL(i),
+   buf_size >> FM10K_SRRCTL_BSIZEPKT_SHIFT);
+
+   /* Enable drop on empty, it's RO for VF */
+   if (hw->mac.type == fm10k_mac_pf && rxq->drop_en)
+   rxdctl |= FM10K_RXDCTL_DROP_ON_EMPTY;
+
+   FM10K_WRITE_REG(hw, FM10K_RXDCTL(i), rxdctl);
+   FM10K_WRITE_FLUSH(hw);
+   }
+
+   return 0;
+}
+
+static int
 fm10k_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ 

[dpdk-dev] [PATCH v6 13/16] fm10k: add function to set vlan

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

Add fm10k_vlan_filter_set to set vlan.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 923f23c..12394e5 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -787,6 +787,20 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,

 }

+static int
+fm10k_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   PMD_INIT_FUNC_TRACE();
+
+   /* @todo - add support for the VF */
+   if (hw->mac.type != fm10k_mac_pf)
+   return -ENOTSUP;
+
+   return fm10k_update_vlan(hw, vlan_id, 0, on);
+}
+
 static inline int
 check_nb_desc(uint16_t min, uint16_t max, uint16_t mult, uint16_t request)
 {
@@ -1388,6 +1402,7 @@ static struct eth_dev_ops fm10k_eth_dev_ops = {
.stats_reset= fm10k_stats_reset,
.link_update= fm10k_link_update,
.dev_infos_get  = fm10k_dev_infos_get,
+   .vlan_filter_set= fm10k_vlan_filter_set,
.rx_queue_start = fm10k_dev_rx_queue_start,
.rx_queue_stop  = fm10k_dev_rx_queue_stop,
.tx_queue_start = fm10k_dev_tx_queue_start,
-- 
1.7.7.6



[dpdk-dev] [PATCH v6 11/16] fm10k: add PF RSS support

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Configure RSS in fm10k_dev_rx_init function.
2. Add fm10k_rss_hash_update and fm10k_rss_hash_conf_get to get
   and inquery RSS configuration.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |  156 +++
 1 files changed, 156 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 7451a44..0f4d339 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -270,6 +270,78 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
return 0;
 }

+static void
+fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct rte_eth_conf *dev_conf = >data->dev_conf;
+   uint32_t mrqc, *key, i, reta, j;
+   uint64_t hf;
+
+#define RSS_KEY_SIZE 40
+   static uint8_t rss_intel_key[RSS_KEY_SIZE] = {
+   0x6D, 0x5A, 0x56, 0xDA, 0x25, 0x5B, 0x0E, 0xC2,
+   0x41, 0x67, 0x25, 0x3D, 0x43, 0xA3, 0x8F, 0xB0,
+   0xD0, 0xCA, 0x2B, 0xCB, 0xAE, 0x7B, 0x30, 0xB4,
+   0x77, 0xCB, 0x2D, 0xA3, 0x80, 0x30, 0xF2, 0x0C,
+   0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA,
+   };
+
+   if (dev->data->nb_rx_queues == 1 ||
+   dev_conf->rxmode.mq_mode != ETH_MQ_RX_RSS ||
+   dev_conf->rx_adv_conf.rss_conf.rss_hf == 0)
+   return;
+
+   /* random key is rss_intel_key (default) or user provided (rss_key) */
+   if (dev_conf->rx_adv_conf.rss_conf.rss_key == NULL)
+   key = (uint32_t *)rss_intel_key;
+   else
+   key = (uint32_t *)dev_conf->rx_adv_conf.rss_conf.rss_key;
+
+   /* Now fill our hash function seeds, 4 bytes at a time */
+   for (i = 0; i < RSS_KEY_SIZE / sizeof(*key); ++i)
+   FM10K_WRITE_REG(hw, FM10K_RSSRK(0, i), key[i]);
+
+   /*
+* Fill in redirection table
+* The byte-swap is needed because NIC registers are in
+* little-endian order.
+*/
+   reta = 0;
+   for (i = 0, j = 0; i < FM10K_RETA_SIZE; i++, j++) {
+   if (j == dev->data->nb_rx_queues)
+   j = 0;
+   reta = (reta << CHAR_BIT) | j;
+   if ((i & 3) == 3)
+   FM10K_WRITE_REG(hw, FM10K_RETA(0, i >> 2),
+   rte_bswap32(reta));
+   }
+
+   /*
+* Generate RSS hash based on packet types, TCP/UDP
+* port numbers and/or IPv4/v6 src and dst addresses
+*/
+   hf = dev_conf->rx_adv_conf.rss_conf.rss_hf;
+   mrqc = 0;
+   mrqc |= (hf & ETH_RSS_IPV4_TCP)? FM10K_MRQC_TCP_IPV4 : 0;
+   mrqc |= (hf & ETH_RSS_IPV4)? FM10K_MRQC_IPV4 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6)? FM10K_MRQC_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_EX) ? FM10K_MRQC_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_TCP)? FM10K_MRQC_TCP_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_TCP_EX) ? FM10K_MRQC_TCP_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV4_UDP)? FM10K_MRQC_UDP_IPV4 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_UDP)? FM10K_MRQC_UDP_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_UDP_EX) ? FM10K_MRQC_UDP_IPV6 : 0;
+
+   if (mrqc == 0) {
+   PMD_INIT_LOG(ERR, "Specified RSS mode 0x%"PRIx64"is not"
+   "supported", hf);
+   return;
+   }
+
+   FM10K_WRITE_REG(hw, FM10K_MRQC(0), mrqc);
+}
+
 static int
 fm10k_dev_tx_init(struct rte_eth_dev *dev)
 {
@@ -359,6 +431,8 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
FM10K_WRITE_FLUSH(hw);
}

+   /* Configure RSS if applicable */
+   fm10k_dev_mq_rx_configure(dev);
return 0;
 }

@@ -1164,6 +1238,86 @@ fm10k_reta_query(struct rte_eth_dev *dev,
return 0;
 }

+static int
+fm10k_rss_hash_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_conf *rss_conf)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t *key = (uint32_t *)rss_conf->rss_key;
+   uint32_t mrqc;
+   uint64_t hf = rss_conf->rss_hf;
+   int i;
+
+   PMD_INIT_FUNC_TRACE();
+
+   if (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
+   FM10K_RSSRK_ENTRIES_PER_REG)
+   return -EINVAL;
+
+   if (hf == 0)
+   return -EINVAL;
+
+   mrqc = 0;
+   mrqc |= (hf & ETH_RSS_IPV4_TCP)? FM10K_MRQC_TCP_IPV4 : 0;
+   mrqc |= (hf & ETH_RSS_IPV4)? FM10K_MRQC_IPV4 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6)? FM10K_MRQC_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_EX) ? FM10K_MRQC_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_TCP)? FM10K_MRQC_TCP_IPV6 : 0;
+   mrqc |= (hf & ETH_RSS_IPV6_TCP_EX) ? FM10K_MRQC_TCP_IPV6 : 0;
+   mrqc |= (hf & 

[dpdk-dev] [PATCH v6 14/16] fm10k: add SRIOV-VF support

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

fm10k pmd driver will support both PF and VF device with single
copy of code. The reason is NIC maps registers with same
function in PF and VF to same PCI I/O address. Then, PF/VF drivers
use same address to access registers belonging to it, HW will
translate the request to correct units.

For some functionalities that are unique to PF, driver will check
current driver type and behave correctly.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 12394e5..8261fe8 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -1562,6 +1562,7 @@ eth_fm10k_dev_init(__rte_unused struct eth_driver 
*eth_drv,
  */
 static struct rte_pci_id pci_id_fm10k_map[] = {
 #define RTE_PCI_DEV_ID_DECL_FM10K(vend, dev) { RTE_PCI_DEVICE(vend, dev) },
+#define RTE_PCI_DEV_ID_DECL_FM10KVF(vend, dev) { RTE_PCI_DEVICE(vend, dev) },
 #include "rte_pci_dev_ids.h"
{ .vendor_id = 0, /* sentinel */ },
 };
-- 
1.7.7.6



[dpdk-dev] [PATCH v6 10/16] fm10k: add receive and tranmit function

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Add fm10k_recv_pkts and fm10k_xmit_pkts functions.
2. Link app function pointer to actual fm10k recv/xmit
   functions.
3. Change Makefile to compile new file fm10k_rxtx.c

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/Makefile   |1 +
 lib/librte_pmd_fm10k/fm10k.h|7 +
 lib/librte_pmd_fm10k/fm10k_ethdev.c |2 +
 lib/librte_pmd_fm10k/fm10k_rxtx.c   |  317 +++
 4 files changed, 327 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_pmd_fm10k/fm10k_rxtx.c

diff --git a/lib/librte_pmd_fm10k/Makefile b/lib/librte_pmd_fm10k/Makefile
index b24cc67..986f4ef 100644
--- a/lib/librte_pmd_fm10k/Makefile
+++ b/lib/librte_pmd_fm10k/Makefile
@@ -83,6 +83,7 @@ VPATH += $(RTE_SDK)/lib/librte_pmd_fm10k/base
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx.c

 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_pf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_tlv.c
diff --git a/lib/librte_pmd_fm10k/fm10k.h b/lib/librte_pmd_fm10k/fm10k.h
index b23a3a6..b2ff10e 100644
--- a/lib/librte_pmd_fm10k/fm10k.h
+++ b/lib/librte_pmd_fm10k/fm10k.h
@@ -279,4 +279,11 @@ fm10k_addr_alignment_valid(struct rte_mbuf *mb)

return 0;
 }
+
+/* Rx and Tx prototypes */
+uint16_t fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts);
+
+uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
 #endif
diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index b79badc..7451a44 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -1244,6 +1244,8 @@ eth_fm10k_dev_init(__rte_unused struct eth_driver 
*eth_drv,
PMD_INIT_FUNC_TRACE();

dev->dev_ops = _eth_dev_ops;
+   dev->rx_pkt_burst = _recv_pkts;
+   dev->tx_pkt_burst = _xmit_pkts;

/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/lib/librte_pmd_fm10k/fm10k_rxtx.c 
b/lib/librte_pmd_fm10k/fm10k_rxtx.c
new file mode 100644
index 000..022bfe6
--- /dev/null
+++ b/lib/librte_pmd_fm10k/fm10k_rxtx.c
@@ -0,0 +1,317 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include 
+#include 
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#ifdef RTE_PMD_PACKET_PREFETCH
+#define rte_packet_prefetch(p)  rte_prefetch1(p)
+#else
+#define rte_packet_prefetch(p)  do {} while (0)
+#endif
+
+static inline void dump_rxd(union fm10k_rx_desc *rxd)
+{
+#ifndef RTE_LIBRTE_FM10K_DEBUG_RX
+   RTE_SET_USED(rxd);
+#endif
+   PMD_RX_LOG(DEBUG, "+|+");
+   PMD_RX_LOG(DEBUG, "| GLORT  | PKT HDR & TYPE |");
+   PMD_RX_LOG(DEBUG, "|   0x%08x   |   0x%08x   |", rxd->d.glort,
+   rxd->d.data);
+   PMD_RX_LOG(DEBUG, "+|+");
+   PMD_RX_LOG(DEBUG, "|   VLAN & LEN   | STATUS |");
+   PMD_RX_LOG(DEBUG, "|   0x%08x   |   0x%08x   |", rxd->d.vlan_len,
+   rxd->d.staterr);
+   PMD_RX_LOG(DEBUG, "+|+");
+   PMD_RX_LOG(DEBUG, "|  

[dpdk-dev] [PATCH v6 16/16] maintainers: claim for fm10k review

2015-02-17 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

Claim for fm10k polling mode driver review.

Signed-off-by: Chen Jing D(Mark) 
---
 MAINTAINERS |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a771fa3..e7a425b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -213,6 +213,10 @@ Intel i40e
 M: Helin Zhang 
 F: lib/librte_pmd_i40e/

+Intel fm10k
+M: Jing Chen 
+F: lib/librte_pmd_fm10k/
+
 RedHat virtio
 M: Changchun Ouyang 
 F: lib/librte_pmd_virtio/
-- 
1.7.7.6



[dpdk-dev] [PATCH v6 15/16] fm10k: add PF and VF interrupt handling function

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Add functions to enable PF/VF interrupt.
2. Add function to process error message passed from interrupt.
2. Add 2 interrupt handling functions, one for PF and one for VF.
2. Enable interrupt after completing initialization of NIC.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c |  268 +++
 1 files changed, 268 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 8261fe8..38f6925 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -1344,6 +1344,256 @@ fm10k_rss_hash_conf_get(struct rte_eth_dev *dev,
return 0;
 }

+static void
+fm10k_dev_enable_intr_pf(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t int_map = FM10K_INT_MAP_IMMEDIATE;
+
+   /* Bind all local non-queue interrupt to vector 0 */
+   int_map |= 0;
+
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_Mailbox), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_PCIeFault), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchUpDown), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SwitchEvent), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_SRAM), int_map);
+   FM10K_WRITE_REG(hw, FM10K_INT_MAP(fm10k_int_VFLR), int_map);
+
+   /* Enable misc causes */
+   FM10K_WRITE_REG(hw, FM10K_EIMR, FM10K_EIMR_ENABLE(PCA_FAULT) |
+   FM10K_EIMR_ENABLE(THI_FAULT) |
+   FM10K_EIMR_ENABLE(FUM_FAULT) |
+   FM10K_EIMR_ENABLE(MAILBOX) |
+   FM10K_EIMR_ENABLE(SWITCHREADY) |
+   FM10K_EIMR_ENABLE(SWITCHNOTREADY) |
+   FM10K_EIMR_ENABLE(SRAMERROR) |
+   FM10K_EIMR_ENABLE(VFLR));
+
+   /* Enable ITR 0 */
+   FM10K_WRITE_REG(hw, FM10K_ITR(0), FM10K_ITR_AUTOMASK |
+   FM10K_ITR_MASK_CLEAR);
+   FM10K_WRITE_FLUSH(hw);
+}
+
+static void
+fm10k_dev_enable_intr_vf(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t int_map = FM10K_INT_MAP_IMMEDIATE;
+
+   /* Bind all local non-queue interrupt to vector 0 */
+   int_map |= 0;
+
+   /* Only INT 0 available, other 15 are reserved. */
+   FM10K_WRITE_REG(hw, FM10K_VFINT_MAP, int_map);
+
+   /* Enable ITR 0 */
+   FM10K_WRITE_REG(hw, FM10K_VFITR(0), FM10K_ITR_AUTOMASK |
+   FM10K_ITR_MASK_CLEAR);
+   FM10K_WRITE_FLUSH(hw);
+}
+
+static int
+fm10k_dev_handle_fault(struct fm10k_hw *hw, uint32_t eicr)
+{
+   struct fm10k_fault fault;
+   int err;
+   const char *estr = "Unknown error";
+
+   /* Process PCA fault */
+   if (eicr & FM10K_EIMR_PCA_FAULT) {
+   err = fm10k_get_fault(hw, FM10K_PCA_FAULT, );
+   if (err)
+   goto error;
+   switch (fault.type) {
+   case PCA_NO_FAULT:
+   estr = "PCA_NO_FAULT"; break;
+   case PCA_UNMAPPED_ADDR:
+   estr = "PCA_UNMAPPED_ADDR"; break;
+   case PCA_BAD_QACCESS_PF:
+   estr = "PCA_BAD_QACCESS_PF"; break;
+   case PCA_BAD_QACCESS_VF:
+   estr = "PCA_BAD_QACCESS_VF"; break;
+   case PCA_MALICIOUS_REQ:
+   estr = "PCA_MALICIOUS_REQ"; break;
+   case PCA_POISONED_TLP:
+   estr = "PCA_POISONED_TLP"; break;
+   case PCA_TLP_ABORT:
+   estr = "PCA_TLP_ABORT"; break;
+   default:
+   goto error;
+   }
+   PMD_INIT_LOG(ERR, "%s: %s(%d) Addr:0x%"PRIx64" Spec: 0x%x",
+   estr, fault.func ? "VF" : "PF", fault.func,
+   fault.address, fault.specinfo);
+   }
+
+   /* Process THI fault */
+   if (eicr & FM10K_EIMR_THI_FAULT) {
+   err = fm10k_get_fault(hw, FM10K_THI_FAULT, );
+   if (err)
+   goto error;
+   switch (fault.type) {
+   case THI_NO_FAULT:
+   estr = "THI_NO_FAULT"; break;
+   case THI_MAL_DIS_Q_FAULT:
+   estr = "THI_MAL_DIS_Q_FAULT"; break;
+   default:
+   goto error;
+   }
+   PMD_INIT_LOG(ERR, "%s: %s(%d) Addr:0x%"PRIx64" Spec: 0x%x",
+   estr, fault.func ? "VF" : "PF", fault.func,
+   fault.address, fault.specinfo);
+   }
+
+   /* Process FUM fault */
+   if (eicr & FM10K_EIMR_FUM_FAULT) {
+   

[dpdk-dev] [PATCH v6 12/16] fm10k: add scatter receive function

2015-02-17 Thread Chen Jing D(Mark)
From: Jeff Shaw 

1. Add fm10k_recv_scattered_pkts function to receive jumbo frame
   and multi-segment packets.
2. Configure correct receive function in rx_init and dev_init.

Signed-off-by: Jeff Shaw 
Signed-off-by: Chen Jing D(Mark) 
---
 lib/librte_pmd_fm10k/fm10k.h|3 +
 lib/librte_pmd_fm10k/fm10k_ethdev.c |   15 
 lib/librte_pmd_fm10k/fm10k_rxtx.c   |  145 +++
 3 files changed, 163 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k.h b/lib/librte_pmd_fm10k/fm10k.h
index b2ff10e..0e31796 100644
--- a/lib/librte_pmd_fm10k/fm10k.h
+++ b/lib/librte_pmd_fm10k/fm10k.h
@@ -284,6 +284,9 @@ fm10k_addr_alignment_valid(struct rte_mbuf *mb)
 uint16_t fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
+   struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
 #endif
diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 0f4d339..923f23c 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -423,6 +423,13 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
FM10K_WRITE_REG(hw, FM10K_SRRCTL(i),
buf_size >> FM10K_SRRCTL_BSIZEPKT_SHIFT);

+   /* It adds dual VLAN length for supporting dual VLAN */
+   if ((dev->data->dev_conf.rxmode.max_rx_pkt_len +
+   2 * FM10K_VLAN_TAG_SIZE) > buf_size){
+   dev->data->scattered_rx = 1;
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+   }
+
/* Enable drop on empty, it's RO for VF */
if (hw->mac.type == fm10k_mac_pf && rxq->drop_en)
rxdctl |= FM10K_RXDCTL_DROP_ON_EMPTY;
@@ -431,6 +438,11 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
FM10K_WRITE_FLUSH(hw);
}

+   if (dev->data->dev_conf.rxmode.enable_scatter) {
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+   dev->data->scattered_rx = 1;
+   }
+
/* Configure RSS if applicable */
fm10k_dev_mq_rx_configure(dev);
return 0;
@@ -1403,6 +1415,9 @@ eth_fm10k_dev_init(__rte_unused struct eth_driver 
*eth_drv,
dev->rx_pkt_burst = _recv_pkts;
dev->tx_pkt_burst = _xmit_pkts;

+   if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = _recv_scattered_pkts;
+
/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
diff --git a/lib/librte_pmd_fm10k/fm10k_rxtx.c 
b/lib/librte_pmd_fm10k/fm10k_rxtx.c
index 022bfe6..bf1c537 100644
--- a/lib/librte_pmd_fm10k/fm10k_rxtx.c
+++ b/lib/librte_pmd_fm10k/fm10k_rxtx.c
@@ -195,6 +195,151 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
return count;
 }

+uint16_t
+fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct rte_mbuf *mbuf;
+   union fm10k_rx_desc desc;
+   struct fm10k_rx_queue *q = rx_queue;
+   uint16_t count = 0;
+   uint16_t nb_rcv, nb_seg;
+   int alloc = 0;
+   uint16_t next_dd;
+   struct rte_mbuf *first_seg = q->pkt_first_seg;
+   struct rte_mbuf *last_seg = q->pkt_last_seg;
+   int ret;
+
+   next_dd = q->next_dd;
+   nb_rcv = 0;
+
+   nb_seg = RTE_MIN(nb_pkts, q->alloc_thresh);
+   for (count = 0; count < nb_seg; count++) {
+   mbuf = q->sw_ring[next_dd];
+   desc = q->hw_ring[next_dd];
+   if (!(desc.d.staterr & FM10K_RXD_STATUS_DD))
+   break;
+#ifdef RTE_LIBRTE_FM10K_DEBUG_RX
+   dump_rxd();
+#endif
+
+   if (++next_dd == q->nb_desc) {
+   next_dd = 0;
+   alloc = 1;
+   }
+
+   /* Prefetch next mbuf while processing current one. */
+   rte_prefetch0(q->sw_ring[next_dd]);
+
+   /*
+* When next RX descriptor is on a cache-line boundary,
+* prefetch the next 4 RX descriptors and the next 8 pointers
+* to mbufs.
+*/
+   if ((next_dd & 0x3) == 0) {
+   rte_prefetch0(>hw_ring[next_dd]);
+   rte_prefetch0(>sw_ring[next_dd]);
+   }
+
+   /* Fill data length */
+   rte_pktmbuf_data_len(mbuf) = desc.w.length;
+
+   /*
+* If this is the first buffer of the received packet,
+* set the pointer to the first mbuf of the packet and
+* initialize its context.
+* Otherwise, update the total length and the number of 

[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Thomas Monjalon
2015-02-17 12:25, Bruce Richardson:
> On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > 2015-02-16 15:16, Bruce Richardson:
> > > On Mon, Feb 16, 2015 at 03:33:40PM +0100, Olivier MATZ wrote:
> > > > Hi John,
> > > > 
> > > > On 02/13/2015 04:39 PM, John McNamara wrote:
> > > > > From: Richardson, Bruce 
> > > > > 
> > > > > Example showing how callbacks can be used to insert a timestamp
> > > > > into each packet on RX. On TX the timestamp is used to calculate
> > > > > the packet latency through the app, in cycles.
> > > > > 
> > > > > Signed-off-by: Bruce Richardson 
> > > > 
> > > > 
> > > > I'm looking at the example and I don't understand what is the advantage
> > > > of having callbacks in ethdev layer, knowing that the application can
> > > > do the same job by a standard function call.
> > > > 
> > > > What is the advantage of having callbacks compared to:
> > > > 
> > > > 
> > > > for (port = 0; port < nb_ports; port++) {
> > > > struct rte_mbuf *bufs[BURST_SIZE];
> > > > const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
> > > > bufs, BURST_SIZE);
> > > > if (unlikely(nb_rx == 0))
> > > > continue;
> > > > add_timestamp(bufs, nb_rx);
> > > > 
> > > > const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
> > > > bufs, nb_rx);
> > > > calc_latency(bufs, nb_tx);
> > > > 
> > > > if (unlikely(nb_tx < nb_rx)) {
> > > > uint16_t buf;
> > > > for (buf = nb_tx; buf < nb_rx; buf++)
> > > > rte_pktmbuf_free(bufs[buf]);
> > > > }
> > > > }
> > > > 
> > > > 
> > > > To me, doing like the code above has several advantages:
> > > > 
> > > > - code is more readable: the callback is explicitly invoked, so there is
> > > >   no risk to forget it
> > > > - code is faster: the functions calls can be inlined by the compiler
> > > > - easier to handle error cases in the callback function as the error
> > > >   code is accessible to the application
> > > > - there is no need to add code in ethdev api to do this
> > > > - if the application does not want to use callbacks (I suppose most
> > > >   applications), it won't have any performance impact
> > > > 
> > > > Regards,
> > > > Olivier
> > > 
> > > In this specific instance, given that the application does little else, 
> > > there
> > > is no real advantage to using the callbacks - it's just to have a simple 
> > > example
> > > of how they can be used.
> > > 
> > > Where callbacks are really designed to be useful, is for extending or 
> > > augmenting
> > > hardware capabilities. Taking the example of sequence numbers - to use 
> > > the most
> > > trivial example - an application could be written to take advantage of 
> > > sequence
> > > numbers written to packets by the hardware which received them. However, 
> > > if such
> > > an application was to be used with a NIC which does not provide sequence 
> > > numbering
> > > capability, for example, anything using ixgbe driver, the application 
> > > writer has
> > > two choices - either modify his application code to check each packet for
> > > a sequence number in the data path, and add it there post-rx, or 
> > > alternatively,
> > > to check the NIC capabilities at initialization time, and add a callback 
> > > there
> > > at initialization, if the hardware does not support it. In the latter 
> > > case,
> > > the main packet processing body of the application can be written as 
> > > though
> > > hardware always has sequence numbering capability, safe in the knowledge 
> > > that
> > > any hardware not supporting it will be back-filled by a software fallback 
> > > at 
> > > initialization-time.
> > > 
> > > By the same token, we could also look to extend hardware capabilities. For
> > > different filtering or hashing capabilities, there can be limits in 
> > > hardware
> > > which are far less than what we need to use in software. Again, callbacks 
> > > will
> > > allow the data path to be written in a way that is oblivious to the 
> > > underlying
> > > hardware limits, because software will transparently fill in the gaps.
> > > 
> > > Hope this makes the use case clear.
> > 
> > After thinking more about these callbacks, I realize these callbacks won't
> > help, as Olivier said.
> > 
> > With callback,
> > 1/ application checks device capability
> > 2/ application provides hardware emulation as DPDK callback
> > 3/ application forgets previous steps
> > 4/ application calls DPDK Rx
> > 5/ DPDK calls callback (without calling optimization)
> > 
> > Without callback,
> > 1/ application checks device capability
> > 2/ application provides hardware emulation as internal function
> > 3/ application set an internal device-flag to enable this function
> > 4/ application calls DPDK Rx
> > 5/ application calls the hardware emulation if flag is set
> > 
> > So the only difference is to keep persistent the device 

[dpdk-dev] [PATCH v2 0/2] new headroom stats library and example application

2015-02-17 Thread Pawel Wodkowski
Hi community,
I would like to introduce library for measuring load of some arbitrary jobs. It
can be used to profile every kind of job sets on any arbitrary execution unit or
tasking library.

In provided l2fwd-headroom example I demonstrate how to use this library to 
select optimal rx burst poll time. Jobs are selected by using existing rte_timer
library calls. This example does no limit possible schemes on which this library
can be used.

PATCH v2 changes:
 - Remove jobs management/callback from library to not duplicate tasking library
   behaviour.
 - Cleenup/remove useless statistics.
 - Rework example application to use rte_timer library for jobs selection.
 - Introduce new app parameter '-l' for automatic thousands separating in stats.
 - More readable statistics format.

Pawel Wodkowski (2):
  librte_headroom: New library for checking core/system/app load
  examples: introduce new l2fwd-headroom example

 config/common_bsdapp |5 +
 config/common_linuxapp   |5 +
 examples/Makefile|1 +
 examples/l2fwd-headroom/Makefile |   51 ++
 examples/l2fwd-headroom/main.c   | 1039 ++
 lib/Makefile |1 +
 lib/librte_headroom/Makefile |   54 ++
 lib/librte_headroom/rte_headroom.c   |  271 +++
 lib/librte_headroom/rte_headroom.h   |  324 
 lib/librte_headroom/rte_headroom_version.map |   20 +
 mk/rte.app.mk|4 +
 11 files changed, 1775 insertions(+)
 create mode 100644 examples/l2fwd-headroom/Makefile
 create mode 100644 examples/l2fwd-headroom/main.c
 create mode 100644 lib/librte_headroom/Makefile
 create mode 100644 lib/librte_headroom/rte_headroom.c
 create mode 100644 lib/librte_headroom/rte_headroom.h
 create mode 100644 lib/librte_headroom/rte_headroom_version.map

-- 
1.7.9.5



[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Neil Horman
On Tue, Feb 17, 2015 at 01:50:58PM +, Bruce Richardson wrote:
> On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote:
> > Hi Bruce,
> > 
> > On 02/17/2015 01:25 PM, Bruce Richardson wrote:
> > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > >>2015-02-16 15:16, Bruce Richardson:
> > >>>In this specific instance, given that the application does little else, 
> > >>>there
> > >>>is no real advantage to using the callbacks - it's just to have a simple 
> > >>>example
> > >>>of how they can be used.
> > >>>
> > >>>Where callbacks are really designed to be useful, is for extending or 
> > >>>augmenting
> > >>>hardware capabilities. Taking the example of sequence numbers - to use 
> > >>>the most
> > >>>trivial example - an application could be written to take advantage of 
> > >>>sequence
> > >>>numbers written to packets by the hardware which received them. However, 
> > >>>if such
> > >>>an application was to be used with a NIC which does not provide sequence 
> > >>>numbering
> > >>>capability, for example, anything using ixgbe driver, the application 
> > >>>writer has
> > >>>two choices - either modify his application code to check each packet for
> > >>>a sequence number in the data path, and add it there post-rx, or 
> > >>>alternatively,
> > >>>to check the NIC capabilities at initialization time, and add a callback 
> > >>>there
> > >>>at initialization, if the hardware does not support it. In the latter 
> > >>>case,
> > >>>the main packet processing body of the application can be written as 
> > >>>though
> > >>>hardware always has sequence numbering capability, safe in the knowledge 
> > >>>that
> > >>>any hardware not supporting it will be back-filled by a software 
> > >>>fallback at
> > >>>initialization-time.
> > >>>
> > >>>By the same token, we could also look to extend hardware capabilities. 
> > >>>For
> > >>>different filtering or hashing capabilities, there can be limits in 
> > >>>hardware
> > >>>which are far less than what we need to use in software. Again, 
> > >>>callbacks will
> > >>>allow the data path to be written in a way that is oblivious to the 
> > >>>underlying
> > >>>hardware limits, because software will transparently fill in the gaps.
> > >>>
> > >>>Hope this makes the use case clear.
> > >>
> > >>After thinking more about these callbacks, I realize these callbacks won't
> > >>help, as Olivier said.
> > >>
> > >>With callback,
> > >>1/ application checks device capability
> > >>2/ application provides hardware emulation as DPDK callback
> > >>3/ application forgets previous steps
> > >>4/ application calls DPDK Rx
> > >>5/ DPDK calls callback (without calling optimization)
> > >>
> > >>Without callback,
> > >>1/ application checks device capability
> > >>2/ application provides hardware emulation as internal function
> > >>3/ application set an internal device-flag to enable this function
> > >>4/ application calls DPDK Rx
> > >>5/ application calls the hardware emulation if flag is set
> > >>
> > >>So the only difference is to keep persistent the device information in
> > >>the application instead of storing it as a function pointer in the
> > >>DPDK struct.
> > >>You can also be faster with this approach: at initialization time,
> > >>you can check that your NIC supports the feature and use a specific
> > >>mainloop that adds or not the sequence number without any runtime
> > >>test.
> > >
> > >That is assuming that all NICs are equal on your system. It's also assuming
> > >that you only have a single point in your application where you call RX or
> > >TX burst. In the case where you have a couple of different NICs on the 
> > >system,
> > >or where you want to write an application to take advantage of 
> > >capabilities of
> > >different NICs, the ability to resolve all these difference at 
> > >initialization
> > >time is useful. The main packet handling code can be written with just the
> > >processing of packets in mind, rather than having to have a set of branches
> > >after each RX burst call, or before each TX burst call, to "smooth out" the
> > >different NIC capabilities.
> > >
> > >As for the option of maintaining different main loops for different NICs 
> > >with
> > >different capabilities - that sounds like a maintenance nightmare to
> > >me, due to duplicated code! Callbacks is a far cleaner solution than that 
> > >IMHO.
> > 
> > Why not just provide a function like this:
> > 
> >   rte_do_unsupported_stuff_by_software(m[], m_count, wanted_features,
> > dev_feature_flags)
> > 
> > This function can be called (or not) from the application mainloop.
> > You don't need to maintain several mainloops (for each device) as
> > the specific work will be done depending on the given flags. And the
> > applications that do not require these features (most applications?)
> > are not penalized at all.
> 
> Have you measured the performance hit due to this proposed change? In my tests
> it's very, very small, even for the fastest 

[dpdk-dev] [PATCH v2 1/2] librte_headroom: New library for checking core/system/app load

2015-02-17 Thread Pawel Wodkowski
This library provide API to measure time spend in particular parts of
code and to calculate optimal polling time.

To calculate a those statistics application code need to be devided into
parts (called jobs) that do something. It is up to application to decide
what is considered a job.

Series of jobs must be surrounded with the rte_headroom_start_loop() and
rte_headroom_finish_loop() calls. After that, jobs might be started.
Each job must be surrounded with rte_headroom_start_job() and
rte_headroom_finish_job() calls.

After job finish its execution, period in which it should be called
again is adjusted to minimize time wasted on unnecessary polls/calls.
Adjustmend is based on data provided by job itself (ex: number of
packets it processed).

After all jobs in serie are executed fallowing statistics are updated
and might be used by application. Statistics can be reset. Some of
provided statistic data:
 - total/min/max execution - time spent in executing jobs.
 - total/min/max management - time spent outside execution area. This
value might used to measure overhead of sheduling jobs. This time also
contains overhead of headroom library itself.
 - number of loops that executed at least one job
 - executed jobs
 - time when statistics were reset.

Each job provide total/min/max execution time and execution count
statistics.

Signed-off-by: Pawel Wodkowski 
---
 config/common_bsdapp |5 +
 config/common_linuxapp   |5 +
 lib/Makefile |1 +
 lib/librte_headroom/Makefile |   54 +
 lib/librte_headroom/rte_headroom.c   |  271 +
 lib/librte_headroom/rte_headroom.h   |  324 ++
 lib/librte_headroom/rte_headroom_version.map |   20 ++
 7 files changed, 680 insertions(+)
 create mode 100644 lib/librte_headroom/Makefile
 create mode 100644 lib/librte_headroom/rte_headroom.c
 create mode 100644 lib/librte_headroom/rte_headroom.h
 create mode 100644 lib/librte_headroom/rte_headroom_version.map

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 57bacb8..aa2e5fd 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -282,6 +282,11 @@ CONFIG_RTE_LIBRTE_HASH=y
 CONFIG_RTE_LIBRTE_HASH_DEBUG=n

 #
+# Compile librte_headroom
+#
+CONFIG_RTE_LIBRTE_HEADROOM=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d428f84..055a37b 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -290,6 +290,11 @@ CONFIG_RTE_LIBRTE_HASH=y
 CONFIG_RTE_LIBRTE_HASH_DEBUG=n

 #
+# Compile librte_headroom
+#
+CONFIG_RTE_LIBRTE_HEADROOM=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/lib/Makefile b/lib/Makefile
index d617d81..4fc2819 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -54,6 +54,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
+DIRS-$(CONFIG_RTE_LIBRTE_HEADROOM) += librte_headroom
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
diff --git a/lib/librte_headroom/Makefile b/lib/librte_headroom/Makefile
new file mode 100644
index 000..faefb3b
--- /dev/null
+++ b/lib/librte_headroom/Makefile
@@ -0,0 +1,54 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 

[dpdk-dev] [PATCH] testpmd: initialize rx_fc_en and tx_fc_en to zero

2015-02-17 Thread Pablo de Lara
rx_fc_en and tx_fc_en in cmd_link_flow_ctrl_set_parsed
could be used without being initialized.

Signed-off-by: Pablo de Lara 
---
 app/test-pmd/cmdline.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b2aab40..d52ba89 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5117,7 +5117,7 @@ cmd_link_flow_ctrl_set_parsed(void *parsed_result,
struct cmd_link_flow_ctrl_set_result *res = parsed_result;
cmdline_parse_inst_t *cmd = data;
struct rte_eth_fc_conf fc_conf;
-   int rx_fc_en, tx_fc_en;
+   int rx_fc_en, tx_fc_en = 0;
int ret;

/*
-- 
1.7.4.1



[dpdk-dev] [PATCH v3 1/5] ethdev: add rx interrupt enable/disable functions

2015-02-17 Thread Neil Horman
On Tue, Feb 17, 2015 at 09:47:15PM +0800, Zhou Danny wrote:
> v3 changes
> - Add return value for interrupt enable/disable functions
> 
> Add two dev_ops functions to enable and disable rx queue interrupts
> 
> Signed-off-by: Danny Zhou 
> Tested-by: Yong Liu 
> ---
>  lib/librte_ether/rte_ethdev.c | 43 
>  lib/librte_ether/rte_ethdev.h | 57 
> +++
>  2 files changed, 100 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index ea3a1fb..d27469a 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2825,6 +2825,49 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   }
>   rte_spinlock_unlock(_eth_dev_cb_lock);
>  }
> +
> +int
> +rte_eth_dev_rx_queue_intr_enable(uint8_t port_id,
> + uint16_t queue_id)
> +{
> + struct rte_eth_dev *dev;
> +
> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return (-ENODEV);
> + }
> +
> + dev = _eth_devices[port_id];
> + if (dev == NULL) {
> + PMD_DEBUG_TRACE("Invalid port device\n");
> + return (-ENODEV);
> + }
> +
> + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
> + return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
> +}
> +
> +int
> +rte_eth_dev_rx_queue_intr_disable(uint8_t port_id,
> + uint16_t queue_id)
> +{
> + struct rte_eth_dev *dev;
> +
> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return (-ENODEV);
> + }
> +
> + dev = _eth_devices[port_id];
> + if (dev == NULL) {
> + PMD_DEBUG_TRACE("Invalid port device\n");
> + return (-ENODEV);
> + }
> +
> + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
> + return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
> +}
> +
>  #ifdef RTE_NIC_BYPASS
>  int rte_eth_dev_bypass_init(uint8_t port_id)
>  {
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 84160c3..0f320a9 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -848,6 +848,8 @@ struct rte_eth_fdir {
>  struct rte_intr_conf {
>   /** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
>   uint16_t lsc;
> + /** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
> + uint16_t rxq;
>  };
>  
>  /**
> @@ -1109,6 +,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev 
> *dev,
>   const struct rte_eth_txconf *tx_conf);
>  /**< @internal Setup a transmit queue of an Ethernet device. */
>  
> +typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> + uint16_t rx_queue_id);
> +/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> +
> +typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> + uint16_t rx_queue_id);
> +/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
> +
>  typedef void (*eth_queue_release_t)(void *queue);
>  /**< @internal Release memory resources allocated by given RX/TX queue. */
>  
> @@ -1445,6 +1455,8 @@ struct eth_dev_ops {
>   eth_queue_start_t  tx_queue_start;/**< Start TX for a queue.*/
>   eth_queue_stop_t   tx_queue_stop;/**< Stop TX for a queue.*/
>   eth_rx_queue_setup_t   rx_queue_setup;/**< Set up device RX queue.*/
> + eth_rx_enable_intr_t   rx_queue_intr_enable; /**< Enable Rx queue 
> interrupt. */
> + eth_rx_disable_intr_t  rx_queue_intr_disable; /**< Disable Rx queue 
> interrupt.*/
>   eth_queue_release_trx_queue_release;/**< Release RX queue.*/
>   eth_rx_queue_count_t   rx_queue_count; /**< Get Rx queue count. */
>   eth_rx_descriptor_done_t   rx_descriptor_done;  /**< Check rxd DD bit */
> @@ -2811,6 +2823,51 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev 
> *dev,
>   enum rte_eth_event_type event);
>  
>  /**
> + * When there is no rx packet coming in Rx Queue for a long time, we can
> + * sleep lcore related to RX Queue for power saving, and enable rx interrupt
> + * to be triggered when rx packect arrives.
> + *
> + * The rte_eth_dev_rx_queue_intr_enable() function enables rx queue
> + * interrupt on specific rx queue of a port.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the receive queue from which to retrieve input packets.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
> + *  

[dpdk-dev] [PATCH v3 1/5] ethdev: add rx interrupt enable/disable functions

2015-02-17 Thread Neil Horman
On Tue, Feb 17, 2015 at 09:47:15PM +0800, Zhou Danny wrote:
> v3 changes
> - Add return value for interrupt enable/disable functions
> 
> Add two dev_ops functions to enable and disable rx queue interrupts
> 
> Signed-off-by: Danny Zhou 
> Tested-by: Yong Liu 
> ---
>  lib/librte_ether/rte_ethdev.c | 43 
>  lib/librte_ether/rte_ethdev.h | 57 
> +++
>  2 files changed, 100 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index ea3a1fb..d27469a 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2825,6 +2825,49 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   }
>   rte_spinlock_unlock(_eth_dev_cb_lock);
>  }
> +
> +int
> +rte_eth_dev_rx_queue_intr_enable(uint8_t port_id,
> + uint16_t queue_id)
> +{
> + struct rte_eth_dev *dev;
> +
> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return (-ENODEV);
> + }
> +
> + dev = _eth_devices[port_id];
> + if (dev == NULL) {
> + PMD_DEBUG_TRACE("Invalid port device\n");
> + return (-ENODEV);
> + }
> +
> + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
> + return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
> +}
> +
> +int
> +rte_eth_dev_rx_queue_intr_disable(uint8_t port_id,
> + uint16_t queue_id)
> +{
> + struct rte_eth_dev *dev;
> +
> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return (-ENODEV);
> + }
> +
> + dev = _eth_devices[port_id];
> + if (dev == NULL) {
> + PMD_DEBUG_TRACE("Invalid port device\n");
> + return (-ENODEV);
> + }
> +
> + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
> + return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
> +}
> +
>  #ifdef RTE_NIC_BYPASS
>  int rte_eth_dev_bypass_init(uint8_t port_id)
>  {
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 84160c3..0f320a9 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -848,6 +848,8 @@ struct rte_eth_fdir {
>  struct rte_intr_conf {
>   /** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
>   uint16_t lsc;
> + /** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
> + uint16_t rxq;
>  };
>  
>  /**
> @@ -1109,6 +,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev 
> *dev,
>   const struct rte_eth_txconf *tx_conf);
>  /**< @internal Setup a transmit queue of an Ethernet device. */
>  
> +typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> + uint16_t rx_queue_id);
> +/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> +
> +typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> + uint16_t rx_queue_id);
> +/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
> +
>  typedef void (*eth_queue_release_t)(void *queue);
>  /**< @internal Release memory resources allocated by given RX/TX queue. */
>  
> @@ -1445,6 +1455,8 @@ struct eth_dev_ops {
>   eth_queue_start_t  tx_queue_start;/**< Start TX for a queue.*/
>   eth_queue_stop_t   tx_queue_stop;/**< Stop TX for a queue.*/
>   eth_rx_queue_setup_t   rx_queue_setup;/**< Set up device RX queue.*/
> + eth_rx_enable_intr_t   rx_queue_intr_enable; /**< Enable Rx queue 
> interrupt. */
> + eth_rx_disable_intr_t  rx_queue_intr_disable; /**< Disable Rx queue 
> interrupt.*/
Put these at the end of eth_dev_ops if you want to avoid breaking ABI

>   eth_queue_release_trx_queue_release;/**< Release RX queue.*/
>   eth_rx_queue_count_t   rx_queue_count; /**< Get Rx queue count. */
>   eth_rx_descriptor_done_t   rx_descriptor_done;  /**< Check rxd DD bit */
> @@ -2811,6 +2823,51 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev 
> *dev,
>   enum rte_eth_event_type event);
>  
>  /**
> + * When there is no rx packet coming in Rx Queue for a long time, we can
> + * sleep lcore related to RX Queue for power saving, and enable rx interrupt
> + * to be triggered when rx packect arrives.
> + *
> + * The rte_eth_dev_rx_queue_intr_enable() function enables rx queue
> + * interrupt on specific rx queue of a port.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the receive queue from which to retrieve input packets.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @return
> + *   - (0) if successful.
> + * 

[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-17 Thread Jay Rolette
On Mon, Feb 16, 2015 at 7:00 PM, Matthew Hall  wrote:

> On Mon, Feb 16, 2015 at 10:33:52AM -0600, Jay Rolette wrote:
> > In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
> > netif_rx(). The source for netif_receive_skb() point out that it should
> > only be called from soft-irq context, which isn't the case for KNI.
>
> For the uninitiated among us, what was the practical effect of the coding
> error? Waiting forever for a lock which will never be available in IRQ
> context, or causing unintended re-entrancy, or what?
>

Sadly, I'm not really one of the enlightened ones when it comes to the
Linux kernel. VxWorks? sure. Linux kernel? learning as required.

I didn't chase it down to the specific mechanism in this case. Unusual for
me, but this time I took the expedient route of finding a likely
explanation plus Yao's fix on that same code with his explanation of a
deadlock and went with it. It'll be a few more days before we've had enough
run time on it to absolutely confirm (not an easy bug to repro).

If I get hand-wavy about it, my assumption is that the requirement for
netif_receive_skb() be called in soft-irq context means it doesn't expect
to be pre-empted or rentrant.  When you call netif_rx() instead, it puts
the skb on the backlog and it gets processed from there. Part of that code
disables interrupts during part of the processing. Not sure what else is
coming in and actually deadlocking things.

Honestly, I don't understand enough details of how everything works in the
Linux network stack yet. I've done tons of work on the network path of
stack-less systems, a bit of work in device drivers, but have only touched
the surface of the internals of Linux network stack. The meat of my product
avoids that like the plague because it is too slow.

Sorry, lots of words but not much light being shed this time...
Jay


[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Bruce Richardson
On Tue, Feb 17, 2015 at 04:32:01PM +0100, Thomas Monjalon wrote:
> 2015-02-17 12:25, Bruce Richardson:
> > On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > > 2015-02-16 15:16, Bruce Richardson:
> > > > On Mon, Feb 16, 2015 at 03:33:40PM +0100, Olivier MATZ wrote:
> > > > > Hi John,
> > > > > 
> > > > > On 02/13/2015 04:39 PM, John McNamara wrote:
> > > > > > From: Richardson, Bruce 
> > > > > > 
> > > > > > Example showing how callbacks can be used to insert a timestamp
> > > > > > into each packet on RX. On TX the timestamp is used to calculate
> > > > > > the packet latency through the app, in cycles.
> > > > > > 
> > > > > > Signed-off-by: Bruce Richardson 
> > > > > 
> > > > > 
> > > > > I'm looking at the example and I don't understand what is the 
> > > > > advantage
> > > > > of having callbacks in ethdev layer, knowing that the application can
> > > > > do the same job by a standard function call.
> > > > > 
> > > > > What is the advantage of having callbacks compared to:
> > > > > 
> > > > > 
> > > > > for (port = 0; port < nb_ports; port++) {
> > > > >   struct rte_mbuf *bufs[BURST_SIZE];
> > > > >   const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
> > > > >   bufs, BURST_SIZE);
> > > > >   if (unlikely(nb_rx == 0))
> > > > >   continue;
> > > > >   add_timestamp(bufs, nb_rx);
> > > > > 
> > > > >   const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
> > > > >   bufs, nb_rx);
> > > > >   calc_latency(bufs, nb_tx);
> > > > > 
> > > > >   if (unlikely(nb_tx < nb_rx)) {
> > > > >   uint16_t buf;
> > > > >   for (buf = nb_tx; buf < nb_rx; buf++)
> > > > >   rte_pktmbuf_free(bufs[buf]);
> > > > >   }
> > > > > }
> > > > > 
> > > > > 
> > > > > To me, doing like the code above has several advantages:
> > > > > 
> > > > > - code is more readable: the callback is explicitly invoked, so there 
> > > > > is
> > > > >   no risk to forget it
> > > > > - code is faster: the functions calls can be inlined by the compiler
> > > > > - easier to handle error cases in the callback function as the error
> > > > >   code is accessible to the application
> > > > > - there is no need to add code in ethdev api to do this
> > > > > - if the application does not want to use callbacks (I suppose most
> > > > >   applications), it won't have any performance impact
> > > > > 
> > > > > Regards,
> > > > > Olivier
> > > > 
> > > > In this specific instance, given that the application does little else, 
> > > > there
> > > > is no real advantage to using the callbacks - it's just to have a 
> > > > simple example
> > > > of how they can be used.
> > > > 
> > > > Where callbacks are really designed to be useful, is for extending or 
> > > > augmenting
> > > > hardware capabilities. Taking the example of sequence numbers - to use 
> > > > the most
> > > > trivial example - an application could be written to take advantage of 
> > > > sequence
> > > > numbers written to packets by the hardware which received them. 
> > > > However, if such
> > > > an application was to be used with a NIC which does not provide 
> > > > sequence numbering
> > > > capability, for example, anything using ixgbe driver, the application 
> > > > writer has
> > > > two choices - either modify his application code to check each packet 
> > > > for
> > > > a sequence number in the data path, and add it there post-rx, or 
> > > > alternatively,
> > > > to check the NIC capabilities at initialization time, and add a 
> > > > callback there
> > > > at initialization, if the hardware does not support it. In the latter 
> > > > case,
> > > > the main packet processing body of the application can be written as 
> > > > though
> > > > hardware always has sequence numbering capability, safe in the 
> > > > knowledge that
> > > > any hardware not supporting it will be back-filled by a software 
> > > > fallback at 
> > > > initialization-time.
> > > > 
> > > > By the same token, we could also look to extend hardware capabilities. 
> > > > For
> > > > different filtering or hashing capabilities, there can be limits in 
> > > > hardware
> > > > which are far less than what we need to use in software. Again, 
> > > > callbacks will
> > > > allow the data path to be written in a way that is oblivious to the 
> > > > underlying
> > > > hardware limits, because software will transparently fill in the gaps.
> > > > 
> > > > Hope this makes the use case clear.
> > > 
> > > After thinking more about these callbacks, I realize these callbacks won't
> > > help, as Olivier said.
> > > 
> > > With callback,
> > > 1/ application checks device capability
> > > 2/ application provides hardware emulation as DPDK callback
> > > 3/ application forgets previous steps
> > > 4/ application calls DPDK Rx
> > > 5/ DPDK calls callback (without calling optimization)
> > > 
> > > Without callback,
> > > 1/ application checks device capability

[dpdk-dev] [PATCH v3 4/5] eal: add per rx queue interrupt handling based on VFIO

2015-02-17 Thread Neil Horman
On Tue, Feb 17, 2015 at 09:47:18PM +0800, Zhou Danny wrote:
> v3 changes:
> - Fix review comments
> 
> v2 changes:
> - Fix compilation issue for a missed header file
> - Bug fix: free unreleased resources on the exception path before return
> - Consolidate coding style related review comments
> 
> This patch does below:
> - Create multiple VFIO eventfd for rx queues.
> - Handle per rx queue interrupt.
> - Eliminate unnecessary suspended DPDK polling thread wakeup mechanism
> for rx interrupt by allowing polling thread epoll_wait rx queue
> interrupt notification.
> 
> Signed-off-by: Danny Zhou 
> Tested-by: Yong Liu 
> ---
>  lib/librte_eal/common/include/rte_eal.h|  12 ++
>  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 190 
> -
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  12 +-
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
>  5 files changed, 175 insertions(+), 44 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_eal.h 
> b/lib/librte_eal/common/include/rte_eal.h
> index f4ecd2e..d81331f 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -150,6 +150,18 @@ int rte_eal_iopl_init(void);
>   *   - On failure, a negative error value.
>   */
>  int rte_eal_init(int argc, char **argv);
> +
> +/**
> + * @param port_id
> + *   the port id
> + * @param queue_id
> + *   the queue id
> + * @return
> + *   - On success, return 0
> + *   - On failure, returns -1.
> + */
> +int rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id);
> +
>  /**
>   * Usage function typedef used by the application usage function.
>   *
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
> b/lib/librte_eal/linuxapp/eal/Makefile
> index e117cec..c593dfa 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -43,6 +43,7 @@ CFLAGS += -I$(SRCDIR)/include
>  CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
>  CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
>  CFLAGS += -I$(RTE_SDK)/lib/librte_ring
> +CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
>  CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
>  CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
>  CFLAGS += -I$(RTE_SDK)/lib/librte_ether
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index dc2668a..97215ad 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -64,6 +64,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "eal_private.h"
>  #include "eal_vfio.h"
> @@ -127,6 +128,9 @@ static pthread_t intr_thread;
>  #ifdef VFIO_PRESENT
>  
>  #define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
> +/* irq set buffer length for queue interrupts and LSC interrupt */
> +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + \
> + sizeof(int) * (VFIO_MAX_QUEUE_ID + 1))
>  
>  /* enable legacy (INTx) interrupts */
>  static int
> @@ -218,10 +222,10 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) {
>   return 0;
>  }
>  
> -/* enable MSI-X interrupts */
> +/* enable MSI interrupts */
>  static int
>  vfio_enable_msi(struct rte_intr_handle *intr_handle) {
> - int len, ret;
> + int len, ret, max_intr;
>   char irq_set_buf[IRQ_SET_BUF_LEN];
>   struct vfio_irq_set *irq_set;
>   int *fd_ptr;
> @@ -230,12 +234,19 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) {
>  
>   irq_set = (struct vfio_irq_set *) irq_set_buf;
>   irq_set->argsz = len;
> - irq_set->count = 1;
> + if ((!intr_handle->max_intr) ||
> + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID))
> + max_intr = VFIO_MAX_QUEUE_ID + 1;
> + else
> + max_intr = intr_handle->max_intr;
> +
> + irq_set->count = max_intr;
>   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
> VFIO_IRQ_SET_ACTION_TRIGGER;
>   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
>   irq_set->start = 0;
>   fd_ptr = (int *) _set->data;
> - *fd_ptr = intr_handle->fd;
> + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd));
> + fd_ptr[max_intr - 1] = intr_handle->fd;
>  
>   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
>  
> @@ -244,27 +255,10 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) {
>   intr_handle->fd);
>   return -1;
>   }
> -
> - /* manually trigger interrupt to enable it */
> - memset(irq_set, 0, len);
> - len = sizeof(struct vfio_irq_set);
> - irq_set->argsz = len;
> - irq_set->count = 1;
> - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
> - irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
> - irq_set->start = 0;
> -
> - ret = ioctl(intr_handle->vfio_dev_fd, 

[dpdk-dev] [PATCH v2 2/2] examples: introduce new l2fwd-headroom example

2015-02-17 Thread Pawel Wodkowski
This app demonstrate usage of new headroom library.
It is basicaly orginal l2fwd with following modificantions to met
headroom library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in orginal application.
- stats is moved to rte_alarm callbac to not introduce overhead of
printing.
- stats are expanded to show headroom statistics.
- added new parameter '-l' to automatic thousands separator.

Comparing orginal l2fwd and l2fwd-headroom apps will show approach what
is needed to properly write own application with headroom measurements.

New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of headroom library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.


Signed-off-by: Pawel Wodkowski 
---
 examples/Makefile|1 +
 examples/l2fwd-headroom/Makefile |   51 ++
 examples/l2fwd-headroom/main.c   | 1039 ++
 mk/rte.app.mk|4 +
 4 files changed, 1095 insertions(+)
 create mode 100644 examples/l2fwd-headroom/Makefile
 create mode 100644 examples/l2fwd-headroom/main.c

diff --git a/examples/Makefile b/examples/Makefile
index 81f1d2f..8a459b7 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -50,6 +50,7 @@ DIRS-$(CONFIG_RTE_MBUF_REFCNT) += ip_fragmentation
 DIRS-$(CONFIG_RTE_MBUF_REFCNT) += ipv4_multicast
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += kni
 DIRS-y += l2fwd
+DIRS-y += l2fwd-headroom
 DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem
 DIRS-y += l3fwd
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += l3fwd-acl
diff --git a/examples/l2fwd-headroom/Makefile b/examples/l2fwd-headroom/Makefile
new file mode 100644
index 000..07da286
--- /dev/null
+++ b/examples/l2fwd-headroom/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = l2fwd-headroom
+
+# all source are stored in SRCS-y
+SRCS-y := main.c
+
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/l2fwd-headroom/main.c b/examples/l2fwd-headroom/main.c
new file mode 100644
index 000..7ba1743
--- /dev/null
+++ b/examples/l2fwd-headroom/main.c
@@ -0,0 +1,1039 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote 

[dpdk-dev] Patches outstanding

2015-02-17 Thread Thomas Monjalon
2015-02-17 10:35, Stephen Hemminger:
> There are currently 1039 patches outstanding on DPDK.
> What is the schedule for getting these merged or resolved?

Several patches are ready to be applied in coming days.
Many patches are still waiting for closing discussions.

> I don't think it would be reasonable to declare 2.0 as done
> until the patch backlog is 0!

The target is to integrate features in next days and make a rc1.
Then we will have 5 weeks to fix and clean before the release.

If some patches have not reached a consensus at the proper time,
they could stay in the backlog.


[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Bruce Richardson
On Tue, Feb 17, 2015 at 10:49:24AM -0500, Neil Horman wrote:
> On Tue, Feb 17, 2015 at 01:50:58PM +, Bruce Richardson wrote:
> > On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote:
> > > Hi Bruce,
> > > 
> > > On 02/17/2015 01:25 PM, Bruce Richardson wrote:
> > > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > > >>2015-02-16 15:16, Bruce Richardson:
> > > >>>In this specific instance, given that the application does little 
> > > >>>else, there
> > > >>>is no real advantage to using the callbacks - it's just to have a 
> > > >>>simple example
> > > >>>of how they can be used.
> > > >>>
> > > >>>Where callbacks are really designed to be useful, is for extending or 
> > > >>>augmenting
> > > >>>hardware capabilities. Taking the example of sequence numbers - to use 
> > > >>>the most
> > > >>>trivial example - an application could be written to take advantage of 
> > > >>>sequence
> > > >>>numbers written to packets by the hardware which received them. 
> > > >>>However, if such
> > > >>>an application was to be used with a NIC which does not provide 
> > > >>>sequence numbering
> > > >>>capability, for example, anything using ixgbe driver, the application 
> > > >>>writer has
> > > >>>two choices - either modify his application code to check each packet 
> > > >>>for
> > > >>>a sequence number in the data path, and add it there post-rx, or 
> > > >>>alternatively,
> > > >>>to check the NIC capabilities at initialization time, and add a 
> > > >>>callback there
> > > >>>at initialization, if the hardware does not support it. In the latter 
> > > >>>case,
> > > >>>the main packet processing body of the application can be written as 
> > > >>>though
> > > >>>hardware always has sequence numbering capability, safe in the 
> > > >>>knowledge that
> > > >>>any hardware not supporting it will be back-filled by a software 
> > > >>>fallback at
> > > >>>initialization-time.
> > > >>>
> > > >>>By the same token, we could also look to extend hardware capabilities. 
> > > >>>For
> > > >>>different filtering or hashing capabilities, there can be limits in 
> > > >>>hardware
> > > >>>which are far less than what we need to use in software. Again, 
> > > >>>callbacks will
> > > >>>allow the data path to be written in a way that is oblivious to the 
> > > >>>underlying
> > > >>>hardware limits, because software will transparently fill in the gaps.
> > > >>>
> > > >>>Hope this makes the use case clear.
> > > >>
> > > >>After thinking more about these callbacks, I realize these callbacks 
> > > >>won't
> > > >>help, as Olivier said.
> > > >>
> > > >>With callback,
> > > >>1/ application checks device capability
> > > >>2/ application provides hardware emulation as DPDK callback
> > > >>3/ application forgets previous steps
> > > >>4/ application calls DPDK Rx
> > > >>5/ DPDK calls callback (without calling optimization)
> > > >>
> > > >>Without callback,
> > > >>1/ application checks device capability
> > > >>2/ application provides hardware emulation as internal function
> > > >>3/ application set an internal device-flag to enable this function
> > > >>4/ application calls DPDK Rx
> > > >>5/ application calls the hardware emulation if flag is set
> > > >>
> > > >>So the only difference is to keep persistent the device information in
> > > >>the application instead of storing it as a function pointer in the
> > > >>DPDK struct.
> > > >>You can also be faster with this approach: at initialization time,
> > > >>you can check that your NIC supports the feature and use a specific
> > > >>mainloop that adds or not the sequence number without any runtime
> > > >>test.
> > > >
> > > >That is assuming that all NICs are equal on your system. It's also 
> > > >assuming
> > > >that you only have a single point in your application where you call RX 
> > > >or
> > > >TX burst. In the case where you have a couple of different NICs on the 
> > > >system,
> > > >or where you want to write an application to take advantage of 
> > > >capabilities of
> > > >different NICs, the ability to resolve all these difference at 
> > > >initialization
> > > >time is useful. The main packet handling code can be written with just 
> > > >the
> > > >processing of packets in mind, rather than having to have a set of 
> > > >branches
> > > >after each RX burst call, or before each TX burst call, to "smooth out" 
> > > >the
> > > >different NIC capabilities.
> > > >
> > > >As for the option of maintaining different main loops for different NICs 
> > > >with
> > > >different capabilities - that sounds like a maintenance nightmare to
> > > >me, due to duplicated code! Callbacks is a far cleaner solution than 
> > > >that IMHO.
> > > 
> > > Why not just provide a function like this:
> > > 
> > >   rte_do_unsupported_stuff_by_software(m[], m_count, wanted_features,
> > >   dev_feature_flags)
> > > 
> > > This function can be called (or not) from the application mainloop.
> > > You don't need to maintain several 

[dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock

2015-02-17 Thread Stephen Hemminger
On Mon, 16 Feb 2015 22:44:31 +
"Dumitrescu, Cristian"  wrote:

> Hi Stephen,
> 
> Sorry, NACK.
> 
> 1. Overflow issue
> As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, 
> the line of code below results in overflow:
>   port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / 
> params->rate;
> Therefore, there is most likely a significant accuracy loss, which might 
> result in more packets allowed to go out than it should.

The tsc shifted is still 64 bits.
and rate is 32 bits bytes/sec.

I chose scale such that
if clock = 3 Ghz
then min rate = 715 bytes/sec =  5722 bits/sec

> 2. Integer division has a higher cost than floating point division
> My understanding is we are considering a performance improvement by replacing 
> the double precision floating point division in:
>   double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
> with an integer division:
>   uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / 
> port->cycles_per_byte;
> I don't think this is going to have the claimed benefit, as acording to 
> "Intel 64 and IA-32 Architectures Optimization  Reference Manual" (Appendix 
> C), the latency of the integer division instruction is significantly bigger 
> than the latency of integer division:
>   Instruction FDIV double precision: latency = 38-40 cycles
>   Instruction IDIV: latency = 56 - 80 cycles

I observed that performance when from 5Gbit/sec to 10Gbit/sec.
Mostly because the floating point engages more instruction units and does not
pipeline. Cycle count is not everything.  This was on Ivy Bridge processor.


> 3. Alternative
> I hear though your suggestion about replacing the floating point division 
> with a more performant construction. One suggestion would be to replace it 
> with an integer multiplication followed by a shift right, probably by using a 
> uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need 
> to prototype this code myself. Would you be OK to look into providing an 
> alternative implementation?
>

I looked into multiplative integer method, and will do it in future. But it has
more scaling issues since it would require that the values both be 32 bits.



[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Neil Horman
On Tue, Feb 17, 2015 at 04:00:56PM +, Bruce Richardson wrote:
> On Tue, Feb 17, 2015 at 10:49:24AM -0500, Neil Horman wrote:
> > On Tue, Feb 17, 2015 at 01:50:58PM +, Bruce Richardson wrote:
> > > On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote:
> > > > Hi Bruce,
> > > > 
> > > > On 02/17/2015 01:25 PM, Bruce Richardson wrote:
> > > > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > > > >>2015-02-16 15:16, Bruce Richardson:
> > > > >>>In this specific instance, given that the application does little 
> > > > >>>else, there
> > > > >>>is no real advantage to using the callbacks - it's just to have a 
> > > > >>>simple example
> > > > >>>of how they can be used.
> > > > >>>
> > > > >>>Where callbacks are really designed to be useful, is for extending 
> > > > >>>or augmenting
> > > > >>>hardware capabilities. Taking the example of sequence numbers - to 
> > > > >>>use the most
> > > > >>>trivial example - an application could be written to take advantage 
> > > > >>>of sequence
> > > > >>>numbers written to packets by the hardware which received them. 
> > > > >>>However, if such
> > > > >>>an application was to be used with a NIC which does not provide 
> > > > >>>sequence numbering
> > > > >>>capability, for example, anything using ixgbe driver, the 
> > > > >>>application writer has
> > > > >>>two choices - either modify his application code to check each 
> > > > >>>packet for
> > > > >>>a sequence number in the data path, and add it there post-rx, or 
> > > > >>>alternatively,
> > > > >>>to check the NIC capabilities at initialization time, and add a 
> > > > >>>callback there
> > > > >>>at initialization, if the hardware does not support it. In the 
> > > > >>>latter case,
> > > > >>>the main packet processing body of the application can be written as 
> > > > >>>though
> > > > >>>hardware always has sequence numbering capability, safe in the 
> > > > >>>knowledge that
> > > > >>>any hardware not supporting it will be back-filled by a software 
> > > > >>>fallback at
> > > > >>>initialization-time.
> > > > >>>
> > > > >>>By the same token, we could also look to extend hardware 
> > > > >>>capabilities. For
> > > > >>>different filtering or hashing capabilities, there can be limits in 
> > > > >>>hardware
> > > > >>>which are far less than what we need to use in software. Again, 
> > > > >>>callbacks will
> > > > >>>allow the data path to be written in a way that is oblivious to the 
> > > > >>>underlying
> > > > >>>hardware limits, because software will transparently fill in the 
> > > > >>>gaps.
> > > > >>>
> > > > >>>Hope this makes the use case clear.
> > > > >>
> > > > >>After thinking more about these callbacks, I realize these callbacks 
> > > > >>won't
> > > > >>help, as Olivier said.
> > > > >>
> > > > >>With callback,
> > > > >>1/ application checks device capability
> > > > >>2/ application provides hardware emulation as DPDK callback
> > > > >>3/ application forgets previous steps
> > > > >>4/ application calls DPDK Rx
> > > > >>5/ DPDK calls callback (without calling optimization)
> > > > >>
> > > > >>Without callback,
> > > > >>1/ application checks device capability
> > > > >>2/ application provides hardware emulation as internal function
> > > > >>3/ application set an internal device-flag to enable this function
> > > > >>4/ application calls DPDK Rx
> > > > >>5/ application calls the hardware emulation if flag is set
> > > > >>
> > > > >>So the only difference is to keep persistent the device information in
> > > > >>the application instead of storing it as a function pointer in the
> > > > >>DPDK struct.
> > > > >>You can also be faster with this approach: at initialization time,
> > > > >>you can check that your NIC supports the feature and use a specific
> > > > >>mainloop that adds or not the sequence number without any runtime
> > > > >>test.
> > > > >
> > > > >That is assuming that all NICs are equal on your system. It's also 
> > > > >assuming
> > > > >that you only have a single point in your application where you call 
> > > > >RX or
> > > > >TX burst. In the case where you have a couple of different NICs on the 
> > > > >system,
> > > > >or where you want to write an application to take advantage of 
> > > > >capabilities of
> > > > >different NICs, the ability to resolve all these difference at 
> > > > >initialization
> > > > >time is useful. The main packet handling code can be written with just 
> > > > >the
> > > > >processing of packets in mind, rather than having to have a set of 
> > > > >branches
> > > > >after each RX burst call, or before each TX burst call, to "smooth 
> > > > >out" the
> > > > >different NIC capabilities.
> > > > >
> > > > >As for the option of maintaining different main loops for different 
> > > > >NICs with
> > > > >different capabilities - that sounds like a maintenance nightmare to
> > > > >me, due to duplicated code! Callbacks is a far cleaner solution than 
> > > > >that IMHO.
> > > > 
> > > > Why 

[dpdk-dev] [PATCH v8 12/14] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-17 Thread Maxime Leroy
Hi Tetsuya,

On Tue, Feb 17, 2015 at 9:51 AM, Tetsuya Mukawa  wrote:
>
>
> >> +/* get port_id enabled by above procedures */
> >> +if (rte_eth_dev_get_changed_port(devs, _port_id))
> >> +goto err2;
> > [...]
> >
> >>  /**
> >> + * Uninitilization function called for each device driver once.
> >> + */
> >> +typedef int (rte_dev_uninit_t)(const char *name, const char *args);
> > Why do you need args for uninit?
> >
>
> I just added for the case that finalization code of PMD needs it.
> But, probably "args" parameter can be removed.
>
>

I think there are no needs to have any args in the uninit function:
1) You librte_pmd_null doesn't use it
2) You give exactly the same argument that was used by the init
function. A driver should have already stored these parameters in an
internal private structure at initialization. So it's not needed to
give me back for uninit method.

>From my understanding devargs_list is only needed at the init to store
the arguments when we parse the command line. Then, at initialization,
rte_eal_dev_init  creates the devices from this list .

By removing args from uninit function, you doesn't need to add and
remove anymore devargs in devargs_list to (de)attach a new device.

What do you think ?

Maxime


[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Bruce Richardson
On Tue, Feb 17, 2015 at 11:08:10AM -0500, Neil Horman wrote:
> On Tue, Feb 17, 2015 at 04:00:56PM +, Bruce Richardson wrote:
> > On Tue, Feb 17, 2015 at 10:49:24AM -0500, Neil Horman wrote:
> > > On Tue, Feb 17, 2015 at 01:50:58PM +, Bruce Richardson wrote:
> > > > On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote:
> > > > > Hi Bruce,
> > > > > 
> > > > > On 02/17/2015 01:25 PM, Bruce Richardson wrote:
> > > > > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > > > > >>2015-02-16 15:16, Bruce Richardson:
> > > > > >>>In this specific instance, given that the application does little 
> > > > > >>>else, there
> > > > > >>>is no real advantage to using the callbacks - it's just to have a 
> > > > > >>>simple example
> > > > > >>>of how they can be used.
> > > > > >>>
> > > > > >>>Where callbacks are really designed to be useful, is for extending 
> > > > > >>>or augmenting
> > > > > >>>hardware capabilities. Taking the example of sequence numbers - to 
> > > > > >>>use the most
> > > > > >>>trivial example - an application could be written to take 
> > > > > >>>advantage of sequence
> > > > > >>>numbers written to packets by the hardware which received them. 
> > > > > >>>However, if such
> > > > > >>>an application was to be used with a NIC which does not provide 
> > > > > >>>sequence numbering
> > > > > >>>capability, for example, anything using ixgbe driver, the 
> > > > > >>>application writer has
> > > > > >>>two choices - either modify his application code to check each 
> > > > > >>>packet for
> > > > > >>>a sequence number in the data path, and add it there post-rx, or 
> > > > > >>>alternatively,
> > > > > >>>to check the NIC capabilities at initialization time, and add a 
> > > > > >>>callback there
> > > > > >>>at initialization, if the hardware does not support it. In the 
> > > > > >>>latter case,
> > > > > >>>the main packet processing body of the application can be written 
> > > > > >>>as though
> > > > > >>>hardware always has sequence numbering capability, safe in the 
> > > > > >>>knowledge that
> > > > > >>>any hardware not supporting it will be back-filled by a software 
> > > > > >>>fallback at
> > > > > >>>initialization-time.
> > > > > >>>
> > > > > >>>By the same token, we could also look to extend hardware 
> > > > > >>>capabilities. For
> > > > > >>>different filtering or hashing capabilities, there can be limits 
> > > > > >>>in hardware
> > > > > >>>which are far less than what we need to use in software. Again, 
> > > > > >>>callbacks will
> > > > > >>>allow the data path to be written in a way that is oblivious to 
> > > > > >>>the underlying
> > > > > >>>hardware limits, because software will transparently fill in the 
> > > > > >>>gaps.
> > > > > >>>
> > > > > >>>Hope this makes the use case clear.
> > > > > >>
> > > > > >>After thinking more about these callbacks, I realize these 
> > > > > >>callbacks won't
> > > > > >>help, as Olivier said.
> > > > > >>
> > > > > >>With callback,
> > > > > >>1/ application checks device capability
> > > > > >>2/ application provides hardware emulation as DPDK callback
> > > > > >>3/ application forgets previous steps
> > > > > >>4/ application calls DPDK Rx
> > > > > >>5/ DPDK calls callback (without calling optimization)
> > > > > >>
> > > > > >>Without callback,
> > > > > >>1/ application checks device capability
> > > > > >>2/ application provides hardware emulation as internal function
> > > > > >>3/ application set an internal device-flag to enable this function
> > > > > >>4/ application calls DPDK Rx
> > > > > >>5/ application calls the hardware emulation if flag is set
> > > > > >>
> > > > > >>So the only difference is to keep persistent the device information 
> > > > > >>in
> > > > > >>the application instead of storing it as a function pointer in the
> > > > > >>DPDK struct.
> > > > > >>You can also be faster with this approach: at initialization time,
> > > > > >>you can check that your NIC supports the feature and use a specific
> > > > > >>mainloop that adds or not the sequence number without any runtime
> > > > > >>test.
> > > > > >
> > > > > >That is assuming that all NICs are equal on your system. It's also 
> > > > > >assuming
> > > > > >that you only have a single point in your application where you call 
> > > > > >RX or
> > > > > >TX burst. In the case where you have a couple of different NICs on 
> > > > > >the system,
> > > > > >or where you want to write an application to take advantage of 
> > > > > >capabilities of
> > > > > >different NICs, the ability to resolve all these difference at 
> > > > > >initialization
> > > > > >time is useful. The main packet handling code can be written with 
> > > > > >just the
> > > > > >processing of packets in mind, rather than having to have a set of 
> > > > > >branches
> > > > > >after each RX burst call, or before each TX burst call, to "smooth 
> > > > > >out" the
> > > > > >different NIC capabilities.
> > > > > >
> > > > > >As for 

[dpdk-dev] [PATCH v3 1/2] pmd: enable DCB in SRIOV

2015-02-17 Thread Pawel Wodkowski
This patch enables DCB in SRIOV mode for ixgbe (Niantic) driver.

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  2 +-
 lib/librte_pmd_ixgbe/ixgbe_pf.c | 19 ++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |  7 +++
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 412bab2..7e7434d 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1514,7 +1514,7 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
if (conf->nb_queue_pools != ETH_16_POOLS &&
   conf->nb_queue_pools != ETH_32_POOLS) {
PMD_INIT_LOG(ERR, " VMDQ+DCB selected, "
-   "number of TX qqueue pools must be %d 
or %d\n",
+   "number of TX queue pools must be %d or 
%d\n",
ETH_16_POOLS, ETH_32_POOLS);
return (-EINVAL);
}
diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c b/lib/librte_pmd_ixgbe/ixgbe_pf.c
index 255c996..8411445 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
@@ -137,7 +137,7 @@ int ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)


 /*
- * Functin that make SRIOV configuration, based on device configuration,
+ * Function that make SRIOV configuration, based on device configuration,
  * number of requested queues and number of VF created.
  * Function returns:
  * 1 - SRIOV is not enabled (no VF created)
@@ -191,7 +191,7 @@ ixgbe_pf_configure_mq_sriov(struct rte_eth_dev *dev)
break;
case ETH_MQ_RX_RSS:
PMD_INIT_LOG(INFO, " RSS (SRIOV active) mode, "
-   "Rx mq mode is changed from:"
+   "Rx mq mode is changed from "
"mq_mode %u into VMDQ mq_mode %u\n",
dev_conf->rxmode.mq_mode,
dev->data->dev_conf.rxmode.mq_mode);
@@ -295,7 +295,7 @@ ixgbe_pf_configure_mq_sriov(struct rte_eth_dev *dev)

/* Check if available queus count is not less than allocated.*/
if (dev->data->nb_rx_queues > sriov->nb_rx_q_per_pool ||
-   dev->data->nb_rx_queues > sriov->nb_tx_q_per_pool) {
+   dev->data->nb_tx_queues > sriov->nb_tx_q_per_pool) {
PMD_INIT_LOG(ERR, "SRIOV active, "
"rx/tx queue number must less or equal to 
%d/%d\n",
sriov->nb_rx_q_per_pool, 
sriov->nb_tx_q_per_pool);
@@ -305,7 +305,6 @@ ixgbe_pf_configure_mq_sriov(struct rte_eth_dev *dev)
return 0;
 }

-
 int ixgbe_pf_host_configure(struct rte_eth_dev *eth_dev)
 {
uint32_t vtctl, fcrth;
@@ -659,7 +658,9 @@ ixgbe_get_vf_queues(struct rte_eth_dev *dev, uint32_t vf, 
uint32_t *msgbuf)
 {
struct ixgbe_vf_info *vfinfo =
*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
-   uint32_t default_q = vf * RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool;
+   struct ixgbe_dcb_config *dcbinfo =
+   IXGBE_DEV_PRIVATE_TO_DCB_CFG(dev->data->dev_private);
+   uint32_t default_q = RTE_ETH_DEV_SRIOV(dev).def_pool_q_idx;

/* Verify if the PF supports the mbox APIs version or not */
switch (vfinfo[vf].api_version) {
@@ -677,10 +678,10 @@ ixgbe_get_vf_queues(struct rte_eth_dev *dev, uint32_t vf, 
uint32_t *msgbuf)
/* Notify VF of default queue */
msgbuf[IXGBE_VF_DEF_QUEUE] = default_q;

-   /*
-* FIX ME if it needs fill msgbuf[IXGBE_VF_TRANS_VLAN]
-* for VLAN strip or VMDQ_DCB or VMDQ_DCB_RSS
-*/
+   if (dcbinfo->num_tcs.pg_tcs)
+   msgbuf[IXGBE_VF_TRANS_VLAN] = dcbinfo->num_tcs.pg_tcs;
+   else
+   msgbuf[IXGBE_VF_TRANS_VLAN] = 1;

return 0;
 }
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index e6766b3..f845bb0 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3166,10 +3166,9 @@ void ixgbe_configure_dcb(struct rte_eth_dev *dev)

/* check support mq_mode for DCB */
if ((dev_conf->rxmode.mq_mode != ETH_MQ_RX_VMDQ_DCB) &&
-   (dev_conf->rxmode.mq_mode != ETH_MQ_RX_DCB))
-   return;
-
-   if (dev->data->nb_rx_queues != ETH_DCB_NUM_QUEUES)
+   (dev_conf->rxmode.mq_mode != ETH_MQ_RX_DCB) &&
+   (dev_conf->txmode.mq_mode != ETH_MQ_TX_VMDQ_DCB) &&
+   (dev_conf->txmode.mq_mode != ETH_MQ_TX_DCB))
return;

/** Configure DCB hardware **/
-- 
1.9.1



[dpdk-dev] [PATCH v3 0/2] new headroom stats library and example application

2015-02-17 Thread Pawel Wodkowski
Hi community,
I would like to introduce library for measuring load of some arbitrary jobs. It
can be used to profile every kind of job sets on any arbitrary execution unit or
tasking library.

In provided l2fwd-headroom example I demonstrate how to use this library to
select optimal rx burst poll time. Jobs are selected by using existing rte_timer
library calls. This example does no limit possible schemes on which this library
can be used.

PATCH v3 changes:
 - spelling fixes.

PATCH v2 changes:
 - Remove jobs management/callback from library to not duplicate tasking library
   behaviour.
 - Cleenup/remove useless statistics.
 - Rework example application to use rte_timer library for jobs selection.
 - Introduce new app parameter '-l' for automatic thousands separating in stats.
 - More readable statistics format.


Pawel Wodkowski (2):
  pmd: enable DCB in SRIOV
  tespmd: fix DCB in SRIOV mode support

 app/test-pmd/cmdline.c  |  4 ++--
 app/test-pmd/testpmd.c  | 39 +++--
 app/test-pmd/testpmd.h  | 10 --
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  2 +-
 lib/librte_pmd_ixgbe/ixgbe_pf.c | 19 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |  7 +++
 6 files changed, 45 insertions(+), 36 deletions(-)

-- 
1.9.1



[dpdk-dev] testpmd app issues

2015-02-17 Thread Jeff Wang
Hi,

I'm new to DPDK.

I have gone through the user guide, set up environment, hugepages. I can
get the helloworld app work. Now, when I tried to play with the testpmd
app, I got the following issue:

[root at localhost dpdk-1.8.0]# build/app/testpmd -c 0x2 -n1 -- -i
--nb-cores=1 --nb-ports=0x1

.

EAL: TSC frequency is ~2594110 KHz
EAL: Master core 1 is ready (tid=e07b3840)
PMD: ENICPMD trace: rte_enic_pmd_init
EAL: PCI device :02:00.0 on NUMA socket -1
EAL:   probe driver: 8086:10d3 rte_em_pmd
EAL:   :02:00.0 not managed by UIO driver, skipping
EAL: PCI device :03:00.0 on NUMA socket -1
EAL:   probe driver: 8086:10d3 rte_em_pmd
EAL:   PCI memory mapped at 0x7f06df80
EAL:   PCI memory mapped at 0x7f06df82
PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x10d3
EAL: PCI device :04:00.0 on NUMA socket -1
EAL:   probe driver: 8086:10d3 rte_em_pmd
EAL:   :04:00.0 not managed by UIO driver, skipping
EAL: PCI device :05:00.0 on NUMA socket -1
EAL:   probe driver: 8086:10d3 rte_em_pmd
EAL:   :05:00.0 not managed by UIO driver, skipping*PANIC in main():
Empty set of forwarding logical cores - check the core mask supplied
in the command parameters*
5: [build/app/testpmd() [0x428ea5]]
4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f06df8e2af5]]

.

It says the core mask is not right. I set it to 0x2 because my CPU
only has 2 cores. I don't quite get it.

Can someone help me with this? And has anyone else encountered the same issue?

Thanks!


[dpdk-dev] [PATCH v3 2/2] tespmd: fix DCB in SRIOV mode support

2015-02-17 Thread Pawel Wodkowski
This patch incorporate fixes to support DCB in SRIOV mode for testpmd.

Signed-off-by: Pawel Wodkowski 
---
 app/test-pmd/cmdline.c |  4 ++--
 app/test-pmd/testpmd.c | 39 +--
 app/test-pmd/testpmd.h | 10 --
 3 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4beb404..eb9877e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1942,9 +1942,9 @@ cmd_config_dcb_parsed(void *parsed_result,

/* DCB in VT mode */
if (!strncmp(res->vt_en, "on",2))
-   dcb_conf.dcb_mode = DCB_VT_ENABLED;
+   dcb_conf.vt_en = 1;
else
-   dcb_conf.dcb_mode = DCB_ENABLED;
+   dcb_conf.vt_en = 0;

if (!strncmp(res->pfc_en, "on",2)) {
dcb_conf.pfc_en = 1;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 773b8af..9b12c25 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1743,7 +1743,8 @@ const uint16_t vlan_tags[] = {
 };

 static  int
-get_eth_dcb_conf(struct rte_eth_conf *eth_conf, struct dcb_config *dcb_conf)
+get_eth_dcb_conf(struct rte_eth_conf *eth_conf, struct dcb_config *dcb_conf,
+   uint16_t sriov)
 {
 uint8_t i;

@@ -1751,7 +1752,7 @@ get_eth_dcb_conf(struct rte_eth_conf *eth_conf, struct 
dcb_config *dcb_conf)
 * Builds up the correct configuration for dcb+vt based on the vlan 
tags array
 * given above, and the number of traffic classes available for use.
 */
-   if (dcb_conf->dcb_mode == DCB_VT_ENABLED) {
+   if (dcb_conf->vt_en == 1) {
struct rte_eth_vmdq_dcb_conf vmdq_rx_conf;
struct rte_eth_vmdq_dcb_tx_conf vmdq_tx_conf;

@@ -1768,9 +1769,17 @@ get_eth_dcb_conf(struct rte_eth_conf *eth_conf, struct 
dcb_config *dcb_conf)
vmdq_rx_conf.pool_map[i].vlan_id = vlan_tags[ i ];
vmdq_rx_conf.pool_map[i].pools = 1 << (i % 
vmdq_rx_conf.nb_queue_pools);
}
-   for (i = 0; i < ETH_DCB_NUM_USER_PRIORITIES; i++) {
-   vmdq_rx_conf.dcb_queue[i] = i;
-   vmdq_tx_conf.dcb_queue[i] = i;
+
+   if (sriov == 0) {
+   for (i = 0; i < ETH_DCB_NUM_USER_PRIORITIES; i++) {
+   vmdq_rx_conf.dcb_queue[i] = i;
+   vmdq_tx_conf.dcb_queue[i] = i;
+   }
+   } else {
+   for (i = 0; i < ETH_DCB_NUM_USER_PRIORITIES; i++) {
+   vmdq_rx_conf.dcb_queue[i] = i % 
dcb_conf->num_tcs;
+   vmdq_tx_conf.dcb_queue[i] = i % 
dcb_conf->num_tcs;
+   }
}

/*set DCB mode of RX and TX of multiple queues*/
@@ -1828,22 +1837,32 @@ init_port_dcb_config(portid_t pid,struct dcb_config 
*dcb_conf)
uint16_t nb_vlan;
uint16_t i;

-   /* rxq and txq configuration in dcb mode */
-   nb_rxq = 128;
-   nb_txq = 128;
rx_free_thresh = 64;

+   rte_port = [pid];
memset(_conf,0,sizeof(struct rte_eth_conf));
/* Enter DCB configuration status */
dcb_config = 1;

nb_vlan = sizeof( vlan_tags )/sizeof( vlan_tags[ 0 ]);
/*set configuration of DCB in vt mode and DCB in non-vt mode*/
-   retval = get_eth_dcb_conf(_conf, dcb_conf);
+   retval = get_eth_dcb_conf(_conf, dcb_conf, 
rte_port->dev_info.max_vfs);
+
+   /* rxq and txq configuration in dcb mode */
+   nb_rxq = rte_port->dev_info.max_rx_queues;
+   nb_txq = rte_port->dev_info.max_tx_queues;
+
+   if (rte_port->dev_info.max_vfs) {
+   if (port_conf.rxmode.mq_mode == ETH_MQ_RX_VMDQ_DCB)
+   nb_rxq /= 
port_conf.rx_adv_conf.vmdq_dcb_conf.nb_queue_pools;
+
+   if (port_conf.txmode.mq_mode == ETH_MQ_TX_VMDQ_DCB)
+   nb_txq /= 
port_conf.tx_adv_conf.vmdq_dcb_tx_conf.nb_queue_pools;
+   }
+
if (retval < 0)
return retval;

-   rte_port = [pid];
memcpy(_port->dev_conf, _conf,sizeof(struct rte_eth_conf));

rte_port->rx_conf.rx_thresh = rx_thresh;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8f5e6c7..695e893 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -227,20 +227,10 @@ struct fwd_config {
portid_t   nb_fwd_ports;/**< Nb. of ports involved. */
 };

-/**
- * DCB mode enable
- */
-enum dcb_mode_enable
-{
-   DCB_VT_ENABLED,
-   DCB_ENABLED
-};
-
 /*
  * DCB general config info
  */
 struct dcb_config {
-   enum dcb_mode_enable dcb_mode;
uint8_t vt_en;
enum rte_eth_nb_tcs num_tcs;
uint8_t pfc_en;
-- 
1.9.1



[dpdk-dev] [PATCH v3 0/2] new headroom stats library and example application

2015-02-17 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pawel Wodkowski
> Sent: Tuesday, February 17, 2015 5:20 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3 0/2] new headroom stats library and example
> application
> 
> Hi community,
> I would like to introduce library for measuring load of some arbitrary jobs. 
> It
> can be used to profile every kind of job sets on any arbitrary execution unit 
> or
> tasking library.
> 
> In provided l2fwd-headroom example I demonstrate how to use this library to
> select optimal rx burst poll time. Jobs are selected by using existing 
> rte_timer
> library calls. This example does no limit possible schemes on which this 
> library
> can be used.
> 
> PATCH v3 changes:
>  - spelling fixes.
> 
> PATCH v2 changes:
>  - Remove jobs management/callback from library to not duplicate tasking
> library
>behaviour.
>  - Cleenup/remove useless statistics.
>  - Rework example application to use rte_timer library for jobs selection.
>  - Introduce new app parameter '-l' for automatic thousands separating in 
> stats.
>  - More readable statistics format.
> 
> 
> Pawel Wodkowski (2):
>   pmd: enable DCB in SRIOV
>   tespmd: fix DCB in SRIOV mode support
> 
>  app/test-pmd/cmdline.c  |  4 ++--
>  app/test-pmd/testpmd.c  | 39 
> +++--
>  app/test-pmd/testpmd.h  | 10 --
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  2 +-
>  lib/librte_pmd_ixgbe/ixgbe_pf.c | 19 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |  7 +++
>  6 files changed, 45 insertions(+), 36 deletions(-)
> 
> --
> 1.9.1

Not this branch :(
Self-NACK


[dpdk-dev] testpmd app issues

2015-02-17 Thread Bruce Richardson
On Tue, Feb 17, 2015 at 09:31:33AM -0700, Jeff Wang wrote:
> Hi,
> 
> I'm new to DPDK.
> 
> I have gone through the user guide, set up environment, hugepages. I can
> get the helloworld app work. Now, when I tried to play with the testpmd
> app, I got the following issue:
> 
> [root at localhost dpdk-1.8.0]# build/app/testpmd -c 0x2 -n1 -- -i
> --nb-cores=1 --nb-ports=0x1
> 
> .
> 
> EAL: TSC frequency is ~2594110 KHz
> EAL: Master core 1 is ready (tid=e07b3840)
> PMD: ENICPMD trace: rte_enic_pmd_init
> EAL: PCI device :02:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:10d3 rte_em_pmd
> EAL:   :02:00.0 not managed by UIO driver, skipping
> EAL: PCI device :03:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:10d3 rte_em_pmd
> EAL:   PCI memory mapped at 0x7f06df80
> EAL:   PCI memory mapped at 0x7f06df82
> PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x10d3
> EAL: PCI device :04:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:10d3 rte_em_pmd
> EAL:   :04:00.0 not managed by UIO driver, skipping
> EAL: PCI device :05:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:10d3 rte_em_pmd
> EAL:   :05:00.0 not managed by UIO driver, skipping*PANIC in main():
> Empty set of forwarding logical cores - check the core mask supplied
> in the command parameters*
> 5: [build/app/testpmd() [0x428ea5]]
> 4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f06df8e2af5]]
> 
> .
> 
> It says the core mask is not right. I set it to 0x2 because my CPU
> only has 2 cores. I don't quite get it.
> 
> Can someone help me with this? And has anyone else encountered the same issue?
> 
> Thanks!

Hi,

a coremask of 2 means to use only the second core (i.e. core 1, but not core 0).
Since the coremask is a bitmask, to use two cores you need to specify 0011b, or
"3" decimal/hex.

Regards,
/Bruce


[dpdk-dev] [PATCH v4 0/2] new headroom stats library and example application

2015-02-17 Thread Pawel Wodkowski
Hi community,
I would like to introduce library for measuring load of some arbitrary jobs. It
can be used to profile every kind of job sets on any arbitrary execution unit or
tasking library.

In provided l2fwd-headroom example I demonstrate how to use this library to
select optimal rx burst poll time. Jobs are selected by using existing rte_timer
library calls. This example does no limit possible schemes on which this library
can be used.

PATCH v4 changes:
 - use proper branch fof generating patch.

PATCH v3 changes:
 - Fix spelling.

PATCH v2 changes:
 - Remove jobs management/callback from library to not duplicate tasking library
   behaviour.
 - Cleenup/remove useless statistics.
 - Rework example application to use rte_timer library for jobs selection.
 - Introduce new app parameter '-l' for automatic thousands separating in stats.
 - More readable statistics format.


Pawel Wodkowski (2):
  librte_headroom: New library for checking core/system/app load
  examples: introduce new l2fwd-headroom example

 config/common_bsdapp |5 +
 config/common_linuxapp   |5 +
 examples/Makefile|1 +
 examples/l2fwd-headroom/Makefile |   51 ++
 examples/l2fwd-headroom/main.c   | 1039 ++
 lib/Makefile |1 +
 lib/librte_headroom/Makefile |   54 ++
 lib/librte_headroom/rte_headroom.c   |  271 +++
 lib/librte_headroom/rte_headroom.h   |  324 
 lib/librte_headroom/rte_headroom_version.map |   20 +
 mk/rte.app.mk|4 +
 11 files changed, 1775 insertions(+)
 create mode 100644 examples/l2fwd-headroom/Makefile
 create mode 100644 examples/l2fwd-headroom/main.c
 create mode 100644 lib/librte_headroom/Makefile
 create mode 100644 lib/librte_headroom/rte_headroom.c
 create mode 100644 lib/librte_headroom/rte_headroom.h
 create mode 100644 lib/librte_headroom/rte_headroom_version.map

-- 
1.9.1



[dpdk-dev] [PATCH v4 1/2] librte_headroom: New library for checking core/system/app load

2015-02-17 Thread Pawel Wodkowski
This library provide API to measure time spend in particular parts of
code and to calculate optimal polling time.

To calculate a those statistics application code need to be devided into
parts (called jobs) that do something. It is up to application to decide
what is considered a job.

Series of jobs must be surrounded with the rte_headroom_start_loop() and
rte_headroom_finish_loop() calls. After that, jobs might be started.
Each job must be surrounded with rte_headroom_start_job() and
rte_headroom_finish_job() calls.

After job finish its execution, period in which it should be called
again is adjusted to minimize time wasted on unnecessary polls/calls.
Adjustmend is based on data provided by job itself (ex: number of
packets it processed).

After all jobs in serie are executed fallowing statistics are updated
and might be used by application. Statistics can be reset. Some of
provided statistic data:
 - total/min/max execution - time spent in executing jobs.
 - total/min/max management - time spent outside execution area. This
value might used to measure overhead of sheduling jobs. This time also
contains overhead of headroom library itself.
 - number of loops that executed at least one job
 - executed jobs
 - time when statistics were reset.

Each job provide total/min/max execution time and execution count
statistics.

Signed-off-by: Pawel Wodkowski 
---
 config/common_bsdapp |   5 +
 config/common_linuxapp   |   5 +
 lib/Makefile |   1 +
 lib/librte_headroom/Makefile |  54 +
 lib/librte_headroom/rte_headroom.c   | 271 ++
 lib/librte_headroom/rte_headroom.h   | 324 +++
 lib/librte_headroom/rte_headroom_version.map |  20 ++
 7 files changed, 680 insertions(+)
 create mode 100644 lib/librte_headroom/Makefile
 create mode 100644 lib/librte_headroom/rte_headroom.c
 create mode 100644 lib/librte_headroom/rte_headroom.h
 create mode 100644 lib/librte_headroom/rte_headroom_version.map

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 57bacb8..aa2e5fd 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -282,6 +282,11 @@ CONFIG_RTE_LIBRTE_HASH=y
 CONFIG_RTE_LIBRTE_HASH_DEBUG=n

 #
+# Compile librte_headroom
+#
+CONFIG_RTE_LIBRTE_HEADROOM=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d428f84..055a37b 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -290,6 +290,11 @@ CONFIG_RTE_LIBRTE_HASH=y
 CONFIG_RTE_LIBRTE_HASH_DEBUG=n

 #
+# Compile librte_headroom
+#
+CONFIG_RTE_LIBRTE_HEADROOM=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/lib/Makefile b/lib/Makefile
index d617d81..4fc2819 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -54,6 +54,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
+DIRS-$(CONFIG_RTE_LIBRTE_HEADROOM) += librte_headroom
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
diff --git a/lib/librte_headroom/Makefile b/lib/librte_headroom/Makefile
new file mode 100644
index 000..faefb3b
--- /dev/null
+++ b/lib/librte_headroom/Makefile
@@ -0,0 +1,54 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 

[dpdk-dev] [PATCH v4 2/2] examples: introduce new l2fwd-headroom example

2015-02-17 Thread Pawel Wodkowski
This app demonstrate usage of new headroom library.
It is basicaly orginal l2fwd with following modificantions to met
headroom library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in orginal application.
- stats is moved to rte_alarm callbac to not introduce overhead of
printing.
- stats are expanded to show headroom statistics.
- added new parameter '-l' to automatic thousands separator.

Comparing orginal l2fwd and l2fwd-headroom apps will show approach what
is needed to properly write own application with headroom measurements.

New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of headroom library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.


Signed-off-by: Pawel Wodkowski 
---
 examples/Makefile|1 +
 examples/l2fwd-headroom/Makefile |   51 ++
 examples/l2fwd-headroom/main.c   | 1039 ++
 mk/rte.app.mk|4 +
 4 files changed, 1095 insertions(+)
 create mode 100644 examples/l2fwd-headroom/Makefile
 create mode 100644 examples/l2fwd-headroom/main.c

diff --git a/examples/Makefile b/examples/Makefile
index 81f1d2f..8a459b7 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -50,6 +50,7 @@ DIRS-$(CONFIG_RTE_MBUF_REFCNT) += ip_fragmentation
 DIRS-$(CONFIG_RTE_MBUF_REFCNT) += ipv4_multicast
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += kni
 DIRS-y += l2fwd
+DIRS-y += l2fwd-headroom
 DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem
 DIRS-y += l3fwd
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += l3fwd-acl
diff --git a/examples/l2fwd-headroom/Makefile b/examples/l2fwd-headroom/Makefile
new file mode 100644
index 000..07da286
--- /dev/null
+++ b/examples/l2fwd-headroom/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = l2fwd-headroom
+
+# all source are stored in SRCS-y
+SRCS-y := main.c
+
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/l2fwd-headroom/main.c b/examples/l2fwd-headroom/main.c
new file mode 100644
index 000..7ba1743
--- /dev/null
+++ b/examples/l2fwd-headroom/main.c
@@ -0,0 +1,1039 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote 

[dpdk-dev] [PATCH v2 3/4] examples: example showing use of callbacks.

2015-02-17 Thread Neil Horman
On Tue, Feb 17, 2015 at 04:15:09PM +, Bruce Richardson wrote:
> On Tue, Feb 17, 2015 at 11:08:10AM -0500, Neil Horman wrote:
> > On Tue, Feb 17, 2015 at 04:00:56PM +, Bruce Richardson wrote:
> > > On Tue, Feb 17, 2015 at 10:49:24AM -0500, Neil Horman wrote:
> > > > On Tue, Feb 17, 2015 at 01:50:58PM +, Bruce Richardson wrote:
> > > > > On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote:
> > > > > > Hi Bruce,
> > > > > > 
> > > > > > On 02/17/2015 01:25 PM, Bruce Richardson wrote:
> > > > > > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > > > > > >>2015-02-16 15:16, Bruce Richardson:
> > > > > > >>>In this specific instance, given that the application does 
> > > > > > >>>little else, there
> > > > > > >>>is no real advantage to using the callbacks - it's just to have 
> > > > > > >>>a simple example
> > > > > > >>>of how they can be used.
> > > > > > >>>
> > > > > > >>>Where callbacks are really designed to be useful, is for 
> > > > > > >>>extending or augmenting
> > > > > > >>>hardware capabilities. Taking the example of sequence numbers - 
> > > > > > >>>to use the most
> > > > > > >>>trivial example - an application could be written to take 
> > > > > > >>>advantage of sequence
> > > > > > >>>numbers written to packets by the hardware which received them. 
> > > > > > >>>However, if such
> > > > > > >>>an application was to be used with a NIC which does not provide 
> > > > > > >>>sequence numbering
> > > > > > >>>capability, for example, anything using ixgbe driver, the 
> > > > > > >>>application writer has
> > > > > > >>>two choices - either modify his application code to check each 
> > > > > > >>>packet for
> > > > > > >>>a sequence number in the data path, and add it there post-rx, or 
> > > > > > >>>alternatively,
> > > > > > >>>to check the NIC capabilities at initialization time, and add a 
> > > > > > >>>callback there
> > > > > > >>>at initialization, if the hardware does not support it. In the 
> > > > > > >>>latter case,
> > > > > > >>>the main packet processing body of the application can be 
> > > > > > >>>written as though
> > > > > > >>>hardware always has sequence numbering capability, safe in the 
> > > > > > >>>knowledge that
> > > > > > >>>any hardware not supporting it will be back-filled by a software 
> > > > > > >>>fallback at
> > > > > > >>>initialization-time.
> > > > > > >>>
> > > > > > >>>By the same token, we could also look to extend hardware 
> > > > > > >>>capabilities. For
> > > > > > >>>different filtering or hashing capabilities, there can be limits 
> > > > > > >>>in hardware
> > > > > > >>>which are far less than what we need to use in software. Again, 
> > > > > > >>>callbacks will
> > > > > > >>>allow the data path to be written in a way that is oblivious to 
> > > > > > >>>the underlying
> > > > > > >>>hardware limits, because software will transparently fill in the 
> > > > > > >>>gaps.
> > > > > > >>>
> > > > > > >>>Hope this makes the use case clear.
> > > > > > >>
> > > > > > >>After thinking more about these callbacks, I realize these 
> > > > > > >>callbacks won't
> > > > > > >>help, as Olivier said.
> > > > > > >>
> > > > > > >>With callback,
> > > > > > >>1/ application checks device capability
> > > > > > >>2/ application provides hardware emulation as DPDK callback
> > > > > > >>3/ application forgets previous steps
> > > > > > >>4/ application calls DPDK Rx
> > > > > > >>5/ DPDK calls callback (without calling optimization)
> > > > > > >>
> > > > > > >>Without callback,
> > > > > > >>1/ application checks device capability
> > > > > > >>2/ application provides hardware emulation as internal function
> > > > > > >>3/ application set an internal device-flag to enable this function
> > > > > > >>4/ application calls DPDK Rx
> > > > > > >>5/ application calls the hardware emulation if flag is set
> > > > > > >>
> > > > > > >>So the only difference is to keep persistent the device 
> > > > > > >>information in
> > > > > > >>the application instead of storing it as a function pointer in the
> > > > > > >>DPDK struct.
> > > > > > >>You can also be faster with this approach: at initialization time,
> > > > > > >>you can check that your NIC supports the feature and use a 
> > > > > > >>specific
> > > > > > >>mainloop that adds or not the sequence number without any runtime
> > > > > > >>test.
> > > > > > >
> > > > > > >That is assuming that all NICs are equal on your system. It's also 
> > > > > > >assuming
> > > > > > >that you only have a single point in your application where you 
> > > > > > >call RX or
> > > > > > >TX burst. In the case where you have a couple of different NICs on 
> > > > > > >the system,
> > > > > > >or where you want to write an application to take advantage of 
> > > > > > >capabilities of
> > > > > > >different NICs, the ability to resolve all these difference at 
> > > > > > >initialization
> > > > > > >time is useful. The main packet handling code can be written with 
> > > > > > >just 

[dpdk-dev] Patches outstanding

2015-02-17 Thread Stephen Hemminger
There are currently 1039 patches outstanding on DPDK.
What is the schedule for getting these merged or resolved?
I don't think it would be reasonable to declare 2.0 as done
until the patch backlog is 0!


<    1   2