[dpdk-dev] [PATCH] ethdev: don't look for devices if none were found

2016-11-19 Thread Anatoly Burakov
Aside from avoiding doing useless work, this also fixes a segfault
when calling rte_eth_dev_get_port_by_name() whenever no devices
were found yet, and therefore rte_eth_dev_data wasn't yet allocated.

Fixes: 9c5b8d8b9feb ("ethdev: clean port id retrieval when attaching")

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ether/rte_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fde8112..76a6dbf 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -376,6 +376,9 @@ rte_eth_dev_get_port_by_name(const char *name, uint8_t 
*port_id)
return -EINVAL;
}

+   if (!nb_ports)
+   return -ENODEV;
+
*port_id = RTE_MAX_ETHPORTS;

for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
-- 
2.5.5



[dpdk-dev] [PATCH] vdev: fix missing alias check on uninit

2016-11-18 Thread Anatoly Burakov
Fixes: d63eed6b2dca ("eal: add driver name alias")

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_vdev.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_vdev.c 
b/lib/librte_eal/common/eal_common_vdev.c
index 0ff2377..7d6e54f 100644
--- a/lib/librte_eal/common/eal_common_vdev.c
+++ b/lib/librte_eal/common/eal_common_vdev.c
@@ -111,6 +111,14 @@ rte_eal_vdev_uninit(const char *name)
return driver->remove(name);
}

+   /* Give new names precedence over aliases. */
+   TAILQ_FOREACH(driver, _driver_list, next) {
+   if (driver->driver.alias &&
+   !strncmp(driver->driver.alias, name,
+   strlen(driver->driver.alias)))
+   return driver->remove(name);
+   }
+
RTE_LOG(ERR, EAL, "no driver found for %s\n", name);
return -EINVAL;
 }
-- 
2.5.5



[dpdk-dev] [PATCH] ivshmem: document a potential segmentation fault in rte_ring

2016-06-01 Thread Anatoly Burakov
Commit 4768c475 added a pointer to the memzone in rte_ring. However,
all memzones are residing in local mem_config, therefore accessing
the memzone pointer inside the guest in an IVSHMEM-shared rte_ring
will cause segmentation fault. This issue is unlikely to ever get
fixed, as this would require lots of changes for very little benefit,
therefore we're documenting this limitation instead.

Signed-off-by: Anatoly Burakov 
---
 doc/guides/prog_guide/ivshmem_lib.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/guides/prog_guide/ivshmem_lib.rst 
b/doc/guides/prog_guide/ivshmem_lib.rst
index 9401ccf..b8a32e4 100644
--- a/doc/guides/prog_guide/ivshmem_lib.rst
+++ b/doc/guides/prog_guide/ivshmem_lib.rst
@@ -79,6 +79,8 @@ The following is a simple guide to using the IVSHMEM Library 
API:
 Only data structures fully residing in DPDK hugepage memory work correctly.
 Supported data structures created by malloc(), mmap()
 or otherwise using non-DPDK memory cause undefined behavior and even a 
segmentation fault.
+Specifically, because the memzone field in an rte_ring refers to a memzone 
structure residing in local memory,
+accessing the memzone field in a shared rte_ring will cause an immediate 
segmentation fault.

 IVSHMEM Environment Configuration
 -
-- 
2.5.0



[dpdk-dev] [PATCH] ivshmem: fix overlap detection code

2016-05-24 Thread Anatoly Burakov
Partial revert of an earlier ill-conceived "fix".
Adjacent segments can never be considered overlapping because we
are not comparing ends to starts, but rather starts to starts.
Therefore the earlier fix was wrong (plus it also had a typo).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 07aec69..eea0314 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -184,21 +184,21 @@ overlap(const struct rte_memzone * mz1, const struct 
rte_memzone * mz2)
i_end2 = mz2->ioremap_addr + mz2->len;

/* check for overlap in virtual addresses */
-   if (start1 > start2 && start1 < end2)
+   if (start1 >= start2 && start1 < end2)
result |= VIRT;
if (start2 >= start1 && start2 < end1)
result |= VIRT;

/* check for overlap in physical addresses */
-   if (p_start1 > p_start2 && p_start1 < p_end2)
+   if (p_start1 >= p_start2 && p_start1 < p_end2)
result |= PHYS;
-   if (p_start2 > p_start1 && p_start2 < p_end1)
+   if (p_start2 >= p_start1 && p_start2 < p_end1)
result |= PHYS;

/* check for overlap in ioremap addresses */
-   if (i_start1 > i_start2 && i_start1 < i_end2)
+   if (i_start1 >= i_start2 && i_start1 < i_end2)
result |= IOREMAP;
-   if (i_start2 > i_start1 && i_start2 < i_end1)
+   if (i_start2 >= i_start1 && i_start2 < i_end1)
result |= IOREMAP;

return result;
-- 
2.5.5



[dpdk-dev] [PATCH v2] ivshmem: avoid infinite loop when concatenating adjacent segments

2016-04-07 Thread Anatoly Burakov
This patch aligns the logic used to check for the presence of
adjacent segments in has_adjacent_segments() with the logic used
in cleanup_segments() when actually deciding to concatenate or
not a pair of segments. Additionally, adjacent segments are
no longer considered overlapping to avoid generating errors for
segments that can happily coexist together.

This fixes an infinite loop that happened when segments where
adjacent in their physical or virtual addresses but not in their
ioremap addresses: has_adjacent_segments() reported the presence
of adjacent segments while cleanup_segments() was not considering
them for concatenation, resulting in an infinite loop since the
result of has_adjacent_segments() is used in the decision to
continue looping in cleanup_segments().

Signed-off-by: David Verbeiren 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 28ddf09..07aec69 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -184,21 +184,21 @@ overlap(const struct rte_memzone * mz1, const struct 
rte_memzone * mz2)
i_end2 = mz2->ioremap_addr + mz2->len;

/* check for overlap in virtual addresses */
-   if (start1 >= start2 && start1 < end2)
+   if (start1 > start2 && start1 < end2)
result |= VIRT;
if (start2 >= start1 && start2 < end1)
result |= VIRT;

/* check for overlap in physical addresses */
-   if (p_start1 >= p_start2 && p_start1 < p_end2)
+   if (p_start1 > p_start2 && p_start1 < p_end2)
result |= PHYS;
-   if (p_start2 >= p_start1 && p_start2 < p_end1)
+   if (p_start2 > p_start1 && p_start2 < p_end1)
result |= PHYS;

/* check for overlap in ioremap addresses */
-   if (i_start1 >= i_start2 && i_start1 < i_end2)
+   if (i_start1 > i_start2 && i_start1 < i_end2)
result |= IOREMAP;
-   if (i_start2 >= i_start1 && i_start2 < i_end1)
+   if (i_start2 > i_start1 && i_start2 < i_end1)
result |= IOREMAP;

return result;
@@ -254,17 +254,14 @@ adjacent(const struct rte_memzone * mz1, const struct 
rte_memzone * mz2)
 static int
 has_adjacent_segments(struct ivshmem_segment * ms, int len)
 {
-   int i, j, a;
+   int i, j;

for (i = 0; i < len; i++)
for (j = i + 1; j < len; j++) {
-   a = adjacent([i].entry.mz, [j].entry.mz);
-
-   /* check if segments are adjacent virtually and/or 
physically but
-* not ioremap (since that would indicate that they are 
from
-* different PCI devices and thus don't need to be 
concatenated.
+   /* we're only interested in fully adjacent segments; 
partially
+* adjacent segments can coexist.
 */
-   if ((a & (VIRT|PHYS)) > 0 && (a & IOREMAP) == 0)
+   if (adjacent([i].entry.mz, [j].entry.mz) == FULL)
return 1;
}
return 0;
-- 
2.5.0



[dpdk-dev] [PATCH v6] vfio: Support for no-IOMMU mode

2016-01-28 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions to eal_vfio.h, and DMA mapping
functions to eal_pci_vfio.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Santosh Shukla 
Tested-by: Santosh Shukla 
---
v6 changes:
  Fixed functions not declared as static
  Fixed definitions to be more consistent with others

v5 changes:
  Renamed functions

v4 changes:
  Fixed the commit message and added a missing sign-off

v3 changes:
  Merging DMA mapping functions back into eal_pci_vfio.c
  Fixing and adding comments

v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   8 ++
 2 files changed, 160 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..a6c7e16 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

+/* DMA mapping function prototype.
+ * Takes VFIO container fd as a parameter.
+ * Returns 0 on success, -1 on error.
+ * */
+typedef int (*vfio_dma_func_t)(int);
+
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+static int vfio_type1_dma_map(int);
+static int vfio_noiommu_dma_map(int);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { RTE_VFIO_TYPE1, "Type 1", _type1_dma_map},
+   /* IOMMU-less mode */
+   { RTE_VFIO_NOIOMMU, "No-IOMMU", _noiommu_dma_map},
+};
+
+int
+vfio_type1_dma_map(int vfio_container_fd)
+{
+   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+   int i, ret;
+
+   /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   struct vfio_iommu_type1_dma_map dma_map;
+
+   if (ms[i].addr == NULL)
+   break;
+
+   memset(_map, 0, sizeof(dma_map));
+   dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+   dma_map.vaddr = ms[i].addr_64;
+   dma_map.size = ms[i].len;
+   dma_map.iova = ms[i].phys_addr;
+   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
VFIO_DMA_MAP_FLAG_WRITE;
+
+   ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, _map);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+int
+vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
+{
+   /* No-IOMMU mode does not need DMA mapping */
+   return 0;
+}
+
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
@@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = _types[idx];
+
+   int ret 

[dpdk-dev] [PATCH v4] vfio: Support for no-IOMMU mode

2016-01-27 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Santosh Shukla 
Tested-by: Santosh Shukla 
---
v4 changes:
  Fixed the commit message and added a missing sign-off

v3 changes:
  Merging DMA mapping functions back into eal_pci_vfio.c
  Fixing and adding comments

v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 2 files changed, 157 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..fdf334b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

+/* DMA mapping function prototype.
+ * Takes VFIO container fd as a parameter.
+ * Returns 0 on success, -1 on error.
+ * */
+typedef  int (*vfio_dma_func_t)(int);
+
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+int vfio_iommu_type1_dma_map(int);
+int vfio_iommu_noiommu_dma_map(int);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", _iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", _iommu_noiommu_dma_map},
+};
+
+int
+vfio_iommu_type1_dma_map(int vfio_container_fd)
+{
+   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+   int i, ret;
+
+   /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   struct vfio_iommu_type1_dma_map dma_map;
+
+   if (ms[i].addr == NULL)
+   break;
+
+   memset(_map, 0, sizeof(dma_map));
+   dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+   dma_map.vaddr = ms[i].addr_64;
+   dma_map.size = ms[i].len;
+   dma_map.iova = ms[i].phys_addr;
+   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
VFIO_DMA_MAP_FLAG_WRITE;
+
+   ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, _map);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+int
+vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd)
+{
+   /* No-IOMMU mode does not need DMA mapping */
+   return 0;
+}
+
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
@@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = _types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t-&

[dpdk-dev] [PATCH v3] vfio: Support for no-IOMMU mode

2016-01-27 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio_dma.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Tested-by: Santosh Shukla 
---
v3 changes:
  Merging DMA mapping functions back into eal_pci_vfio.c
  Fixing and adding comments

v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 205 +
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 2 files changed, 157 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..fdf334b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,11 +72,74 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

+/* DMA mapping function prototype.
+ * Takes VFIO container fd as a parameter.
+ * Returns 0 on success, -1 on error.
+ * */
+typedef  int (*vfio_dma_func_t)(int);
+
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+int vfio_iommu_type1_dma_map(int);
+int vfio_iommu_noiommu_dma_map(int);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", _iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", _iommu_noiommu_dma_map},
+};
+
+int
+vfio_iommu_type1_dma_map(int vfio_container_fd)
+{
+   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+   int i, ret;
+
+   /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   struct vfio_iommu_type1_dma_map dma_map;
+
+   if (ms[i].addr == NULL)
+   break;
+
+   memset(_map, 0, sizeof(dma_map));
+   dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+   dma_map.vaddr = ms[i].addr_64;
+   dma_map.size = ms[i].len;
+   dma_map.iova = ms[i].phys_addr;
+   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
VFIO_DMA_MAP_FLAG_WRITE;
+
+   ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, _map);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+int
+vfio_iommu_noiommu_dma_map(int __rte_unused vfio_container_fd)
+{
+   /* No-IOMMU mode does not need DMA mapping */
+   return 0;
+}
+
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
@@ -208,42 +271,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = _types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  usi

[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-13 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio_dma.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Santosh Shukla 
Tested-by: Santosh Shukla 
---
v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  22 
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 143 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c |  84 +++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 5 files changed, 202 insertions(+), 53 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 26eced5..5c9e9d9 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_dma.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index a17c708..da1c431 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -106,6 +106,28 @@ struct vfio_config {
struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
 };

+/* function pointer typedef for DMA mapping functions */
+typedef  int (*vfio_dma_func_t)(int);
+
+/* Structure to hold supported IOMMU types */
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+/* function prototypes for different IOMMU types */
+int vfio_iommu_type1_dma_map(int container_fd);
+int vfio_iommu_noiommu_dma_map(int container_fd);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", _iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", _iommu_noiommu_dma_map},
+};
+
 #endif

 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..5eb6cd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,6 +72,7 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
@@ -208,42 +209,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = _types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
+   t->type_id, t->name);
+   

[dpdk-dev] [PATCH] vfio: Support for no-IOMMU mode

2015-12-21 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio_dma.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  22 
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 142 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c |  84 +++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 5 files changed, 201 insertions(+), 53 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 26eced5..5c9e9d9 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_dma.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index a17c708..da1c431 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -106,6 +106,28 @@ struct vfio_config {
struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
 };

+/* function pointer typedef for DMA mapping functions */
+typedef  int (*vfio_dma_func_t)(int);
+
+/* Structure to hold supported IOMMU types */
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+/* function prototypes for different IOMMU types */
+int vfio_iommu_type1_dma_map(int container_fd);
+int vfio_iommu_noiommu_dma_map(int container_fd);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", _iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", _iommu_noiommu_dma_map},
+};
+
 #endif

 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..71eeea8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,6 +72,7 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
@@ -208,42 +209,57 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   for (unsigned idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = _types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
+   t->type_id, t->name);
+   return t;
+   }
+   /* not an error, there may be more supported IOMMU types */
+   RTE_LOG(DEBUG, EAL, "  set IOMMU type %d (%s) failed, "
+  

[dpdk-dev] [PATCH] eal: correct licenses for PCI feature headers

2015-06-10 Thread Anatoly Burakov
---
 .../common/include/rte_pci_dev_feature_defs.h  | 26 ++
 .../common/include/rte_pci_dev_features.h  | 26 ++
 2 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h 
b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
index 6316b6d..c200951 100644
--- a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -1,4 +1,29 @@
 /*-
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ *
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
@@ -29,6 +54,7 @@
  *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
  */

 #ifndef _RTE_PCI_DEV_DEFS_H_
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h 
b/lib/librte_eal/common/include/rte_pci_dev_features.h
index 01200de..9528bb3 100644
--- a/lib/librte_eal/common/include/rte_pci_dev_features.h
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -1,4 +1,29 @@
 /*-
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ *
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
@@ -29,6 +54,7 @@
  *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
  */

 #ifndef _RTE_PCI_DEV_FEATURES_H
-- 
1.8.1.4



[dpdk-dev] [PATCH] maintainers: claim VFIO and IVSHMEM

2015-02-24 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7750881..2eb7761 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -117,6 +117,7 @@ F: lib/librte_eal/linuxapp/igb_uio/
 F: lib/librte_eal/linuxapp/eal/*uio*

 Linux VFIO
+M: Anatoly Burakov 
 F: lib/librte_eal/linuxapp/eal/*vfio*

 Linux Xen
@@ -348,6 +349,7 @@ F: examples/cmdline/
 F: doc/guides/sample_app_ug/cmd_line.rst

 Qemu IVSHMEM
+M: Anatoly Burakov 
 F: lib/librte_ivshmem/
 F: lib/librte_eal/linuxapp/eal/eal_ivshmem.c
 F: doc/guides/prog_guide/ivshmem_lib.rst
-- 
1.8.1.4



[dpdk-dev] [PATCH v8] eal: map PCI memory resources after hugepages

2014-11-11 Thread Anatoly Burakov
Multi-process DPDK application must mmap hugepages and PCI resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL try and map PCI BARs right after the hugepages
(instead of location chosen by mmap) in virtual memory, so that PCI BARs
have less chance of ending up in random places in virtual memory.

Signed-off-by: Liang Xu 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 30 --
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 13 --
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 19 +++---
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  6 +
 4 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 5fe3961..79fbbb8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,25 @@ error:
return -1;
 }

+void *
+pci_find_max_end_va(void)
+{
+   const struct rte_memseg *seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg *last = seg;
+   unsigned i = 0;
+
+   for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) {
+   if (seg->addr == NULL)
+   break;
+
+   if (seg->addr > last->addr)
+   last = seg;
+
+   }
+   return RTE_PTR_ADD(last->addr, last->len);
+}
+
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
@@ -106,21 +125,16 @@ pci_map_resource(void *requested_addr, int fd, off_t 
offset, size_t size)
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
-   if (mapaddr == MAP_FAILED ||
-   (requested_addr != NULL && mapaddr != requested_addr)) {
+   if (mapaddr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
__func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
-   goto fail;
+   } else {
+   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
}

-   RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
-
return mapaddr;
-
-fail:
-   return NULL;
 }

 /* parse the "resource" sysfs file */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..e53f06b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -48,6 +49,8 @@

 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

+void *pci_map_addr = NULL;
+

 #define OFF_MAX  ((uint64_t)(off_t)-1)
 static int
@@ -371,10 +374,16 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   /* try mapping somewhere close to the end of 
hugepages */
+   if (pci_map_addr == NULL)
+   pci_map_addr = pci_find_max_end_va();
+
+   mapaddr = pci_map_resource(pci_map_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
-   if (mapaddr == NULL)
+   if (mapaddr == MAP_FAILED)
fail = 1;
+
+   pci_map_addr = RTE_PTR_ADD(mapaddr, (size_t) 
maps[j].size);
}

if (fail) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c776ddc..c1246e8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -720,10 +721,22 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
if (i == msix_bar)
continue;

-   bar_addr = pci_map_resource(maps[i].add

[dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages

2014-11-10 Thread Anatoly Burakov
Multi-process DPDK application must mmap hugepages and pci resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL map PCI BARs right after the hugepages (instead of
location chosen by mmap) in virtual memory.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Liang Xu 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 19 +++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  |  9 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 13 +++--
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  6 ++
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 5fe3961..dae8739 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,25 @@ error:
return -1;
 }

+void *
+pci_find_max_end_va(void)
+{
+   const struct rte_memseg *seg = rte_eal_get_physmem_layout();
+   const struct rte_memseg *last = seg;
+   unsigned i = 0;
+
+   for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) {
+   if (seg->addr == NULL)
+   break;
+
+   if (seg->addr > last->addr)
+   last = seg;
+
+   }
+   return RTE_PTR_ADD(last->addr, last->len);
+}
+
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..5090bf1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -48,6 +48,8 @@

 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

+void *pci_map_addr = NULL;
+

 #define OFF_MAX  ((uint64_t)(off_t)-1)
 static int
@@ -371,10 +373,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   if (pci_map_addr == NULL)
+   pci_map_addr = pci_find_max_end_va();
+
+   mapaddr = pci_map_resource(pci_map_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+
+   pci_map_addr = RTE_PTR_ADD(pci_map_addr, 
maps[j].size);
}

if (fail) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c776ddc..fb6ee7a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -720,8 +720,17 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
if (i == msix_bar)
continue;

-   bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, 
reg.offset,
-   reg.size);
+   if (internal_config.process_type == RTE_PROC_PRIMARY) {
+   if (pci_map_addr == NULL)
+   pci_map_addr = pci_find_max_end_va();
+
+   bar_addr = pci_map_resource(pci_map_addr, vfio_dev_fd, 
reg.offset,
+   reg.size);
+   pci_map_addr = RTE_PTR_ADD(pci_map_addr, reg.size);
+   } else {
+   bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, 
reg.offset,
+   reg.size);
+   }

if (bar_addr == NULL) {
RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", 
pci_addr, i,
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index d758bee..1070eb8 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -59,6 +59,12 @@ struct mapped_pci_resource {
 TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
 extern struct mapped_pci_res_list *pci_res_list;

+/*
+ * Helper function to map PCI resources right after hugepages in virtual memory
+ */
+extern void *pci_map_addr;
+void *pci_find_max_end_va(void);
+
 void *pci_map_resource(void *requested_addr, int fd, off_t offset,
size_t size);

-- 
1.8.1.4



[dpdk-dev] [PATCH] Fix regression for eal_flags_autotest introduced by tailq rework

2014-11-05 Thread Anatoly Burakov
As a result of moving tailq's into local memory, some tailq data
is now reserved in rte_malloc heaps (because it needs to be
shared across DPDK processes). The first thing DPDK initializes
is a log mempool, and since it creates a tailq, it reserves
space in rte_malloc heap before allocating the mempool itself.
By default, rte_malloc allocates way more space than is necessary,
so under some conditions (namely, overall memory available is low)
this results in malloc heap eating up so much memory that log
mempool is not able to allocate its memzone.

This patch fixes the unit tests to account for that change.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_eal_flags.c | 43 +--
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 21e6cca..9541619 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -52,6 +52,11 @@

 #include "process.h"

+#ifdef RTE_LIBRTE_XEN_DOM0
+#define DEFAULT_MEM_SIZE "30"
+#else
+#define DEFAULT_MEM_SIZE "8"
+#endif
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
 #define no_huge "--no-huge"
@@ -616,14 +621,15 @@ test_no_huge_flag(void)
/* With --no-huge */
const char *argv1[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2"};
/* With --no-huge and -m */
-   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2", 
"-m", "2"};
+   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
+   "-m", DEFAULT_MEM_SIZE};

/* With --no-huge and --socket-mem */
const char *argv3[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "--socket-mem=2"};
+   "--socket-mem=" DEFAULT_MEM_SIZE};
/* With --no-huge, -m and --socket-mem */
const char *argv4[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "-m", "2", "--socket-mem=2"};
+   "-m", DEFAULT_MEM_SIZE, "--socket-mem=" 
DEFAULT_MEM_SIZE};
if (launch_proc(argv1) != 0) {
printf("Error - process did not run ok with --no-huge flag\n");
return -1;
@@ -789,20 +795,20 @@ test_misc_flags(void)
/* With invalid --syslog */
const char *argv5[] = {prgname, prefix, mp_flag, "-c", "1", "--syslog", 
"error"};
/* With no-sh-conf */
-   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
no_shconf, nosh_prefix };

 #ifdef RTE_EXEC_ENV_BSDAPP
return 0;
 #endif
/* With --huge-dir */
-   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", hugepath};
/* With empty --huge-dir (should fail) */
-   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir"};
/* With invalid --huge-dir */
-   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", "invalid"};
/* Secondary process with invalid --huge-dir (should run as flag has no
 * effect on secondary processes) */
@@ -923,15 +929,15 @@ test_file_prefix(void)
 #endif

/* this should fail unless the test itself is run with "memtest" prefix 
*/
-   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
"2",
+   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
 

[dpdk-dev] [PATCH] Fix regression for eal_flags_autotest introduced by tailq rework

2014-11-05 Thread Anatoly Burakov
As a result of moving tailq's into local memory, some tailq data
is now reserved in rte_malloc heaps (because it needs to be
shared across DPDK processes). The first thing DPDK initializes
is a log mempool, and since it creates a tailq, it reserves
space in rte_malloc heap before allocating the mempool itself.
By default, rte_malloc allocates way more space than is necessary,
so under some conditions (namely, overall memory available is low)
this results in malloc heap eating up so much memory that log
mempool is not able to allocate its memzone.

This patch fixes the unit tests to account for that change.
---
 app/test/test_eal_flags.c | 43 +--
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 21e6cca..9541619 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -52,6 +52,11 @@

 #include "process.h"

+#ifdef RTE_LIBRTE_XEN_DOM0
+#define DEFAULT_MEM_SIZE "30"
+#else
+#define DEFAULT_MEM_SIZE "8"
+#endif
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
 #define no_huge "--no-huge"
@@ -616,14 +621,15 @@ test_no_huge_flag(void)
/* With --no-huge */
const char *argv1[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2"};
/* With --no-huge and -m */
-   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2", 
"-m", "2"};
+   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
+   "-m", DEFAULT_MEM_SIZE};

/* With --no-huge and --socket-mem */
const char *argv3[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "--socket-mem=2"};
+   "--socket-mem=" DEFAULT_MEM_SIZE};
/* With --no-huge, -m and --socket-mem */
const char *argv4[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "-m", "2", "--socket-mem=2"};
+   "-m", DEFAULT_MEM_SIZE, "--socket-mem=" 
DEFAULT_MEM_SIZE};
if (launch_proc(argv1) != 0) {
printf("Error - process did not run ok with --no-huge flag\n");
return -1;
@@ -789,20 +795,20 @@ test_misc_flags(void)
/* With invalid --syslog */
const char *argv5[] = {prgname, prefix, mp_flag, "-c", "1", "--syslog", 
"error"};
/* With no-sh-conf */
-   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
no_shconf, nosh_prefix };

 #ifdef RTE_EXEC_ENV_BSDAPP
return 0;
 #endif
/* With --huge-dir */
-   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", hugepath};
/* With empty --huge-dir (should fail) */
-   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir"};
/* With invalid --huge-dir */
-   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", "invalid"};
/* Secondary process with invalid --huge-dir (should run as flag has no
 * effect on secondary processes) */
@@ -923,15 +929,15 @@ test_file_prefix(void)
 #endif

/* this should fail unless the test itself is run with "memtest" prefix 
*/
-   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
"2",
+   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest };

/* primary process with memtest1 */
-   const char *argv1[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv1[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest1 };

/* primary process with memtest2 */
-   const char *argv2[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv2[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest2 };

char prefix[32];
@@ -1025,7 +1031,6 @@ test_file_prefix(void)
 static int
 test_memory_flags(void)
 {
-   const char* mem_size = NULL;
 #ifdef RTE_EXEC_ENV_BSDAPP
/* BSD target doesn't support prefixes at this point */
const char * prefix = "";
@@ -1037,20 +1042,14 @@ test_memory_flags(void)
}
snprintf(prefix, sizeof(prefix), "--file-prefix=%s", tmp);
 #endif
-#ifdef RTE_LIBRTE_XEN_DOM0
-   mem_size = "30";
-#else
-   mem_size = "2";

[dpdk-dev] [PATCH 10/10] rte_acl: make acl tailq fully local

2014-06-20 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_acl/acl.h |  1 -
 lib/librte_acl/rte_acl.c | 74 +++-
 2 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
index e6d7985..b9d63fd 100644
--- a/lib/librte_acl/acl.h
+++ b/lib/librte_acl/acl.h
@@ -149,7 +149,6 @@ struct rte_acl_bld_trie {
 };

 struct rte_acl_ctx {
-   TAILQ_ENTRY(rte_acl_ctx) next;/**< Next in list. */
charname[RTE_ACL_NAMESIZE];
/** Name of the ACL context. */
int32_t socket_id;
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 129a41f..3b47ab6 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -36,13 +36,14 @@

 #defineBIT_SIZEOF(x)   (sizeof(x) * CHAR_BIT)

-TAILQ_HEAD(rte_acl_list, rte_acl_ctx);
+TAILQ_HEAD(rte_acl_list, rte_tailq_entry);

 struct rte_acl_ctx *
 rte_acl_find_existing(const char *name)
 {
-   struct rte_acl_ctx *ctx;
+   struct rte_acl_ctx *ctx = NULL;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;

/* check that we have an initialised tail queue */
acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
@@ -52,27 +53,55 @@ rte_acl_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(name, ctx->name, sizeof(ctx->name)) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (ctx == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return ctx;
 }

 void
 rte_acl_free(struct rte_acl_ctx *ctx)
 {
+   struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
+
if (ctx == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_ACL, rte_acl_list, ctx);
+   /* check that we have an initialised tail queue */
+   acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
+   if (acl_list == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, acl_list, next) {
+   if (te->data == (void *) ctx)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(acl_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

rte_free(ctx->mem);
rte_free(ctx);
+   rte_free(te);
 }

 struct rte_acl_ctx *
@@ -81,6 +110,7 @@ rte_acl_create(const struct rte_acl_param *param)
size_t sz;
struct rte_acl_ctx *ctx;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
char name[sizeof(ctx->name)];

/* check that we have an initialised tail queue */
@@ -105,15 +135,31 @@ rte_acl_create(const struct rte_acl_param *param)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* if we already have one with that name */
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(param->name, ctx->name, sizeof(ctx->name)) == 0)
break;
}

/* if ACL with such name doesn't exist, then create a new one. */
-   if (ctx == NULL && (ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE,
-   param->socket_id)) != NULL) {
+   if (te == NULL) {
+   ctx = NULL;
+   te = rte_zmalloc("ACL_TAILQ_ENTRY", sizeof(*te), 0);
+
+   if (te == NULL) {
+   RTE_LOG(ERR, ACL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
+   ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE, 
param->socket_id);

+   if (ctx == NULL) {
+   RTE_LOG(ERR, ACL,
+   "allocation of %zu bytes on socket %d for %s 
failed\n",
+   sz, param->socket_id, name);
+   rte_free(te);
+   goto exit;
+   }
/* init new allocated context. */
ctx->rules = ctx + 1;
ctx->max_rules = param->max_rule_num;
@@ -121,14 +167,12 @@ rte_acl_create(const struct rte_acl_param *param)
ctx->socket_id = param->socket_id;
rte_snprintf(ctx->name, sizeof(ctx->name), "%s", param->name);

- 

[dpdk-dev] [PATCH 09/10] rte_lpm6: make lpm6 tailq fully local

2014-06-20 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm6.c | 62 ++-
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32690cb..8072534 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -77,7 +77,7 @@ enum valid_flag {
VALID
 };

-TAILQ_HEAD(rte_lpm6_list, rte_lpm6);
+TAILQ_HEAD(rte_lpm6_list, rte_tailq_entry);

 /** Tbl entry structure. It is the same for both tbl24 and tbl8 */
 struct rte_lpm6_tbl_entry {
@@ -99,8 +99,6 @@ struct rte_lpm6_rule {

 /** LPM6 structure. */
 struct rte_lpm6 {
-   TAILQ_ENTRY(rte_lpm6) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM6_NAMESIZE];/**< Name of the lpm. */
uint32_t max_rules;  /**< Max number of rules. */
@@ -149,6 +147,7 @@ rte_lpm6_create(const char *name, int socket_id,
 {
char mem_name[RTE_LPM6_NAMESIZE];
struct rte_lpm6 *lpm = NULL;
+   struct rte_tailq_entry *te;
uint64_t mem_size, rules_size;
struct rte_lpm6_list *lpm_list;

@@ -179,12 +178,20 @@ rte_lpm6_create(const char *name, int socket_id,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* Guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm6 *) te->data;
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
+   goto exit;
+
+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry!\n");
goto exit;
+   }

/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm6 *)rte_zmalloc_socket(mem_name, (size_t)mem_size,
@@ -192,6 +199,7 @@ rte_lpm6_create(const char *name, int socket_id,

if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -201,6 +209,7 @@ rte_lpm6_create(const char *name, int socket_id,
if (lpm->rules_tbl == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
rte_free(lpm);
+   rte_free(te);
goto exit;
}

@@ -209,7 +218,9 @@ rte_lpm6_create(const char *name, int socket_id,
lpm->number_tbl8s = config->number_tbl8s;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -223,7 +234,8 @@ exit:
 struct rte_lpm6 *
 rte_lpm6_find_existing(const char *name)
 {
-   struct rte_lpm6 *l;
+   struct rte_lpm6 *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm6_list *lpm_list;

/* Check that we have an initialised tail queue */
@@ -234,14 +246,17 @@ rte_lpm6_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm6 *) te->data;
if (strncmp(name, l->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -252,13 +267,38 @@ rte_lpm6_find_existing(const char *name)
 void
 rte_lpm6_free(struct rte_lpm6 *lpm)
 {
+   struct rte_lpm6_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, lpm);
-   rte_free(lpm->rules_tbl);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm6_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
-- 
1.8.1.4



[dpdk-dev] [PATCH 08/10] rte_lpm: make lpm tailq fully local

2014-06-20 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm.c | 65 
 lib/librte_lpm/rte_lpm.h |  2 --
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 35209c3..1ee4e96 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -56,7 +56,7 @@

 #include "rte_lpm.h"

-TAILQ_HEAD(rte_lpm_list, rte_lpm);
+TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);

 #define MAX_DEPTH_TBL24 24

@@ -118,24 +118,29 @@ depth_to_range(uint8_t depth)
 struct rte_lpm *
 rte_lpm_find_existing(const char *name)
 {
-   struct rte_lpm *l;
+   struct rte_lpm *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) 
== NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm *) te->data;
if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -149,12 +154,13 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
 {
char mem_name[RTE_LPM_NAMESIZE];
struct rte_lpm *lpm = NULL;
+   struct rte_tailq_entry *te;
uint32_t mem_size;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -176,18 +182,27 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm *) te->data;
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm *)rte_zmalloc_socket(mem_name, mem_size,
CACHE_LINE_SIZE, socket_id);
if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -195,7 +210,9 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
lpm->max_rules = max_rules;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -209,12 +226,38 @@ exit:
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+   struct rte_lpm_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, lpm);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 840d871..62d7736 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -132,8 +

[dpdk-dev] [PATCH 06/10] rte_fbk_hash: make rte_fbk_hash tailq fully local

2014-06-20 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_fbk_hash.c | 73 ++
 lib/librte_hash/rte_fbk_hash.h |  3 --
 2 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/lib/librte_hash/rte_fbk_hash.c b/lib/librte_hash/rte_fbk_hash.c
index 4d67554..1356cf4 100644
--- a/lib/librte_hash/rte_fbk_hash.c
+++ b/lib/librte_hash/rte_fbk_hash.c
@@ -54,7 +54,7 @@

 #include "rte_fbk_hash.h"

-TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
+TAILQ_HEAD(rte_fbk_hash_list, rte_tailq_entry);

 /**
  * Performs a lookup for an existing hash table, and returns a pointer to
@@ -69,24 +69,29 @@ TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
 struct rte_fbk_hash_table *
 rte_fbk_hash_find_existing(const char *name)
 {
-   struct rte_fbk_hash_table *h;
+   struct rte_fbk_hash_table *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_fbk_hash_list *fbk_hash_list;

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   h = (struct rte_fbk_hash_table *) te->data;
if (strncmp(name, h->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -104,6 +109,7 @@ struct rte_fbk_hash_table *
 rte_fbk_hash_create(const struct rte_fbk_hash_params *params)
 {
struct rte_fbk_hash_table *ht = NULL;
+   struct rte_tailq_entry *te;
char hash_name[RTE_FBK_HASH_NAMESIZE];
const uint32_t mem_size =
sizeof(*ht) + (sizeof(ht->t[0]) * params->entries);
@@ -112,7 +118,8 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -134,20 +141,28 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(ht, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   ht = (struct rte_fbk_hash_table *) te->data;
if (strncmp(params->name, ht->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
-   if (ht != NULL)
+   if (te != NULL)
goto exit;

+   te = rte_zmalloc("FBK_HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory for table. */
-   ht = (struct rte_fbk_hash_table *)rte_malloc_socket(hash_name, mem_size,
+   ht = (struct rte_fbk_hash_table *)rte_zmalloc_socket(hash_name, 
mem_size,
0, params->socket_id);
-   if (ht == NULL)
+   if (ht == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate fbk hash table\n");
+   rte_free(te);
goto exit;
-
-   memset(ht, 0, mem_size);
+   }

/* Set up hash table context. */
rte_snprintf(ht->name, sizeof(ht->name), "%s", params->name);
@@ -169,7 +184,9 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
ht->init_val = RTE_FBK_HASH_INIT_VAL_DEFAULT;
}

-   TAILQ_INSERT_TAIL(fbk_hash_list, ht, next);
+   te->data = (void *) ht;
+
+   TAILQ_INSERT_TAIL(fbk_hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +203,38 @@ exit:
 void
 rte_fbk_hash_free(struct rte_fbk_hash_table *ht)
 {
+   struct rte_tailq_entry *te;
+   struct rte_fbk_hash_list *fbk_hash_list;
+
if (ht == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list, ht);
+   /* check that we have an initialised tail queue */
+   if ((fbk_hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_

[dpdk-dev] [PATCH 03/10] rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer

2014-06-20 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 app/test/test_tailq.c | 33 ---
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test/test_tailq.c b/app/test/test_tailq.c
index 67da009..c9b53ee 100644
--- a/app/test/test_tailq.c
+++ b/app/test/test_tailq.c
@@ -52,16 +52,16 @@

 #define DEFAULT_TAILQ (RTE_TAILQ_NUM)

-static struct rte_dummy d_elem;
+static struct rte_tailq_entry d_elem;

 static int
 test_tailq_create(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;
unsigned i;

/* create a first tailq and check its non-null */
-   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error allocating dummy_q0\n");

@@ -70,13 +70,14 @@ test_tailq_create(void)
TAILQ_INSERT_TAIL(d_head, _elem, next);

/* try allocating dummy_q0 again, and check for failure */
-   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head) == NULL)
+   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head) == 
NULL)
do_return("Error, non-null result returned when attemption to "
"re-allocate a tailq\n");

/* now fill up the tailq slots available and check we get an error */
for (i = RTE_TAILQ_NUM; i < RTE_MAX_TAILQ; i++){
-   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i, rte_dummy_head)) == 
NULL)
+   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i,
+   rte_tailq_entry_head)) == NULL)
break;
}

@@ -91,10 +92,10 @@ static int
 test_tailq_lookup(void)
 {
/* run successful  test - check result is found */
-   struct rte_dummy_head *d_head;
-   struct rte_dummy *d_ptr;
+   struct rte_tailq_entry_head *d_head;
+   struct rte_tailq_entry *d_ptr;

-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error with tailq lookup\n");

@@ -104,7 +105,7 @@ test_tailq_lookup(void)
"expected element not found\n");

/* now try a bad/error lookup */
-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_tailq_entry_head);
if (d_head != NULL)
do_return("Error, lookup does not return NULL for bad tailq 
name\n");

@@ -115,7 +116,7 @@ test_tailq_lookup(void)
 static int
 test_tailq_deprecated(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;

/* since TAILQ_RESERVE is not able to create new tailqs,
 * we should find an existing one (IOW, RTE_TAILQ_RESERVE behaves 
identical
@@ -123,29 +124,29 @@ test_tailq_deprecated(void)
 *
 * PCI_RESOURCE_LIST tailq is guaranteed to
 * be present in any DPDK app. */
-   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

-   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

/* try doing that with non-existent names */
-   d_head = RTE_TAILQ_RESERVE("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

/* try doing the same with NULL names */
-   d_head = RTE_TAILQ_RESERVE(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/lib

[dpdk-dev] [PATCH 02/10] eal: use --base-virtaddr for mapping rte_config as well

2014-06-20 Thread Anatoly Burakov
Use --base-virtaddr to set the address of rte_config file along with
start address of the hugepages. Since the user would likely expect
the hugepages to be starting at the specified address, the specified
address will likely be rounded to either 2M or 1G. So, in order to
not waste space, we subtract the length of the config (and align it
on page boundary) from the base virtual address and map the config
just before the hugepages.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index ecb7664..32cec25 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -212,6 +212,14 @@ rte_eal_config_create(void)
if (internal_config.no_shconf)
return;

+   /* map the config before hugepage address so that we don't waste a page 
*/
+   if (internal_config.base_virtaddr != 0)
+   rte_mem_cfg_addr = (void *) 
+   RTE_ALIGN_FLOOR(internal_config.base_virtaddr -
+   sizeof(struct rte_mem_config), sysconf (_SC_PAGE_SIZE));
+   else
+   rte_mem_cfg_addr = NULL;
+
if (mem_cfg_fd < 0){
mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
if (mem_cfg_fd < 0)
@@ -231,7 +239,7 @@ rte_eal_config_create(void)
"process running?\n", pathname);
}

-   rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
+   rte_mem_cfg_addr = mmap(rte_mem_cfg_addr, 
sizeof(*rte_config.mem_config),
PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 
0);

if (rte_mem_cfg_addr == MAP_FAILED){
-- 
1.8.1.4



[dpdk-dev] [PATCH 00/10] Make DPDK tailqs fully local

2014-06-20 Thread Anatoly Burakov
This issue was reported by OVS-DPDK project, and the fix should go to
upstream DPDK. This is not memnic-related - this is to do with
DPDK's rte_ivshmem library.

Every DPDK data structure has a corresponding TAILQ reserved for it in
the runtime config file. Those TAILQs are fully local to the process,
however most data structures contain pointers to next entry in the
TAILQ.

Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

As part of this patchset, rte_malloc is also fixed to properly support
multiprocess malloc and free. Previously, if the memory was malloc'd
and freed in different processes, this could lead to segmentation
faults due to different heap pointers in malloc elements themselves.
This is fixed by making shared config to be mapped at the same
addresses in both primary and secondary processes, so that the heap
pointers in malloc elements are always valid, whatever process is
doing malloc or free.

The mapping address for the shared config is also now set with the
base-virtaddr flag, mapping the config file just before the start
address for the hugepages.

v2 changes:
* fixed race conditions in *_free operations
* fixed multiprocess support for malloc heaps
* added similar changes for acl
* rebased on top of e88b42f818bc1a6d4ce6cb70371b66e37fa34f7d

v3 changes:
* fixed race reported by Konstantin Ananyev (introduced in v2)

v4 changes:
* rte_mem_config mapping address is now also set by --base-virtaddr

Anatoly Burakov (10):
  eal: map shared config into exact same address as primary process
  eal: use --base-virtaddr for mapping rte_config as well
  rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
  rte_ring: make ring tailq fully local
  rte_hash: make rte_hash tailq fully local
  rte_fbk_hash: make rte_fbk_hash tailq fully local
  rte_mempool: make mempool tailq fully local
  rte_lpm: make lpm tailq fully local
  rte_lpm6: make lpm6 tailq fully local
  rte_acl: make acl tailq fully local

 app/test/test_tailq.c | 33 +-
 lib/librte_acl/acl.h  |  1 -
 lib/librte_acl/rte_acl.c  | 74 ++-
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 ++
 lib/librte_eal/common/include/rte_tailq.h |  9 +--
 lib/librte_eal/linuxapp/eal/eal.c | 54 +++--
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 +-
 lib/librte_hash/rte_fbk_hash.c| 73 +-
 lib/librte_hash/rte_fbk_hash.h|  3 -
 lib/librte_hash/rte_hash.c| 61 ---
 lib/librte_hash/rte_hash.h|  2 -
 lib/librte_lpm/rte_lpm.c  | 65 
 lib/librte_lpm/rte_lpm.h  |  2 -
 lib/librte_lpm/rte_lpm6.c | 62 +++
 lib/librte_mempool/Makefile   |  3 +-
 lib/librte_mempool/rte_mempool.c  | 37 +---
 lib/librte_mempool/rte_mempool.h  |  2 -
 lib/librte_ring/Makefile  |  4 +-
 lib/librte_ring/rte_ring.c| 33 +++---
 lib/librte_ring/rte_ring.h|  2 -
 21 files changed, 424 insertions(+), 120 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH] dpdk_nic_bind: unbind ports that were erroneously bound

2014-06-18 Thread Anatoly Burakov
When binding devices to a generic driver (i.e. one that doesn't have a
PCI ID table, some devices that are not bound to any other driver could
be bound even if no one has asked them to. hence, we check the list of
drivers again, and see if some of the previously-unbound devices were
erroneously bound. if such devices are found, they are unbound back.

Signed-off-by: Anatoly Burakov 
---
 tools/dpdk_nic_bind.py | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py
index 42e845f..334bf47 100755
--- a/tools/dpdk_nic_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -383,10 +383,32 @@ def unbind_all(dev_list, force=False):

 def bind_all(dev_list, driver, force=False):
 """Unbind method, takes a list of device locations"""
+global devices
+
 dev_list = map(dev_id_from_dev_name, dev_list)
+
 for d in dev_list:
 bind_one(d, driver, force)

+# when binding devices to a generic driver (i.e. one that doesn't have a
+# PCI ID table), some devices that are not bound to any other driver could
+# be bound even if no one has asked them to. hence, we check the list of
+# drivers again, and see if some of the previously-unbound devices were
+# erroneously bound.
+for d in devices.keys():
+# skip devices that were already bound or that we know should be bound
+if "Driver_str" in devices[d] or d in dev_list:
+continue
+
+# update information about this device
+devices[d] = dict(devices[d].items() +
+  get_pci_device_details(d).items())
+
+# check if updated information indicates that the device was bound
+if "Driver_str" in devices[d]:
+unbind_one(d, force)
+
+
 def display_devices(title, dev_list, extra_params = None):
 '''Displays to the user the details of a list of devices given in 
"dev_list"
 The "extra_params" parameter, if given, should contain a string with
-- 
1.8.1.4



[dpdk-dev] [PATCH v3 2/2] vfio: more verbose error messages

2014-06-18 Thread Anatoly Burakov
also, making VFIO code distinguish between actual unexpected values
and ioctl() failures, providing appropriate error messages.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 48 --
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 9eb5dcd..bf765b5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -180,7 +180,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
VFIO_TYPE1_IOMMU);
if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
+   "error %i (%s)\n", errno, strerror(errno));
return -1;
}

@@ -201,7 +202,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, _map);

if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}
}
@@ -253,7 +255,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)

ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, );
if (ret < 0) {
-   RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+   RTE_LOG(ERR, EAL, "  cannot get IRQ info, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -271,7 +274,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
/* set up an eventfd for interrupts */
fd = eventfd(0, 0);
if (fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up eventfd, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -313,22 +317,31 @@ pci_vfio_get_container_fd(void)
if (internal_config.process_type == RTE_PROC_PRIMARY) {
vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+   RTE_LOG(ERR, EAL, "  cannot open VFIO container, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

/* check VFIO API version */
ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
if (ret != VFIO_API_VERSION) {
-   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get VFIO API 
version, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported VFIO API 
version!\n");
close(vfio_container_fd);
return -1;
}

/* check if we support IOMMU type 1 */
ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, 
VFIO_TYPE1_IOMMU);
-   if (!ret) {
-   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+   if (ret != 1) {
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported IOMMU 
type!\n");
close(vfio_container_fd);
return -1;
}
@@ -564,7 +577,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
/* check if the group is viable */
ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, _status);
if (ret) {
-   RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+   RTE_LOG(ERR, EAL, "  %s cannot get group status, "
+   "error %i (%s)\n", pc

[dpdk-dev] [PATCH v3 1/2] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Anatoly Burakov
Currently, VFIO only checks for being able to access the /dev/vfio
directory when initializing VFIO, deferring actual VFIO container
initialization to VFIO binding code. This doesn't bode well for when
VFIO container cannot be initialized for whatever reason, because
it results in unrecoverable error even if the user didn't set up
VFIO and didn't even want to use it in the first place.

This patch fixes this by moving container initialization into the
code that checks if VFIO is available at runtime. Therefore, any
issues with the container will be known at initialization stage and
VFIO will simply be turned off if container could not be set up.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 4de6061..9eb5dcd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -523,17 +523,6 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
loc->domain, loc->bus, loc->devid, loc->function);

-   /* get container fd (needs to be done only once per initialization) */
-   if (vfio_cfg.vfio_container_fd == -1) {
-   int vfio_container_fd = pci_vfio_get_container_fd();
-   if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", 
pci_addr);
-   return -1;
-   }
-
-   vfio_cfg.vfio_container_fd = vfio_container_fd;
-   }
-
/* get group number */
iommu_group_no = pci_vfio_get_group_no(pci_addr);

@@ -770,10 +759,10 @@ pci_vfio_enable(void)
vfio_cfg.vfio_groups[i].fd = -1;
vfio_cfg.vfio_groups[i].group_no = -1;
}
-   vfio_cfg.vfio_container_fd = -1;
+   vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();

/* check if we have VFIO driver enabled */
-   if (access(VFIO_DIR, F_OK) == 0)
+   if (vfio_cfg.vfio_container_fd != -1)
vfio_cfg.vfio_enabled = 1;
else
RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong 
permissions\n");
-- 
1.8.1.4



[dpdk-dev] [PATCH v3 0/2] Fix issues with VFIO

2014-06-18 Thread Anatoly Burakov
This patchset fixes an issue with VFIO where DPDK initialization could
fail even if the user didn't want to use VFIO in the first place. Also,
more verbose and descriptive error messages were added to VFIO code, for
example distinguishing between a failed ioctl() call and an unsupported
VFIO API version.

Anatoly Burakov (2):
  vfio: open VFIO container at startup rather than during init
  vfio: more verbose error messages

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 --
 1 file changed, 34 insertions(+), 29 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH 10/10] rte_ip_frag: API header file fix

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ip_frag.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 84952a1..e0936dc 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -36,9 +36,9 @@

 /**
  * @file
- * RTE IPv4 Fragmentation and Reassembly
+ * RTE IP Fragmentation and Reassembly
  *
- * Implementation of IPv4 packet fragmentation and reassembly.
+ * Implementation of IP packet fragmentation and reassembly.
  */

 #include 
-- 
1.8.1.4



[dpdk-dev] [PATCH 08/10] ip_fragmentation: small fixes

2014-06-18 Thread Anatoly Burakov
Adding check for non-existent ports in portmask.

Also, making everything NUMA-related depend on lcore sockets, not device
sockets. This is because the init_mem() function allocates all data
structures based on NUMA nodes of the lcores in the coremask. Therefore,
when no cores are on socket 0, but there are devices on socket 0, it may
lead to segmentation faults.

Signed-off-by: Anatoly Burakov 
---
 examples/ip_fragmentation/main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 02e40a1..3172ad5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -886,6 +886,10 @@ MAIN(int argc, char **argv)
if (init_mem() < 0)
rte_panic("Cannot initialize memory structures!\n");

+   /* check if portmask has non-existent ports */
+   if (enabled_port_mask & ~(RTE_LEN2MASK(nb_ports, unsigned)))
+   rte_exit(EXIT_FAILURE, "Non-existent ports in portmask!\n");
+
/* initialize all ports */
for (portid = 0; portid < nb_ports; portid++) {
/* skip ports that are not enabled */
@@ -907,7 +911,7 @@ MAIN(int argc, char **argv)
qconf = _queue_conf[rx_lcore_id];
}

-   socket = rte_eth_dev_socket_id(portid);
+   socket = (int) rte_lcore_to_socket_id(rx_lcore_id);
if (socket == SOCKET_ID_ANY)
socket = 0;

-- 
1.8.1.4



[dpdk-dev] [PATCH 07/10] ip_frag: fix order of arguments to key compare function

2014-06-18 Thread Anatoly Burakov
when using key compare function, it uses key length of the first
argument to determine how long should be the keys that are compared.
however, currently we are passing a key from the fragmentation table as
first argument. the problem with this is that this key is potentially
uninitialized (i.e. contains all zeroes, including key length). this
leads to a nasty bug of comparing only the key id's and not keys
themselves.

of course, a safer way would be to do RTE_MAX between key lengths, but
since this compare is done per-packet, every cycle counts, so we just
use the key whos length is guaranteed to be correct because it comes
from an actual packet.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_internal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
b/lib/librte_ip_frag/ip_frag_internal.c
index 6203740..a2c645b 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -346,7 +346,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
max_cycles = tbl->max_cycles;
assoc = tbl->bucket_entries;

-   if (tbl->last != NULL && ip_frag_key_cmp(>last->key, key) == 0)
+   if (tbl->last != NULL && ip_frag_key_cmp(key, >last->key) == 0)
return (tbl->last);

/* different hashing methods for IPv4 and IPv6 */
@@ -378,7 +378,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
p1, i, assoc,
IPv6_KEY_BYTES(p1[i].key.src_dst), p1[i].key.id, 
p1[i].start);

-   if (ip_frag_key_cmp([i].key, key) == 0)
+   if (ip_frag_key_cmp(key, [i].key) == 0)
return (p1 + i);
else if (ip_frag_key_is_empty([i].key))
empty = (empty == NULL) ? (p1 + i) : empty;
@@ -404,7 +404,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
p2, i, assoc,
IPv6_KEY_BYTES(p2[i].key.src_dst), p2[i].key.id, 
p2[i].start);

-   if (ip_frag_key_cmp([i].key, key) == 0)
+   if (ip_frag_key_cmp(key, [i].key) == 0)
return (p2 + i);
else if (ip_frag_key_is_empty([i].key))
empty = (empty == NULL) ?( p2 + i) : empty;
-- 
1.8.1.4



[dpdk-dev] [PATCH 03/10] ip_frag: renaming rte_ip_frag_pkt to ip_frag_pkt

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_common.h  | 18 +-
 lib/librte_ip_frag/ip_frag_internal.c| 20 ++--
 lib/librte_ip_frag/rte_ip_frag.h | 12 ++--
 lib/librte_ip_frag/rte_ipv4_reassembly.c |  4 ++--
 lib/librte_ip_frag/rte_ipv6_reassembly.c |  4 ++--
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/librte_ip_frag/ip_frag_common.h 
b/lib/librte_ip_frag/ip_frag_common.h
index 5ad0a0b..9df8074 100644
--- a/lib/librte_ip_frag/ip_frag_common.h
+++ b/lib/librte_ip_frag/ip_frag_common.h
@@ -63,21 +63,21 @@ if (!(exp)) {   
\
"%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64

 /* internal functions declarations */
-struct rte_mbuf * ip_frag_process(struct rte_ip_frag_pkt *fp,
+struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
uint16_t ofs, uint16_t len, uint16_t more_frags);

-struct rte_ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
+struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
struct rte_ip_frag_death_row *dr,
const struct ip_frag_key *key, uint64_t tms);

-struct rte_ip_frag_pkt * ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
+struct ip_frag_pkt * ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
const struct ip_frag_key *key, uint64_t tms,
-   struct rte_ip_frag_pkt **free, struct rte_ip_frag_pkt **stale);
+   struct ip_frag_pkt **free, struct ip_frag_pkt **stale);

 /* these functions need to be declared here as ip_frag_process relies on them 
*/
-struct rte_mbuf * ipv4_frag_reassemble(const struct rte_ip_frag_pkt *fp);
-struct rte_mbuf * ipv6_frag_reassemble(const struct rte_ip_frag_pkt *fp);
+struct rte_mbuf * ipv4_frag_reassemble(const struct ip_frag_pkt *fp);
+struct rte_mbuf * ipv6_frag_reassemble(const struct ip_frag_pkt *fp);



@@ -122,7 +122,7 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const struct 
ip_frag_key * k2)

 /* put fragment on death row */
 static inline void
-ip_frag_free(struct rte_ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
+ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
 {
uint32_t i, k;

@@ -140,7 +140,7 @@ ip_frag_free(struct rte_ip_frag_pkt *fp, struct 
rte_ip_frag_death_row *dr)

 /* if key is empty, mark key as in use */
 static inline void
-ip_frag_inuse(struct rte_ip_frag_tbl *tbl, const struct  rte_ip_frag_pkt *fp)
+ip_frag_inuse(struct rte_ip_frag_tbl *tbl, const struct  ip_frag_pkt *fp)
 {
if (ip_frag_key_is_empty(>key)) {
TAILQ_REMOVE(>lru, fp, lru);
@@ -150,7 +150,7 @@ ip_frag_inuse(struct rte_ip_frag_tbl *tbl, const struct  
rte_ip_frag_pkt *fp)

 /* reset the fragment */
 static inline void
-ip_frag_reset(struct rte_ip_frag_pkt *fp, uint64_t tms)
+ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms)
 {
static const struct ip_frag zero_frag = {
.ofs = 0,
diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
b/lib/librte_ip_frag/ip_frag_internal.c
index cfcab1b..219221f 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -54,7 +54,7 @@
 /* local frag table helper functions */
 static inline void
 ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr,
-   struct rte_ip_frag_pkt *fp)
+   struct ip_frag_pkt *fp)
 {
ip_frag_free(fp, dr);
ip_frag_key_invalidate(>key);
@@ -64,7 +64,7 @@ ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct 
rte_ip_frag_death_row *dr,
 }

 static inline void
-ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct rte_ip_frag_pkt *fp,
+ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct ip_frag_pkt *fp,
const struct ip_frag_key *key, uint64_t tms)
 {
fp->key = key[0];
@@ -76,7 +76,7 @@ ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct 
rte_ip_frag_pkt *fp,

 static inline void
 ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
*dr,
-   struct rte_ip_frag_pkt *fp, uint64_t tms)
+   struct ip_frag_pkt *fp, uint64_t tms)
 {
ip_frag_free(fp, dr);
ip_frag_reset(fp, tms);
@@ -137,7 +137,7 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, 
uint32_t *v2)
 }

 struct rte_mbuf *
-ip_frag_process(struct rte_ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
+ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
 {
uint32_t idx;
@@ -268,11 +268,11 @@ ip_frag_process(struct rte_ip_frag_pkt *fp, struct 
rte_ip_frag_death_row *dr,
  * If such entry is not present, then allocate a new one.
  * If the entry is stale, then free and reuse it.
  */
-struct rte_ip_frag_pkt *
+struct ip_frag_pkt *
 ip_frag_find(struct rt

[dpdk-dev] [PATCH 02/10] ip_frag: fix debug macros

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ipv4_reassembly.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c 
b/lib/librte_ip_frag/rte_ipv4_reassembly.c
index cbac413..c14c677 100644
--- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
+++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
@@ -145,7 +145,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
"max_entries: %u, use_entries: %u\n\n",
__func__, __LINE__,
-   mb, tms, key.src_dst, key.id, ip_ofs, ip_len, ip_flag,
+   mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);

@@ -161,7 +161,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
", total_size: %u, frag_size: %u, last_idx: %u\n\n",
__func__, __LINE__,
tbl, tbl->max_entries, tbl->use_entries,
-   fp, fp->key.src_dst, fp->key.id, fp->start,
+   fp, fp->key.src_dst[0], fp->key.id, fp->start,
fp->total_size, fp->frag_size, fp->last_idx);


@@ -176,7 +176,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
", total_size: %u, frag_size: %u, last_idx: %u\n\n",
__func__, __LINE__, mb,
tbl, tbl->max_entries, tbl->use_entries,
-   fp, fp->key.src_dst, fp->key.id, fp->start,
+   fp, fp->key.src_dst[0], fp->key.id, fp->start,
fp->total_size, fp->frag_size, fp->last_idx);

return (mb);
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 2/2] vfio: more verbose error messages

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
CC: Neil Horman 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 48 --
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 9eb5dcd..bf765b5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -180,7 +180,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
VFIO_TYPE1_IOMMU);
if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
+   "error %i (%s)\n", errno, strerror(errno));
return -1;
}

@@ -201,7 +202,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, _map);

if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}
}
@@ -253,7 +255,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)

ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, );
if (ret < 0) {
-   RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+   RTE_LOG(ERR, EAL, "  cannot get IRQ info, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -271,7 +274,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
/* set up an eventfd for interrupts */
fd = eventfd(0, 0);
if (fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up eventfd, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -313,22 +317,31 @@ pci_vfio_get_container_fd(void)
if (internal_config.process_type == RTE_PROC_PRIMARY) {
vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+   RTE_LOG(ERR, EAL, "  cannot open VFIO container, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

/* check VFIO API version */
ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
if (ret != VFIO_API_VERSION) {
-   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get VFIO API 
version, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported VFIO API 
version!\n");
close(vfio_container_fd);
return -1;
}

/* check if we support IOMMU type 1 */
ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, 
VFIO_TYPE1_IOMMU);
-   if (!ret) {
-   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+   if (ret != 1) {
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported IOMMU 
type!\n");
close(vfio_container_fd);
return -1;
}
@@ -564,7 +577,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
/* check if the group is viable */
ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, _status);
if (ret) {
-   RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+   RTE_LOG(ERR, EAL, "  %s cannot get group status, "
+   "error %i (%s)\n", pci_addr, errno, 
strerror(errno));
close(vfio_group_fd);
clear_current_group();
return

[dpdk-dev] [PATCH v2 0/2] Fix issues with VFIO

2014-06-18 Thread Anatoly Burakov
This patchset fixes an issue with VFIO where DPDK initialization could
fail even if the user didn't want to use VFIO in the first place. Also,
more verbose and descriptive error messages were added to VFIO code, for
example distinguishing between a failed ioctl() call and an unsupported
VFIO API version.

Anatoly Burakov (2):
  vfio: open VFIO container at startup rather than during init
  vfio: more verbose error messages

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 --
 1 file changed, 34 insertions(+), 29 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH v3 9/9] rte_acl: make acl tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_acl/acl.h |  1 -
 lib/librte_acl/rte_acl.c | 74 +++-
 2 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
index e6d7985..b9d63fd 100644
--- a/lib/librte_acl/acl.h
+++ b/lib/librte_acl/acl.h
@@ -149,7 +149,6 @@ struct rte_acl_bld_trie {
 };

 struct rte_acl_ctx {
-   TAILQ_ENTRY(rte_acl_ctx) next;/**< Next in list. */
charname[RTE_ACL_NAMESIZE];
/** Name of the ACL context. */
int32_t socket_id;
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 129a41f..3b47ab6 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -36,13 +36,14 @@

 #defineBIT_SIZEOF(x)   (sizeof(x) * CHAR_BIT)

-TAILQ_HEAD(rte_acl_list, rte_acl_ctx);
+TAILQ_HEAD(rte_acl_list, rte_tailq_entry);

 struct rte_acl_ctx *
 rte_acl_find_existing(const char *name)
 {
-   struct rte_acl_ctx *ctx;
+   struct rte_acl_ctx *ctx = NULL;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;

/* check that we have an initialised tail queue */
acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
@@ -52,27 +53,55 @@ rte_acl_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(name, ctx->name, sizeof(ctx->name)) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (ctx == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return ctx;
 }

 void
 rte_acl_free(struct rte_acl_ctx *ctx)
 {
+   struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
+
if (ctx == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_ACL, rte_acl_list, ctx);
+   /* check that we have an initialised tail queue */
+   acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
+   if (acl_list == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, acl_list, next) {
+   if (te->data == (void *) ctx)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(acl_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

rte_free(ctx->mem);
rte_free(ctx);
+   rte_free(te);
 }

 struct rte_acl_ctx *
@@ -81,6 +110,7 @@ rte_acl_create(const struct rte_acl_param *param)
size_t sz;
struct rte_acl_ctx *ctx;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
char name[sizeof(ctx->name)];

/* check that we have an initialised tail queue */
@@ -105,15 +135,31 @@ rte_acl_create(const struct rte_acl_param *param)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* if we already have one with that name */
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(param->name, ctx->name, sizeof(ctx->name)) == 0)
break;
}

/* if ACL with such name doesn't exist, then create a new one. */
-   if (ctx == NULL && (ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE,
-   param->socket_id)) != NULL) {
+   if (te == NULL) {
+   ctx = NULL;
+   te = rte_zmalloc("ACL_TAILQ_ENTRY", sizeof(*te), 0);
+
+   if (te == NULL) {
+   RTE_LOG(ERR, ACL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
+   ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE, 
param->socket_id);

+   if (ctx == NULL) {
+   RTE_LOG(ERR, ACL,
+   "allocation of %zu bytes on socket %d for %s 
failed\n",
+   sz, param->socket_id, name);
+   rte_free(te);
+   goto exit;
+   }
/* init new allocated context. */
ctx->rules = ctx + 1;
ctx->max_rules = param->max_rule_num;
@@ -121,14 +167,12 @@ rte_acl_create(const struct rte_acl_param *param)
ctx->socket_id = param->socket_id;
rte_snprintf(ctx->name, sizeof(ctx->name), "%s", param->name);

- 

[dpdk-dev] [PATCH v3 8/9] rte_lpm6: make lpm6 tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm6.c | 62 ++-
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 56c74a1..73b48d0 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -77,7 +77,7 @@ enum valid_flag {
VALID
 };

-TAILQ_HEAD(rte_lpm6_list, rte_lpm6);
+TAILQ_HEAD(rte_lpm6_list, rte_tailq_entry);

 /** Tbl entry structure. It is the same for both tbl24 and tbl8 */
 struct rte_lpm6_tbl_entry {
@@ -99,8 +99,6 @@ struct rte_lpm6_rule {

 /** LPM6 structure. */
 struct rte_lpm6 {
-   TAILQ_ENTRY(rte_lpm6) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM6_NAMESIZE];/**< Name of the lpm. */
uint32_t max_rules;  /**< Max number of rules. */
@@ -149,6 +147,7 @@ rte_lpm6_create(const char *name, int socket_id,
 {
char mem_name[RTE_LPM6_NAMESIZE];
struct rte_lpm6 *lpm = NULL;
+   struct rte_tailq_entry *te;
uint64_t mem_size, rules_size;
struct rte_lpm6_list *lpm_list;

@@ -179,12 +178,20 @@ rte_lpm6_create(const char *name, int socket_id,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* Guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm6 *) te->data;
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
+   goto exit;
+
+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry!\n");
goto exit;
+   }

/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm6 *)rte_zmalloc_socket(mem_name, (size_t)mem_size,
@@ -192,6 +199,7 @@ rte_lpm6_create(const char *name, int socket_id,

if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -201,6 +209,7 @@ rte_lpm6_create(const char *name, int socket_id,
if (lpm->rules_tbl == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
rte_free(lpm);
+   rte_free(te);
goto exit;
}

@@ -209,7 +218,9 @@ rte_lpm6_create(const char *name, int socket_id,
lpm->number_tbl8s = config->number_tbl8s;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -223,7 +234,8 @@ exit:
 struct rte_lpm6 *
 rte_lpm6_find_existing(const char *name)
 {
-   struct rte_lpm6 *l;
+   struct rte_lpm6 *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm6_list *lpm_list;

/* Check that we have an initialised tail queue */
@@ -234,14 +246,17 @@ rte_lpm6_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm6 *) te->data;
if (strncmp(name, l->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -252,13 +267,38 @@ rte_lpm6_find_existing(const char *name)
 void
 rte_lpm6_free(struct rte_lpm6 *lpm)
 {
+   struct rte_lpm6_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, lpm);
-   rte_free(lpm->rules_tbl);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm6_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
-- 
1.8.1.4



[dpdk-dev] [PATCH v3 7/9] rte_lpm: make lpm tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm.c | 65 
 lib/librte_lpm/rte_lpm.h |  2 --
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 592750e..6a49d43 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -56,7 +56,7 @@

 #include "rte_lpm.h"

-TAILQ_HEAD(rte_lpm_list, rte_lpm);
+TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);

 #define MAX_DEPTH_TBL24 24

@@ -118,24 +118,29 @@ depth_to_range(uint8_t depth)
 struct rte_lpm *
 rte_lpm_find_existing(const char *name)
 {
-   struct rte_lpm *l;
+   struct rte_lpm *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) 
== NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm *) te->data;
if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -149,12 +154,13 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
 {
char mem_name[RTE_LPM_NAMESIZE];
struct rte_lpm *lpm = NULL;
+   struct rte_tailq_entry *te;
uint32_t mem_size;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -176,18 +182,27 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm *) te->data;
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm *)rte_zmalloc_socket(mem_name, mem_size,
CACHE_LINE_SIZE, socket_id);
if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -195,7 +210,9 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
lpm->max_rules = max_rules;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -209,12 +226,38 @@ exit:
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+   struct rte_lpm_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, lpm);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index d35565d..308f5ef 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -132,8 +

[dpdk-dev] [PATCH v3 6/9] rte_mempool: make mempool tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_mempool/Makefile  |  3 ++-
 lib/librte_mempool/rte_mempool.c | 37 -
 lib/librte_mempool/rte_mempool.h |  2 --
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index c79b306..9939e10 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -44,7 +44,8 @@ endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

-# this lib needs eal
+# this lib needs eal, rte_ring and rte_malloc
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_eal lib/librte_ring
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7eebf7f..736e854 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,7 +61,7 @@

 #include "rte_mempool.h"

-TAILQ_HEAD(rte_mempool_list, rte_mempool);
+TAILQ_HEAD(rte_mempool_list, rte_tailq_entry);

 #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5

@@ -404,6 +405,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
char mz_name[RTE_MEMZONE_NAMESIZE];
char rg_name[RTE_RING_NAMESIZE];
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_ring *r;
const struct rte_memzone *mz;
size_t mempool_size;
@@ -501,6 +503,13 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
}
}

+   /* try to allocate tailq entry */
+   te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, MEMPOOL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
/*
 * If user provided an external memory buffer, then use it to
 * store mempool objects. Otherwise reserve memzone big enough to
@@ -527,8 +536,10 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * no more memory: in this case we loose previously reserved
 * space for the as we cannot free it
 */
-   if (mz == NULL)
+   if (mz == NULL) {
+   rte_free(te);
goto exit;
+   }

if (rte_eal_has_hugepages()) {
startaddr = (void*)mz->addr;
@@ -587,7 +598,9 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

mempool_populate(mp, n, 1, obj_init, obj_init_arg);

-   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, mp);
+   te->data = (void *) mp;
+
+   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, te);

 exit:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
@@ -812,6 +825,7 @@ void
 rte_mempool_list_dump(FILE *f)
 {
const struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -822,7 +836,8 @@ rte_mempool_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
rte_mempool_dump(f, mp);
}

@@ -834,6 +849,7 @@ struct rte_mempool *
 rte_mempool_lookup(const char *name)
 {
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -844,15 +860,18 @@ rte_mempool_lookup(const char *name)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
if (strncmp(name, mp->name, RTE_MEMPOOL_NAMESIZE) == 0)
break;
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOOL_RWLOCK);

-   if (mp == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return mp;
 }
@@ -860,7 +879,7 @@ rte_mempool_lookup(const char *name)
 void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
  void *arg)
 {
-   struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te = NULL;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -871,8 +890,8 @@ void rte_mempool_walk(void (*func)(const struct rte_mempool 
*, void *),

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
-   (*func)(mp, arg);
+   TAILQ_FOREACH(te, mempool_list, next) {
+   (*func)((struct rte_mempool *) te->data,

[dpdk-dev] [PATCH v3 5/9] rte_fbk_hash: make rte_fbk_hash tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_fbk_hash.c | 73 ++
 lib/librte_hash/rte_fbk_hash.h |  3 --
 2 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/lib/librte_hash/rte_fbk_hash.c b/lib/librte_hash/rte_fbk_hash.c
index 4d67554..1356cf4 100644
--- a/lib/librte_hash/rte_fbk_hash.c
+++ b/lib/librte_hash/rte_fbk_hash.c
@@ -54,7 +54,7 @@

 #include "rte_fbk_hash.h"

-TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
+TAILQ_HEAD(rte_fbk_hash_list, rte_tailq_entry);

 /**
  * Performs a lookup for an existing hash table, and returns a pointer to
@@ -69,24 +69,29 @@ TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
 struct rte_fbk_hash_table *
 rte_fbk_hash_find_existing(const char *name)
 {
-   struct rte_fbk_hash_table *h;
+   struct rte_fbk_hash_table *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_fbk_hash_list *fbk_hash_list;

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   h = (struct rte_fbk_hash_table *) te->data;
if (strncmp(name, h->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -104,6 +109,7 @@ struct rte_fbk_hash_table *
 rte_fbk_hash_create(const struct rte_fbk_hash_params *params)
 {
struct rte_fbk_hash_table *ht = NULL;
+   struct rte_tailq_entry *te;
char hash_name[RTE_FBK_HASH_NAMESIZE];
const uint32_t mem_size =
sizeof(*ht) + (sizeof(ht->t[0]) * params->entries);
@@ -112,7 +118,8 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -134,20 +141,28 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(ht, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   ht = (struct rte_fbk_hash_table *) te->data;
if (strncmp(params->name, ht->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
-   if (ht != NULL)
+   if (te != NULL)
goto exit;

+   te = rte_zmalloc("FBK_HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory for table. */
-   ht = (struct rte_fbk_hash_table *)rte_malloc_socket(hash_name, mem_size,
+   ht = (struct rte_fbk_hash_table *)rte_zmalloc_socket(hash_name, 
mem_size,
0, params->socket_id);
-   if (ht == NULL)
+   if (ht == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate fbk hash table\n");
+   rte_free(te);
goto exit;
-
-   memset(ht, 0, mem_size);
+   }

/* Set up hash table context. */
rte_snprintf(ht->name, sizeof(ht->name), "%s", params->name);
@@ -169,7 +184,9 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
ht->init_val = RTE_FBK_HASH_INIT_VAL_DEFAULT;
}

-   TAILQ_INSERT_TAIL(fbk_hash_list, ht, next);
+   te->data = (void *) ht;
+
+   TAILQ_INSERT_TAIL(fbk_hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +203,38 @@ exit:
 void
 rte_fbk_hash_free(struct rte_fbk_hash_table *ht)
 {
+   struct rte_tailq_entry *te;
+   struct rte_fbk_hash_list *fbk_hash_list;
+
if (ht == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list, ht);
+   /* check that we have an initialised tail queue */
+   if ((fbk_hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_

[dpdk-dev] [PATCH v3 4/9] rte_hash: make rte_hash tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_hash.c | 61 +++---
 lib/librte_hash/rte_hash.h |  2 --
 2 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index d4221a8..eea5c01 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -60,7 +60,7 @@
 #include "rte_hash.h"


-TAILQ_HEAD(rte_hash_list, rte_hash);
+TAILQ_HEAD(rte_hash_list, rte_tailq_entry);

 /* Macro to enable/disable run-time checking of function parameters */
 #if defined(RTE_LIBRTE_HASH_DEBUG)
@@ -141,24 +141,29 @@ find_first(uint32_t sig, const uint32_t *sig_bucket, 
uint32_t num_sigs)
 struct rte_hash *
 rte_hash_find_existing(const char *name)
 {
-   struct rte_hash *h;
+   struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_hash_list *hash_list;

/* check that we have an initialised tail queue */
-   if ((hash_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, 
rte_hash_list)) == NULL) {
+   if ((hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) 
== NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -166,6 +171,7 @@ struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
uint32_t num_buckets, sig_bucket_size, key_size,
hash_tbl_size, sig_tbl_size, key_tbl_size, mem_size;
char hash_name[RTE_HASH_NAMESIZE];
@@ -212,17 +218,25 @@ rte_hash_create(const struct rte_hash_parameters *params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(params->name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
-   if (h != NULL)
+   if (te != NULL)
+   goto exit;
+
+   te = rte_zmalloc("HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "tailq entry allocation failed\n");
goto exit;
+   }

h = (struct rte_hash *)rte_zmalloc_socket(hash_name, mem_size,
   CACHE_LINE_SIZE, params->socket_id);
if (h == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -242,7 +256,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->hash_func = (params->hash_func == NULL) ?
DEFAULT_HASH_FUNC : params->hash_func;

-   TAILQ_INSERT_TAIL(hash_list, h, next);
+   te->data = (void *) h;
+
+   TAILQ_INSERT_TAIL(hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -253,11 +269,38 @@ exit:
 void
 rte_hash_free(struct rte_hash *h)
 {
+   struct rte_tailq_entry *te;
+   struct rte_hash_list *hash_list;
+
if (h == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_HASH, rte_hash_list, h);
+   /* check that we have an initialised tail queue */
+   if ((hash_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, hash_list, next) {
+   if (te->data == (void *) h)
+   break;
+   }
+
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(hash_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(h);
+   rte_free(te);
 }

 static inline int32_t
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 5228e3a..2ecaf1a 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -86,8 +86,6 @@ struct rte_hash_parameters {

 /** A hash table structure. */
 struct rte_hash {
-   TAILQ_ENTRY(rte_hash) next;/**< Next in list. */
-
char name[RTE_HASH_NAMESIZE];   /**< Name of the hash. */
  

[dpdk-dev] [PATCH v3 3/9] rte_ring: make ring tailq fully local

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 ++--
 lib/librte_ring/Makefile  |  4 ++--
 lib/librte_ring/rte_ring.c| 33 +++
 lib/librte_ring/rte_ring.h|  2 --
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 4ad76a7..fa5f4e3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -101,7 +102,7 @@ static int memseg_idx;
 static int pagesz;

 /* Tailq heads to add rings to */
-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /*
  * Utility functions
@@ -754,6 +755,7 @@ rte_eal_ivshmem_obj_init(void)
struct ivshmem_segment * seg;
struct rte_memzone * mz;
struct rte_ring * r;
+   struct rte_tailq_entry *te;
unsigned i, ms, idx;
uint64_t offset;

@@ -808,6 +810,8 @@ rte_eal_ivshmem_obj_init(void)
mcfg->memzone_idx++;
}

+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
/* find rings */
for (i = 0; i < mcfg->memzone_idx; i++) {
mz = >memzone[i];
@@ -819,10 +823,19 @@ rte_eal_ivshmem_obj_init(void)

r = (struct rte_ring*) (mz->addr_64);

-   TAILQ_INSERT_TAIL(ring_list, r, next);
+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate ring tailq 
entry!\n");
+   return -1;
+   }
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);

RTE_LOG(DEBUG, EAL, "Found ring: '%s' at %p\n", r->name, 
mz->addr);
}
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

 #ifdef RTE_LIBRTE_IVSHMEM_DEBUG
rte_memzone_dump(stdout);
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 550507d..2380a43 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -42,7 +42,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h

-# this lib needs eal
-DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal
+# this lib needs eal and rte_malloc
+DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 2fe4024..d2ff3fe 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,7 +90,7 @@

 #include "rte_ring.h"

-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)
@@ -155,6 +156,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
+   struct rte_tailq_entry *te;
const struct rte_memzone *mz;
ssize_t ring_size;
int mz_flags = 0;
@@ -173,6 +175,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +195,14 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
/* no need to check return value here, we already checked the
 * arguments above */
rte_ring_init(r, name, count, flags);
-   TAILQ_INSERT_TAIL(ring_list, r, next);
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);
} else {
r = NULL;
RTE_LOG(ERR, RING, "Cannot reserve memory\n");
+   rte_free(te);
}
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

@@ -272,7 +285,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 void
 rte_ring_list_dump(FILE *f)
 {
-   const struct rte_ring *mp;
+   const struct rte_tailq_entry *te;
struct rte_ring_list *ring_list;

/* check that we have an initialised tail queue */
@@ -284,8 +297,8 @@ rte_ring_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);

-   TAILQ

[dpdk-dev] [PATCH v3 2/9] rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer

2014-06-18 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 app/test/test_tailq.c | 33 ---
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test/test_tailq.c b/app/test/test_tailq.c
index 67da009..c9b53ee 100644
--- a/app/test/test_tailq.c
+++ b/app/test/test_tailq.c
@@ -52,16 +52,16 @@

 #define DEFAULT_TAILQ (RTE_TAILQ_NUM)

-static struct rte_dummy d_elem;
+static struct rte_tailq_entry d_elem;

 static int
 test_tailq_create(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;
unsigned i;

/* create a first tailq and check its non-null */
-   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error allocating dummy_q0\n");

@@ -70,13 +70,14 @@ test_tailq_create(void)
TAILQ_INSERT_TAIL(d_head, _elem, next);

/* try allocating dummy_q0 again, and check for failure */
-   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head) == NULL)
+   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head) == 
NULL)
do_return("Error, non-null result returned when attemption to "
"re-allocate a tailq\n");

/* now fill up the tailq slots available and check we get an error */
for (i = RTE_TAILQ_NUM; i < RTE_MAX_TAILQ; i++){
-   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i, rte_dummy_head)) == 
NULL)
+   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i,
+   rte_tailq_entry_head)) == NULL)
break;
}

@@ -91,10 +92,10 @@ static int
 test_tailq_lookup(void)
 {
/* run successful  test - check result is found */
-   struct rte_dummy_head *d_head;
-   struct rte_dummy *d_ptr;
+   struct rte_tailq_entry_head *d_head;
+   struct rte_tailq_entry *d_ptr;

-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error with tailq lookup\n");

@@ -104,7 +105,7 @@ test_tailq_lookup(void)
"expected element not found\n");

/* now try a bad/error lookup */
-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_tailq_entry_head);
if (d_head != NULL)
do_return("Error, lookup does not return NULL for bad tailq 
name\n");

@@ -115,7 +116,7 @@ test_tailq_lookup(void)
 static int
 test_tailq_deprecated(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;

/* since TAILQ_RESERVE is not able to create new tailqs,
 * we should find an existing one (IOW, RTE_TAILQ_RESERVE behaves 
identical
@@ -123,29 +124,29 @@ test_tailq_deprecated(void)
 *
 * PCI_RESOURCE_LIST tailq is guaranteed to
 * be present in any DPDK app. */
-   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

-   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

/* try doing that with non-existent names */
-   d_head = RTE_TAILQ_RESERVE("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

/* try doing the same with NULL names */
-   d_head = RTE_TAILQ_RESERVE(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/lib

[dpdk-dev] [PATCH v3 1/9] eal: map shared config into exact same address as primary process

2014-06-18 Thread Anatoly Burakov
Shared config is shared across primary and secondary processes.
However,when using rte_malloc, the malloc elements keep references to
the heap inside themselves. This heap reference might not be referencing
a local heap because the heap reference points to the heap of whatever
process has allocated that malloc element. Therefore, there can be
situations when malloc elements in a given heap actually reference
different addresses for the same heap - depending on which process has
allocated the element. This can lead to segmentation faults when dealing
with malloc elements allocated on the same heap by different processes.

To fix this problem, heaps will now have the same addresses across
processes. In order to achieve that, a new field in a shared mem_config
(a structure that holds the heaps, and which is shared across processes)
was added to keep the address of where this config is mapped in the
primary process.

Secondary process will now map the config in two stages - first, it'll
map it into an arbitrary address and read the address the primary
process has allocated for the shared config. Then, the config is
unmapped and re-mapped using the address previously read.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 +++
 lib/librte_eal/linuxapp/eal/eal.c | 44 ---
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 30ce6fc..d6359e5 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -89,6 +89,11 @@ struct rte_mem_config {

/* Heaps of Malloc per socket */
struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+
+   /* address of mem_config in primary process. used to map shared config 
into
+* exact same address the primary process maps it.
+*/
+   uint64_t mem_cfg_addr;
 } __attribute__((__packed__));


diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 6994303..fee375c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -239,13 +239,19 @@ rte_eal_config_create(void)
}
memcpy(rte_mem_cfg_addr, _mem_config, sizeof(early_mem_config));
rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+
+   /* store address of the config in the config itself so that secondary
+* processes could later map the config into this exact location */
+   rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+
 }

 /* attach to an existing shared memory config */
 static void
 rte_eal_config_attach(void)
 {
-   void *rte_mem_cfg_addr;
+   struct rte_mem_config *mem_config;
+
const char *pathname = eal_runtime_config_path();

if (internal_config.no_shconf)
@@ -257,13 +263,40 @@ rte_eal_config_attach(void)
rte_panic("Cannot open '%s' for rte_mem_config\n", 
pathname);
}

-   rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
-   PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 
0);
+   /* map it as read-only first */
+   mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
+   PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
+   if (mem_config == MAP_FAILED)
+   rte_panic("Cannot mmap memory for rte_config\n");
+
+   rte_config.mem_config = mem_config;
+}
+
+/* reattach the shared config at exact memory location primary process has it 
*/
+static void
+rte_eal_config_reattach(void)
+{
+   struct rte_mem_config *mem_config;
+   void *rte_mem_cfg_addr;
+
+   if (internal_config.no_shconf)
+   return;
+
+   /* save the address primary process has mapped shared config to */
+   rte_mem_cfg_addr = (void *) (uintptr_t) 
rte_config.mem_config->mem_cfg_addr;
+
+   /* unmap original config */
+   munmap(rte_config.mem_config, sizeof(struct rte_mem_config));
+
+   /* remap the config at proper address */
+   mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
+   sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
+   mem_cfg_fd, 0);
close(mem_cfg_fd);
-   if (rte_mem_cfg_addr == MAP_FAILED)
+   if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
rte_panic("Cannot mmap memory for rte_config\n");

-   rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+   rte_config.mem_config = mem_config;
 }

 /* Detect if we are a primary or a secondary process */
@@ -301,6 +334,7 @@ rte_config_init(void)
case RTE_PROC_SECONDARY:
rte_eal_config_attach();
rte_eal_mcfg_wait_complete(rte_config.mem_config);
+  

[dpdk-dev] [PATCH v3 0/9] Make DPDK tailqs fully local

2014-06-18 Thread Anatoly Burakov
This issue was reported by OVS-DPDK project, and the fix should go to
upstream DPDK. This is not memnic-related - this is to do with
DPDK's rte_ivshmem library.

Every DPDK data structure has a corresponding TAILQ reserved for it in
the runtime config file. Those TAILQs are fully local to the process,
however most data structures contain pointers to next entry in the
TAILQ.

Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

v2 changes:
* fixed race conditions in *_free operations
* fixed multiprocess support for malloc heaps
* added similar changes for acl
* rebased on top of e88b42f818bc1a6d4ce6cb70371b66e37fa34f7d

v3 changes:
* fixed race reported by Konstantin Ananyev (introduced in v2)

Anatoly Burakov (9):
  eal: map shared config into exact same address as primary process
  rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
  rte_ring: make ring tailq fully local
  rte_hash: make rte_hash tailq fully local
  rte_fbk_hash: make rte_fbk_hash tailq fully local
  rte_mempool: make mempool tailq fully local
  rte_lpm: make lpm tailq fully local
  rte_lpm6: make lpm6 tailq fully local
  rte_acl: make acl tailq fully local

 app/test/test_tailq.c | 33 +-
 lib/librte_acl/acl.h  |  1 -
 lib/librte_acl/rte_acl.c  | 74 ++-
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 ++
 lib/librte_eal/common/include/rte_tailq.h |  9 +--
 lib/librte_eal/linuxapp/eal/eal.c | 44 --
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 +-
 lib/librte_hash/rte_fbk_hash.c| 73 +-
 lib/librte_hash/rte_fbk_hash.h|  3 -
 lib/librte_hash/rte_hash.c| 61 ---
 lib/librte_hash/rte_hash.h|  2 -
 lib/librte_lpm/rte_lpm.c  | 65 
 lib/librte_lpm/rte_lpm.h  |  2 -
 lib/librte_lpm/rte_lpm6.c | 62 +++
 lib/librte_mempool/Makefile   |  3 +-
 lib/librte_mempool/rte_mempool.c  | 37 +---
 lib/librte_mempool/rte_mempool.h  |  2 -
 lib/librte_ring/Makefile  |  4 +-
 lib/librte_ring/rte_ring.c| 33 +++---
 lib/librte_ring/rte_ring.h|  2 -
 21 files changed, 415 insertions(+), 119 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH 9/9] rte_acl: make acl tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_acl/acl.h |  1 -
 lib/librte_acl/rte_acl.c | 74 +++-
 2 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
index e6d7985..b9d63fd 100644
--- a/lib/librte_acl/acl.h
+++ b/lib/librte_acl/acl.h
@@ -149,7 +149,6 @@ struct rte_acl_bld_trie {
 };

 struct rte_acl_ctx {
-   TAILQ_ENTRY(rte_acl_ctx) next;/**< Next in list. */
charname[RTE_ACL_NAMESIZE];
/** Name of the ACL context. */
int32_t socket_id;
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 129a41f..3b47ab6 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -36,13 +36,14 @@

 #defineBIT_SIZEOF(x)   (sizeof(x) * CHAR_BIT)

-TAILQ_HEAD(rte_acl_list, rte_acl_ctx);
+TAILQ_HEAD(rte_acl_list, rte_tailq_entry);

 struct rte_acl_ctx *
 rte_acl_find_existing(const char *name)
 {
-   struct rte_acl_ctx *ctx;
+   struct rte_acl_ctx *ctx = NULL;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;

/* check that we have an initialised tail queue */
acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
@@ -52,27 +53,55 @@ rte_acl_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(name, ctx->name, sizeof(ctx->name)) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (ctx == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return ctx;
 }

 void
 rte_acl_free(struct rte_acl_ctx *ctx)
 {
+   struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
+
if (ctx == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_ACL, rte_acl_list, ctx);
+   /* check that we have an initialised tail queue */
+   acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
+   if (acl_list == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, acl_list, next) {
+   if (te->data == (void *) ctx)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(acl_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

rte_free(ctx->mem);
rte_free(ctx);
+   rte_free(te);
 }

 struct rte_acl_ctx *
@@ -81,6 +110,7 @@ rte_acl_create(const struct rte_acl_param *param)
size_t sz;
struct rte_acl_ctx *ctx;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
char name[sizeof(ctx->name)];

/* check that we have an initialised tail queue */
@@ -105,15 +135,31 @@ rte_acl_create(const struct rte_acl_param *param)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* if we already have one with that name */
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(param->name, ctx->name, sizeof(ctx->name)) == 0)
break;
}

/* if ACL with such name doesn't exist, then create a new one. */
-   if (ctx == NULL && (ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE,
-   param->socket_id)) != NULL) {
+   if (te == NULL) {
+   ctx = NULL;
+   te = rte_zmalloc("ACL_TAILQ_ENTRY", sizeof(*te), 0);
+
+   if (te == NULL) {
+   RTE_LOG(ERR, ACL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
+   ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE, 
param->socket_id);

+   if (ctx == NULL) {
+   RTE_LOG(ERR, ACL,
+   "allocation of %zu bytes on socket %d for %s 
failed\n",
+   sz, param->socket_id, name);
+   rte_free(te);
+   goto exit;
+   }
/* init new allocated context. */
ctx->rules = ctx + 1;
ctx->max_rules = param->max_rule_num;
@@ -121,14 +167,12 @@ rte_acl_create(const struct rte_acl_param *param)
ctx->socket_id = param->socket_id;
rte_snprintf(ctx->name, sizeof(ctx->name), "%s", param->name);

- 

[dpdk-dev] [PATCH 8/9] rte_lpm6: make lpm6 tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm6.c | 62 ++-
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 56c74a1..73b48d0 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -77,7 +77,7 @@ enum valid_flag {
VALID
 };

-TAILQ_HEAD(rte_lpm6_list, rte_lpm6);
+TAILQ_HEAD(rte_lpm6_list, rte_tailq_entry);

 /** Tbl entry structure. It is the same for both tbl24 and tbl8 */
 struct rte_lpm6_tbl_entry {
@@ -99,8 +99,6 @@ struct rte_lpm6_rule {

 /** LPM6 structure. */
 struct rte_lpm6 {
-   TAILQ_ENTRY(rte_lpm6) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM6_NAMESIZE];/**< Name of the lpm. */
uint32_t max_rules;  /**< Max number of rules. */
@@ -149,6 +147,7 @@ rte_lpm6_create(const char *name, int socket_id,
 {
char mem_name[RTE_LPM6_NAMESIZE];
struct rte_lpm6 *lpm = NULL;
+   struct rte_tailq_entry *te;
uint64_t mem_size, rules_size;
struct rte_lpm6_list *lpm_list;

@@ -179,12 +178,20 @@ rte_lpm6_create(const char *name, int socket_id,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* Guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm6 *) te->data;
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
+   goto exit;
+
+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry!\n");
goto exit;
+   }

/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm6 *)rte_zmalloc_socket(mem_name, (size_t)mem_size,
@@ -192,6 +199,7 @@ rte_lpm6_create(const char *name, int socket_id,

if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -201,6 +209,7 @@ rte_lpm6_create(const char *name, int socket_id,
if (lpm->rules_tbl == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
rte_free(lpm);
+   rte_free(te);
goto exit;
}

@@ -209,7 +218,9 @@ rte_lpm6_create(const char *name, int socket_id,
lpm->number_tbl8s = config->number_tbl8s;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -223,7 +234,8 @@ exit:
 struct rte_lpm6 *
 rte_lpm6_find_existing(const char *name)
 {
-   struct rte_lpm6 *l;
+   struct rte_lpm6 *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm6_list *lpm_list;

/* Check that we have an initialised tail queue */
@@ -234,14 +246,17 @@ rte_lpm6_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm6 *) te->data;
if (strncmp(name, l->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -252,13 +267,38 @@ rte_lpm6_find_existing(const char *name)
 void
 rte_lpm6_free(struct rte_lpm6 *lpm)
 {
+   struct rte_lpm6_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, lpm);
-   rte_free(lpm->rules_tbl);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm6_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
-- 
1.8.1.4



[dpdk-dev] [PATCH 7/9] rte_lpm: make lpm tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm.c | 65 
 lib/librte_lpm/rte_lpm.h |  2 --
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 592750e..6a49d43 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -56,7 +56,7 @@

 #include "rte_lpm.h"

-TAILQ_HEAD(rte_lpm_list, rte_lpm);
+TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);

 #define MAX_DEPTH_TBL24 24

@@ -118,24 +118,29 @@ depth_to_range(uint8_t depth)
 struct rte_lpm *
 rte_lpm_find_existing(const char *name)
 {
-   struct rte_lpm *l;
+   struct rte_lpm *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) 
== NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm *) te->data;
if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -149,12 +154,13 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
 {
char mem_name[RTE_LPM_NAMESIZE];
struct rte_lpm *lpm = NULL;
+   struct rte_tailq_entry *te;
uint32_t mem_size;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -176,18 +182,27 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm *) te->data;
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm *)rte_zmalloc_socket(mem_name, mem_size,
CACHE_LINE_SIZE, socket_id);
if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -195,7 +210,9 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
lpm->max_rules = max_rules;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -209,12 +226,38 @@ exit:
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+   struct rte_lpm_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, lpm);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index d35565d..308f5ef 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -132,8 +

[dpdk-dev] [PATCH 6/9] rte_mempool: make mempool tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_mempool/Makefile  |  3 ++-
 lib/librte_mempool/rte_mempool.c | 37 -
 lib/librte_mempool/rte_mempool.h |  2 --
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index c79b306..9939e10 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -44,7 +44,8 @@ endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

-# this lib needs eal
+# this lib needs eal, rte_ring and rte_malloc
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_eal lib/librte_ring
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7eebf7f..736e854 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,7 +61,7 @@

 #include "rte_mempool.h"

-TAILQ_HEAD(rte_mempool_list, rte_mempool);
+TAILQ_HEAD(rte_mempool_list, rte_tailq_entry);

 #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5

@@ -404,6 +405,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
char mz_name[RTE_MEMZONE_NAMESIZE];
char rg_name[RTE_RING_NAMESIZE];
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_ring *r;
const struct rte_memzone *mz;
size_t mempool_size;
@@ -501,6 +503,13 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
}
}

+   /* try to allocate tailq entry */
+   te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, MEMPOOL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
/*
 * If user provided an external memory buffer, then use it to
 * store mempool objects. Otherwise reserve memzone big enough to
@@ -527,8 +536,10 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * no more memory: in this case we loose previously reserved
 * space for the as we cannot free it
 */
-   if (mz == NULL)
+   if (mz == NULL) {
+   rte_free(te);
goto exit;
+   }

if (rte_eal_has_hugepages()) {
startaddr = (void*)mz->addr;
@@ -587,7 +598,9 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

mempool_populate(mp, n, 1, obj_init, obj_init_arg);

-   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, mp);
+   te->data = (void *) mp;
+
+   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, te);

 exit:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
@@ -812,6 +825,7 @@ void
 rte_mempool_list_dump(FILE *f)
 {
const struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -822,7 +836,8 @@ rte_mempool_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
rte_mempool_dump(f, mp);
}

@@ -834,6 +849,7 @@ struct rte_mempool *
 rte_mempool_lookup(const char *name)
 {
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -844,15 +860,18 @@ rte_mempool_lookup(const char *name)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
if (strncmp(name, mp->name, RTE_MEMPOOL_NAMESIZE) == 0)
break;
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOOL_RWLOCK);

-   if (mp == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return mp;
 }
@@ -860,7 +879,7 @@ rte_mempool_lookup(const char *name)
 void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
  void *arg)
 {
-   struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te = NULL;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -871,8 +890,8 @@ void rte_mempool_walk(void (*func)(const struct rte_mempool 
*, void *),

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
-   (*func)(mp, arg);
+   TAILQ_FOREACH(te, mempool_list, next) {
+   (*func)((struct rte_mempool *) te->data,

[dpdk-dev] [PATCH 5/9] rte_fbk_hash: make rte_fbk_hash tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_fbk_hash.c | 73 ++
 lib/librte_hash/rte_fbk_hash.h |  3 --
 2 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/lib/librte_hash/rte_fbk_hash.c b/lib/librte_hash/rte_fbk_hash.c
index 4d67554..1356cf4 100644
--- a/lib/librte_hash/rte_fbk_hash.c
+++ b/lib/librte_hash/rte_fbk_hash.c
@@ -54,7 +54,7 @@

 #include "rte_fbk_hash.h"

-TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
+TAILQ_HEAD(rte_fbk_hash_list, rte_tailq_entry);

 /**
  * Performs a lookup for an existing hash table, and returns a pointer to
@@ -69,24 +69,29 @@ TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
 struct rte_fbk_hash_table *
 rte_fbk_hash_find_existing(const char *name)
 {
-   struct rte_fbk_hash_table *h;
+   struct rte_fbk_hash_table *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_fbk_hash_list *fbk_hash_list;

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   h = (struct rte_fbk_hash_table *) te->data;
if (strncmp(name, h->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -104,6 +109,7 @@ struct rte_fbk_hash_table *
 rte_fbk_hash_create(const struct rte_fbk_hash_params *params)
 {
struct rte_fbk_hash_table *ht = NULL;
+   struct rte_tailq_entry *te;
char hash_name[RTE_FBK_HASH_NAMESIZE];
const uint32_t mem_size =
sizeof(*ht) + (sizeof(ht->t[0]) * params->entries);
@@ -112,7 +118,8 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -134,20 +141,28 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(ht, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   ht = (struct rte_fbk_hash_table *) te->data;
if (strncmp(params->name, ht->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
-   if (ht != NULL)
+   if (te != NULL)
goto exit;

+   te = rte_zmalloc("FBK_HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory for table. */
-   ht = (struct rte_fbk_hash_table *)rte_malloc_socket(hash_name, mem_size,
+   ht = (struct rte_fbk_hash_table *)rte_zmalloc_socket(hash_name, 
mem_size,
0, params->socket_id);
-   if (ht == NULL)
+   if (ht == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate fbk hash table\n");
+   rte_free(te);
goto exit;
-
-   memset(ht, 0, mem_size);
+   }

/* Set up hash table context. */
rte_snprintf(ht->name, sizeof(ht->name), "%s", params->name);
@@ -169,7 +184,9 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
ht->init_val = RTE_FBK_HASH_INIT_VAL_DEFAULT;
}

-   TAILQ_INSERT_TAIL(fbk_hash_list, ht, next);
+   te->data = (void *) ht;
+
+   TAILQ_INSERT_TAIL(fbk_hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +203,38 @@ exit:
 void
 rte_fbk_hash_free(struct rte_fbk_hash_table *ht)
 {
+   struct rte_tailq_entry *te;
+   struct rte_fbk_hash_list *fbk_hash_list;
+
if (ht == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list, ht);
+   /* check that we have an initialised tail queue */
+   if ((fbk_hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_

[dpdk-dev] [PATCH 4/9] rte_hash: make rte_hash tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_hash.c | 61 +++---
 lib/librte_hash/rte_hash.h |  2 --
 2 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index d4221a8..eea5c01 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -60,7 +60,7 @@
 #include "rte_hash.h"


-TAILQ_HEAD(rte_hash_list, rte_hash);
+TAILQ_HEAD(rte_hash_list, rte_tailq_entry);

 /* Macro to enable/disable run-time checking of function parameters */
 #if defined(RTE_LIBRTE_HASH_DEBUG)
@@ -141,24 +141,29 @@ find_first(uint32_t sig, const uint32_t *sig_bucket, 
uint32_t num_sigs)
 struct rte_hash *
 rte_hash_find_existing(const char *name)
 {
-   struct rte_hash *h;
+   struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_hash_list *hash_list;

/* check that we have an initialised tail queue */
-   if ((hash_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, 
rte_hash_list)) == NULL) {
+   if ((hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) 
== NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -166,6 +171,7 @@ struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
uint32_t num_buckets, sig_bucket_size, key_size,
hash_tbl_size, sig_tbl_size, key_tbl_size, mem_size;
char hash_name[RTE_HASH_NAMESIZE];
@@ -212,17 +218,25 @@ rte_hash_create(const struct rte_hash_parameters *params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(params->name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
-   if (h != NULL)
+   if (te != NULL)
+   goto exit;
+
+   te = rte_zmalloc("HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "tailq entry allocation failed\n");
goto exit;
+   }

h = (struct rte_hash *)rte_zmalloc_socket(hash_name, mem_size,
   CACHE_LINE_SIZE, params->socket_id);
if (h == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -242,7 +256,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->hash_func = (params->hash_func == NULL) ?
DEFAULT_HASH_FUNC : params->hash_func;

-   TAILQ_INSERT_TAIL(hash_list, h, next);
+   te->data = (void *) h;
+
+   TAILQ_INSERT_TAIL(hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -253,11 +269,38 @@ exit:
 void
 rte_hash_free(struct rte_hash *h)
 {
+   struct rte_tailq_entry *te;
+   struct rte_hash_list *hash_list;
+
if (h == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_HASH, rte_hash_list, h);
+   /* check that we have an initialised tail queue */
+   if ((hash_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, hash_list, next) {
+   if (te->data == (void *) h)
+   break;
+   }
+
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(hash_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(h);
+   rte_free(te);
 }

 static inline int32_t
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 5228e3a..2ecaf1a 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -86,8 +86,6 @@ struct rte_hash_parameters {

 /** A hash table structure. */
 struct rte_hash {
-   TAILQ_ENTRY(rte_hash) next;/**< Next in list. */
-
char name[RTE_HASH_NAMESIZE];   /**< Name of the hash. */
  

[dpdk-dev] [PATCH 3/9] rte_ring: make ring tailq fully local

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 ++--
 lib/librte_ring/Makefile  |  4 ++--
 lib/librte_ring/rte_ring.c| 33 +++
 lib/librte_ring/rte_ring.h|  2 --
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 4ad76a7..fa5f4e3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -101,7 +102,7 @@ static int memseg_idx;
 static int pagesz;

 /* Tailq heads to add rings to */
-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /*
  * Utility functions
@@ -754,6 +755,7 @@ rte_eal_ivshmem_obj_init(void)
struct ivshmem_segment * seg;
struct rte_memzone * mz;
struct rte_ring * r;
+   struct rte_tailq_entry *te;
unsigned i, ms, idx;
uint64_t offset;

@@ -808,6 +810,8 @@ rte_eal_ivshmem_obj_init(void)
mcfg->memzone_idx++;
}

+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
/* find rings */
for (i = 0; i < mcfg->memzone_idx; i++) {
mz = >memzone[i];
@@ -819,10 +823,19 @@ rte_eal_ivshmem_obj_init(void)

r = (struct rte_ring*) (mz->addr_64);

-   TAILQ_INSERT_TAIL(ring_list, r, next);
+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate ring tailq 
entry!\n");
+   return -1;
+   }
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);

RTE_LOG(DEBUG, EAL, "Found ring: '%s' at %p\n", r->name, 
mz->addr);
}
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

 #ifdef RTE_LIBRTE_IVSHMEM_DEBUG
rte_memzone_dump(stdout);
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 550507d..2380a43 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -42,7 +42,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h

-# this lib needs eal
-DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal
+# this lib needs eal and rte_malloc
+DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 2fe4024..d2ff3fe 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,7 +90,7 @@

 #include "rte_ring.h"

-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)
@@ -155,6 +156,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
+   struct rte_tailq_entry *te;
const struct rte_memzone *mz;
ssize_t ring_size;
int mz_flags = 0;
@@ -173,6 +175,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +195,14 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
/* no need to check return value here, we already checked the
 * arguments above */
rte_ring_init(r, name, count, flags);
-   TAILQ_INSERT_TAIL(ring_list, r, next);
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);
} else {
r = NULL;
RTE_LOG(ERR, RING, "Cannot reserve memory\n");
+   rte_free(te);
}
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

@@ -272,7 +285,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 void
 rte_ring_list_dump(FILE *f)
 {
-   const struct rte_ring *mp;
+   const struct rte_tailq_entry *te;
struct rte_ring_list *ring_list;

/* check that we have an initialised tail queue */
@@ -284,8 +297,8 @@ rte_ring_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);

-   TAILQ

[dpdk-dev] [PATCH 2/9] rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer

2014-06-17 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 app/test/test_tailq.c | 33 ---
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test/test_tailq.c b/app/test/test_tailq.c
index 67da009..c9b53ee 100644
--- a/app/test/test_tailq.c
+++ b/app/test/test_tailq.c
@@ -52,16 +52,16 @@

 #define DEFAULT_TAILQ (RTE_TAILQ_NUM)

-static struct rte_dummy d_elem;
+static struct rte_tailq_entry d_elem;

 static int
 test_tailq_create(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;
unsigned i;

/* create a first tailq and check its non-null */
-   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error allocating dummy_q0\n");

@@ -70,13 +70,14 @@ test_tailq_create(void)
TAILQ_INSERT_TAIL(d_head, _elem, next);

/* try allocating dummy_q0 again, and check for failure */
-   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head) == NULL)
+   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head) == 
NULL)
do_return("Error, non-null result returned when attemption to "
"re-allocate a tailq\n");

/* now fill up the tailq slots available and check we get an error */
for (i = RTE_TAILQ_NUM; i < RTE_MAX_TAILQ; i++){
-   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i, rte_dummy_head)) == 
NULL)
+   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i,
+   rte_tailq_entry_head)) == NULL)
break;
}

@@ -91,10 +92,10 @@ static int
 test_tailq_lookup(void)
 {
/* run successful  test - check result is found */
-   struct rte_dummy_head *d_head;
-   struct rte_dummy *d_ptr;
+   struct rte_tailq_entry_head *d_head;
+   struct rte_tailq_entry *d_ptr;

-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error with tailq lookup\n");

@@ -104,7 +105,7 @@ test_tailq_lookup(void)
"expected element not found\n");

/* now try a bad/error lookup */
-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_tailq_entry_head);
if (d_head != NULL)
do_return("Error, lookup does not return NULL for bad tailq 
name\n");

@@ -115,7 +116,7 @@ test_tailq_lookup(void)
 static int
 test_tailq_deprecated(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;

/* since TAILQ_RESERVE is not able to create new tailqs,
 * we should find an existing one (IOW, RTE_TAILQ_RESERVE behaves 
identical
@@ -123,29 +124,29 @@ test_tailq_deprecated(void)
 *
 * PCI_RESOURCE_LIST tailq is guaranteed to
 * be present in any DPDK app. */
-   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

-   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

/* try doing that with non-existent names */
-   d_head = RTE_TAILQ_RESERVE("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

/* try doing the same with NULL names */
-   d_head = RTE_TAILQ_RESERVE(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/lib

[dpdk-dev] [PATCH 1/9] eal: map shared config into exact same address as primary process

2014-06-17 Thread Anatoly Burakov
Shared config is shared across primary and secondary processes.
However,when using rte_malloc, the malloc elements keep references to
the heap inside themselves. This heap reference might not be referencing
a local heap because the heap reference points to the heap of whatever
process has allocated that malloc element. Therefore, there can be
situations when malloc elements in a given heap actually reference
different addresses for the same heap - depending on which process has
allocated the element. This can lead to segmentation faults when dealing
with malloc elements allocated on the same heap by different processes.

To fix this problem, heaps will now have the same addresses across
processes. In order to achieve that, a new field in a shared mem_config
(a structure that holds the heaps, and which is shared across processes)
was added to keep the address of where this config is mapped in the
primary process.

Secondary process will now map the config in two stages - first, it'll
map it into an arbitrary address and read the address the primary
process has allocated for the shared config. Then, the config is
unmapped and re-mapped using the address previously read.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 
 lib/librte_eal/linuxapp/eal/eal.c | 31 +++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 30ce6fc..d6359e5 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -89,6 +89,11 @@ struct rte_mem_config {

/* Heaps of Malloc per socket */
struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+
+   /* address of mem_config in primary process. used to map shared config 
into
+* exact same address the primary process maps it.
+*/
+   uint64_t mem_cfg_addr;
 } __attribute__((__packed__));


diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 6994303..fedd82f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -239,6 +239,11 @@ rte_eal_config_create(void)
}
memcpy(rte_mem_cfg_addr, _mem_config, sizeof(early_mem_config));
rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+
+   /* store address of the config in the config itself so that secondary
+* processes could later map the config into this exact location */
+   rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+
 }

 /* attach to an existing shared memory config */
@@ -246,6 +251,8 @@ static void
 rte_eal_config_attach(void)
 {
void *rte_mem_cfg_addr;
+   struct rte_mem_config *mem_config;
+
const char *pathname = eal_runtime_config_path();

if (internal_config.no_shconf)
@@ -257,13 +264,27 @@ rte_eal_config_attach(void)
rte_panic("Cannot open '%s' for rte_mem_config\n", 
pathname);
}

-   rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
-   PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 
0);
-   close(mem_cfg_fd);
-   if (rte_mem_cfg_addr == MAP_FAILED)
+   /* map it as read-only first */
+   mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
+   PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
+   if (mem_config == MAP_FAILED)
rte_panic("Cannot mmap memory for rte_config\n");

-   rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+   /* store address used by primary process */
+   rte_mem_cfg_addr = (void *) (uintptr_t) mem_config->mem_cfg_addr;
+
+   /* unmap the config */
+   munmap(mem_config, sizeof(*mem_config));
+
+   /* map the config again, with the proper virtual address */
+   mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
+   sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
+   mem_cfg_fd, 0);
+   if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
+   rte_panic("Cannot mmap memory for rte_config\n");
+   close(mem_cfg_fd);
+
+   rte_config.mem_config = mem_config;
 }

 /* Detect if we are a primary or a secondary process */
-- 
1.8.1.4



[dpdk-dev] [PATCH 0/9] Make DPDK tailqs fully local

2014-06-17 Thread Anatoly Burakov
This issue was reported by OVS-DPDK project, and the fix should go to
upstream DPDK. This is not memnic-related - this is to do with
DPDK's rte_ivshmem library.

Every DPDK data structure has a corresponding TAILQ reserved for it in
the runtime config file. Those TAILQs are fully local to the process,
however most data structures contain pointers to next entry in the
TAILQ.

Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

v2 changes:
* fixed race conditions in *_free operations
* fixed multiprocess support for malloc heaps
* added similar changes for acl
* rebased on top of e88b42f818bc1a6d4ce6cb70371b66e37fa34f7d

Anatoly Burakov (9):
  eal: map shared config into exact same address as primary process
  rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
  rte_ring: make ring tailq fully local
  rte_hash: make rte_hash tailq fully local
  rte_fbk_hash: make rte_fbk_hash tailq fully local
  rte_mempool: make mempool tailq fully local
  rte_lpm: make lpm tailq fully local
  rte_lpm6: make lpm6 tailq fully local
  rte_acl: make acl tailq fully local

 app/test/test_tailq.c | 33 +-
 lib/librte_acl/acl.h  |  1 -
 lib/librte_acl/rte_acl.c  | 74 ++-
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 ++
 lib/librte_eal/common/include/rte_tailq.h |  9 +--
 lib/librte_eal/linuxapp/eal/eal.c | 31 --
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 +-
 lib/librte_hash/rte_fbk_hash.c| 73 +-
 lib/librte_hash/rte_fbk_hash.h|  3 -
 lib/librte_hash/rte_hash.c| 61 ---
 lib/librte_hash/rte_hash.h|  2 -
 lib/librte_lpm/rte_lpm.c  | 65 
 lib/librte_lpm/rte_lpm.h  |  2 -
 lib/librte_lpm/rte_lpm6.c | 62 +++
 lib/librte_mempool/Makefile   |  3 +-
 lib/librte_mempool/rte_mempool.c  | 37 +---
 lib/librte_mempool/rte_mempool.h  |  2 -
 lib/librte_ring/Makefile  |  4 +-
 lib/librte_ring/rte_ring.c| 33 +++---
 lib/librte_ring/rte_ring.h|  2 -
 21 files changed, 402 insertions(+), 119 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH 7/7] rte_lpm6: make lpm6 tailq fully local

2014-06-13 Thread Anatoly Burakov
---
 lib/librte_lpm/rte_lpm6.c | 55 +--
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 56c74a1..36cb9fc 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -77,7 +77,7 @@ enum valid_flag {
VALID
 };

-TAILQ_HEAD(rte_lpm6_list, rte_lpm6);
+TAILQ_HEAD(rte_lpm6_list, rte_tailq_entry);

 /** Tbl entry structure. It is the same for both tbl24 and tbl8 */
 struct rte_lpm6_tbl_entry {
@@ -99,8 +99,6 @@ struct rte_lpm6_rule {

 /** LPM6 structure. */
 struct rte_lpm6 {
-   TAILQ_ENTRY(rte_lpm6) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM6_NAMESIZE];/**< Name of the lpm. */
uint32_t max_rules;  /**< Max number of rules. */
@@ -149,6 +147,7 @@ rte_lpm6_create(const char *name, int socket_id,
 {
char mem_name[RTE_LPM6_NAMESIZE];
struct rte_lpm6 *lpm = NULL;
+   struct rte_tailq_entry *te;
uint64_t mem_size, rules_size;
struct rte_lpm6_list *lpm_list;

@@ -179,19 +178,28 @@ rte_lpm6_create(const char *name, int socket_id,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* Guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm6 *) te->data;
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry!\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm6 *)rte_zmalloc_socket(mem_name, (size_t)mem_size,
CACHE_LINE_SIZE, socket_id);

if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -201,6 +209,7 @@ rte_lpm6_create(const char *name, int socket_id,
if (lpm->rules_tbl == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
rte_free(lpm);
+   rte_free(te);
goto exit;
}

@@ -209,7 +218,9 @@ rte_lpm6_create(const char *name, int socket_id,
lpm->number_tbl8s = config->number_tbl8s;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -223,7 +234,8 @@ exit:
 struct rte_lpm6 *
 rte_lpm6_find_existing(const char *name)
 {
-   struct rte_lpm6 *l;
+   struct rte_lpm6 *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm6_list *lpm_list;

/* Check that we have an initialised tail queue */
@@ -234,14 +246,17 @@ rte_lpm6_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm6 *) te->data;
if (strncmp(name, l->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -252,13 +267,31 @@ rte_lpm6_find_existing(const char *name)
 void
 rte_lpm6_free(struct rte_lpm6 *lpm)
 {
+   struct rte_lpm6_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, lpm);
-   rte_free(lpm->rules_tbl);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm6_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL)
+   return;
+
+   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, te);
rte_free(lpm);
+   rte_free(te);
 }

 /*
-- 
1.8.1.4



[dpdk-dev] [PATCH 6/7] rte_lpm: make lpm tailq fully local

2014-06-13 Thread Anatoly Burakov
---
 lib/librte_lpm/rte_lpm.c | 54 
 lib/librte_lpm/rte_lpm.h |  2 --
 2 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 592750e..18a0cc0 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -56,7 +56,7 @@

 #include "rte_lpm.h"

-TAILQ_HEAD(rte_lpm_list, rte_lpm);
+TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);

 #define MAX_DEPTH_TBL24 24

@@ -118,24 +118,29 @@ depth_to_range(uint8_t depth)
 struct rte_lpm *
 rte_lpm_find_existing(const char *name)
 {
-   struct rte_lpm *l;
+   struct rte_lpm *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) 
== NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm *) te->data;
if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -149,6 +154,7 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
 {
char mem_name[RTE_LPM_NAMESIZE];
struct rte_lpm *lpm = NULL;
+   struct rte_tailq_entry *te;
uint32_t mem_size;
struct rte_lpm_list *lpm_list;

@@ -176,18 +182,27 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm *) te->data;
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm *)rte_zmalloc_socket(mem_name, mem_size,
CACHE_LINE_SIZE, socket_id);
if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -195,7 +210,9 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
lpm->max_rules = max_rules;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -209,12 +226,31 @@ exit:
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+   struct rte_lpm_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, lpm);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL)
+   return;
+
+   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, te);
rte_free(lpm);
+   rte_free(te);
 }

 /*
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index d35565d..308f5ef 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -132,8 +132,6 @@ struct rte_lpm_rule_info {

 /** @internal LPM structure. */
 struct rte_lpm {
-   TAILQ_ENTRY(rte_lpm) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM_NAMESIZE];/**< Name of the lpm. */
int mem_location; /**< @deprecated @see RTE_LPM_HEAP and 
RTE_LPM_MEMZONE. */
-- 
1.8.1.4



[dpdk-dev] [PATCH 5/7] rte_mempool: make mempool tailq fully local

2014-06-13 Thread Anatoly Burakov
---
 lib/librte_mempool/Makefile  |  3 ++-
 lib/librte_mempool/rte_mempool.c | 37 -
 lib/librte_mempool/rte_mempool.h |  2 --
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index c79b306..9939e10 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -44,7 +44,8 @@ endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

-# this lib needs eal
+# this lib needs eal, rte_ring and rte_malloc
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_eal lib/librte_ring
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7eebf7f..736e854 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,7 +61,7 @@

 #include "rte_mempool.h"

-TAILQ_HEAD(rte_mempool_list, rte_mempool);
+TAILQ_HEAD(rte_mempool_list, rte_tailq_entry);

 #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5

@@ -404,6 +405,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
char mz_name[RTE_MEMZONE_NAMESIZE];
char rg_name[RTE_RING_NAMESIZE];
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_ring *r;
const struct rte_memzone *mz;
size_t mempool_size;
@@ -501,6 +503,13 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
}
}

+   /* try to allocate tailq entry */
+   te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, MEMPOOL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
/*
 * If user provided an external memory buffer, then use it to
 * store mempool objects. Otherwise reserve memzone big enough to
@@ -527,8 +536,10 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * no more memory: in this case we loose previously reserved
 * space for the as we cannot free it
 */
-   if (mz == NULL)
+   if (mz == NULL) {
+   rte_free(te);
goto exit;
+   }

if (rte_eal_has_hugepages()) {
startaddr = (void*)mz->addr;
@@ -587,7 +598,9 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

mempool_populate(mp, n, 1, obj_init, obj_init_arg);

-   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, mp);
+   te->data = (void *) mp;
+
+   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, te);

 exit:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
@@ -812,6 +825,7 @@ void
 rte_mempool_list_dump(FILE *f)
 {
const struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -822,7 +836,8 @@ rte_mempool_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
rte_mempool_dump(f, mp);
}

@@ -834,6 +849,7 @@ struct rte_mempool *
 rte_mempool_lookup(const char *name)
 {
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -844,15 +860,18 @@ rte_mempool_lookup(const char *name)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
if (strncmp(name, mp->name, RTE_MEMPOOL_NAMESIZE) == 0)
break;
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOOL_RWLOCK);

-   if (mp == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return mp;
 }
@@ -860,7 +879,7 @@ rte_mempool_lookup(const char *name)
 void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
  void *arg)
 {
-   struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te = NULL;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -871,8 +890,8 @@ void rte_mempool_walk(void (*func)(const struct rte_mempool 
*, void *),

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
-   (*func)(mp, arg);
+   TAILQ_FOREACH(te, mempool_list, next) {
+   (*func)((struct rte_mempool *) te->data, arg);
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOOL_RWLOCK);
diff --git 

[dpdk-dev] [PATCH 4/7] rte_fbk_hash: make rte_fbk_hash tailq fully local

2014-06-13 Thread Anatoly Burakov
---
 lib/librte_hash/rte_fbk_hash.c | 66 +-
 lib/librte_hash/rte_fbk_hash.h |  3 --
 2 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/lib/librte_hash/rte_fbk_hash.c b/lib/librte_hash/rte_fbk_hash.c
index 4d67554..e566f48 100644
--- a/lib/librte_hash/rte_fbk_hash.c
+++ b/lib/librte_hash/rte_fbk_hash.c
@@ -54,7 +54,7 @@

 #include "rte_fbk_hash.h"

-TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
+TAILQ_HEAD(rte_fbk_hash_list, rte_tailq_entry);

 /**
  * Performs a lookup for an existing hash table, and returns a pointer to
@@ -69,24 +69,29 @@ TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
 struct rte_fbk_hash_table *
 rte_fbk_hash_find_existing(const char *name)
 {
-   struct rte_fbk_hash_table *h;
+   struct rte_fbk_hash_table *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_fbk_hash_list *fbk_hash_list;

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   h = (struct rte_fbk_hash_table *) te->data;
if (strncmp(name, h->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -104,6 +109,7 @@ struct rte_fbk_hash_table *
 rte_fbk_hash_create(const struct rte_fbk_hash_params *params)
 {
struct rte_fbk_hash_table *ht = NULL;
+   struct rte_tailq_entry *te;
char hash_name[RTE_FBK_HASH_NAMESIZE];
const uint32_t mem_size =
sizeof(*ht) + (sizeof(ht->t[0]) * params->entries);
@@ -112,7 +118,8 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -134,20 +141,28 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(ht, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   ht = (struct rte_fbk_hash_table *) te->data;
if (strncmp(params->name, ht->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
-   if (ht != NULL)
+   if (te != NULL)
+   goto exit;
+
+   te = rte_zmalloc("FBK_HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
goto exit;
+   }

/* Allocate memory for table. */
-   ht = (struct rte_fbk_hash_table *)rte_malloc_socket(hash_name, mem_size,
+   ht = (struct rte_fbk_hash_table *)rte_zmalloc_socket(hash_name, 
mem_size,
0, params->socket_id);
-   if (ht == NULL)
+   if (ht == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate fbk hash table\n");
+   rte_free(te);
goto exit;
-
-   memset(ht, 0, mem_size);
+   }

/* Set up hash table context. */
rte_snprintf(ht->name, sizeof(ht->name), "%s", params->name);
@@ -169,7 +184,9 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
ht->init_val = RTE_FBK_HASH_INIT_VAL_DEFAULT;
}

-   TAILQ_INSERT_TAIL(fbk_hash_list, ht, next);
+   te->data = (void *) ht;
+
+   TAILQ_INSERT_TAIL(fbk_hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +203,31 @@ exit:
 void
 rte_fbk_hash_free(struct rte_fbk_hash_table *ht)
 {
+   struct rte_tailq_entry *te;
+   struct rte_fbk_hash_list *fbk_hash_list;
+
if (ht == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list, ht);
+   /* check that we have an initialised tail queue */
+   if ((fbk_hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   /* find out tailq entry */
+  

[dpdk-dev] [PATCH 1/7] rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer

2014-06-13 Thread Anatoly Burakov
---
 app/test/test_tailq.c | 33 ---
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test/test_tailq.c b/app/test/test_tailq.c
index 67da009..c9b53ee 100644
--- a/app/test/test_tailq.c
+++ b/app/test/test_tailq.c
@@ -52,16 +52,16 @@

 #define DEFAULT_TAILQ (RTE_TAILQ_NUM)

-static struct rte_dummy d_elem;
+static struct rte_tailq_entry d_elem;

 static int
 test_tailq_create(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;
unsigned i;

/* create a first tailq and check its non-null */
-   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error allocating dummy_q0\n");

@@ -70,13 +70,14 @@ test_tailq_create(void)
TAILQ_INSERT_TAIL(d_head, _elem, next);

/* try allocating dummy_q0 again, and check for failure */
-   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head) == NULL)
+   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head) == 
NULL)
do_return("Error, non-null result returned when attemption to "
"re-allocate a tailq\n");

/* now fill up the tailq slots available and check we get an error */
for (i = RTE_TAILQ_NUM; i < RTE_MAX_TAILQ; i++){
-   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i, rte_dummy_head)) == 
NULL)
+   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i,
+   rte_tailq_entry_head)) == NULL)
break;
}

@@ -91,10 +92,10 @@ static int
 test_tailq_lookup(void)
 {
/* run successful  test - check result is found */
-   struct rte_dummy_head *d_head;
-   struct rte_dummy *d_ptr;
+   struct rte_tailq_entry_head *d_head;
+   struct rte_tailq_entry *d_ptr;

-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error with tailq lookup\n");

@@ -104,7 +105,7 @@ test_tailq_lookup(void)
"expected element not found\n");

/* now try a bad/error lookup */
-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_tailq_entry_head);
if (d_head != NULL)
do_return("Error, lookup does not return NULL for bad tailq 
name\n");

@@ -115,7 +116,7 @@ test_tailq_lookup(void)
 static int
 test_tailq_deprecated(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;

/* since TAILQ_RESERVE is not able to create new tailqs,
 * we should find an existing one (IOW, RTE_TAILQ_RESERVE behaves 
identical
@@ -123,29 +124,29 @@ test_tailq_deprecated(void)
 *
 * PCI_RESOURCE_LIST tailq is guaranteed to
 * be present in any DPDK app. */
-   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

-   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

/* try doing that with non-existent names */
-   d_head = RTE_TAILQ_RESERVE("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

/* try doing the same with NULL names */
-   d_head = RTE_TAILQ_RESERVE(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/librte_eal/common/eal_common_tailqs.c
index f294a58..db9a185 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -118,7 +118,7 @@ rte_dump_tailq(FILE *f)
rte_rwlock_read_lock(>qlock);
for (i=0; 

[dpdk-dev] [PATCH 0/7] Make DPDK tailqs fully local

2014-06-13 Thread Anatoly Burakov
This issue was reported by OVS-DPDK project, and the fix should go to
upstream DPDK. This is not memnic-related - this is to do with
DPDK's rte_ivshmem library.

Every DPDK data structure has a corresponding TAILQ reserved for it in
the runtime config file. Those TAILQs are fully local to the process,
however most data structures contain pointers to next entry in the
TAILQ.

Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

Anatoly Burakov (7):
  rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
  rte_ring: make ring tailq completely local
  rte_hash: make rte_hash tailq fully local
  rte_fbk_hash: make rte_fbk_hash tailq fully local
  rte_mempool: make mempool tailq fully local
  rte_lpm: make lpm tailq fully local
  rte_lpm6: make lpm6 tailq fully local

 app/test/test_tailq.c | 33 
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +++--
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 14 ++-
 lib/librte_hash/rte_fbk_hash.c| 66 ---
 lib/librte_hash/rte_fbk_hash.h|  3 --
 lib/librte_hash/rte_hash.c| 54 -
 lib/librte_hash/rte_hash.h|  2 -
 lib/librte_lpm/rte_lpm.c  | 54 -
 lib/librte_lpm/rte_lpm.h  |  2 -
 lib/librte_lpm/rte_lpm6.c | 55 --
 lib/librte_mempool/Makefile   |  3 +-
 lib/librte_mempool/rte_mempool.c  | 37 -
 lib/librte_mempool/rte_mempool.h  |  2 -
 lib/librte_ring/Makefile  |  4 +-
 lib/librte_ring/rte_ring.c| 33 
 lib/librte_ring/rte_ring.h|  2 -
 17 files changed, 278 insertions(+), 97 deletions(-)

-- 
1.8.1.4



[dpdk-dev] [PATCH v6 20/20] setup script: adding support for VFIO to setup.sh

2014-06-13 Thread Anatoly Burakov
Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.
---
 tools/setup.sh | 157 +++--
 1 file changed, 142 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index a54f65d..369e09e 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }

 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+   echo "Unloading any existing VFIO module"
+   /sbin/lsmod | grep -s vfio > /dev/null
+   if [ $? -eq 0 ] ; then
+   sudo /sbin/rmmod vfio-pci
+   sudo /sbin/rmmod vfio_iommu_type1
+   sudo /sbin/rmmod vfio
+   fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+   remove_vfio_module
+
+   VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+   echo "Loading VFIO module"
+   /sbin/lsmod | grep -s vfio_pci > /dev/null
+   if [ $? -ne 0 ] ; then
+   if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+   sudo /sbin/modprobe vfio-pci
+   fi
+   fi
+
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # check if /dev/vfio/vfio exists - that way we
+   # know we either loaded the module, or it was
+   # compiled into the kernel
+   if [ ! -e /dev/vfio/vfio ] ; then
+   echo "## ERROR: VFIO not found!"
+   fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }

 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # make sure regular user can access everything inside /dev/vfio
+   echo "chmod /dev/vfio/*"
+   sudo /usr/bin/chmod 0666 /dev/vfio/*
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # since permissions are only to be set when running as
+   # regular user, we only check ulimit here
+   #
+   # warn if regular user is only allowed
+   # to memlock <64M of memory
+   MEMLOCK_AMNT=`ulimit -l`
+
+   if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+   MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+   echo ""
+   echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+   echo ""
+   echo "This is the maximum amount of memory you will be"
+   echo "able to use with DPDK and VFIO if run as current user."
+   echo -n "To change this, please adjust limits.conf memlock "
+   echo "limit for current user."
+
+   if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+   echo ""
+   echo "## WARNING: memlock limit is less than 64MB"
+   echo -n "## DPDK with VFIO may not be able to 
initialize "
+   echo "if run as current user."
+   fi
+   fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,25 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+   if /sbin/lsmod  | grep -q vfio_pci ; then
+   ${RTE_SDK}/tools/dpdk_nic_bind.py --status
+   echo ""
+   echo -n "Enter PCI address of device to bind to VFIO driver: "
+   read PCI_PATH
+   sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH &&
+   echo "OK"
+   else
+   echo "# Please load the 'vfio-pci' kernel module before 
querying or "
+   echo "# adjusting NIC device bindings"
+   fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
if  /sbin/lsmod  | grep -q igb_uio ; then
${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +512,29 @@ step2_func()
TEXT[1]="Insert IGB UIO module"
FUNC[1]="load_igb_uio_module"

-   TEXT[2]="Insert KNI module"
-   FUNC[2]="load_kni_module"
+   TEXT[2]="Insert VFIO module"
+   FUNC[2]="load_vfio_module"
+
+   TEXT[3]="Insert KNI module"
+   FUNC[3]="load_kni_module"

-   TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-   FUNC[3]="set_non_numa_pages"
+   TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+   FUNC[4]="set_non_numa_pages"

-   TEXT[4]="Setup hugepage mappings for NUMA systems"
-   

[dpdk-dev] [PATCH v6 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind

2014-06-13 Thread Anatoly Burakov
Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 -
 tools/setup.sh  | 16 +-
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index e87a05e..42e845f 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]

 def usage():
 '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):

 def check_modules():
 '''Checks that igb_uio is loaded'''
+global dpdk_drivers

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
 fd.close()
-mod = "igb_uio"
+
+# list of supported modules
+mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]

 # first check if module is loaded
-found = False
 for line in loaded_mods:
-if line.startswith(mod):
-found = True
-break
-if not found:
-print "Error - module %s not loaded" %mod
+for mod in mods:
+if line.startswith(mod["Name"]):
+mod["Found"] = True
+# special case for vfio_pci (module is named vfio-pci,
+# but its .ko is named vfio_pci)
+elif line.replace("_", "-").startswith(mod["Name"]):
+mod["Found"] = True
+
+# check if we have at least one loaded module
+if True not in [mod["Found"] for mod in mods]:
+print "Error - no supported modules are loaded"
 sys.exit(1)

+# change DPDK driver list to only contain drivers that are loaded
+dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
 the pci addresses (domain:bus:slot.func). The values are themselves
 dictionaries - one for each NIC.'''
 global devices
+global dpdk_drivers

 # clear any old data
 devices = {}
@@ -240,10 +254,11 @@ def get_nic_details():

 # add igb_uio to list of supporting modules if needed
 if "Module_str" in devices[d]:
-if "igb_uio" not in devices[d]["Module_str"]:
-devices[d]["Module_str"] = devices[d]["Module_str"] + 
",igb_uio"
+for driver in dpdk_drivers:
+if driver not in devices[d]["Module_str"]:
+devices[d]["Module_str"] = devices[d]["Module_str"] + 
",%s" % driver
 else:
-devices[d]["Module_str"] = "igb_uio"
+devices[d]["Module_str"] = ",".join(dpdk_drivers)

 # make sure the driver and module strings do not have any duplicates
 if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
 dev["Driver_str"] = "" # clear driver string

 # if we are binding to one of DPDK drivers, add PCI id's to that driver
-if driver == "igb_uio":
+if driver in dpdk_drivers:
 filename = "/sys/bus/pci/drivers/%s/new_id" % driver
 try:
 f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
 '''Function called when the script is passed the "--status" option. 
Displays
 to the user what devices are bound to the igb_uio driver, the kernel driver
 or to no driver'''
+global dpdk_drivers
 kernel_drv = []
-uio_drv = []
+dpdk_drv = []
 no_drv = []
+
 # split our list of devices into the three categories above
 for d in devices.keys():
 if not has_driver(d):
 no_drv.append(devices[d])
 continue
-if devices[d]["Driver_str"] == "igb_uio":
-uio_drv.append(devices[d])
+if devices[d]["Driver_str"] in dpdk_drivers:
+dpdk_drv.append(devices[d])
 else:
 kernel_drv.append(devices[d])

 # print each category separately, so we can clearly see what's used by DPDK
-display_devices("Network devices using IGB_UIO driver", uio_drv, \
+display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
 "drv=%(Driver_str)s unused=%(Module_str)s")
 display_devices("Network devices using kernel driver", kernel_drv,
 "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s 
%(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index c3fbd4d..a54f65d 100755
--- a/tools/setup.sh
+++ 

[dpdk-dev] [PATCH v6 18/20] igb_uio: Removed PCI ID table from igb_uio

2014-06-13 Thread Anatoly Burakov
Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-
 tools/igb_uio_bind.py | 118 +++---
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 8e467a2..60b8ca4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;

-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include 
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)

 static struct pci_driver igbuio_pci_driver = {
.name = "igb_uio",
-   .id_table = igbuio_pci_ids,
+   .id_table = NULL,
.probe = igbuio_pci_probe,
.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 18dbeda..e87a05e 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []

 def usage():
 '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
 return path

 def check_modules():
-'''Checks that the needed modules (igb_uio) is loaded, and then
-determine from the .ko file, what its supported device ids are'''
-global module_dev_ids
+'''Checks that igb_uio is loaded'''

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
@@ -166,40 +162,35 @@ def check_modules():
 print "Error - module %s not loaded" %mod
 sys.exit(1)

-# now find the .ko and get list of supported vendor/dev-ids
-modpath = find_module(mod)
-if modpath is None:
-print "Cannot find module file %s" % (mod + ".ko")
-sys.exit(1)
-depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-for line in depmod_output:
-if not line.startswith("alias"):
-continue
-if not line.endswith(mod):
-continue
-lineparts = line.split()
-if not(lineparts[1].startswith("pci:")):
-continue;
-else:
-lineparts[1] = lineparts[1][4:]
-vendor = lineparts[1][:9]
-device = lineparts[1][9:18]
-if vendor.startswith("v") and device.startswith("d"):
-module_dev_ids.append({"Vendor": int(vendor[1:],16),
-   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-'''return true if device is supported by igb_uio, false otherwise'''
-for dev in module_dev_ids:
-if (dev["Vendor"] == devices[dev_id]["Vendor"] and
-dev["Device"] == devices[dev_id]["Device"]):
-return True
-return False
-
 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str" in devices[dev_id]

+def get_pci_device_details(dev_id):
+  

[dpdk-dev] [PATCH v6 17/20] test app: adding unit tests for VFIO EAL command-line parameter

2014-06-13 Thread Anatoly Burakov
Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.
---
 app/test/test_eal_flags.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 298c11a..ea4a567 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
const char *argv11[] = {prgname, "--file-prefix=virtaddr",
"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};

+   /* try running with --vfio-intr INTx flag */
+   const char *argv12[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+   /* try running with --vfio-intr MSI flag */
+   const char *argv13[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+   /* try running with --vfio-intr MSI-X flag */
+   const char *argv14[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+   /* try running with --vfio-intr invalid flag */
+   const char *argv15[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=invalid"};
+

if (launch_proc(argv0) == 0) {
printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
printf("Error - process did not run ok with --base-virtaddr 
parameter\n");
return -1;
}
+   if (launch_proc(argv12) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr INTx parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv13) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv14) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI-X parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv15) == 0) {
+   printf("Error - process run ok with "
+   "--vfio-intr invalid parameter\n");
+   return -1;
+   }
return 0;
 }
 #endif
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 16/20] eal: make --no-huge use mmap instead of malloc

2014-06-13 Thread Anatoly Burakov
This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index d9cfb09..ae43f9e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)

/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
-   addr = malloc(internal_config.memory);
+   addr = mmap(NULL, internal_config.memory, PROT_READ | 
PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   if (addr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+   strerror(errno));
+   return -1;
+   }
mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
mcfg->memseg[0].addr = addr;
mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line

2014-06-13 Thread Anatoly Burakov
Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index faa4c93..6994303 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0"xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR"vfio-intr"

 #define RTE_EAL_BLACKLIST_SIZE 0x100

@@ -360,6 +361,8 @@ eal_usage(const char *prgname)
   "   (ex: --vdev=eth_pcap0,iface=eth2).\n"
   "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of native 
RDTSC\n"
   "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+  "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO "
+  "(legacy|msi|msix)\n"
   "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by 
hotplug)\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -578,6 +581,28 @@ eal_parse_base_virtaddr(const char *arg)
return 0;
 }

+static int
+eal_parse_vfio_intr(const char *mode)
+{
+   unsigned i;
+   static struct {
+   const char *name;
+   enum rte_intr_mode value;
+   } map[] = {
+   { "legacy", RTE_INTR_MODE_LEGACY },
+   { "msi", RTE_INTR_MODE_MSI },
+   { "msix", RTE_INTR_MODE_MSIX },
+   };
+
+   for (i = 0; i < RTE_DIM(map); i++) {
+   if (!strcmp(mode, map[i].name)) {
+   internal_config.vfio_intr_mode = map[i].value;
+   return 0;
+   }
+   }
+   return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -632,6 +657,7 @@ eal_parse_args(int argc, char **argv)
{OPT_PCI_BLACKLIST, 1, 0, 0},
{OPT_VDEV, 1, 0, 0},
{OPT_SYSLOG, 1, NULL, 0},
+   {OPT_VFIO_INTR, 1, NULL, 0},
{OPT_BASE_VIRTADDR, 1, 0, 0},
{OPT_XEN_DOM0, 0, 0, 0},
{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -828,6 +854,14 @@ eal_parse_args(int argc, char **argv)
return -1;
}
}
+   else if (!strcmp(lgopts[option_index].name, 
OPT_VFIO_INTR)) {
+   if (eal_parse_vfio_intr(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameters 
for --"
+   OPT_VFIO_INTR "\n");
+   eal_usage(prgname);
+   return -1;
+   }
+   }
else if (!strcmp(lgopts[option_index].name, 
OPT_CREATE_UIO_DEV)) {
internal_config.create_uio_dev = 1;
}
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 14/20] pci: enable VFIO device binding

2014-06-13 Thread Anatoly Burakov
Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 44 +--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index c7cd38e..3b94b6f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,30 @@ error:
return -1;
 }

+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+   int ret, mapped = 0;
+
+   /* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+   if (pci_vfio_is_enabled()) {
+   ret = pci_vfio_map_resource(dev);
+   if (ret == 0)
+   mapped = 1;
+   else if (ret < 0)
+   return ret;
+   }
+#endif
+   /* map resources for devices that use igb_uio */
+   if (!mapped) {
+   ret = pci_uio_map_resource(dev);
+   if (ret != 0)
+   return ret;
+   }
+   return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +424,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
+   int ret;
struct rte_pci_id *id_table;
-   int ret = 0;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -437,7 +461,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
-   ret = pci_uio_map_resource(dev);
+   ret = pci_map_device(dev);
if (ret != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
@@ -474,5 +498,21 @@ rte_eal_pci_init(void)
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
+#ifdef VFIO_PRESENT
+   pci_vfio_enable();
+
+   if (pci_vfio_is_enabled()) {
+
+   /* if we are primary process, create a thread to communicate 
with
+* secondary processes. the thread will use a socket to wait for
+* requests from secondary process to send open file 
descriptors,
+* because VFIO does not allow multiple open descriptors on a 
group or
+* VFIO container.
+*/
+   if (internal_config.process_type == RTE_PROC_PRIMARY &&
+   pci_vfio_mp_sync_setup() < 0)
+   return -1;
+   }
+#endif
return 0;
 }
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 13/20] vfio: add multiprocess support.

2014-06-13 Thread Anatoly Burakov
Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  84 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 497 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 91012fc..756d6b0 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 867467b..4de6061 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -304,7 +304,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
 }

 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
int ret, vfio_container_fd;
@@ -334,13 +334,38 @@ pci_vfio_get_container_fd(void)
}

return vfio_container_fd;
+   } else {
+   /*
+* if we're in a secondary process, request container fd from 
the
+* primary process via our socket
+*/
+   int socket_fd;
+
+   socket_fd = vfio_mp_sync_connect_to_primary();
+   if (socket_fd < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) 
< 0) {
+   RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+   if (vfio_container_fd < 0) {
+   RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   close(socket_fd);
+   return vfio_container_fd;
}

return -1;
 }

 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
int i;
@@ -376,6 +401,47 @@ pci_vfio_get_group_fd(int iommu_group_no)
vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = 
vfio_group_fd;
return vfio_group_fd;
}
+   /* if we're in a secondary process, request group fd from the primary
+* process via our socket
+*/
+   else {
+   int socket_fd, ret;
+
+   socket_fd = vfio_mp_sync_connect_to_primary();
+
+   if (socket_fd < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, 

[dpdk-dev] [PATCH v6 12/20] vfio: create mapping code for VFIO

2014-06-13 Thread Anatoly Burakov
Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering
---
 lib/librte_eal/linuxapp/eal/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/eal.c  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 709 +
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |   6 +
 6 files changed, 753 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 00a2115..91012fc 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE

 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 070bdc9..faa4c93 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -649,6 +649,8 @@ eal_parse_args(int argc, char **argv)
internal_config.force_sockets = 0;
internal_config.syslog_facility = LOG_DAEMON;
internal_config.xen_dom0_support = 0;
+   /* if set to NONE, interrupt mode is determined automatically */
+   internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 000..867467b
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,709 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) 

[dpdk-dev] [PATCH v6 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c

2014-06-13 Thread Anatoly Burakov
eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 6e320ec..00a2115 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif

 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h 
rte_dom0_common.h
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 10/20] interrupts: Add support for VFIO interrupts

2014-06-13 Thread Anatoly Burakov
Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 287 -
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 286 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index bd9fc5f..dc2668a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -44,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -66,6 +66,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_vfio.h"

 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)

@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
int uio_intr_count;  /* for uio device */
+#ifdef VFIO_PRESENT
+   uint64_t vfio_intr_count;/* for vfio device */
+#endif
uint64_t timerfd_num;/* for timerfd */
char charbuf[16];/* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;

+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   /* enable INTx */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+   fd_ptr = (int *) _set->data;
+   *fd_ptr = intr_handle->fd;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* unmask INTx after enabling */
+   memset(irq_set, 0, len);
+   len = sizeof(struct vfio_irq_set);
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+
+   len = sizeof(struct vfio_irq_set);
+
+   /* mask interrupts before disabling */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* disable INTx*/
+   memset(irq_set, 0, len);
+   irq_set->argsz = len;
+   irq_set->count = 0;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+   "Error disabling INTx interrupts for fd %d\n", 
intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+   int len, ret;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   struct vfio_irq_set *irq_set;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+   irq_set->start = 0;
+   fd_ptr = (int *) _set->data;
+  

[dpdk-dev] [PATCH v6 09/20] vfio: add VFIO header

2014-06-13 Thread Anatoly Burakov
Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include 
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include 
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 08/20] vfio: add support for VFIO in Linuxapp targets

2014-06-13 Thread Anatoly Burakov
Add VFIO compilation option to common Linuxapp config.
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5f6b8f0..63ae903 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 07/20] igb_uio: Moved interrupt type out of igb_uio

2014-06-13 Thread Anatoly Burakov
Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.
---
 lib/librte_eal/common/Makefile |  1 +
 lib/librte_eal/common/include/rte_pci.h|  1 +
 .../common/include/rte_pci_dev_feature_defs.h  | 46 +
 .../common/include/rte_pci_dev_features.h  | 44 
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  | 48 +-
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 915cef1..7f27966 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -40,6 +40,7 @@ INC += rte_string_fns.h rte_cpuflags.h rte_version.h 
rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
 INC += rte_common_vect.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 3857584..3608ee0 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include 
 #include 
 #include 
+
 #include 

 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h 
b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+   RTE_INTR_MODE_NONE = 0,
+   RTE_INTR_MODE_LEGACY,
+   RTE_INTR_MODE_MSI,
+   RTE_INTR_MODE_MSIX,
+   RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h 
b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or 

[dpdk-dev] [PATCH v6 06/20] igb_uio: make igb_uio compilation optional

2014-06-13 Thread Anatoly Burakov
Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.
---
 config/common_linuxapp   | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7c143eb..5f6b8f0 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index 9ff167c..8fcfdf6 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4



[dpdk-dev] [PATCH v6 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING

2014-06-13 Thread Anatoly Burakov
Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.
---
 app/test/test_pci.c | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c| 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c   | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 4 ++--
 lib/librte_pmd_virtio/virtio_ethdev.c   | 2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 680a095..40095c6 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
  struct rte_pci_device *dev);

 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */

@@ -90,7 +90,7 @@ struct rte_pci_driver my_driver = {
.name = "test_driver",
.devinit = my_driver_init,
.id_table = my_driver_id,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };

 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 03200f3..dad5418 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -475,7 +475,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 0;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
ret = pci_uio_map_resource(dev);
if (ret != 0)
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index b56d7d3..3857584 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };

-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 49b2a68..c7cd38e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 1;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
ret = pci_uio_map_resource(dev);
if (ret != 0)
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 398838f..f025338 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -280,7 +280,7 @@ static struct eth_driver rte_em_pmd = {
{
.name = "rte_em_pmd",
.id_table = pci_id_em_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_em_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index 6e835c3..58ba5d3 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -603,7 +603,7 @@ static struct eth_driver rte_igb_pmd = {
{
.name = "rte_igb_pmd",
.id_table = pci_id_igb_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igb_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
@@ -616,7 +616,7 @@ static struct eth_driver rte_igbvf_pmd = {
{
.name = "rte_igbvf_pmd",
.id_table = pci_id_igbvf_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igbvf_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 

[dpdk-dev] [PATCH v6 04/20] pci: distinguish between legitimate failures and non-fatal errors

2014-06-13 Thread Anatoly Burakov
Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   |  8 +---
 lib/librte_eal/common/eal_common_pci.c| 16 +---
 lib/librte_eal/linuxapp/eal/eal_pci.c |  8 +---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index b560077..03200f3 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -217,7 +217,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (access(devname, O_RDWR) < 0) {
RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO 
driver, "
"skipping\n", loc->domain, loc->bus, 
loc->devid, loc->function);
-   return -1;
+   return 1;
}

/* save fd if in primary process */
@@ -440,6 +440,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
struct rte_pci_id *id_table;
+   int ret;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -476,8 +477,9 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d

if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
/* map resources for devices that use igb_uio */
-   if (pci_uio_map_resource(dev) < 0)
-   return -1;
+   ret = pci_uio_map_resource(dev);
+   if (ret != 0)
+   return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
/* unbind current driver */
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 4d877ea..af809a8 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)

 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
rc = rte_eal_pci_probe_one_driver(dr, dev);
if (rc < 0)
/* negative value is an error */
-   break;
+   return -1;
if (rc > 0)
/* positive value means driver not found */
continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
;
return 0;
}
-   return -1;
+   return 1;
 }

 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
struct rte_pci_device *dev = NULL;
struct rte_devargs *devargs;
int probe_all = 0;
+   int ret = 0;

if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   pci_probe_all_drivers(dev);
+   ret = pci_probe_all_drivers(dev);
else if (devargs != NULL &&
-   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-   pci_probe_all_drivers(dev) < 0)
+   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+   ret = pci_probe_all_drivers(dev);
+   if (ret < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, 
dev->addr.bus,
 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 2066608..49b2a68 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct 

[dpdk-dev] [PATCH v6 03/20] pci: fixing errors in a previous commit found by checkpatch

2014-06-13 Thread Anatoly Burakov
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  |   2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 112 +++--
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |   2 +-
 3 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a422e5f..2066608 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -99,7 +99,7 @@ error:

 /* map a particular resource from a file */
 void *
-pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
void *mapaddr;

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index c9a12a1..7c75593 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -74,7 +74,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps)
RTE_LOG(ERR, EAL,
"%s(): cannot parse offset of %s\n",
__func__, dirname);
-   return (-1);
+   return -1;
}

/* get mapping size */
@@ -84,7 +84,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps)
RTE_LOG(ERR, EAL,
"%s(): cannot parse size of %s\n",
__func__, dirname);
-   return (-1);
+   return -1;
}

/* get mapping physical address */
@@ -94,20 +94,21 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps)
RTE_LOG(ERR, EAL,
"%s(): cannot parse addr of %s\n",
__func__, dirname);
-   return (-1);
+   return -1;
}

if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
RTE_LOG(ERR, EAL,
"%s(): offset/size exceed system max value\n",
__func__);
-   return (-1);
+   return -1;
}

maps[i].offset = offset;
maps[i].size = size;
-}
-   return (i);
+   }
+
+   return i;
 }

 static int
@@ -140,12 +141,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
RTE_LOG(ERR, EAL,
"Cannot mmap device resource\n");
close(fd);
-   return (-1);
+   return -1;
}
/* fd is not needed in slave process, close it */
close(fd);
}
-   return (0);
+   return 0;
}

RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -214,15 +215,15 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 * or uio:uioX */

rte_snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-loc->domain, loc->bus, loc->devid, loc->function);
+   SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
+   loc->domain, loc->bus, loc->devid, loc->function);

dir = opendir(dirname);
if (dir == NULL) {
/* retry with the parent directory */
rte_snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-loc->domain, loc->bus, loc->devid, loc->function);
+   SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
+   loc->domain, loc->bus, loc->devid, 
loc->function);
dir = opendir(dirname);

if (dir == NULL) {
@@ -265,7 +266,8 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
return -1;

/* create uio device if we've been asked to */
-   if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, 
uio_num) < 0)
+   if (internal_config.create_uio_dev &&
+   pci_mknod_uio_dev(dstbuf, uio_num) < 0)
RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);

return uio_num;
@@ -293,7 +295,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)

/* secondary processes - use already recorded details */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-   return (pci_uio_map_secondary(dev));
+   return pci_uio_map_secondary(dev);

/* find uio resource */
uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -314,10 

[dpdk-dev] [PATCH v6 02/20] pci: move uio mapping code to a separate file

2014-06-13 Thread Anatoly Burakov
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 403 +---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 421 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 
 4 files changed, 492 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index dad1f79..6e320ec 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 29f1728..a422e5f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */

 #include 
-#include 
-#include 
 #include 
 #include 

@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"

 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct pci_map {
-   void *addr;
-   uint64_t offset;
-   uint64_t size;
-   uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-   TAILQ_ENTRY(mapped_pci_resource) next;
-
-   struct rte_pci_addr pci_addr;
-   char path[PATH_MAX];
-   int nb_maps;
-   struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;

 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }

 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
void *mapaddr;

@@ -147,342 +123,6 @@ fail:
return NULL;
 }

-#define OFF_MAX  ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-   int i;
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   uint64_t offset, size;
-
-   for (i = 0; i != nb_maps; i++) {
-
-   /* check if map directory exists */
-   rte_snprintf(dirname, sizeof(dirname),
-   "%s/maps/map%u", devname, i);
-
-   if (access(dirname, F_OK) != 0)
-   break;
-
-   /* get mapping offset */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/offset", dirname);
-   if (pci_parse_sysfs_value(filename, ) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse offset of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
-
-   /* get mapping size */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/size", dirname);
-   if (pci_parse_sysfs_value(filename, ) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse size of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
-
-   /* get mapping physical address */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/addr", dirname);
-   if (pci_parse_sysfs_value(filename, [i].phaddr) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse addr of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
-
-   if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-   RTE_LOG(ERR, EAL,
-   "%s(): offset/size exceed system max value\n",
-   __func__);
-   return (-1);
-   }
-
-   maps[i].offset = offset;
-   maps[i].size = size;
-}
-   return (i);
-}
-
-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{

[dpdk-dev] [PATCH v6 01/20] pci: move open() out of pci_map_resource, rename structs

2014-06-13 Thread Anatoly Burakov
Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 --
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index f809574..29f1728 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 

-#include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 

 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct uio_map {
+struct pci_map {
void *addr;
uint64_t offset;
uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-   TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
-   size_t nb_maps;
-   struct uio_map maps[PCI_MAX_RESOURCE];
+   int nb_maps;
+   struct pci_map maps[PCI_MAX_RESOURCE];
 };

-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;

-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:

 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-   int fd;
void *mapaddr;

-   /*
-* open devname, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   goto fail;
-   }
-
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
-   close(fd);
if (mapaddr == MAP_FAILED ||
(requested_addr != NULL && mapaddr != requested_addr)) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr,
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
+   __func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
goto fail;
@@ -186,10 +148,10 @@ fail:
 }

 #define OFF_MAX  ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t 
nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-   size_t i;
+   int i;
char dirname[PATH_MAX];
char filename[PATH_MAX];
uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map 
maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-size_t i;
-struct uio_resource *uio_res;
+   int fd, i;
+   struct mapped_pci_resource *uio_res;

-   TAILQ_FOREACH(uio_res, uio_res_list, next) {
+   TAILQ_FOREACH(uio_res, pci_res_list, next) {

/* skip this element if it doesn't match our PCI address */
if (memcmp(_res->pci_addr, >addr, sizeof(dev->addr)))
continue;

for (i = 0; i 

[dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK

2014-06-13 Thread Anatoly Burakov
This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 fixes:
* Fixed a couple of resource leaks

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

v4 fixes:
* Rebased on top of 5ebbb17281645b23359fbd49133bb639b63ba88c
* Fixed a typo in EAL command-line help text

v5 fixes:
* Fixed missing virtio change to RTE_PCI_DRV_NEED_MAPPING
* Fixed compile issue when VFIO was disabled (introduced in v3)

v6 fixes:
* Rebased on top of 36c248ebc629889fff4e7d9d17e109412ddf9ecf
* Fixed FreeBSD issue with failed unbinds (introduced in v1)
* Fixed a few issues found by checkpatch

Tested-by: Waterman Cao  

This patch has been tested by intel.
We tested this patch with the following functions:
* Layer-2 Forwarding support
* Sample commands test
* Packet forwarding checking
* Bind and unbind VFIO driver
* Compile igb_uio driver ( Linux kernel < 3.6)
* Interrupt model test under Legacy|msi|msix
All cases passed.

Please see test environment information :
Fedora 20 x86_64, Linux Kernel 3.13.6-200,
GCC 4.8.2 Intel Xeon CPU E5-2680 v2 @ 2.80GHz NIC: Intel Niantic 82599

Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL
command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c  |  36 +
 app/test/test_pci.c|   4 +-
 config/common_linuxapp |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c|  10 +-
 lib/librte_eal/common/Makefile |   1 +
 lib/librte_eal/common/eal_common_pci.c |  16 +-
 lib/librte_eal/common/include/rte_pci.h|   5 +-
 .../common/include/rte_pci_dev_feature_defs.h  |  46 ++
 .../common/include/rte_pci_dev_features.h  |  44 ++
 lib/librte_eal/linuxapp/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c  |  36 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 287 +++-
 lib/librte_eal/linuxapp/eal/eal_memory.c   |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 476 ++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 431 +++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 789 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c|   4 +-
 lib/librte_pmd_virtio/virtio_ethdev.c  |   2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c|   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}| 155 ++--
 tools/setup.sh | 173 -
 30 files changed, 2593 insertions(+), 589 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 

[dpdk-dev] [PATCH v5 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind

2014-06-10 Thread Anatoly Burakov
Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov 
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 -
 tools/setup.sh  | 16 +-
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index 33adcf4..1e517e7 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]

 def usage():
 '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):

 def check_modules():
 '''Checks that igb_uio is loaded'''
+global dpdk_drivers

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
 fd.close()
-mod = "igb_uio"
+
+# list of supported modules
+mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]

 # first check if module is loaded
-found = False
 for line in loaded_mods:
-if line.startswith(mod):
-found = True
-break
-if not found:
-print "Error - module %s not loaded" %mod
+for mod in mods:
+if line.startswith(mod["Name"]):
+mod["Found"] = True
+# special case for vfio_pci (module is named vfio-pci,
+# but its .ko is named vfio_pci)
+elif line.replace("_", "-").startswith(mod["Name"]):
+mod["Found"] = True
+
+# check if we have at least one loaded module
+if True not in [mod["Found"] for mod in mods]:
+print "Error - no supported modules are loaded"
 sys.exit(1)

+# change DPDK driver list to only contain drivers that are loaded
+dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
 the pci addresses (domain:bus:slot.func). The values are themselves
 dictionaries - one for each NIC.'''
 global devices
+global dpdk_drivers

 # clear any old data
 devices = {} 
@@ -240,10 +254,11 @@ def get_nic_details():

 # add igb_uio to list of supporting modules if needed
 if "Module_str" in devices[d]:
-if "igb_uio" not in devices[d]["Module_str"]:
-devices[d]["Module_str"] = devices[d]["Module_str"] + 
",igb_uio"
+for driver in dpdk_drivers:
+if driver not in devices[d]["Module_str"]:
+devices[d]["Module_str"] = devices[d]["Module_str"] + 
",%s" % driver
 else:
-devices[d]["Module_str"] = "igb_uio"
+devices[d]["Module_str"] = ",".join(dpdk_drivers)

 # make sure the driver and module strings do not have any duplicates
 if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
 dev["Driver_str"] = "" # clear driver string

 # if we are binding to one of DPDK drivers, add PCI id's to that driver
-if driver == "igb_uio":
+if driver in dpdk_drivers:
 filename = "/sys/bus/pci/drivers/%s/new_id" % driver
 try:
 f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
 '''Function called when the script is passed the "--status" option. 
Displays
 to the user what devices are bound to the igb_uio driver, the kernel driver
 or to no driver'''
+global dpdk_drivers
 kernel_drv = []
-uio_drv = []
+dpdk_drv = []
 no_drv = []
+
 # split our list of devices into the three categories above
 for d in devices.keys():
 if not has_driver(d):
 no_drv.append(devices[d])
 continue
-if devices[d]["Driver_str"] == "igb_uio":
-uio_drv.append(devices[d])
+if devices[d]["Driver_str"] in dpdk_drivers:
+dpdk_drv.append(devices[d])
 else:
 kernel_drv.append(devices[d])

 # print each category separately, so we can clearly see what's used by DPDK
-display_devices("Netwo

[dpdk-dev] [PATCH v5 18/20] igb_uio: Removed PCI ID table from igb_uio

2014-06-10 Thread Anatoly Burakov
Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-
 tools/igb_uio_bind.py | 118 +++---
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 7d5e6b4..6362b1c 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;

-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include 
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)

 static struct pci_driver igbuio_pci_driver = {
.name = "igb_uio",
-   .id_table = igbuio_pci_ids,
+   .id_table = NULL,
.probe = igbuio_pci_probe,
.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 824aa2b..33adcf4 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []

 def usage():
 '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
 return path

 def check_modules():
-'''Checks that the needed modules (igb_uio) is loaded, and then
-determine from the .ko file, what its supported device ids are'''
-global module_dev_ids
+'''Checks that igb_uio is loaded'''

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
@@ -165,41 +161,36 @@ def check_modules():
 if not found:
 print "Error - module %s not loaded" %mod
 sys.exit(1)
-
-# now find the .ko and get list of supported vendor/dev-ids
-modpath = find_module(mod)
-if modpath is None:
-print "Cannot find module file %s" % (mod + ".ko")
-sys.exit(1)
-depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-for line in depmod_output:
-if not line.startswith("alias"):
-continue
-if not line.endswith(mod):
-continue
-lineparts = line.split()
-if not(lineparts[1].startswith("pci:")):
-continue;
-else:
-lineparts[1] = lineparts[1][4:]
-vendor = lineparts[1][:9]
-device = lineparts[1][9:18]
-if vendor.startswith("v") and device.startswith("d"):
-module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-'''return true if device is supported by igb_uio, false otherwise'''
-for dev in module_dev_ids:
-if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-dev["Device"]

[dpdk-dev] [PATCH v5 17/20] test app: adding unit tests for VFIO EAL command-line parameter

2014-06-10 Thread Anatoly Burakov
Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_eal_flags.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..a0ee4e6 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
const char *argv11[] = {prgname, "--file-prefix=virtaddr",
"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};

+   /* try running with --vfio-intr INTx flag */
+   const char *argv12[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+   /* try running with --vfio-intr MSI flag */
+   const char *argv13[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+   /* try running with --vfio-intr MSI-X flag */
+   const char *argv14[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+   /* try running with --vfio-intr invalid flag */
+   const char *argv15[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=invalid"};
+

if (launch_proc(argv0) == 0) {
printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
printf("Error - process did not run ok with --base-virtaddr 
parameter\n");
return -1;
}
+   if (launch_proc(argv12) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr INTx parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv13) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv14) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI-X parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv15) == 0) {
+   printf("Error - process run ok with "
+   "--vfio-intr invalid parameter\n");
+   return -1;
+   }
return 0;
 }
 #endif
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 16/20] eal: make --no-huge use mmap instead of malloc

2014-06-10 Thread Anatoly Burakov
This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 8d1edd9..315214b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)

/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
-   addr = malloc(internal_config.memory);
+   addr = mmap(NULL, internal_config.memory, PROT_READ | 
PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   if (addr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+   strerror(errno));
+   return -1;
+   }
mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
mcfg->memseg[0].addr = addr;
mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line

2014-06-10 Thread Anatoly Burakov
Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index aeb5903..10c40fa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0"xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR"vfio-intr"

 #define RTE_EAL_BLACKLIST_SIZE 0x100

@@ -361,6 +362,8 @@ eal_usage(const char *prgname)
   "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
   "native RDTSC\n"
   "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+  "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO "
+  "(legacy|msi|msix)\n"
   "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by 
hotplug)\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +582,28 @@ eal_parse_base_virtaddr(const char *arg)
return 0;
 }

+static int
+eal_parse_vfio_intr(const char *mode)
+{
+   unsigned i;
+   static struct {
+   const char *name;
+   enum rte_intr_mode value;
+   } map[] = {
+   { "legacy", RTE_INTR_MODE_LEGACY },
+   { "msi", RTE_INTR_MODE_MSI },
+   { "msix", RTE_INTR_MODE_MSIX },
+   };
+
+   for (i = 0; i < RTE_DIM(map); i++) {
+   if (!strcmp(mode, map[i].name)) {
+   internal_config.vfio_intr_mode = map[i].value;
+   return 0;
+   }
+   }
+   return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +658,7 @@ eal_parse_args(int argc, char **argv)
{OPT_PCI_BLACKLIST, 1, 0, 0},
{OPT_VDEV, 1, 0, 0},
{OPT_SYSLOG, 1, NULL, 0},
+   {OPT_VFIO_INTR, 1, NULL, 0},
{OPT_BASE_VIRTADDR, 1, 0, 0},
{OPT_XEN_DOM0, 0, 0, 0},
{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -829,6 +855,14 @@ eal_parse_args(int argc, char **argv)
return -1;
}
}
+   else if (!strcmp(lgopts[option_index].name, 
OPT_VFIO_INTR)) {
+   if (eal_parse_vfio_intr(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameters 
for --"
+   OPT_VFIO_INTR "\n");
+   eal_usage(prgname);
+   return -1;
+   }
+   }
else if (!strcmp(lgopts[option_index].name, 
OPT_CREATE_UIO_DEV)) {
internal_config.create_uio_dev = 1;
}
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 14/20] pci: enable VFIO device binding

2014-06-10 Thread Anatoly Burakov
Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a0abec8..8a9cbf9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,27 @@ error:
return -1;
 }

+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+   int ret, mapped = 0;
+
+   /* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+   if (pci_vfio_is_enabled()) {
+   if ((ret = pci_vfio_map_resource(dev)) == 0)
+   mapped = 1;
+   else if (ret < 0)
+   return ret;
+   }
+#endif
+   /* map resources for devices that use igb_uio */
+   if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
+   return ret;
+
+   return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +421,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
+   int ret;
struct rte_pci_id *id_table;
-   int ret = 0;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -436,8 +457,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
}

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-   /* map resources for devices that use igb_uio */
-   if ((ret = pci_uio_map_resource(dev)) != 0)
+   if ((ret = pci_map_device(dev)) != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -473,5 +493,21 @@ rte_eal_pci_init(void)
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
+#ifdef VFIO_PRESENT
+   pci_vfio_enable();
+
+   if (pci_vfio_is_enabled()) {
+
+   /* if we are primary process, create a thread to communicate 
with
+* secondary processes. the thread will use a socket to wait for
+* requests from secondary process to send open file 
descriptors,
+* because VFIO does not allow multiple open descriptors on a 
group or
+* VFIO container.
+*/
+   if (internal_config.process_type == RTE_PROC_PRIMARY &&
+   pci_vfio_mp_sync_setup() < 0)
+   return -1;
+   }
+#endif
return 0;
 }
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 13/20] vfio: add multiprocess support.

2014-06-10 Thread Anatoly Burakov
Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  79 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index cf9f026..3c05edf 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index e1d6973..f0d4f55 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -303,7 +303,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
 }

 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
int ret, vfio_container_fd;
@@ -333,13 +333,36 @@ pci_vfio_get_container_fd(void)
}

return vfio_container_fd;
+   } else {
+   /*
+* if we're in a secondary process, request container fd from 
the
+* primary process via our socket
+*/
+   int socket_fd;
+   if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) 
< 0) {
+   RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+   if (vfio_container_fd < 0) {
+   RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   close(socket_fd);
+   return vfio_container_fd;
}

return -1;
 }

 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
int i;
@@ -375,6 +398,44 @@ pci_vfio_get_group_fd(int iommu_group_no)
vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = 
vfio_group_fd;
return vfio_group_fd;
}
+   /* if we're in a secondary process, request group fd from the primary
+* process via our socket
+*/
+   else {
+   int socket_fd, ret;
+   if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_m

[dpdk-dev] [PATCH v5 12/20] vfio: create mapping code for VFIO

2014-06-10 Thread Anatoly Burakov
Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/eal.c  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 706 +
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |   6 +
 6 files changed, 750 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 5f3be5f..cf9f026 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE

 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 9d2675b..aeb5903 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,8 @@ eal_parse_args(int argc, char **argv)
internal_config.force_sockets = 0;
internal_config.syslog_facility = LOG_DAEMON;
internal_config.xen_dom0_support = 0;
+   /* if set to NONE, interrupt mode is determined automatically */
+   internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 000..e1d6973
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,706 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILIT

[dpdk-dev] [PATCH v5 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c

2014-06-10 Thread Anatoly Burakov
eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index d958014..5f3be5f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif

 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h 
rte_dom0_common.h
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 10/20] interrupts: Add support for VFIO interrupts

2014-06-10 Thread Anatoly Burakov
Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 287 -
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 286 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..664e522 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -44,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -66,6 +66,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_vfio.h"

 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)

@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
int uio_intr_count;  /* for uio device */
+#ifdef VFIO_PRESENT
+   uint64_t vfio_intr_count;/* for vfio device */
+#endif
uint64_t timerfd_num;/* for timerfd */
char charbuf[16];/* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;

+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   /* enable INTx */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+   fd_ptr = (int *) _set->data;
+   *fd_ptr = intr_handle->fd;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* unmask INTx after enabling */
+   memset(irq_set, 0, len);
+   len = sizeof(struct vfio_irq_set);
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+
+   len = sizeof(struct vfio_irq_set);
+
+   /* mask interrupts before disabling */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* disable INTx*/
+   memset(irq_set, 0, len);
+   irq_set->argsz = len;
+   irq_set->count = 0;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+   "Error disabling INTx interrupts for fd %d\n", 
intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+   int len, ret;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   struct vfio_irq_set *irq_set;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+ 

[dpdk-dev] [PATCH v5 09/20] vfio: add VFIO header

2014-06-10 Thread Anatoly Burakov
Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include 
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include 
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 08/20] vfio: add support for VFIO in Linuxapp targets

2014-06-10 Thread Anatoly Burakov
Add VFIO compilation option to common Linuxapp config.

Signed-off-by: Anatoly Burakov 
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index b17e37e..2ed4b7e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 07/20] igb_uio: Moved interrupt type out of igb_uio

2014-06-10 Thread Anatoly Burakov
Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/Makefile |  1 +
 lib/librte_eal/common/include/rte_pci.h|  1 +
 .../common/include/rte_pci_dev_feature_defs.h  | 46 +
 .../common/include/rte_pci_dev_features.h  | 44 
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  | 48 +-
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 0016fc5..e2a3f3a 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h 
rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 11b8c13..e653027 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include 
 #include 
 #include 
+
 #include 

 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h 
b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+   RTE_INTR_MODE_NONE = 0,
+   RTE_INTR_MODE_LEGACY,
+   RTE_INTR_MODE_MSI,
+   RTE_INTR_MODE_MSIX,
+   RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h 
b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ *

[dpdk-dev] [PATCH v5 06/20] igb_uio: make igb_uio compilation optional

2014-06-10 Thread Anatoly Burakov
Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.

Signed-off-by: Anatoly Burakov 
---
 config/common_linuxapp   | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..b17e37e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4



[dpdk-dev] [PATCH v5 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING

2014-06-10 Thread Anatoly Burakov
Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_pci.c | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c| 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c   | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 4 ++--
 lib/librte_pmd_virtio/virtio_ethdev.c   | 2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
  struct rte_pci_device *dev);

 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */

@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
.name = "test_driver",
.devinit = my_driver_init,
.id_table = my_driver_id,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };

 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 0;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
if (pci_uio_map_resource(dev) < 0)
return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index c793773..11b8c13 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };

-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 0b779ec..a0abec8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 1;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
if ((ret = pci_uio_map_resource(dev)) != 0)
return ret;
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 493806c..c8355bc 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -280,7 +280,7 @@ static struct eth_driver rte_em_pmd = {
{
.name = "rte_em_pmd",
.id_table = pci_id_em_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_em_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index 5f93bcf..d60f923 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -603,7 +603,7 @@ static struct eth_driver rte_igb_pmd = {
{
.name = "rte_igb_pmd",
.id_table = pci_id_igb_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igb_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
@@ -616,7 +616,7 @@ static struct eth_driver rte_igbvf_pmd = {
{
.name = "rte_igbvf_pmd",
.id_table = pci_id_igbvf_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igbvf_dev_init,
.dev_priv

[dpdk-dev] [PATCH v5 02/20] pci: move uio mapping code to a separate file

2014-06-10 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 403 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 403 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 
 4 files changed, 474 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index b052820..d958014 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fd88bd0..628813b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */

 #include 
-#include 
-#include 
 #include 
 #include 

@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"

 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct pci_map {
-   void *addr;
-   uint64_t offset;
-   uint64_t size;
-   uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-   TAILQ_ENTRY(mapped_pci_resource) next;
-
-   struct rte_pci_addr pci_addr;
-   char path[PATH_MAX];
-   int nb_maps;
-   struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;

 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }

 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
void *mapaddr;

@@ -147,342 +123,6 @@ fail:
return NULL;
 }

-#define OFF_MAX  ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-   int i;
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   uint64_t offset, size;
-
-   for (i = 0; i != nb_maps; i++) {
- 
-   /* check if map directory exists */
-   rte_snprintf(dirname, sizeof(dirname), 
-   "%s/maps/map%u", devname, i);
- 
-   if (access(dirname, F_OK) != 0)
-   break;
- 
-   /* get mapping offset */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/offset", dirname);
-   if (pci_parse_sysfs_value(filename, ) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse offset of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
- 
-   /* get mapping size */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/size", dirname);
-   if (pci_parse_sysfs_value(filename, ) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse size of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
- 
-   /* get mapping physical address */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/addr", dirname);
-   if (pci_parse_sysfs_value(filename, [i].phaddr) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse addr of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
-
-   if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-   RTE_LOG(ERR, EAL,
-   "%s(): offset/size exceed system max value\n",
-   __func__); 
-   return (-1);
- 

[dpdk-dev] [PATCH v5 01/20] pci: move open() out of pci_map_resource, rename structs

2014-06-10 Thread Anatoly Burakov
Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 --
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..fd88bd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 

-#include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 

 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct uio_map {
+struct pci_map {
void *addr;
uint64_t offset;
uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-   TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
-   size_t nb_maps;
-   struct uio_map maps[PCI_MAX_RESOURCE];
+   int nb_maps;
+   struct pci_map maps[PCI_MAX_RESOURCE];
 };

-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;

-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:

 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-   int fd;
void *mapaddr;

-   /*
-* open devname, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   goto fail;
-   }
-
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
-   close(fd);
if (mapaddr == MAP_FAILED ||
(requested_addr != NULL && mapaddr != requested_addr)) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr,
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
+   __func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
goto fail;
@@ -186,10 +148,10 @@ fail:
 }

 #define OFF_MAX  ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t 
nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-   size_t i;
+   int i;
char dirname[PATH_MAX];
char filename[PATH_MAX];
uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map 
maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-size_t i;
-struct uio_resource *uio_res;
+   int fd, i;
+   struct mapped_pci_resource *uio_res;

-   TAILQ_FOREACH(uio_res, uio_res_list, next) {
+   TAILQ_FOREACH(uio_res, pci_res_list, next) {

/* skip this element if it doesn't match our PCI address */
if (memcmp(_res->pci_a

[dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK

2014-06-10 Thread Anatoly Burakov
This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 fixes:
* Fixed a couple of resource leaks

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

v4 fixes:
* Rebased on top of 5ebbb17281645b23359fbd49133bb639b63ba88c
* Fixed a typo in EAL command-line help text

v5 fixes:
* Fixed missing virtio change to RTE_PCI_DRV_NEED_MAPPING
* Fixed compile issue when VFIO was disabled (introduced in v3)

Tested-by: Waterman Cao  

This patch has been tested by intel.
We tested this patch with the following functions:
* Layer-2 Forwarding support
* Sample commands test
* Packet forwarding checking
* Bind and unbind VFIO driver
* Compile igb_uio driver ( Linux kernel < 3.6)
* Interrupt model test under Legacy|msi|msix
All cases passed.

Please see test environment information :
Fedora 20 x86_64, Linux Kernel 3.13.6-200,
GCC 4.8.2 Intel Xeon CPU E5-2680 v2 @ 2.80GHz NIC: Intel Niantic 82599


Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL
command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c  |  36 +
 app/test/test_pci.c|   4 +-
 config/common_linuxapp |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c|   2 +-
 lib/librte_eal/common/Makefile |   1 +
 lib/librte_eal/common/eal_common_pci.c |  16 +-
 lib/librte_eal/common/include/rte_pci.h|   5 +-
 .../common/include/rte_pci_dev_feature_defs.h  |  46 ++
 .../common/include/rte_pci_dev_features.h  |  44 ++
 lib/librte_eal/linuxapp/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c  |  36 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 287 +++-
 lib/librte_eal/linuxapp/eal/eal_memory.c   |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 473 ++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 403 +++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 781 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c|   4 +-
 lib/librte_pmd_virtio/virtio_ethdev.c  |   2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c|   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}| 157 +++--
 tools/setup.sh | 172 -
 30 files changed, 2548 insertions(+), 588 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxa

[dpdk-dev] [PATCH v4 20/20] setup script: adding support for VFIO to setup.sh

2014-06-03 Thread Anatoly Burakov
Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov 
---
 tools/setup.sh | 156 +++--
 1 file changed, 141 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index e0671b8..3991da9 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }

 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+   echo "Unloading any existing VFIO module"
+   /sbin/lsmod | grep -s vfio > /dev/null
+   if [ $? -eq 0 ] ; then
+   sudo /sbin/rmmod vfio-pci
+   sudo /sbin/rmmod vfio_iommu_type1
+   sudo /sbin/rmmod vfio
+   fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+   remove_vfio_module
+
+   VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+   echo "Loading VFIO module"
+   /sbin/lsmod | grep -s vfio_pci > /dev/null
+   if [ $? -ne 0 ] ; then
+   if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+   sudo /sbin/modprobe vfio-pci
+   fi
+   fi
+
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # check if /dev/vfio/vfio exists - that way we
+   # know we either loaded the module, or it was
+   # compiled into the kernel
+   if [ ! -e /dev/vfio/vfio ] ; then
+   echo "## ERROR: VFIO not found!"
+   fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }

 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # make sure regular user can access everything inside /dev/vfio
+   echo "chmod /dev/vfio/*"
+   sudo /usr/bin/chmod 0666 /dev/vfio/*
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # since permissions are only to be set when running as
+   # regular user, we only check ulimit here
+   #
+   # warn if regular user is only allowed
+   # to memlock <64M of memory
+   MEMLOCK_AMNT=`ulimit -l`
+
+   if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+   MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+   echo ""
+   echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+   echo ""
+   echo "This is the maximum amount of memory you will be"
+   echo "able to use with DPDK and VFIO if run as current user."
+   echo -n "To change this, please adjust limits.conf memlock "
+   echo "limit for current user."
+
+   if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+   echo ""
+   echo "## WARNING: memlock limit is less than 64MB"
+   echo -n "## DPDK with VFIO may not be able to 
initialize "
+   echo "if run as current user."
+   fi
+   fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,24 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+   if /sbin/lsmod  | grep -q vfio_pci ; then
+   ${RTE_SDK}/tools/dpdk_nic_bind.py --status
+   echo ""
+   echo -n "Enter PCI address of device to bind to VFIO driver: "
+   read PCI_PATH
+   sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && 
echo "OK"
+   else
+   echo "# Please load the 'vfio-pci' kernel module before 
querying or "
+   echo "# adjusting NIC device bindings"
+   fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
if  /sbin/lsmod  | grep -q igb_uio ; then 
${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +511,29 @@ step2_func()
TEXT[1]="Insert IGB UIO module"
FUNC[1]="load_igb_uio_module"

-   TEXT[2]="Insert KNI module"
-   FUNC[2]="load_kni_module"
+   TEXT[2]="Insert VFIO module"
+

  1   2   >