date:20160414

[dpdk-dev] [PATCH] port: bump ABI for pcap file support

2016-04-14 Thread Thomas Monjalon

Support of PCAP file has been added to rte_port in release 16.04
as NEXT_ABI. It is in the standard ABI of the release 16.07.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/rel_notes/deprecation.rst   |  5 -
 doc/guides/rel_notes/release_16_07.rst |  5 -
 examples/ip_pipeline/init.c|  4 
 lib/librte_port/Makefile   |  2 +-
 lib/librte_port/rte_port_source_sink.c | 14 --
 lib/librte_port/rte_port_source_sink.h |  3 ---
 6 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 327fc2b..a3fdbb1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -79,11 +79,6 @@ Deprecation Notices
   modification of the API of rte_mempool_obj_iter(), implying a breakage
   of the ABI.

-* ABI changes are planned for struct rte_port_source_params in order to
-  support PCAP file reading feature. The release 16.04 contains this ABI
-  change wrapped by RTE_NEXT_ABI macro. Release 16.07 will contain this
-  change, and no backwards compatibility is planned.
-
 * A librte_vhost public structures refactor is planned for DPDK 16.07
   that requires both ABI and API change.
   The proposed refactor would expose DPDK vhost dev to applications as
diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 701e827..001888f 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -94,6 +94,9 @@ ABI Changes
   the previous releases and made in this release. Use fixed width quotes for
   ``rte_function_names`` or ``rte_struct_names``. Use the past tense.

+* The ``rte_port_source_params`` structure has new fields to support PCAP file.
+  It was already in release 16.04 with ``RTE_NEXT_ABI`` flag.
+

 Shared Library Versions
 ---
@@ -123,7 +126,7 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_pipeline.so.3
  librte_pmd_bond.so.1
  librte_pmd_ring.so.2
- librte_port.so.2
+   + librte_port.so.3
  librte_power.so.1
  librte_reorder.so.1
  librte_ring.so.1
diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index 83422e8..02351f6 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -1221,8 +1221,6 @@ static void app_pipeline_params_get(struct app_params 
*app,
out->type = PIPELINE_PORT_IN_SOURCE;
out->params.source.mempool = app->mempool[mempool_id];
out->burst_size = app->source_params[in->id].burst;
-
-#ifdef RTE_NEXT_ABI
if (app->source_params[in->id].file_name
!= NULL) {
out->params.source.file_name = strdup(
@@ -1237,8 +1235,6 @@ static void app_pipeline_params_get(struct app_params 
*app,
app->source_params[in->id].
n_bytes_per_pkt;
}
-#endif
-
break;
default:
break;
diff --git a/lib/librte_port/Makefile b/lib/librte_port/Makefile
index 2c0ccbe..d4de5af 100644
--- a/lib/librte_port/Makefile
+++ b/lib/librte_port/Makefile
@@ -44,7 +44,7 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_port_version.map

-LIBABIVER := 2
+LIBABIVER := 3

 #
 # all source are stored in SRCS-y
diff --git a/lib/librte_port/rte_port_source_sink.c 
b/lib/librte_port/rte_port_source_sink.c
index 056c975..4cad710 100644
--- a/lib/librte_port/rte_port_source_sink.c
+++ b/lib/librte_port/rte_port_source_sink.c
@@ -38,17 +38,11 @@
 #include 
 #include 

-#ifdef RTE_NEXT_ABI
-
 #ifdef RTE_PORT_PCAP
 #include 
 #include 
 #endif

-#else
-#undef RTE_PORT_PCAP
-#endif
-
 #include "rte_port_source_sink.h"

 /*
@@ -81,8 +75,6 @@ struct rte_port_source {
uint32_t pkt_index;
 };

-#ifdef RTE_NEXT_ABI
-
 #ifdef RTE_PORT_PCAP

 static int
@@ -232,8 +224,6 @@ error_exit:

 #endif /* RTE_PORT_PCAP */

-#endif /* RTE_NEXT_ABI */
-
 static void *
 rte_port_source_create(void *params, int socket_id)
 {
@@ -258,8 +248,6 @@ rte_port_source_create(void *params, int socket_id)
/* Initialization */
port->mempool = (struct rte_mempool *) p->mempool;

-#ifdef RTE_NEXT_ABI
-
if (p->file_name) {
int status = PCAP_SOURCE_LOAD(port, p->file_name,
p->n_bytes_per_pkt, socket_id);
@@ -270,8 +258,6 @@ rte_port_source_create(void *params, int socket_id)
}
}

-#endif
-
return port;
 }

diff --git a/lib/librte_port/rte_port_source_sink.h 
b/lib/librte_port/rte_port_source_sink.h
index 917abe4..4db8a8a 100644
--- a/lib/librte_port/rte_port_source_sink.h
+++ b/lib/librte_port/rte_port_source_sink.h
@@ -53,7 +53,6 @@ extern "C" {
 struct rte_port_source_params {
/**

[dpdk-dev] [PATCH v2 2/3] i40e: improve performance of vector PMD

2016-04-14 Thread Bruce Richardson

An analysis of the i40e code using Intel? VTune? Amplifier 2016 showed
that the code was unexpectedly causing stalls due to "Loads blocked by
Store Forwards". This can occur when a load from memory has to wait
due to the prior store being to the same address, but being of a smaller
size i.e. the stored value cannot be directly returned to the loader.
[See ref: https://software.intel.com/en-us/node/544454]

These stalls are due to the way in which the data_len values are handled
in the driver. The lengths are extracted using vector operations, but those
16-bit lengths are then assigned using scalar operations i.e. 16-bit
stores.

These regular 16-bit stores actually have two effects in the code:
* they cause the "Loads blocked by Store Forwards" issues reported
* they also cause the previous loads in the RX function to actually be a
load followed by a store to an address on the stack, because the 16-bit
assignment can't be done to an xmm register.

By converting the 16-bit store operations into a sequence of SSE blend
operations, we can ensure that the descriptor loads only occur once, and
avoid both the additional stores and loads from the stack, as well as the
stalls due to the blocked loads.

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/i40e_rxtx_vec.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 1e2fadd..9f67f9d 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -192,11 +192,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
 static inline void
 desc_pktlen_align(__m128i descs[4])
 {
-   __m128i pktlen0, pktlen1, zero;
-   union {
-   uint16_t e[4];
-   uint64_t dword;
-   } vol;
+   __m128i pktlen0, pktlen1;

/* mask everything except pktlen field*/
const __m128i pktlen_msk = _mm_set_epi32(PKTLEN_MASK, PKTLEN_MASK,
@@ -206,18 +202,18 @@ desc_pktlen_align(__m128i descs[4])
pktlen1 = _mm_unpackhi_epi32(descs[1], descs[3]);
pktlen0 = _mm_unpackhi_epi32(pktlen0, pktlen1);

-   zero = _mm_xor_si128(pktlen0, pktlen0);
-
pktlen0 = _mm_srli_epi32(pktlen0, PKTLEN_SHIFT);
pktlen0 = _mm_and_si128(pktlen0, pktlen_msk);

-   pktlen0 = _mm_packs_epi32(pktlen0, zero);
-   vol.dword = _mm_cvtsi128_si64(pktlen0);
-   /* let the descriptor byte 15-14 store the pkt len */
-   *((uint16_t *)[0]+7) = vol.e[0];
-   *((uint16_t *)[1]+7) = vol.e[1];
-   *((uint16_t *)[2]+7) = vol.e[2];
-   *((uint16_t *)[3]+7) = vol.e[3];
+   pktlen0 = _mm_packs_epi32(pktlen0, pktlen0);
+
+   descs[3] = _mm_blend_epi16(descs[3], pktlen0, 0x80);
+   pktlen0 = _mm_slli_epi64(pktlen0, 16);
+   descs[2] = _mm_blend_epi16(descs[2], pktlen0, 0x80);
+   pktlen0 = _mm_slli_epi64(pktlen0, 16);
+   descs[1] = _mm_blend_epi16(descs[1], pktlen0, 0x80);
+   pktlen0 = _mm_slli_epi64(pktlen0, 16);
+   descs[0] = _mm_blend_epi16(descs[0], pktlen0, 0x80);
 }

  /*
-- 
2.5.5

[dpdk-dev] [PATCH v2 1/3] i40e: require SSE4.1 support for vector driver

2016-04-14 Thread Bruce Richardson

Later commits to improve the driver will make use of the SSE4.1
_mm_blend_epi16 intrinsic, so:
* set the compilation level to always have SSE4.1 support,
* and add in a runtime check for SSE4.1 as part of the condition checks
  for vector driver selection.

Signed-off-by: Bruce Richardson 
---
 drivers/net/i40e/Makefile| 6 ++
 drivers/net/i40e/i40e_rxtx_vec.c | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 6dd6eaa..56b20d5 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -102,6 +102,12 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c

+# vector PMD driver needs SSE4.1 support
+ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_1,$(CFLAGS)),)
+CFLAGS_i40e_rxtx_vec.o += -msse4.1
+endif
+
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 047aff5..1e2fadd 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -751,6 +751,10 @@ i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev 
*dev)
struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
struct rte_fdir_conf *fconf = >data->dev_conf.fdir_conf;

+   /* need SSE4.1 support */
+   if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
+   return -1;
+
 #ifndef RTE_LIBRTE_I40E_RX_OLFLAGS_ENABLE
/* whithout rx ol_flags, no VP flag report */
if (rxmode->hw_vlan_strip != 0 ||
-- 
2.5.5

[dpdk-dev] [PATCH v2 0/3] improve i40e vpmd

2016-04-14 Thread Bruce Richardson

This patchset improves the performance of the i40e SSE pmd by removing
operations that triggered CPU stalls. It also shortens the code and
cleans it up a little.

The base requirement for using the SSE code path has been pushed up to
SSE4.1 from SSE3, due to the use of the blend instruction. The instruction
set level is now checked at runtime as part of the driver selection process

Bruce Richardson (3):
  i40e: require SSE4.1 support for vector driver
  i40e: improve performance of vector PMD
  i40e: simplify SSE packet length extraction code

 drivers/net/i40e/Makefile|  6 
 drivers/net/i40e/i40e_rxtx_vec.c | 59 ++--
 2 files changed, 27 insertions(+), 38 deletions(-)

-- 
2.5.5

[dpdk-dev] [PATCH] i40evf: Ignore disabled HW CRC strip for Linux PF hosts

2016-04-14 Thread Björn Töpel

On Linux PF hosts, the VF has no means of changing the HW CRC strip
setting for a RX queue. It's implicitly enabled.

This patch ignores, and warns, if HW CRC stripping was disabled.

Signed-off-by: Bj?rn T?pel 
---
 drivers/net/i40e/i40e_ethdev_vf.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 2bce69b..f88eb79 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1567,6 +1567,8 @@ i40evf_dev_configure(struct rte_eth_dev *dev)
 {
struct i40e_adapter *ad =
I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct rte_eth_conf *conf = >data->dev_conf;
+   struct i40e_vf *vf;

/* Initialize to TRUE. If any of Rx queues doesn't meet the bulk
 * allocation or vector Rx preconditions we will reset it.
@@ -1576,6 +1578,19 @@ i40evf_dev_configure(struct rte_eth_dev *dev)
ad->tx_simple_allowed = true;
ad->tx_vec_allowed = true;

+   /* For Linux PF hosts, VF has no ability to disable HW CRC strip,
+* and is implicitly enabled by the PF.
+*/
+   if (!conf->rxmode.hw_strip_crc) {
+   vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   if ((vf->version_major == I40E_VIRTCHNL_VERSION_MAJOR) &&
+   (vf->version_minor <= I40E_VIRTCHNL_VERSION_MINOR)) {
+   /* Peer is Linux PF host. */
+   PMD_INIT_LOG(NOTICE, "VF can't disable HW CRC Strip.");
+   conf->rxmode.hw_strip_crc = 1;
+   }
+   }
+
return i40evf_init_vlan(dev);
 }

-- 
2.7.4

--
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

[dpdk-dev] memory allocation requirements

2016-04-14 Thread Olivier MATZ

Hi,

On 04/13/2016 06:03 PM, Thomas Monjalon wrote:
> After looking at the patches for container support, it appears that
> some changes are needed in the memory management:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
>
> I think it is time to collect what are the needs and expectations of
> the DPDK memory allocator. The goal is to satisfy every needs while
> cleaning the API.
> Here is a first try to start the discussion.
>
> The memory allocator has 2 classes of API in DPDK.
> First the user/application allows or requires DPDK to take over some
> memory resources of the system. The characteristics can be:
>   - numa node
>   - page size
>   - swappable or not
>   - contiguous (cannot be guaranteed) or not
>   - physical address (as root only)
> Then the drivers or other libraries use the memory through
>   - rte_malloc
>   - rte_memzone
>   - rte_mempool
> I think we can integrate the characteristics of the requested memory
> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
> The rte_mempool still focus on collection of objects with cache.

Just to mention that some evolutions [1] are planned in mempool in
16.07, allowing to populate a mempool with several chunks of memory,
and still ensuring that the objects are physically contiguous. It
completely removes the need to allocate a big virtually contiguous
memory zone (and also physically contiguous if not using
rte_mempool_create_xmem(), which is probably the case in most of
the applications).

Knowing this, the code that remaps the hugepages to get the largest
possible physically contiguous zone probably becomes useless after
the mempool series. Changing it to only one mmap(file) in hugetlbfs
per NUMA socket would clearly simplify this part of EAL.

For other allocations that must be physically contiguous (ex: zones
shared with the hardware), having a page-sized granularity is maybe
enough.

Regards,
Olivier

[1] http://dpdk.org/ml/archives/dev/2016-April/037464.html

[dpdk-dev] [4.4 kernel] kni lockup, kernel dump

2016-04-14 Thread Thomas Monjalon

2016-04-14 15:29, Ferruh Yigit:
> On 4/13/2016 11:26 PM, ALeX Wang wrote:
> > Did more experiment, found that it has nothing to do with the kernel
> > version,
> > 
> > It only happens when using kni module with '--no-huge' eal flag...
> > 
> > Is that expected?
> > 
> 
> Yes.
> KNI kernel module expects mempool is physically continuous, with
> '--no-huge' flag this is no more true, and as a result KNI module can
> access to incorrect address.

This is a bug.
The memory API should allow to explicit this restriction and return
an error if it cannot offer the requested continuous memory.
See this thread for discussion:
http://dpdk.org/ml/archives/dev/2016-April/037444.html

[dpdk-dev] memory allocation requirements

2016-04-14 Thread Sergio Gonzalez Monroy

On 14/04/2016 15:46, Olivier MATZ wrote:
> Hi,
>
> On 04/13/2016 06:03 PM, Thomas Monjalon wrote:
>> After looking at the patches for container support, it appears that
>> some changes are needed in the memory management:
>> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788 
>>
>>
>> I think it is time to collect what are the needs and expectations of
>> the DPDK memory allocator. The goal is to satisfy every needs while
>> cleaning the API.
>> Here is a first try to start the discussion.
>>
>> The memory allocator has 2 classes of API in DPDK.
>> First the user/application allows or requires DPDK to take over some
>> memory resources of the system. The characteristics can be:
>> - numa node
>> - page size
>> - swappable or not
>> - contiguous (cannot be guaranteed) or not
>> - physical address (as root only)
>> Then the drivers or other libraries use the memory through
>> - rte_malloc
>> - rte_memzone
>> - rte_mempool
>> I think we can integrate the characteristics of the requested memory
>> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
>> The rte_mempool still focus on collection of objects with cache.
>
> Just to mention that some evolutions [1] are planned in mempool in
> 16.07, allowing to populate a mempool with several chunks of memory,
> and still ensuring that the objects are physically contiguous. It
> completely removes the need to allocate a big virtually contiguous
> memory zone (and also physically contiguous if not using
> rte_mempool_create_xmem(), which is probably the case in most of
> the applications).
>
> Knowing this, the code that remaps the hugepages to get the largest
> possible physically contiguous zone probably becomes useless after
> the mempool series. Changing it to only one mmap(file) in hugetlbfs
> per NUMA socket would clearly simplify this part of EAL.
>

Are you suggesting to make those changes after the mempool series
has been applied but keeping the current memzone/malloc behavior?

Regards,
Sergio

> For other allocations that must be physically contiguous (ex: zones
> shared with the hardware), having a page-sized granularity is maybe
> enough.
>
> Regards,
> Olivier
>
> [1] http://dpdk.org/ml/archives/dev/2016-April/037464.html

[dpdk-dev] [PATCH 00/36] mempool: rework memory allocation

2016-04-14 Thread Olivier MATZ



On 04/14/2016 03:50 PM, Wiles, Keith wrote:
>> This series is a rework of mempool. For those who don't want to read
>> all the cover letter, here is a sumary:
>>
>> - it is not possible to allocate large mempools if there is not enough
>>   contiguous memory, this series solves this issue
>> - introduce new APIs with less arguments: "create, populate, obj_init"
>> - allow to free a mempool
>> - split code in smaller functions, will ease the introduction of ext_handler
>> - remove test-pmd anonymous mempool creation
>> - remove most of dom0-specific mempool code
>> - opens the door for a eal_memory rework: we probably don't need large
>>   contiguous memory area anymore, working with pages would work.
>>
>> This breaks the ABI as it was indicated in the deprecation for 16.04.
>> The API stays almost the same, no modification is needed in examples app
>> or in test-pmd. Only kni and mellanox drivers are slightly modified.
>>
>> This patch applies on top of 16.04 + v5 of Keith's patch:
>> "mempool: reduce rte_mempool structure size"
>
> I have not digested this complete patch yet, but this one popped out at me as 
> the External Memory Manager support is setting in the wings for 16.07 
> release. If this causes the EMM patch to be rewritten or updated that seems 
> like a problem to me. Does this patch add the External Memory Manager support?
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32015/focus=35107

I've reworked the series you are referring to, and rebased it on top
of this series. Please see:
http://dpdk.org/ml/archives/dev/2016-April/037509.html

Regards,
Olivier

[dpdk-dev] [PATCH v4 3/3] mbuf: get default mempool handler from configuration

2016-04-14 Thread Olivier Matz

From: David Hunt 

By default, the mempool handler used for mbuf allocations is a multi
producer and multi consumer ring. We could imagine a target (maybe some
network processors?) that provides an hardware-assisted pool
mechanism. In this case, the default configuration for this architecture
would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_HANDLER.

Signed-off-by: David Hunt 
Signed-off-by: Olivier Matz 
---
 config/common_base |  1 +
 lib/librte_mbuf/rte_mbuf.c | 21 +
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/config/common_base b/config/common_base
index 0124e86..178cb7e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -390,6 +390,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
+CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_HANDLER="ring_mp_mc"
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index dc0467c..a72f8f2 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -153,6 +153,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id)
 {
+   struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;

@@ -167,10 +168,22 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
mbp_priv.mbuf_data_room_size = data_room_size;
mbp_priv.mbuf_priv_size = priv_size;

-   return rte_mempool_create(name, n, elt_size,
-   cache_size, sizeof(struct rte_pktmbuf_pool_private),
-   rte_pktmbuf_pool_init, _priv, rte_pktmbuf_init, NULL,
-   socket_id, 0);
+   mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
+sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+   if (mp == NULL)
+   return NULL;
+
+   rte_mempool_set_handler(mp, RTE_MBUF_DEFAULT_MEMPOOL_HANDLER);
+   rte_pktmbuf_pool_init(mp, _priv);
+
+   if (rte_mempool_populate_default(mp) < 0) {
+   rte_mempool_free(mp);
+   return NULL;
+   }
+
+   rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+   return mp;
 }

 /* do some sanity checks on a mbuf: panic if it fails */
-- 
2.1.4

[dpdk-dev] [PATCH v4 2/3] app/test: test external mempool handler

2016-04-14 Thread Olivier Matz

Use a minimal custom mempool external handler and check that it also
passes basic mempool autotests.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c | 113 
 1 file changed, 113 insertions(+)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index c96ed27..09951cc 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -85,6 +85,96 @@
 static rte_atomic32_t synchro;

 /*
+ * Simple example of custom mempool structure. Holds pointers to all the
+ * elements which are simply malloc'd in this example.
+ */
+struct custom_mempool {
+   rte_spinlock_t lock;
+   unsigned count;
+   unsigned size;
+   void *elts[];
+};
+
+/*
+ * Loop though all the element pointers and allocate a chunk of memory, then
+ * insert that memory into the ring.
+ */
+static void *
+custom_mempool_alloc(struct rte_mempool *mp)
+{
+   struct custom_mempool *cm;
+
+   cm = rte_zmalloc("custom_mempool",
+   sizeof(struct custom_mempool) + mp->size * sizeof(void *), 0);
+   if (cm == NULL)
+   return NULL;
+
+   rte_spinlock_init(>lock);
+   cm->count = 0;
+   cm->size = mp->size;
+   return cm;
+}
+
+static void
+custom_mempool_free(void *p)
+{
+   rte_free(p);
+}
+
+static int
+custom_mempool_put(void *p, void * const *obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   int ret = 0;
+
+   rte_spinlock_lock(>lock);
+   if (cm->count + n > cm->size) {
+   ret = -ENOBUFS;
+   } else {
+   memcpy(>elts[cm->count], obj_table, sizeof(void *) * n);
+   cm->count += n;
+   }
+   rte_spinlock_unlock(>lock);
+   return ret;
+}
+
+
+static int
+custom_mempool_get(void *p, void **obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   int ret = 0;
+
+   rte_spinlock_lock(>lock);
+   if (n > cm->count) {
+   ret = -ENOENT;
+   } else {
+   cm->count -= n;
+   memcpy(obj_table, >elts[cm->count], sizeof(void *) * n);
+   }
+   rte_spinlock_unlock(>lock);
+   return ret;
+}
+
+static unsigned
+custom_mempool_get_count(void *p)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   return cm->count;
+}
+
+static struct rte_mempool_handler mempool_handler_custom = {
+   .name = "custom_handler",
+   .alloc = custom_mempool_alloc,
+   .free = custom_mempool_free,
+   .put = custom_mempool_put,
+   .get = custom_mempool_get,
+   .get_count = custom_mempool_get_count,
+};
+
+MEMPOOL_REGISTER_HANDLER(mempool_handler_custom);
+
+/*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
  */
@@ -479,6 +569,7 @@ test_mempool(void)
 {
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
+   struct rte_mempool *mp_ext = NULL;

rte_atomic32_init();

@@ -507,6 +598,27 @@ test_mempool(void)
goto err;
}

+   /* create a mempool with an external handler */
+   mp_ext = rte_mempool_create_empty("test_ext",
+   MEMPOOL_SIZE,
+   MEMPOOL_ELT_SIZE,
+   RTE_MEMPOOL_CACHE_MAX_SIZE, 0,
+   SOCKET_ID_ANY, 0);
+
+   if (mp_ext == NULL) {
+   printf("cannot allocate mp_ext mempool\n");
+   goto err;
+   }
+   if (rte_mempool_set_handler(mp_ext, "custom_handler") < 0) {
+   printf("cannot set custom handler\n");
+   goto err;
+   }
+   if (rte_mempool_populate_default(mp_ext) < 0) {
+   printf("cannot populate mp_ext mempool\n");
+   goto err;
+   }
+   rte_mempool_obj_iter(mp_ext, my_obj_init, NULL);
+
/* retrieve the mempool from its name */
if (rte_mempool_lookup("test_nocache") != mp_nocache) {
printf("Cannot lookup mempool from its name\n");
@@ -547,6 +659,7 @@ test_mempool(void)
 err:
rte_mempool_free(mp_nocache);
rte_mempool_free(mp_cache);
+   rte_mempool_free(mp_ext);
return -1;
 }

-- 
2.1.4

[dpdk-dev] [PATCH v4 1/3] mempool: support external handler

2016-04-14 Thread Olivier Matz

From: David Hunt 

Until now, the objects stored in mempool mempool were internally stored a
ring. This patch introduce the possibility to register external handlers
replacing the ring.

The default behavior remains unchanged, but calling the new function
rte_mempool_set_handler() right after rte_mempool_create_empty() allows to
change the handler that will be used when populating the mempool.

Signed-off-by: David Hunt 
Signed-off-by: Olivier Matz 
---
 app/test/test_mempool_perf.c   |   1 -
 lib/librte_mempool/Makefile|   2 +
 lib/librte_mempool/rte_mempool.c   |  72 --
 lib/librte_mempool/rte_mempool.h   | 212 +
 lib/librte_mempool/rte_mempool_default.c   | 147 
 lib/librte_mempool/rte_mempool_handler.c   | 139 +++
 lib/librte_mempool/rte_mempool_version.map |   4 +
 7 files changed, 506 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_default.c
 create mode 100644 lib/librte_mempool/rte_mempool_handler.c

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index cdc02a0..091c1df 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -161,7 +161,6 @@ per_lcore_mempool_test(__attribute__((unused)) void *arg)
   n_get_bulk);
if (unlikely(ret < 0)) {
rte_mempool_dump(stdout, mp);
-   rte_ring_dump(stdout, mp->ring);
/* in this case, objects are lost... */
return -1;
}
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 43423e0..f19366e 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -42,6 +42,8 @@ LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7104a41..9e9a7fc 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -148,7 +148,7 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
 #endif

/* enqueue in ring */
-   rte_ring_sp_enqueue(mp->ring, obj);
+   rte_mempool_ext_put_bulk(mp, , 1);
 }

 /* call obj_cb() for each mempool element */
@@ -300,39 +300,6 @@ rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t 
elt_num,
return (size_t)paddr_idx << pg_shift;
 }

-/* create the internal ring */
-static int
-rte_mempool_ring_create(struct rte_mempool *mp)
-{
-   int rg_flags = 0, ret;
-   char rg_name[RTE_RING_NAMESIZE];
-   struct rte_ring *r;
-
-   ret = snprintf(rg_name, sizeof(rg_name),
-   RTE_MEMPOOL_MZ_FORMAT, mp->name);
-   if (ret < 0 || ret >= (int)sizeof(rg_name))
-   return -ENAMETOOLONG;
-
-   /* ring flags */
-   if (mp->flags & MEMPOOL_F_SP_PUT)
-   rg_flags |= RING_F_SP_ENQ;
-   if (mp->flags & MEMPOOL_F_SC_GET)
-   rg_flags |= RING_F_SC_DEQ;
-
-   /* Allocate the ring that will be used to store objects.
-* Ring functions will return appropriate errors if we are
-* running as a secondary process etc., so no checks made
-* in this function for that condition. */
-   r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
-   mp->socket_id, rg_flags);
-   if (r == NULL)
-   return -rte_errno;
-
-   mp->ring = r;
-   mp->flags |= MEMPOOL_F_RING_CREATED;
-   return 0;
-}
-
 /* free a memchunk allocated with rte_memzone_reserve() */
 static void
 rte_mempool_memchunk_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
@@ -350,7 +317,7 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
void *elt;

while (!STAILQ_EMPTY(>elt_list)) {
-   rte_ring_sc_dequeue(mp->ring, );
+   rte_mempool_ext_get_bulk(mp, , 1);
(void)elt;
STAILQ_REMOVE_HEAD(>elt_list, next);
mp->populated_size--;
@@ -378,15 +345,18 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
unsigned i = 0;
size_t off;
struct rte_mempool_memhdr *memhdr;
-   int ret;

/* create the internal ring if not already done */
if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
-   ret = rte_mempool_ring_create(mp);
-   if (ret < 0)
-   return ret;
+   rte_errno = 0;
+   mp->pool = rte_mempool_ext_alloc(mp);
+

[dpdk-dev] [PATCH v4 0/3] external mempool manager

2016-04-14 Thread Olivier Matz

Here's a reworked version of the patch initially sent by David Hunt.
The main change is that it is rebased on top of the "mempool: rework
memory allocation" series [1], which simplifies a lot the first patch.

[1] http://dpdk.org/ml/archives/dev/2016-April/037464.html

v4 changes:
 * remove the rte_mempool_create_ext() function. To change the handler, the
   user has to do the following:
   - mp = rte_mempool_create_empty()
   - rte_mempool_set_handler(mp, "my_handler")
   - rte_mempool_populate_default(mp)
   This avoids to add another function with more than 10 arguments, duplicating
   the doxygen comments
 * change the api of rte_mempool_alloc_t: only the mempool pointer is required
   as all information is available in it
 * change the api of rte_mempool_free_t: remove return value
 * move inline wrapper functions from the .c to the .h (else they won't be
   inlined). This implies to have one header file (rte_mempool.h), or it
   would have generate cross dependencies issues.
 * remove now unused MEMPOOL_F_INT_HANDLER (note: it was misused anyway due
   to the use of && instead of &)
 * fix build in debug mode (__MEMPOOL_STAT_ADD(mp, put_pool, n) remaining)
 * fix build with shared libraries (global handler has to be declared in
   the .map file)
 * rationalize #include order
 * remove unused function rte_mempool_get_handler_name()
 * rename some structures, fields, functions
 * remove the static in front of rte_tailq_elem rte_mempool_tailq (comment
   from Yuanhan)
 * test the ext mempool handler in the same file than standard mempool tests,
   avoiding to duplicate the code
 * rework the custom handler in mempool_test
 * rework a bit the patch selecting default mbuf pool handler
 * fix some doxygen comments

Things that should still be discussed:

- Panu pointed out that having a compile-time configuration
  option for selecting the default mbuf handler is not a good idea.
  I mostly agree, except in one case (and that's why I kept this patch):
  if a specific architecture has its own way to provide an efficient
  pool handler for mbufs, it could be the proper place to have this
  option. But as far as I know, there is no such architecture today
  in dpdk.

- The other question I would like to raise is about the use cases.
  The cover letter below could be a bit more explicit about what this
  feature will be used for.



This is the initial unmodified cover letter from David Hunt:

Hi list.

Here's the v3 version patch for an external mempool manager

v3 changes:
 * simplified the file layout, renamed to rte_mempool_handler.[hc]
 * moved the default handlers into rte_mempool_default.c
 * moved the example handler out into app/test/test_ext_mempool.c
 * removed is_mc/is_mp change, slight perf degredation on sp cached operation
 * removed stack hanler, may re-introduce at a later date
 * Changes out of code reviews

v2 changes:
 * There was a lot of duplicate code between rte_mempool_xmem_create and
   rte_mempool_create_ext. This has now been refactored and is now
   hopefully cleaner.
 * The RTE_NEXT_ABI define is now used to allow building of the library
   in a format that is compatible with binaries built against previous
   versions of DPDK.
 * Changes out of code reviews. Hopefully I've got most of them included.

The External Mempool Manager is an extension to the mempool API that allows
users to add and use an external mempool manager, which allows external memory
subsystems such as external hardware memory management systems and software
based memory allocators to be used with DPDK.

The existing API to the internal DPDK mempool manager will remain unchanged
and will be backward compatible. However, there will be an ABI breakage, as
the mempool struct is changing. These changes are all contained withing
RTE_NEXT_ABI defs, and the current or next code can be changed with
the CONFIG_RTE_NEXT_ABI config setting

There are two aspects to external mempool manager.
  1. Adding the code for your new mempool handler. This is achieved by adding a
 new mempool handler source file into the librte_mempool library, and
 using the REGISTER_MEMPOOL_HANDLER macro.
  2. Using the new API to call rte_mempool_create_ext to create a new mempool
 using the name parameter to identify which handler to use.

New API calls added
 1. A new mempool 'create' function which accepts mempool handler name.
 2. A new mempool 'rte_get_mempool_handler' function which accepts mempool
handler name, and returns the index to the relevant set of callbacks for
that mempool handler

Several external mempool managers may be used in the same application. A new
mempool can then be created by using the new 'create' function, providing the
mempool handler name to point the mempool to the relevant mempool manager
callback structure.

The old 'create' function can still be called by legacy programs, and will
internally work out the mempool handle based on the flags provided (single
producer, single consumer, etc). By

[dpdk-dev] [PATCH 26/36] mempool: introduce a function to create an empty mempool

2016-04-14 Thread Wiles, Keith

>Introduce a new function rte_mempool_create_empty()
>that allocates a mempool that is not populated.
>
>The functions rte_mempool_create() and rte_mempool_xmem_create()
>now make use of it, making their code much easier to read.
>Currently, they are the only users of rte_mempool_create_empty()
>but the function will be made public in next commits.
>
>Signed-off-by: Olivier Matz 
>+/* create an empty mempool */
>+static struct rte_mempool *
>+rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
>+  unsigned cache_size, unsigned private_data_size,
>+  int socket_id, unsigned flags)
> {

When two processes need to use the same mempool, do we have a race condition 
with one doing a rte_mempool_create_empty() and the other process tries to use 
it when it finds that mempool before being fully initialized by the first 
process?

Regards,
Keith

[dpdk-dev] [PATCH] mem: fix freeing of memzone used by ivshmem

2016-04-14 Thread Sergio Gonzalez Monroy

On 14/04/2016 14:48, Mauricio Vasquez B wrote:
> although previous implementation returned an error when trying to release a
> memzone assigned to an ivshmem device, it stills freed it.
>
> Fixes: cd10c42eb5bc ("mem: fix ivshmem freeing")
>
> Signed-off-by: Mauricio Vasquez B  studenti.polito.it>
> ---
>   lib/librte_eal/common/eal_common_memzone.c | 12 ++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
>

Thanks for the fix (not sure what I was thinking at the time).

Acked-by: Sergio Gonzalez Monroy

[dpdk-dev] [PATCH] mem: fix freeing of memzone used by ivshmem

2016-04-14 Thread Mauricio Vasquez B

although previous implementation returned an error when trying to release a
memzone assigned to an ivshmem device, it stills freed it.

Fixes: cd10c42eb5bc ("mem: fix ivshmem freeing")

Signed-off-by: Mauricio Vasquez B 
---
 lib/librte_eal/common/eal_common_memzone.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index 711c845..1fce906 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -321,15 +321,19 @@ rte_memzone_free(const struct rte_memzone *mz)
idx = ((uintptr_t)mz - (uintptr_t)mcfg->memzone);
idx = idx / sizeof(struct rte_memzone);

-   addr = mcfg->memzone[idx].addr;
 #ifdef RTE_LIBRTE_IVSHMEM
/*
 * If ioremap_addr is set, it's an IVSHMEM memzone and we cannot
 * free it.
 */
-   if (mcfg->memzone[idx].ioremap_addr != 0)
+   if (mcfg->memzone[idx].ioremap_addr != 0) {
ret = -EINVAL;
+   goto error;
+   }
 #endif
+
+   addr = mcfg->memzone[idx].addr;
+
if (addr == NULL)
ret = -EINVAL;
else if (mcfg->memzone_cnt == 0) {
@@ -345,6 +349,10 @@ rte_memzone_free(const struct rte_memzone *mz)
rte_free(addr);

return ret;
+
+error:
+   rte_rwlock_write_unlock(>mlock);
+   return ret;
 }

 /*
-- 
1.9.1

[dpdk-dev] [PATCH v5] mempool: reduce rte_mempool structure size

2016-04-14 Thread Olivier MATZ

Hi,

On 04/14/2016 03:28 PM, Wiles, Keith wrote:
>> From: Keith Wiles 
>> --- a/app/test/test_mempool.c
>> +++ b/app/test/test_mempool.c
>> @@ -122,8 +122,8 @@ test_mempool_basic(void)
>>  return -1;
>>
>>  printf("get private data\n");
>> -if (rte_mempool_get_priv(mp) !=
>> -(char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
>> +if (rte_mempool_get_priv(mp) != (char *)mp +
>> +MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
>
> Should we not add the RTE_PTR_ADD() here as well?

The displayed error message was "cast increases required alignment
of target type", and in this case the alignment constraint of mp
is higher than the constraint for char * (1). So I think there is
no issue here... at least I can say it compiles without error.

Regards,
Olivier

[dpdk-dev] [PATCH] i40e: improve performance of vector PMD

2016-04-14 Thread Iremonger, Bernard

Hi Bruce,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Thursday, April 14, 2016 3:00 PM
> To: Richardson, Bruce ; dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> 
> Subject: Re: [dpdk-dev] [PATCH] i40e: improve performance of vector PMD
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, April 14, 2016 2:50 PM
> > To: dev at dpdk.org
> > Cc: Zhang, Helin; Wu, Jingjing
> > Subject: Re: [dpdk-dev] [PATCH] i40e: improve performance of vector
> > PMD
> >
> > On Thu, Apr 14, 2016 at 11:15:21AM +0100, Bruce Richardson wrote:
> > > An analysis of the i40e code using Intel? VTune? Amplifier 2016
> > > showed that the code was unexpectedly causing stalls due to "Loads
> > > blocked by Store Forwards". This can occur when a load from memory
> > > has to wait due to the prior store being to the same address, but
> > > being of a smaller size i.e. the stored value cannot be directly returned 
> > > to
> the loader.
> > > [See ref: https://software.intel.com/en-us/node/544454]
> > >
> > > These stalls are due to the way in which the data_len values are
> > > handled in the driver. The lengths are extracted using vector
> > > operations, but those 16-bit lengths are then assigned using scalar
> > > operations i.e. 16-bit stores.
> > >
> > > These regular 16-bit stores actually have two effects in the code:
> > > * they cause the "Loads blocked by Store Forwards" issues reported
> > > * they also cause the previous loads in the RX function to actually
> > > be a load followed by a store to an address on the stack, because
> > > the 16-bit assignment can't be done to an xmm register.
> > >
> > > By converting the 16-bit stores operations into a sequence of SSE
> > > blend operations, we can ensure that the descriptor loads only occur
> > > once, and avoid both the additional store and loads from the stack,
> > > as well as the stalls due to the second loads being blocked.
> > >
> > > Signed-off-by: Bruce Richardson 
> > >
> > Self-NAK on this version. The blend instruction used is SSE4.1 so
> > breaks the "default" build.
> >
> > Two obvious options to fix this:
> > 1. Keep the old code with SSE4.1 #ifdefs separating old and new 2.
> > Update the vpmd requirement to SSE4.1, and factor that in during
> > runtime select of the RX code path.
> >
> > Personally, I prefer the second option. Any objections?
> 
> +1 for second one.
> 
> >
> > /Bruce

I am using the "default" build when building in VM's, will both options work 
for me?

Regards,

Bernard.

[dpdk-dev] [PATCH 10/36] mempool: use the list to iterate the mempool elements

2016-04-14 Thread Wiles, Keith

>
> static void
>-txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
>-   uint32_t index __rte_unused)
>+txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
>+  __rte_unused uint32_t index)

I have seen this use of __rte_unused or attributes attached to variables and 
structures in couple different ways.

I have seen the placement of the attribute after and before the variable I 
prefer the attribute to be after, but could adapt I hope.
Do we have a rule about where the attribute is put in this case and others. I 
have seen the attributes for structures are always at the end of the structure, 
which is some cases it may not compile in other places.

I would like to suggest we place the attributes at the end of structure e.g. 
__rte_cached_aligned and I would like to see the __rte_unused after the 
variable as a style in the code.

Thanks

> {
>   struct txq_mp2mr_mbuf_check_data *data = arg;
>-  struct rte_mbuf *buf =
>-  (void *)((uintptr_t)start + data->mp->header_size);
>
Regards,
Keith

[dpdk-dev] [PATCH] fm10k: set packet type for multi-segment packets

2016-04-14 Thread Michael Frasca

When building a chain of mbufs for a multi-segment packet, the
packet_type field resides at the end of the chain. It should be
copied forward to the head of the list.

Fixes: fe65e1e1ce61 ("fm10k: add vector scatter Rx")

Signed-off-by: Michael Frasca 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index f8efe8f..66f126f 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -608,6 +608,7 @@ fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
/* it's the last packet of the set */
start->hash = end->hash;
start->ol_flags = end->ol_flags;
+   start->packet_type = end->packet_type;
pkts[pkt_idx++] = start;
start = end = NULL;
}
-- 
2.5.0

[dpdk-dev] [4.4 kernel] kni lockup, kernel dump

2016-04-14 Thread Ferruh Yigit

On 4/13/2016 11:26 PM, ALeX Wang wrote:
> Did more experiment, found that it has nothing to do with the kernel
> version,
> 
> It only happens when using kni module with '--no-huge' eal flag...
> 
> Is that expected?
> 

Yes.
KNI kernel module expects mempool is physically continuous, with
'--no-huge' flag this is no more true, and as a result KNI module can
access to incorrect address.

Regards,
ferruh

[dpdk-dev] [PATCH 00/36] mempool: rework memory allocation

2016-04-14 Thread Hunt, David



On 4/14/2016 3:01 PM, Olivier MATZ wrote:
>
> On 04/14/2016 03:50 PM, Wiles, Keith wrote:
--snip--
>> I have not digested this complete patch yet, but this one popped out 
>> at me as the External Memory Manager support is setting in the wings 
>> for 16.07 release. If this causes the EMM patch to be rewritten or 
>> updated that seems like a problem to me. Does this patch add the 
>> External Memory Manager support?
>> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32015/focus=35107 
>>
>
> I've reworked the series you are referring to, and rebased it on top
> of this series. Please see:
> http://dpdk.org/ml/archives/dev/2016-April/037509.html
>

Thanks for your help on this, Olivier. Much appreciated.

Regards,
David.

[dpdk-dev] [4.4 kernel] kni lockup, kernel dump

2016-04-14 Thread ALeX Wang

Thx, Ferruh and Thomas for the confirmation and pointer!

On 14 April 2016 at 07:43, Thomas Monjalon 
wrote:

> 2016-04-14 15:29, Ferruh Yigit:
> > On 4/13/2016 11:26 PM, ALeX Wang wrote:
> > > Did more experiment, found that it has nothing to do with the kernel
> > > version,
> > >
> > > It only happens when using kni module with '--no-huge' eal flag...
> > >
> > > Is that expected?
> > >
> >
> > Yes.
> > KNI kernel module expects mempool is physically continuous, with
> > '--no-huge' flag this is no more true, and as a result KNI module can
> > access to incorrect address.
>
> This is a bug.
> The memory API should allow to explicit this restriction and return
> an error if it cannot offer the requested continuous memory.
> See this thread for discussion:
> http://dpdk.org/ml/archives/dev/2016-April/037444.html
>



-- 
Alex Wang,
Open vSwitch developer

[dpdk-dev] [PATCH] cmdline: fix unchecked return value

2016-04-14 Thread Daniel Mrzyglod

This patch is for checking if error values occurs.
fix for coverity errors #13209 & #13195

If the function returns an error value, the error value may be mistaken
for a normal value.

In rdline_char_in: Value returned from a function is not checked for errors
before being used

Signed-off-by: Daniel Mrzyglod 
---
 lib/librte_cmdline/cmdline_rdline.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_rdline.c 
b/lib/librte_cmdline/cmdline_rdline.c
index 1ef2258..e75a556 100644
--- a/lib/librte_cmdline/cmdline_rdline.c
+++ b/lib/librte_cmdline/cmdline_rdline.c
@@ -377,7 +377,10 @@ rdline_char_in(struct rdline *rdl, char c)
case CMDLINE_KEY_CTRL_K:
cirbuf_get_buf_head(>right, rdl->kill_buf, 
RDLINE_BUF_SIZE);
rdl->kill_size = CIRBUF_GET_LEN(>right);
-   cirbuf_del_buf_head(>right, rdl->kill_size);
+
+   if (cirbuf_del_buf_head(>right, rdl->kill_size) < 
0)
+   return -EINVAL;
+
rdline_puts(rdl, vt100_clear_right);
break;

@@ -496,7 +499,10 @@ rdline_char_in(struct rdline *rdl, char c)
vt100_init(>vt100);
cirbuf_init(>left, rdl->left_buf, 0, 
RDLINE_BUF_SIZE);
cirbuf_init(>right, rdl->right_buf, 0, 
RDLINE_BUF_SIZE);
-   cirbuf_add_buf_tail(>left, buf, strnlen(buf, 
RDLINE_BUF_SIZE));
+
+   if (cirbuf_add_buf_tail(>left, buf, strnlen(buf, 
RDLINE_BUF_SIZE)) < 0)
+   return -EINVAL;
+
rdline_redisplay(rdl);
break;

@@ -513,7 +519,10 @@ rdline_char_in(struct rdline *rdl, char c)
vt100_init(>vt100);
cirbuf_init(>left, rdl->left_buf, 0, 
RDLINE_BUF_SIZE);
cirbuf_init(>right, rdl->right_buf, 0, 
RDLINE_BUF_SIZE);
-   cirbuf_add_buf_tail(>left, buf, strnlen(buf, 
RDLINE_BUF_SIZE));
+
+   if (cirbuf_add_buf_tail(>left, buf, strnlen(buf, 
RDLINE_BUF_SIZE)) <  0)
+   return -EINVAL;
+
rdline_redisplay(rdl);

break;
@@ -640,7 +649,9 @@ rdline_add_history(struct rdline * rdl, const char * buf)
rdline_remove_old_history_item(rdl);
}

-   cirbuf_add_buf_tail(>history, buf, len);
+   if (cirbuf_add_buf_tail(>history, buf, len) < 0)
+   return -EINVAL;
+
cirbuf_add_tail(>history, 0);

return 0;
-- 
2.5.5

[dpdk-dev] [PATCH] i40e: improve performance of vector PMD

2016-04-14 Thread Bruce Richardson

On Thu, Apr 14, 2016 at 11:15:21AM +0100, Bruce Richardson wrote:
> An analysis of the i40e code using Intel? VTune? Amplifier 2016 showed
> that the code was unexpectedly causing stalls due to "Loads blocked by
> Store Forwards". This can occur when a load from memory has to wait
> due to the prior store being to the same address, but being of a smaller
> size i.e. the stored value cannot be directly returned to the loader.
> [See ref: https://software.intel.com/en-us/node/544454]
> 
> These stalls are due to the way in which the data_len values are handled
> in the driver. The lengths are extracted using vector operations, but those
> 16-bit lengths are then assigned using scalar operations i.e. 16-bit
> stores.
> 
> These regular 16-bit stores actually have two effects in the code:
> * they cause the "Loads blocked by Store Forwards" issues reported
> * they also cause the previous loads in the RX function to actually be a
> load followed by a store to an address on the stack, because the 16-bit
> assignment can't be done to an xmm register.
> 
> By converting the 16-bit stores operations into a sequence of SSE blend
> operations, we can ensure that the descriptor loads only occur once, and
> avoid both the additional store and loads from the stack, as well as the
> stalls due to the second loads being blocked.
> 
> Signed-off-by: Bruce Richardson 
> 
Self-NAK on this version. The blend instruction used is SSE4.1 so breaks the
"default" build.

Two obvious options to fix this:
1. Keep the old code with SSE4.1 #ifdefs separating old and new
2. Update the vpmd requirement to SSE4.1, and factor that in during runtime
select of the RX code path.

Personally, I prefer the second option. Any objections?

/Bruce

[dpdk-dev] [PATCH 02/36] mempool: replace elt_size by total_elt_size

2016-04-14 Thread Wiles, Keith

>In some mempool functions, we use the size of the elements as arguments or in
>variables. There is a confusion between the size including or not including
>the header and trailer.
>
>To avoid this confusion:
>- update the API documentation
>- rename the variables and argument names as "elt_size" when the size does not
>  include the header and trailer, or else as "total_elt_size".
>
>Signed-off-by: Olivier Matz 

Acked by: Keith Wiles 
>---
> lib/librte_mempool/rte_mempool.c | 21 +++--
> lib/librte_mempool/rte_mempool.h | 19 +++
> 2 files changed, 22 insertions(+), 18 deletions(-)
>
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index ce78476..90b5b1b 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -156,13 +156,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
>uint32_t obj_idx,
>  *
>  * Given the pointer to the memory, and its topology in physical memory
>  * (the physical addresses table), iterate through the "elt_num" objects
>- * of size "total_elt_sz" aligned at "align". For each object in this memory
>+ * of size "elt_sz" aligned at "align". For each object in this memory
>  * chunk, invoke a callback. It returns the effective number of objects
>  * in this memory. */
> uint32_t
>-rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
>align,
>-  const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
>-  rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
>+rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
>+  size_t align, const phys_addr_t paddr[], uint32_t pg_num,
>+  uint32_t pg_shift, rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
> {
>   uint32_t i, j, k;
>   uint32_t pgn, pgf;
>@@ -178,7 +178,7 @@ rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t 
>elt_sz, size_t align,
>   while (i != elt_num && j != pg_num) {
> 
>   start = RTE_ALIGN_CEIL(va, align);
>-  end = start + elt_sz;
>+  end = start + total_elt_sz;
> 
>   /* index of the first page for the next element. */
>   pgf = (end >> pg_shift) - (start >> pg_shift);
>@@ -255,6 +255,7 @@ mempool_populate(struct rte_mempool *mp, size_t num, 
>size_t align,
>   mempool_obj_populate, );
> }
> 
>+/* get the header, trailer and total size of a mempool element. */
> uint32_t
> rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>   struct rte_mempool_objsz *sz)
>@@ -332,17 +333,17 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
>flags,
>  * Calculate maximum amount of memory required to store given number of 
> objects.
>  */
> size_t
>-rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, uint32_t pg_shift)
>+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t 
>pg_shift)
> {
>   size_t n, pg_num, pg_sz, sz;
> 
>   pg_sz = (size_t)1 << pg_shift;
> 
>-  if ((n = pg_sz / elt_sz) > 0) {
>+  if ((n = pg_sz / total_elt_sz) > 0) {
>   pg_num = (elt_num + n - 1) / n;
>   sz = pg_num << pg_shift;
>   } else {
>-  sz = RTE_ALIGN_CEIL(elt_sz, pg_sz) * elt_num;
>+  sz = RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;
>   }
> 
>   return sz;
>@@ -362,7 +363,7 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
>void *end,
>  * given memory footprint to store required number of elements.
>  */
> ssize_t
>-rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
>+rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
>   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
> {
>   uint32_t n;
>@@ -373,7 +374,7 @@ rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, 
>size_t elt_sz,
>   va = (uintptr_t)vaddr;
>   uv = va;
> 
>-  if ((n = rte_mempool_obj_iter(vaddr, elt_num, elt_sz, 1,
>+  if ((n = rte_mempool_obj_iter(vaddr, elt_num, total_elt_sz, 1,
>   paddr, pg_num, pg_shift, mempool_lelem_iter,
>   )) != elt_num) {
>   return -(ssize_t)n;
>diff --git a/lib/librte_mempool/rte_mempool.h 
>b/lib/librte_mempool/rte_mempool.h
>index bd78df5..ca4657f 100644
>--- a/lib/librte_mempool/rte_mempool.h
>+++ b/lib/librte_mempool/rte_mempool.h
>@@ -1289,7 +1289,7 @@ struct rte_mempool *rte_mempool_lookup(const char *name);
>  * calculates header, trailer, body and total sizes of the mempool object.
>  *
>  * @param elt_size
>- *   The size of each element.
>+ *   The size of each element, without header and trailer.
>  * @param flags
>  *   The flags used for the mempool creation.
>  *   Consult rte_mempool_create() for more information about possible values.
>@@ -1315,14 +1315,15 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, 
>uint32_t flags,
>  *
>  * @param elt_num
>  *   Number of elements.
>- * @param elt_sz
>- *   The size of

[dpdk-dev] [PATCH 01/36] mempool: fix comments and style

2016-04-14 Thread Wiles, Keith

>No functional change, just fix some comments and styling issues.
>Also avoid to duplicate comments between rte_mempool_create()
>and rte_mempool_xmem_create().
>
>Signed-off-by: Olivier Matz 

Acked by: Keith Wiles 
>---
> lib/librte_mempool/rte_mempool.c | 17 +---
> lib/librte_mempool/rte_mempool.h | 59 +---
> 2 files changed, 26 insertions(+), 50 deletions(-)
>
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index 7a0e07e..ce78476 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -152,6 +152,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
>uint32_t obj_idx,
>   rte_ring_sp_enqueue(mp->ring, obj);
> }
> 
>+/* Iterate through objects at the given address
>+ *
>+ * Given the pointer to the memory, and its topology in physical memory
>+ * (the physical addresses table), iterate through the "elt_num" objects
>+ * of size "total_elt_sz" aligned at "align". For each object in this memory
>+ * chunk, invoke a callback. It returns the effective number of objects
>+ * in this memory. */
> uint32_t
> rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
> align,
>   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
>@@ -341,10 +348,8 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, 
>uint32_t pg_shift)
>   return sz;
> }
> 
>-/*
>- * Calculate how much memory would be actually required with the
>- * given memory footprint to store required number of elements.
>- */
>+/* Callback used by rte_mempool_xmem_usage(): it sets the opaque
>+ * argument to the end of the object. */
> static void
> mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
>   __rte_unused uint32_t idx)
>@@ -352,6 +357,10 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
>void *end,
>   *(uintptr_t *)arg = (uintptr_t)end;
> }
> 
>+/*
>+ * Calculate how much memory would be actually required with the
>+ * given memory footprint to store required number of elements.
>+ */
> ssize_t
> rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
>   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
>diff --git a/lib/librte_mempool/rte_mempool.h 
>b/lib/librte_mempool/rte_mempool.h
>index 8595e77..bd78df5 100644
>--- a/lib/librte_mempool/rte_mempool.h
>+++ b/lib/librte_mempool/rte_mempool.h
>@@ -214,7 +214,7 @@ struct rte_mempool {
> 
> }  __rte_cache_aligned;
> 
>-#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread in memory. */
>+#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread among memory 
>channels. */
> #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache 
> lines.*/
> #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
> "single-producer".*/
> #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
> "single-consumer".*/
>@@ -270,7 +270,8 @@ struct rte_mempool {
> /* return the header of a mempool object (internal) */
> static inline struct rte_mempool_objhdr *__mempool_get_header(void *obj)
> {
>-  return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj, sizeof(struct 
>rte_mempool_objhdr));
>+  return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj,
>+  sizeof(struct rte_mempool_objhdr));
> }
> 
> /**
>@@ -544,8 +545,9 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
>elt_size,
> /**
>  * Create a new mempool named *name* in memory.
>  *
>- * This function uses ``memzone_reserve()`` to allocate memory. The
>- * pool contains n elements of elt_size. Its size is set to n.
>+ * The pool contains n elements of elt_size. Its size is set to n.
>+ * This function uses ``memzone_reserve()`` to allocate the mempool header
>+ * (and the objects if vaddr is NULL).
>  * Depending on the input parameters, mempool elements can be either allocated
>  * together with the mempool header, or an externally provided memory buffer
>  * could be used to store mempool objects. In later case, that external
>@@ -560,18 +562,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
>elt_size,
>  * @param elt_size
>  *   The size of each element.
>  * @param cache_size
>- *   If cache_size is non-zero, the rte_mempool library will try to
>- *   limit the accesses to the common lockless pool, by maintaining a
>- *   per-lcore object cache. This argument must be lower or equal to
>- *   CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE. It is advised to choose
>- *   cache_size to have "n modulo cache_size == 0": if this is
>- *   not the case, some elements will always stay in the pool and will
>- *   never be used. The access to the per-lcore table is of course
>- *   faster than the multi-producer/consumer pool. The cache can be
>- *   disabled if the cache_size argument is set to 0; it can be useful to
>- *   avoid losing objects in cache. Note that even if not used, the
>- *   memory space for cache is always reserved in a mempool structure,
>- *   except if

[dpdk-dev] [PATCH v2 0/5] virtio support for container

2016-04-14 Thread Tan, Jianfeng

Hi Thomas,

On 4/14/2016 12:14 AM, Thomas Monjalon wrote:
> Hi Jianfeng,
>
> Thanks for raising the container issues and proposing some solutions.
> General comments below.
>
> 2016-02-05 19:20, Jianfeng Tan:
>> This patchset is to provide high performance networking interface (virtio)
>> for container-based DPDK applications. The way of starting DPDK apps in
>> containers with ownership of NIC devices exclusively is beyond the scope.
>> The basic idea here is to present a new virtual device (named eth_cvio),
>> which can be discovered and initialized in container-based DPDK apps using
>> rte_eal_init(). To minimize the change, we reuse already-existing virtio
>> frontend driver code (driver/net/virtio/).
>>   
>> Compared to QEMU/VM case, virtio device framework (translates I/O port r/w
>> operations into unix socket/cuse protocol, which is originally provided in
>> QEMU), is integrated in virtio frontend driver. So this converged driver
>> actually plays the role of original frontend driver and the role of QEMU
>> device framework.
>>   
>> The major difference lies in how to calculate relative address for vhost.
>> The principle of virtio is that: based on one or multiple shared memory
>> segments, vhost maintains a reference system with the base addresses and
>> length for each segment so that an address from VM comes (usually GPA,
>> Guest Physical Address) can be translated into vhost-recognizable address
>> (named VVA, Vhost Virtual Address). To decrease the overhead of address
>> translation, we should maintain as few segments as possible. In VM's case,
>> GPA is always locally continuous. In container's case, CVA (Container
>> Virtual Address) can be used. Specifically:
>> a. when set_base_addr, CVA address is used;
>> b. when preparing RX's descriptors, CVA address is used;
>> c. when transmitting packets, CVA is filled in TX's descriptors;
>> d. in TX and CQ's header, CVA is used.
>>   
>> How to share memory? In VM's case, qemu always shares all physical layout
>> to backend. But it's not feasible for a container, as a process, to share
>> all virtual memory regions to backend. So only specified virtual memory
>> regions (with type of shared) are sent to backend. It's a limitation that
>> only addresses in these areas can be used to transmit or receive packets.
>>
>> Known issues
>>
>> a. When used with vhost-net, root privilege is required to create tap
>> device inside.
>> b. Control queue and multi-queue are not supported yet.
>> c. When --single-file option is used, socket_id of the memory may be
>> wrong. (Use "numactl -N x -m x" to work around this for now)
> There are 2 different topics in this patchset:
> 1/ How to provide networking in containers
> 2/ How to provide memory in containers
>
> 1/ You have decided to use the virtio spec to bridge the host
> with its containers. But there is no virtio device in a container
> and no vhost interface in the host (except the kernel one).
> So you are extending virtio to work as a vdev inside the container.
> Could you explain what is the datapath between virtio and the host app?

The datapath is based on the shared memory, which is determined using 
vhost-user protocol through a unix socket. So the key condition in this 
approach is to map the unix socket into container.


> Does it need to use a fake device from Qemu as Tetsuya has done?

In this implementation, we don't need a fake device from Qemu as Tetsuya 
is doing. We just maintain a virtual virtio device in DPDK EAL layer, 
and talk to vhost via unix socket. OK, I think it's necessary to point 
out the implementation difference between the two implementation: this 
approach gets involved existing virtio PMD at the layer of struct 
virtio_pci_ops, but Tetsuya's solution intercepts r/w toioport or pci 
configuration space.


>
> Do you think there can be some alternatives to vhost/virtio in containers?

Yeah, we were considering another way to create virtual virtio in kernel 
space, which is driven by a new kernel module (instead of virtio-net) 
and a new library (maybe in DPDK). Then control path goes from app -> 
library -> kernel -> vhost user (or vhost-net), and data path is still 
based on the negotiated shared memory and some vring structures inside 
the memory. However, this involves another new kernel module, I don't 
think it's a easy way to go.


>
> 2/ The memory management is already a mess and it's going worst.
> I think we need to think the requirements first and then write a proper
> implementation to cover every identified needs.
> I have started a new thread to cover this part:
>   http://thread.gmane.org/gmane.comp.networking.dpdk.devel/37445

I agree we should isolate the memory problem from network interface 
problem. And the memory problem is not a blocker issue for this patch, 
we can go without changing the memory part, however, it makes it hard to 
use. We'll go to the thread to discuss this more.

Thanks,
Jianfeng

[dpdk-dev] [PATCH 00/36] mempool: rework memory allocation

2016-04-14 Thread Wiles, Keith

>
>
>On 04/14/2016 03:50 PM, Wiles, Keith wrote:
>>> This series is a rework of mempool. For those who don't want to read
>>> all the cover letter, here is a sumary:
>>>
>>> - it is not possible to allocate large mempools if there is not enough
>>>   contiguous memory, this series solves this issue
>>> - introduce new APIs with less arguments: "create, populate, obj_init"
>>> - allow to free a mempool
>>> - split code in smaller functions, will ease the introduction of ext_handler
>>> - remove test-pmd anonymous mempool creation
>>> - remove most of dom0-specific mempool code
>>> - opens the door for a eal_memory rework: we probably don't need large
>>>   contiguous memory area anymore, working with pages would work.
>>>
>>> This breaks the ABI as it was indicated in the deprecation for 16.04.
>>> The API stays almost the same, no modification is needed in examples app
>>> or in test-pmd. Only kni and mellanox drivers are slightly modified.
>>>
>>> This patch applies on top of 16.04 + v5 of Keith's patch:
>>> "mempool: reduce rte_mempool structure size"
>>
>> I have not digested this complete patch yet, but this one popped out at me 
>> as the External Memory Manager support is setting in the wings for 16.07 
>> release. If this causes the EMM patch to be rewritten or updated that seems 
>> like a problem to me. Does this patch add the External Memory Manager 
>> support?
>> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32015/focus=35107
>
>I've reworked the series you are referring to, and rebased it on top
>of this series. Please see:
>http://dpdk.org/ml/archives/dev/2016-April/037509.html

Thanks I just saw that update :-)

>
>Regards,
>Olivier
>


Regards,
Keith

[dpdk-dev] [PATCH] i40e: improve performance of vector PMD

2016-04-14 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Thursday, April 14, 2016 2:50 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin; Wu, Jingjing
> Subject: Re: [dpdk-dev] [PATCH] i40e: improve performance of vector PMD
> 
> On Thu, Apr 14, 2016 at 11:15:21AM +0100, Bruce Richardson wrote:
> > An analysis of the i40e code using Intel? VTune? Amplifier 2016 showed
> > that the code was unexpectedly causing stalls due to "Loads blocked by
> > Store Forwards". This can occur when a load from memory has to wait
> > due to the prior store being to the same address, but being of a smaller
> > size i.e. the stored value cannot be directly returned to the loader.
> > [See ref: https://software.intel.com/en-us/node/544454]
> >
> > These stalls are due to the way in which the data_len values are handled
> > in the driver. The lengths are extracted using vector operations, but those
> > 16-bit lengths are then assigned using scalar operations i.e. 16-bit
> > stores.
> >
> > These regular 16-bit stores actually have two effects in the code:
> > * they cause the "Loads blocked by Store Forwards" issues reported
> > * they also cause the previous loads in the RX function to actually be a
> > load followed by a store to an address on the stack, because the 16-bit
> > assignment can't be done to an xmm register.
> >
> > By converting the 16-bit stores operations into a sequence of SSE blend
> > operations, we can ensure that the descriptor loads only occur once, and
> > avoid both the additional store and loads from the stack, as well as the
> > stalls due to the second loads being blocked.
> >
> > Signed-off-by: Bruce Richardson 
> >
> Self-NAK on this version. The blend instruction used is SSE4.1 so breaks the
> "default" build.
> 
> Two obvious options to fix this:
> 1. Keep the old code with SSE4.1 #ifdefs separating old and new
> 2. Update the vpmd requirement to SSE4.1, and factor that in during runtime
> select of the RX code path.
> 
> Personally, I prefer the second option. Any objections?

+1 for second one.

> 
> /Bruce

[dpdk-dev] [PATCH v5] mempool: reduce rte_mempool structure size

2016-04-14 Thread Wiles, Keith

>From: Keith Wiles 
>
>The rte_mempool structure is changed, which will cause an ABI change
>for this structure. Providing backward compat is not reasonable
>here as this structure is used in multiple defines/inlines.
>
>Allow mempool cache support to be dynamic depending on if the
>mempool being created needs cache support. Saves about 1.5M of
>memory used by the rte_mempool structure.
>
>Allocating small mempools which do not require cache can consume
>larges amounts of memory if you have a number of these mempools.
>
>Change to be effective in release 16.07.
>
>Signed-off-by: Keith Wiles 
>Acked-by: Olivier Matz 

For the change to this patch:
Acked-by: Keith Wiles 

>---
>
>Changes in v5:
>
>- use RTE_PTR_ADD() instead of cast to (char *) to fix compilation on tilera.
>  Error log was:
>
>  rte_mempool.c: In function ?rte_mempool_xmem_create?:
>  rte_mempool.c:595: error: cast increases required alignment of target type
>
>
> app/test/test_mempool.c  |  4 +--
> lib/librte_mempool/rte_mempool.c | 55 ++--
> lib/librte_mempool/rte_mempool.h | 29 ++---
> 3 files changed, 40 insertions(+), 48 deletions(-)
>
>diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
>index f0f823b..10e1fa4 100644
>--- a/app/test/test_mempool.c
>+++ b/app/test/test_mempool.c
>@@ -122,8 +122,8 @@ test_mempool_basic(void)
>   return -1;
> 
>   printf("get private data\n");
>-  if (rte_mempool_get_priv(mp) !=
>-  (char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
>+  if (rte_mempool_get_priv(mp) != (char *)mp +
>+  MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
>   return -1;
> 
>   printf("get physical address of an object\n");
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index f8781e1..7a0e07e 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -452,12 +452,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   /* compilation-time checks */
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
> RTE_CACHE_LINE_MASK) != 0);
>-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
> RTE_CACHE_LINE_MASK) != 0);
>-  RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, local_cache) &
>-RTE_CACHE_LINE_MASK) != 0);
>-#endif
> #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
> RTE_CACHE_LINE_MASK) != 0);
>@@ -527,9 +523,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>*/
>   int head = sizeof(struct rte_mempool);
>   int new_size = (private_data_size + head) % page_size;
>-  if (new_size) {
>+  if (new_size)
>   private_data_size += page_size - new_size;
>-  }
>   }
> 
>   /* try to allocate tailq entry */
>@@ -544,7 +539,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>* store mempool objects. Otherwise reserve a memzone that is large
>* enough to hold mempool header and metadata plus mempool objects.
>*/
>-  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
>+  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size);
>+  mempool_size += private_data_size;
>   mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
>   if (vaddr == NULL)
>   mempool_size += (size_t)objsz.total_size * n;
>@@ -591,8 +587,15 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
>   mp->private_data_size = private_data_size;
> 
>+  /*
>+   * local_cache pointer is set even if cache_size is zero.
>+   * The local_cache points to just past the elt_pa[] array.
>+   */
>+  mp->local_cache = (struct rte_mempool_cache *)
>+  RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
>+
>   /* calculate address of the first element for continuous mempool. */
>-  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
>+  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
>   private_data_size;
>   obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);
> 
>@@ -606,9 +609,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->elt_va_start = (uintptr_t)obj;
>   mp->elt_pa[0] = mp->phys_addr +
>   (mp->elt_va_start - (uintptr_t)mp);
>-
>-  /* mempool elements in a separate chunk of memory. */
>   } else {
>+  /* mempool elements in a separate chunk of memory. */
>   mp->elt_va_start = (uintptr_t)vaddr;
>

[dpdk-dev] [PATCH v5] mempool: reduce rte_mempool structure size

2016-04-14 Thread Wiles, Keith

>From: Keith Wiles 
>
>The rte_mempool structure is changed, which will cause an ABI change
>for this structure. Providing backward compat is not reasonable
>here as this structure is used in multiple defines/inlines.
>
>Allow mempool cache support to be dynamic depending on if the
>mempool being created needs cache support. Saves about 1.5M of
>memory used by the rte_mempool structure.
>
>Allocating small mempools which do not require cache can consume
>larges amounts of memory if you have a number of these mempools.
>
>Change to be effective in release 16.07.
>
>Signed-off-by: Keith Wiles 
>Acked-by: Olivier Matz 
>---
>
>Changes in v5:
>
>- use RTE_PTR_ADD() instead of cast to (char *) to fix compilation on tilera.
>  Error log was:
>
>  rte_mempool.c: In function ?rte_mempool_xmem_create?:
>  rte_mempool.c:595: error: cast increases required alignment of target type
>
>
> app/test/test_mempool.c  |  4 +--
> lib/librte_mempool/rte_mempool.c | 55 ++--
> lib/librte_mempool/rte_mempool.h | 29 ++---
> 3 files changed, 40 insertions(+), 48 deletions(-)
>
>diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
>index f0f823b..10e1fa4 100644
>--- a/app/test/test_mempool.c
>+++ b/app/test/test_mempool.c
>@@ -122,8 +122,8 @@ test_mempool_basic(void)
>   return -1;
> 
>   printf("get private data\n");
>-  if (rte_mempool_get_priv(mp) !=
>-  (char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
>+  if (rte_mempool_get_priv(mp) != (char *)mp +
>+  MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))

Should we not add the RTE_PTR_ADD() here as well?

>   return -1;
> 
>   printf("get physical address of an object\n");
>diff --git a/lib/librte_mempool/rte_mempool.c 
>b/lib/librte_mempool/rte_mempool.c
>index f8781e1..7a0e07e 100644
>--- a/lib/librte_mempool/rte_mempool.c
>+++ b/lib/librte_mempool/rte_mempool.c
>@@ -452,12 +452,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   /* compilation-time checks */
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
> RTE_CACHE_LINE_MASK) != 0);
>-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
> RTE_CACHE_LINE_MASK) != 0);
>-  RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, local_cache) &
>-RTE_CACHE_LINE_MASK) != 0);
>-#endif
> #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>   RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
> RTE_CACHE_LINE_MASK) != 0);
>@@ -527,9 +523,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>*/
>   int head = sizeof(struct rte_mempool);
>   int new_size = (private_data_size + head) % page_size;
>-  if (new_size) {
>+  if (new_size)
>   private_data_size += page_size - new_size;
>-  }
>   }
> 
>   /* try to allocate tailq entry */
>@@ -544,7 +539,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>* store mempool objects. Otherwise reserve a memzone that is large
>* enough to hold mempool header and metadata plus mempool objects.
>*/
>-  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
>+  mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size);
>+  mempool_size += private_data_size;
>   mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
>   if (vaddr == NULL)
>   mempool_size += (size_t)objsz.total_size * n;
>@@ -591,8 +587,15 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
>   mp->private_data_size = private_data_size;
> 
>+  /*
>+   * local_cache pointer is set even if cache_size is zero.
>+   * The local_cache points to just past the elt_pa[] array.
>+   */
>+  mp->local_cache = (struct rte_mempool_cache *)
>+  RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
>+
>   /* calculate address of the first element for continuous mempool. */
>-  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
>+  obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
>   private_data_size;
>   obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);
> 
>@@ -606,9 +609,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
>unsigned elt_size,
>   mp->elt_va_start = (uintptr_t)obj;
>   mp->elt_pa[0] = mp->phys_addr +
>   (mp->elt_va_start - (uintptr_t)mp);
>-
>-  /* mempool elements in a separate chunk of memory. */
>   } else {
>+  /* mempool elements in a separate chunk of memory. */
>   mp->elt_va_start = (uintptr_t)vaddr;
>

[dpdk-dev] ethtool doesnt work on some interface after unbinding dpdk

2016-04-14 Thread Gopakumar Choorakkot Edakkunni

Hi Remy,

Thanks for the response. The error is "No such device", some snippets
below. And no I was not using the dpdk ethtool

ge8->   06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit
Network Connection (rev 01)

root:~# ls /sys/class/net/ge8/device/driver/module/drivers/
pci:igb
root:~#

root:~# ethtool ge8
Settings for ge8:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available

Rgds,
Gopa.

On Thu, Apr 14, 2016 at 12:15 AM, Remy Horton  wrote:

> Morning,
>
> On 13/04/2016 15:48, Gopakumar Choorakkot Edakkunni wrote:
> [..]
>
>> then after a while I
>> unbind from igb_uio and bind them back to igb/ixgbe. At this point, one of
>> the 4 igb ports (random) stops responding to ethtool, ethtool bails out
>> with some error. But otherwise the interface seems to work fine, it has a
>> linux interface created and pops up in /sys/class/net etc.. Has anyone
>> seen
>> this before ? I thought of checking before starting to debug this further
>>
>
> Can you give details of the error? If you were you unbinding from igb_uio
> while examples/ethtool was still running it likley caused something to trip
> up, as at least DPDK ethtool itself was not made with run-time unbinding in
> mind.
>
> Regards,
>
> ..Remy
>

[dpdk-dev] [PATCH 36/36] mempool: update copyright

2016-04-14 Thread Olivier Matz

Update the copyright of files touched by this patch series.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 1 +
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7d4cabe..7104a41 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e6a257f..96bd047 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -2,6 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016 6WIND S.A.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
-- 
2.1.4

[dpdk-dev] [PATCH 35/36] app/test: rework mempool test

2016-04-14 Thread Olivier Matz

Rework the mempool test to better indicate where it failed,
and, now that this feature is available, add the freeing of the
mempool after the test is done.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c | 232 +++-
 1 file changed, 129 insertions(+), 103 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index e0d5c61..c96ed27 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -77,13 +77,13 @@
 #define MAX_KEEP 128
 #define MEMPOOL_SIZE 
((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)

-static struct rte_mempool *mp;
-static struct rte_mempool *mp_cache, *mp_nocache;
+#define RET_ERR() do { \
+   printf("test failed at %s():%d\n", __func__, __LINE__); \
+   return -1;  \
+   } while(0)

 static rte_atomic32_t synchro;

-
-
 /*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
@@ -93,13 +93,14 @@ my_obj_init(struct rte_mempool *mp, __attribute__((unused)) 
void *arg,
void *obj, unsigned i)
 {
uint32_t *objnum = obj;
+
memset(obj, 0, mp->elt_size);
*objnum = i;
 }

 /* basic tests (done on one core) */
 static int
-test_mempool_basic(void)
+test_mempool_basic(struct rte_mempool *mp)
 {
uint32_t *objnum;
void **objtable;
@@ -113,23 +114,23 @@ test_mempool_basic(void)

printf("get an object\n");
if (rte_mempool_get(mp, ) < 0)
-   return -1;
+   RET_ERR();
rte_mempool_dump(stdout, mp);

/* tests that improve coverage */
printf("get object count\n");
if (rte_mempool_count(mp) != MEMPOOL_SIZE - 1)
-   return -1;
+   RET_ERR();

printf("get private data\n");
if (rte_mempool_get_priv(mp) != (char *)mp +
MEMPOOL_HEADER_SIZE(mp, mp->cache_size))
-   return -1;
+   RET_ERR();

 #ifndef RTE_EXEC_ENV_BSDAPP /* rte_mem_virt2phy() not supported on bsd */
printf("get physical address of an object\n");
if (rte_mempool_virt2phy(mp, obj) != rte_mem_virt2phy(obj))
-   return -1;
+   RET_ERR();
 #endif

printf("put the object back\n");
@@ -138,10 +139,10 @@ test_mempool_basic(void)

printf("get 2 objects\n");
if (rte_mempool_get(mp, ) < 0)
-   return -1;
+   RET_ERR();
if (rte_mempool_get(mp, ) < 0) {
rte_mempool_put(mp, obj);
-   return -1;
+   RET_ERR();
}
rte_mempool_dump(stdout, mp);

@@ -155,11 +156,10 @@ test_mempool_basic(void)
 * on other cores may not be empty.
 */
objtable = malloc(MEMPOOL_SIZE * sizeof(void *));
-   if (objtable == NULL) {
-   return -1;
-   }
+   if (objtable == NULL)
+   RET_ERR();

-   for (i=0; i MEMPOOL_SIZE) {
-   printf("bad object number\n");
+   printf("bad object number(%d)\n", *objnum);
ret = -1;
break;
}
-   for (j=sizeof(*objnum); jelt_size; j++) {
+   for (j = sizeof(*objnum); j < mp->elt_size; j++) {
if (obj_data[j] != 0)
ret = -1;
}
@@ -196,14 +196,17 @@ static int 
test_mempool_creation_with_exceeded_cache_size(void)
 {
struct rte_mempool *mp_cov;

-   mp_cov = 
rte_mempool_create("test_mempool_creation_with_exceeded_cache_size", 
MEMPOOL_SIZE,
- MEMPOOL_ELT_SIZE,
- RTE_MEMPOOL_CACHE_MAX_SIZE + 32, 
0,
- NULL, NULL,
- my_obj_init, NULL,
- SOCKET_ID_ANY, 0);
-   if(NULL != mp_cov) {
-   return -1;
+   mp_cov = rte_mempool_create("test_mempool_cache_too_big",
+   MEMPOOL_SIZE,
+   MEMPOOL_ELT_SIZE,
+   RTE_MEMPOOL_CACHE_MAX_SIZE + 32, 0,
+   NULL, NULL,
+   my_obj_init, NULL,
+   SOCKET_ID_ANY, 0);
+
+   if (mp_cov != NULL) {
+   rte_mempool_free(mp_cov);
+   RET_ERR();
}

return 0;
@@ -241,8 +244,8 @@ static int test_mempool_single_producer(void)
continue;
}
if

[dpdk-dev] [PATCH 34/36] mempool: new flag when phys contig mem is not needed

2016-04-14 Thread Olivier Matz

Add a new flag to remove the constraint of having physically contiguous
objects inside a mempool.

Add this flag to the log history mempool to start, but we could add
it in most cases where objects are not mbufs.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/eal_common_log.c |  2 +-
 lib/librte_mempool/rte_mempool.c   | 23 ---
 lib/librte_mempool/rte_mempool.h   |  5 +
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index 1ae8de7..9122b34 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -322,7 +322,7 @@ rte_eal_common_log_init(FILE *default_log)
LOG_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
-   SOCKET_ID_ANY, 0);
+   SOCKET_ID_ANY, MEMPOOL_F_NO_PHYS_CONTIG);

if ((log_history_mp == NULL) &&
((log_history_mp = rte_mempool_lookup(LOG_HISTORY_MP_NAME)) == 
NULL)){
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 1f998ef..7d4cabe 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -410,7 +410,11 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,

while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
off += mp->header_size;
-   mempool_add_elem(mp, (char *)vaddr + off, paddr + off);
+   if (paddr == RTE_BAD_PHYS_ADDR)
+   mempool_add_elem(mp, (char *)vaddr + off,
+   RTE_BAD_PHYS_ADDR);
+   else
+   mempool_add_elem(mp, (char *)vaddr + off, paddr + off);
off += mp->elt_size + mp->trailer_size;
i++;
}
@@ -439,6 +443,10 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, char 
*vaddr,
if (mp->nb_mem_chunks != 0)
return -EEXIST;

+   if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+   return rte_mempool_populate_phys(mp, vaddr, RTE_BAD_PHYS_ADDR,
+   pg_num * pg_sz, free_cb, opaque);
+
for (i = 0; i < pg_num && mp->populated_size < mp->size; i += n) {

/* populate with the largest group of contiguous pages */
@@ -479,6 +487,10 @@ rte_mempool_populate_virt(struct rte_mempool *mp, char 
*addr,
if (RTE_ALIGN_CEIL(len, pg_sz) != len)
return -EINVAL;

+   if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+   return rte_mempool_populate_phys(mp, addr, RTE_BAD_PHYS_ADDR,
+   len, free_cb, opaque);
+
for (off = 0; off + pg_sz <= len &&
 mp->populated_size < mp->size; off += phys_len) {

@@ -528,6 +540,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz;
size_t size, total_elt_sz, align, pg_sz, pg_shift;
+   phys_addr_t paddr;
unsigned mz_id, n;
int ret;

@@ -567,10 +580,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
goto fail;
}

-   /* use memzone physical address if it is valid */
+   if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+   paddr = RTE_BAD_PHYS_ADDR;
+   else
+   paddr = mz->phys_addr;
+
if (rte_eal_has_hugepages() && !rte_xen_dom0_supported())
ret = rte_mempool_populate_phys(mp, mz->addr,
-   mz->phys_addr, mz->len,
+   paddr, mz->len,
rte_mempool_memchunk_mz_free,
(void *)(uintptr_t)mz);
else
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index fe4e6fd..e6a257f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -235,6 +235,7 @@ struct rte_mempool {
 #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
"single-producer".*/
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
 #define MEMPOOL_F_RING_CREATED   0x0010 /**< Internal: ring is created */
+#define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous 
objs. */

 /**
  * @internal When debug is enabled, store some statistics.
@@ -417,6 +418,8 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, 
void *);
  *   - MEMPOOL_F_SC_GET: If this flag is set, the default behavior
  * when using rte_mempool_get() or rte_mempool_get_bulk() is
  * "single-consumer". Otherwise, it is "multi-consumers".
+ *   - MEMPOOL_F_NO_PHYS_CONTIG: If set, allocated objects won't
+ * necessarilly be contiguous in physical memory.
  * @return
  *

[dpdk-dev] [PATCH 33/36] mem: avoid memzone/mempool/ring name truncation

2016-04-14 Thread Olivier Matz

Check the return value of snprintf to ensure that the name of
the object is not truncated.

By the way, update the test to avoid to trigger an error in
that case.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c| 12 
 lib/librte_eal/common/eal_common_memzone.c | 10 +-
 lib/librte_mempool/rte_mempool.c   | 20 
 lib/librte_ring/rte_ring.c | 16 +---
 4 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 2bc3ac0..e0d5c61 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -407,21 +407,25 @@ test_mempool_same_name_twice_creation(void)
 {
struct rte_mempool *mp_tc;

-   mp_tc = rte_mempool_create("test_mempool_same_name_twice_creation", 
MEMPOOL_SIZE,
+   mp_tc = rte_mempool_create("test_mempool_same_name", MEMPOOL_SIZE,
MEMPOOL_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
SOCKET_ID_ANY, 0);
-   if (NULL == mp_tc)
+   if (NULL == mp_tc) {
+   printf("cannot create mempool\n");
return -1;
+   }

-   mp_tc = rte_mempool_create("test_mempool_same_name_twice_creation", 
MEMPOOL_SIZE,
+   mp_tc = rte_mempool_create("test_mempool_same_name", MEMPOOL_SIZE,
MEMPOOL_ELT_SIZE, 0, 0,
NULL, NULL,
NULL, NULL,
SOCKET_ID_ANY, 0);
-   if (NULL != mp_tc)
+   if (NULL != mp_tc) {
+   printf("should not be able to create mempool\n");
return -1;
+   }

return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index 711c845..774eb5d 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -126,6 +126,7 @@ static const struct rte_memzone *
 memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
+   struct rte_memzone *mz;
struct rte_mem_config *mcfg;
size_t requested_len;
int socket, i;
@@ -148,6 +149,13 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
return NULL;
}

+   if (strlen(name) >= sizeof(mz->name) - 1) {
+   RTE_LOG(DEBUG, EAL, "%s(): memzone <%s>: name too long\n",
+   __func__, name);
+   rte_errno = EEXIST;
+   return NULL;
+   }
+
/* if alignment is not a power of two */
if (align && !rte_is_power_of_2(align)) {
RTE_LOG(ERR, EAL, "%s(): Invalid alignment: %u\n", __func__,
@@ -223,7 +231,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);

/* fill the zone in config */
-   struct rte_memzone *mz = get_next_free_memzone();
+   mz = get_next_free_memzone();

if (mz == NULL) {
RTE_LOG(ERR, EAL, "%s(): Cannot find free memzone but there is 
room "
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 4850f5d..1f998ef 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -303,11 +303,14 @@ rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t 
elt_num,
 static int
 rte_mempool_ring_create(struct rte_mempool *mp)
 {
-   int rg_flags = 0;
+   int rg_flags = 0, ret;
char rg_name[RTE_RING_NAMESIZE];
struct rte_ring *r;

-   snprintf(rg_name, sizeof(rg_name), RTE_MEMPOOL_MZ_FORMAT, mp->name);
+   ret = snprintf(rg_name, sizeof(rg_name),
+   RTE_MEMPOOL_MZ_FORMAT, mp->name);
+   if (ret < 0 || ret >= (int)sizeof(rg_name))
+   return -ENAMETOOLONG;

/* ring flags */
if (mp->flags & MEMPOOL_F_SP_PUT)
@@ -692,6 +695,7 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
size_t mempool_size;
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
struct rte_mempool_objsz objsz;
+   int ret;

/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
@@ -745,7 +749,11 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);

-   snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT, name);
+   ret = snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT, name);
+   if (ret < 0 || ret

[dpdk-dev] [PATCH 32/36] test-pmd: remove specific anon mempool code

2016-04-14 Thread Olivier Matz

Now that mempool library provide functions to populate with anonymous
mmap'd memory, we can remove this specific code from test-pmd.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/Makefile|   4 -
 app/test-pmd/mempool_anon.c  | 201 ---
 app/test-pmd/mempool_osdep.h |  54 
 app/test-pmd/testpmd.c   |  23 +++--
 4 files changed, 14 insertions(+), 268 deletions(-)
 delete mode 100644 app/test-pmd/mempool_anon.c
 delete mode 100644 app/test-pmd/mempool_osdep.h

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 72426f3..40039a1 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -58,11 +58,7 @@ SRCS-y += txonly.c
 SRCS-y += csumonly.c
 SRCS-y += icmpecho.c
 SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c
-SRCS-y += mempool_anon.c

-ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
-CFLAGS_mempool_anon.o := -D_GNU_SOURCE
-endif
 CFLAGS_cmdline.o := -D_GNU_SOURCE

 # this application needs libraries first
diff --git a/app/test-pmd/mempool_anon.c b/app/test-pmd/mempool_anon.c
deleted file mode 100644
index 5e23848..000
--- a/app/test-pmd/mempool_anon.c
+++ /dev/null
@@ -1,201 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include 
-#include 
-#include "mempool_osdep.h"
-#include 
-
-#ifdef RTE_EXEC_ENV_LINUXAPP
-
-#include 
-#include 
-#include 
-
-
-#definePAGEMAP_FNAME   "/proc/self/pagemap"
-
-/*
- * the pfn (page frame number) are bits 0-54 (see pagemap.txt in linux
- * Documentation).
- */
-#definePAGEMAP_PFN_BITS54
-#definePAGEMAP_PFN_MASKRTE_LEN2MASK(PAGEMAP_PFN_BITS, 
phys_addr_t)
-
-
-static int
-get_phys_map(void *va, phys_addr_t pa[], uint32_t pg_num, uint32_t pg_sz)
-{
-   int32_t fd, rc;
-   uint32_t i, nb;
-   off_t ofs;
-
-   ofs = (uintptr_t)va / pg_sz * sizeof(*pa);
-   nb = pg_num * sizeof(*pa);
-
-   if ((fd = open(PAGEMAP_FNAME, O_RDONLY)) < 0)
-   return ENOENT;
-
-   if ((rc = pread(fd, pa, nb, ofs)) < 0 || (rc -= nb) != 0) {
-
-   RTE_LOG(ERR, USER1, "failed read of %u bytes from \'%s\' "
-   "at offset %zu, error code: %d\n",
-   nb, PAGEMAP_FNAME, (size_t)ofs, errno);
-   rc = ENOENT;
-   }
-
-   close(fd);
-
-   for (i = 0; i != pg_num; i++)
-   pa[i] = (pa[i] & PAGEMAP_PFN_MASK) * pg_sz;
-
-   return rc;
-}
-
-struct rte_mempool *
-mempool_anon_create(const char *name, unsigned elt_num, unsigned elt_size,
-  unsigned cache_size, unsigned private_data_size,
-  rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-  int socket_id, unsigned flags)
-{
-   struct rte_mempool *mp;
-   phys_addr_t *pa;
-   char *va, *uv;
-   uint32_t n, pg_num, pg_shift, pg_sz, total_size;
-   size_t sz;
-   ssize_t usz;
-   int32_t rc;
-
-   rc = ENOMEM;
-   mp = NULL;
-
-   pg_sz = getpagesize();
-   if (rte_is_power_of_2(pg_sz) == 0) {
-   rte_errno = EINVAL;
-   return mp;
-   }
-
-   pg_shift = rte_bsf32(pg_sz);
-
-   total_size = rte_mempool_calc_obj_size(elt_size, flags, NULL);
-
-

[dpdk-dev] [PATCH 31/36] mempool: make mempool populate and free api public

2016-04-14 Thread Olivier Matz

Add the following functions to the public mempool API:

- rte_mempool_create_empty()
- rte_mempool_populate_phys()
- rte_mempool_populate_phys_tab()
- rte_mempool_populate_virt()
- rte_mempool_populate_default()
- rte_mempool_populate_anon()
- rte_mempool_free()

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c   |  14 +--
 lib/librte_mempool/rte_mempool.h   | 168 +
 lib/librte_mempool/rte_mempool_version.map |   9 +-
 3 files changed, 183 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 5c21f08..4850f5d 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -365,7 +365,7 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
 /* Add objects in the pool, using a physically contiguous memory
  * zone. Return the number of objects added, or a negative value
  * on error. */
-static int
+int
 rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
phys_addr_t paddr, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
void *opaque)
@@ -423,7 +423,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,

 /* Add objects in the pool, using a table of physical pages. Return the
  * number of objects added, or a negative value on error. */
-static int
+int
 rte_mempool_populate_phys_tab(struct rte_mempool *mp, char *vaddr,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
rte_mempool_memchunk_free_cb_t *free_cb, void *opaque)
@@ -458,7 +458,7 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, char 
*vaddr,

 /* Populate the mempool with a virtual area. Return the number of
  * objects added, or a negative value on error. */
-static int
+int
 rte_mempool_populate_virt(struct rte_mempool *mp, char *addr,
size_t len, size_t pg_sz, rte_mempool_memchunk_free_cb_t *free_cb,
void *opaque)
@@ -518,7 +518,7 @@ rte_mempool_populate_virt(struct rte_mempool *mp, char 
*addr,
 /* Default function to populate the mempool: allocate memory in memzones,
  * and populate them. Return the number of objects added, or a negative
  * value on error. */
-static int
+int
 rte_mempool_populate_default(struct rte_mempool *mp)
 {
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
@@ -609,7 +609,7 @@ rte_mempool_memchunk_anon_free(struct rte_mempool_memhdr 
*memhdr,
 }

 /* populate the mempool with an anonymous mapping */
-__rte_unused static int
+int
 rte_mempool_populate_anon(struct rte_mempool *mp)
 {
size_t size;
@@ -650,7 +650,7 @@ rte_mempool_populate_anon(struct rte_mempool *mp)
 }

 /* free a mempool */
-static void
+void
 rte_mempool_free(struct rte_mempool *mp)
 {
struct rte_mempool_list *mempool_list = NULL;
@@ -679,7 +679,7 @@ rte_mempool_free(struct rte_mempool *mp)
 }

 /* create an empty mempool */
-static struct rte_mempool *
+struct rte_mempool *
 rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
unsigned cache_size, unsigned private_data_size,
int socket_id, unsigned flags)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 721d8e7..fe4e6fd 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -502,6 +502,174 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift);

 /**
+ * Create an empty mempool
+ *
+ * The mempool is allocated and initialized, but it is not populated: no
+ * memory is allocated for the mempool elements. The user has to call
+ * rte_mempool_populate_*() or to add memory chunks to the pool. Once
+ * populated, the user may also want to initialize each object with
+ * rte_mempool_obj_iter().
+ *
+ * @param name
+ *   The name of the mempool.
+ * @param n
+ *   The maximum number of elements that can be added in the mempool.
+ *   The optimum size (in terms of memory usage) for a mempool is when n
+ *   is a power of two minus one: n = (2^q - 1).
+ * @param elt_size
+ *   The size of each element.
+ * @param cache_size
+ *   Size of the cache. See rte_mempool_create() for details.
+ * @param private_data_size
+ *   The size of the private data appended after the mempool
+ *   structure. This is useful for storing some private data after the
+ *   mempool structure, as is done for rte_mbuf_pool for example.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in the case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   Flags controlling the behavior of the mempool. See
+ *   rte_mempool_create() for details.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. See rte_mempool_create() for details.
+ */
+struct rte_mempool *

[dpdk-dev] [PATCH 30/36] mempool: populate a mempool with anonymous memory

2016-04-14 Thread Olivier Matz

Now that we can populate a mempool with any virtual memory,
it is easier to introduce a function to populate a mempool
with memory coming from an anonymous mapping, as it's done
in test-pmd.

The next commit will replace test-pmd anonymous mapping by
this function.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 64 
 1 file changed, 64 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index eaae5a0..5c21f08 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -585,6 +586,69 @@ rte_mempool_populate_default(struct rte_mempool *mp)
return ret;
 }

+/* return the memory size required for mempool objects in anonymous mem */
+static size_t
+get_anon_size(const struct rte_mempool *mp)
+{
+   size_t size, total_elt_sz, pg_sz, pg_shift;
+
+   pg_sz = getpagesize();
+   pg_shift = rte_bsf32(pg_sz);
+   total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+   size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift);
+
+   return size;
+}
+
+/* unmap a memory zone mapped by rte_mempool_populate_anon() */
+static void
+rte_mempool_memchunk_anon_free(struct rte_mempool_memhdr *memhdr,
+   void *opaque)
+{
+   munmap(opaque, get_anon_size(memhdr->mp));
+}
+
+/* populate the mempool with an anonymous mapping */
+__rte_unused static int
+rte_mempool_populate_anon(struct rte_mempool *mp)
+{
+   size_t size;
+   int ret;
+   char *addr;
+
+   /* mempool is already populated, error */
+   if (!STAILQ_EMPTY(>mem_list)) {
+   rte_errno = EINVAL;
+   return 0;
+   }
+
+   /* get chunk of virtually continuous memory */
+   size = get_anon_size(mp);
+   addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
+   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+   if (addr == MAP_FAILED) {
+   rte_errno = errno;
+   return 0;
+   }
+   /* can't use MMAP_LOCKED, it does not exist on BSD */
+   if (mlock(addr, size) < 0) {
+   rte_errno = errno;
+   munmap(addr, size);
+   return 0;
+   }
+
+   ret = rte_mempool_populate_virt(mp, addr, size, getpagesize(),
+   rte_mempool_memchunk_anon_free, addr);
+   if (ret == 0)
+   goto fail;
+
+   return mp->populated_size;
+
+ fail:
+   rte_mempool_free_memchunks(mp);
+   return 0;
+}
+
 /* free a mempool */
 static void
 rte_mempool_free(struct rte_mempool *mp)
-- 
2.1.4

[dpdk-dev] [PATCH 29/36] mempool: create the internal ring when populating

2016-04-14 Thread Olivier Matz

Instead of creating the internal ring at mempool creation, do
it when populating the mempool with the first memory chunk. The
objective here is to simplify the change of external handler
when it will be introduced.

For instance, this will be possible:

  mp = rte_mempool_create_empty(...)
  rte_mempool_set_ext_handler(mp, my_handler)
  rte_mempool_populate_default()

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 12 +---
 lib/librte_mempool/rte_mempool.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 5f9ec63..eaae5a0 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -324,6 +324,7 @@ rte_mempool_ring_create(struct rte_mempool *mp)
return -rte_errno;

mp->ring = r;
+   mp->flags |= MEMPOOL_F_RING_CREATED;
return 0;
 }

@@ -372,6 +373,14 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
unsigned i = 0;
size_t off;
struct rte_mempool_memhdr *memhdr;
+   int ret;
+
+   /* create the internal ring if not already done */
+   if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
+   ret = rte_mempool_ring_create(mp);
+   if (ret < 0)
+   return ret;
+   }

/* mempool is already populated */
if (mp->populated_size >= mp->size)
@@ -696,9 +705,6 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
STAILQ_INIT(>elt_list);
STAILQ_INIT(>mem_list);

-   if (rte_mempool_ring_create(mp) < 0)
-   goto exit_unlock;
-
/*
 * local_cache pointer is set even if cache_size is zero.
 * The local_cache points to just past the elt_pa[] array.
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 658d4a2..721d8e7 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -234,6 +234,7 @@ struct rte_mempool {
 #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache 
lines.*/
 #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
"single-producer".*/
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
+#define MEMPOOL_F_RING_CREATED   0x0010 /**< Internal: ring is created */

 /**
  * @internal When debug is enabled, store some statistics.
-- 
2.1.4

[dpdk-dev] [PATCH 28/36] mempool: rework support of xen dom0

2016-04-14 Thread Olivier Matz

Avoid to have a specific file for that, and remove #ifdefs.
Now that we have introduced a function to populate a mempool
with a virtual area, the support of xen dom0 is much easier.

The only thing we need to do is to convert the guest physical
address into the machine physical address using rte_mem_phy2mch().
This function does nothing when not running xen.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/Makefile|   3 -
 lib/librte_mempool/rte_dom0_mempool.c  | 133 -
 lib/librte_mempool/rte_mempool.c   |  33 ++-
 lib/librte_mempool/rte_mempool.h   |  89 ---
 lib/librte_mempool/rte_mempool_version.map |   1 -
 5 files changed, 5 insertions(+), 254 deletions(-)
 delete mode 100644 lib/librte_mempool/rte_dom0_mempool.c

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 706f844..43423e0 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -42,9 +42,6 @@ LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
-ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_dom0_mempool.c
-endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_dom0_mempool.c 
b/lib/librte_mempool/rte_dom0_mempool.c
deleted file mode 100644
index dad755c..000
--- a/lib/librte_mempool/rte_dom0_mempool.c
+++ /dev/null
@@ -1,133 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "rte_mempool.h"
-
-static void
-get_phys_map(void *va, phys_addr_t pa[], uint32_t pg_num,
-   uint32_t pg_sz, uint32_t memseg_id)
-{
-   uint32_t i;
-   uint64_t virt_addr, mfn_id;
-   struct rte_mem_config *mcfg;
-   uint32_t page_size = getpagesize();
-
-   /* get pointer to global configuration */
-   mcfg = rte_eal_get_configuration()->mem_config;
-   virt_addr = (uintptr_t) mcfg->memseg[memseg_id].addr;
-
-   for (i = 0; i != pg_num; i++) {
-   mfn_id = ((uintptr_t)va + i * pg_sz - virt_addr) / 
RTE_PGSIZE_2M;
-   pa[i] = mcfg->memseg[memseg_id].mfn[mfn_id] * page_size;
-   }
-}
-
-/* create the mempool for supporting Dom0 */
-struct rte_mempool *
-rte_dom0_mempool_create(const char *name, unsigned elt_num, unsigned elt_size,
-   unsigned cache_size, unsigned private_data_size,
-   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-   int socket_id, unsigned flags)
-{
-   struct rte_mempool *mp = NULL;
-   phys_addr_t *pa;
-   char *va;
-   size_t sz;
-   uint32_t pg_num, pg_shift, pg_sz, total_size;
-   const struct rte_memzone *mz;
-   char mz_name[RTE_MEMZONE_NAMESIZE];
-   int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
-
-   pg_sz = RTE_PGSIZE_2M;
-
-   pg_shift = rte_bsf32(pg_sz);
-   total_size = rte_mempool_calc_obj_size(elt_size, flags, NULL);
-
-   /*

[dpdk-dev] [PATCH 27/36] eal/xen: return machine address without knowing memseg id

2016-04-14 Thread Olivier Matz

The conversion from guest physical address to machine physical address
is fast when the caller knows the memseg corresponding to the gpa.

But in case the user does not know this information, just find it
by browsing the segments. This feature will be used by next commit.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/include/rte_memory.h   | 11 ++-
 lib/librte_eal/linuxapp/eal/eal_xen_memory.c | 17 +++--
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_memory.h 
b/lib/librte_eal/common/include/rte_memory.h
index f8dbece..0661109 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -200,21 +200,22 @@ unsigned rte_memory_get_nrank(void);
 int rte_xen_dom0_supported(void);

 /**< Internal use only - phys to virt mapping for xen */
-phys_addr_t rte_xen_mem_phy2mch(uint32_t, const phys_addr_t);
+phys_addr_t rte_xen_mem_phy2mch(int32_t, const phys_addr_t);

 /**
  * Return the physical address of elt, which is an element of the pool mp.
  *
  * @param memseg_id
- *   The mempool is from which memory segment.
+ *   Identifier of the memory segment owning the physical address. If
+ *   set to -1, find it automatically.
  * @param phy_addr
  *   physical address of elt.
  *
  * @return
- *   The physical address or error.
+ *   The physical address or RTE_BAD_PHYS_ADDR on error.
  */
 static inline phys_addr_t
-rte_mem_phy2mch(uint32_t memseg_id, const phys_addr_t phy_addr)
+rte_mem_phy2mch(int32_t memseg_id, const phys_addr_t phy_addr)
 {
if (rte_xen_dom0_supported())
return rte_xen_mem_phy2mch(memseg_id, phy_addr);
@@ -250,7 +251,7 @@ static inline int rte_xen_dom0_supported(void)
 }

 static inline phys_addr_t
-rte_mem_phy2mch(uint32_t memseg_id __rte_unused, const phys_addr_t phy_addr)
+rte_mem_phy2mch(int32_t memseg_id __rte_unused, const phys_addr_t phy_addr)
 {
return phy_addr;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_xen_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_xen_memory.c
index 495eef9..efbd374 100644
--- a/lib/librte_eal/linuxapp/eal/eal_xen_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_xen_memory.c
@@ -156,13 +156,26 @@ get_xen_memory_size(void)
  * Based on physical address to caculate MFN in Xen Dom0.
  */
 phys_addr_t
-rte_xen_mem_phy2mch(uint32_t memseg_id, const phys_addr_t phy_addr)
+rte_xen_mem_phy2mch(int32_t memseg_id, const phys_addr_t phy_addr)
 {
-   int mfn_id;
+   int mfn_id, i;
uint64_t mfn, mfn_offset;
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg *memseg = mcfg->memseg;

+   /* find the memory segment owning the physical address */
+   if (memseg_id == -1) {
+   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+   if ((phy_addr >= memseg[i].phys_addr) &&
+   (phys_addr < memseg[i].phys_addr + 
memseg[i].size)) {
+   memseg_id = i;
+   break;
+   }
+   }
+   if (memseg_id == -1)
+   return RTE_BAD_PHYS_ADDR;
+   }
+
mfn_id = (phy_addr - memseg[memseg_id].phys_addr) / RTE_PGSIZE_2M;

/*the MFN is contiguous in 2M */
-- 
2.1.4

[dpdk-dev] [PATCH 26/36] mempool: introduce a function to create an empty mempool

2016-04-14 Thread Olivier Matz

Introduce a new function rte_mempool_create_empty()
that allocates a mempool that is not populated.

The functions rte_mempool_create() and rte_mempool_xmem_create()
now make use of it, making their code much easier to read.
Currently, they are the only users of rte_mempool_create_empty()
but the function will be made public in next commits.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 185 ++-
 1 file changed, 107 insertions(+), 78 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b432aae..03d506a 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -318,30 +318,6 @@ rte_dom0_mempool_create(const char *name __rte_unused,
 }
 #endif

-/* create the mempool */
-struct rte_mempool *
-rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
-  unsigned cache_size, unsigned private_data_size,
-  rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
-  int socket_id, unsigned flags)
-{
-   if (rte_xen_dom0_supported())
-   return rte_dom0_mempool_create(name, n, elt_size,
-  cache_size, private_data_size,
-  mp_init, mp_init_arg,
-  obj_init, obj_init_arg,
-  socket_id, flags);
-   else
-   return rte_mempool_xmem_create(name, n, elt_size,
-  cache_size, private_data_size,
-  mp_init, mp_init_arg,
-  obj_init, obj_init_arg,
-  socket_id, flags,
-  NULL, NULL, 
MEMPOOL_PG_NUM_DEFAULT,
-  MEMPOOL_PG_SHIFT_MAX);
-}
-
 /* create the internal ring */
 static int
 rte_mempool_ring_create(struct rte_mempool *mp)
@@ -645,20 +621,11 @@ rte_mempool_free(struct rte_mempool *mp)
rte_memzone_free(mp->mz);
 }

-/*
- * Create the mempool over already allocated chunk of memory.
- * That external memory buffer can consists of physically disjoint pages.
- * Setting vaddr to NULL, makes mempool to fallback to original behaviour
- * and allocate space for mempool and it's elements as one big chunk of
- * physically continuos memory.
- * */
-struct rte_mempool *
-rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
-   unsigned cache_size, unsigned private_data_size,
-   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-   int socket_id, unsigned flags, void *vaddr,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
+/* create an empty mempool */
+static struct rte_mempool *
+rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
+   unsigned cache_size, unsigned private_data_size,
+   int socket_id, unsigned flags)
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_mempool_list *mempool_list;
@@ -668,7 +635,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
size_t mempool_size;
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
struct rte_mempool_objsz objsz;
-   int ret;

/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
@@ -691,18 +657,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
return NULL;
}

-   /* check that we have both VA and PA */
-   if (vaddr != NULL && paddr == NULL) {
-   rte_errno = EINVAL;
-   return NULL;
-   }
-
-   /* Check that pg_num and pg_shift parameters are valid. */
-   if (pg_num == 0 || pg_shift > MEMPOOL_PG_SHIFT_MAX) {
-   rte_errno = EINVAL;
-   return NULL;
-   }
-
/* "no cache align" imply "no spread" */
if (flags & MEMPOOL_F_NO_CACHE_ALIGN)
flags |= MEMPOOL_F_NO_SPREAD;
@@ -730,11 +684,6 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
goto exit_unlock;
}

-   /*
-* If user provided an external memory buffer, then use it to
-* store mempool objects. Otherwise reserve a memzone that is large
-* enough to hold mempool header and metadata plus mempool objects.
-*/
mempool_size = MEMPOOL_HEADER_SIZE(mp, cache_size);
mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
@@ -746,12 +695,14 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned

[dpdk-dev] [PATCH 25/36] mempool: introduce a function to free a mempool

2016-04-14 Thread Olivier Matz

Introduce rte_mempool_free() that:

- unlink the mempool from the global list if it is found
- free all the memory chunks using their free callbacks
- free the internal ring
- free the memzone containing the mempool

Currently this function is only used in error cases when
creating a new mempool, but it will be made public later
in the patch series.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7336616..b432aae 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -616,6 +616,35 @@ rte_mempool_populate_default(struct rte_mempool *mp)
return ret;
 }

+/* free a mempool */
+static void
+rte_mempool_free(struct rte_mempool *mp)
+{
+   struct rte_mempool_list *mempool_list = NULL;
+   struct rte_tailq_entry *te;
+
+   if (mp == NULL)
+   return;
+
+   mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, mempool_list, next) {
+   if (te->data == (void *)mp)
+   break;
+   }
+
+   if (te != NULL) {
+   TAILQ_REMOVE(mempool_list, te, next);
+   rte_free(te);
+   }
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+   rte_mempool_free_memchunks(mp);
+   rte_ring_free(mp->ring);
+   rte_memzone_free(mp->mz);
+}
+
 /*
  * Create the mempool over already allocated chunk of memory.
  * That external memory buffer can consists of physically disjoint pages.
@@ -775,13 +804,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

 exit_unlock:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
-   if (mp != NULL) {
-   rte_mempool_free_memchunks(mp);
-   rte_ring_free(mp->ring);
-   }
-   rte_free(te);
-   if (mz != NULL)
-   rte_memzone_free(mz);
+   rte_mempool_free(mp);

return NULL;
 }
-- 
2.1.4

[dpdk-dev] [PATCH 24/36] mempool: replace mempool physaddr by a memzone pointer

2016-04-14 Thread Olivier Matz

Storing the pointer to the memzone instead of the physical address
provides more information than just the physical address: for instance,
the memzone flags.

Moreover, keeping the memzone pointer will allow us to free the mempool
(this is done later in the series).

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 4 ++--
 lib/librte_mempool/rte_mempool.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f2ab2..7336616 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -719,7 +719,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
/* init the mempool structure */
memset(mp, 0, sizeof(*mp));
snprintf(mp->name, sizeof(mp->name), "%s", name);
-   mp->phys_addr = mz->phys_addr;
+   mp->mz = mz;
mp->socket_id = socket_id;
mp->size = n;
mp->flags = flags;
@@ -983,7 +983,7 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
fprintf(f, "mempool <%s>@%p\n", mp->name, mp);
fprintf(f, "  flags=%x\n", mp->flags);
fprintf(f, "  ring=<%s>@%p\n", mp->ring->name, mp->ring);
-   fprintf(f, "  phys_addr=0x%" PRIx64 "\n", mp->phys_addr);
+   fprintf(f, "  phys_addr=0x%" PRIx64 "\n", mp->mz->phys_addr);
fprintf(f, "  nb_mem_chunks=%u\n", mp->nb_mem_chunks);
fprintf(f, "  size=%"PRIu32"\n", mp->size);
fprintf(f, "  populated_size=%"PRIu32"\n", mp->populated_size);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 82b0334..4a8c76b 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -203,7 +203,7 @@ struct rte_mempool_memhdr {
 struct rte_mempool {
char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
struct rte_ring *ring;   /**< Ring to store objects. */
-   phys_addr_t phys_addr;   /**< Phys. addr. of mempool struct. */
+   const struct rte_memzone *mz;/**< Memzone where mempool is 
allocated */
int flags;   /**< Flags of the mempool. */
int socket_id;   /**< Socket id passed at mempool 
creation. */
uint32_t size;   /**< Max size of the mempool. */
-- 
2.1.4

[dpdk-dev] [PATCH 23/36] mempool: support no-hugepage mode

2016-04-14 Thread Olivier Matz

Introduce a new function rte_mempool_populate_virt() that is now called
by default when hugepages are not supported. This function populate the
mempool with several physically contiguous chunks whose minimum size is
the page size of the system.

Thanks to this, rte_mempool_create() will work properly in without
hugepages (if the object size is smaller than a page size), and 2
specific workarouds can be removed:

- trailer_size was artificially extended to a page size
- rte_mempool_virt2phy() did not rely on object physical address

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 106 ++-
 lib/librte_mempool/rte_mempool.h |  17 ++-
 2 files changed, 85 insertions(+), 38 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 5b21d0a..54f2ab2 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -222,23 +222,6 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,
sz->trailer_size = new_size - sz->header_size - sz->elt_size;
}

-   if (! rte_eal_has_hugepages()) {
-   /*
-* compute trailer size so that pool elements fit exactly in
-* a standard page
-*/
-   int page_size = getpagesize();
-   int new_size = page_size - sz->header_size - sz->elt_size;
-   if (new_size < 0 || (unsigned int)new_size < sz->trailer_size) {
-   printf("When hugepages are disabled, pool objects "
-  "can't exceed PAGE_SIZE: %d + %d + %d > %d\n",
-  sz->header_size, sz->elt_size, sz->trailer_size,
-  page_size);
-   return 0;
-   }
-   sz->trailer_size = new_size;
-   }
-
/* this is the size of an object, including header and trailer */
sz->total_size = sz->header_size + sz->elt_size + sz->trailer_size;

@@ -507,15 +490,72 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, 
char *vaddr,
return cnt;
 }

-/* Default function to populate the mempool: allocate memory in mezones,
+/* Populate the mempool with a virtual area. Return the number of
+ * objects added, or a negative value on error. */
+static int
+rte_mempool_populate_virt(struct rte_mempool *mp, char *addr,
+   size_t len, size_t pg_sz, rte_mempool_memchunk_free_cb_t *free_cb,
+   void *opaque)
+{
+   phys_addr_t paddr;
+   size_t off, phys_len;
+   int ret, cnt = 0;
+
+   /* mempool must not be populated */
+   if (mp->nb_mem_chunks != 0)
+   return -EEXIST;
+   /* address and len must be page-aligned */
+   if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr)
+   return -EINVAL;
+   if (RTE_ALIGN_CEIL(len, pg_sz) != len)
+   return -EINVAL;
+
+   for (off = 0; off + pg_sz <= len &&
+mp->populated_size < mp->size; off += phys_len) {
+
+   paddr = rte_mem_virt2phy(addr + off);
+   if (paddr == RTE_BAD_PHYS_ADDR) {
+   ret = -EINVAL;
+   goto fail;
+   }
+
+   /* populate with the largest group of contiguous pages */
+   for (phys_len = pg_sz; off + phys_len < len; phys_len += pg_sz) 
{
+   phys_addr_t paddr_tmp;
+
+   paddr_tmp = rte_mem_virt2phy(addr + off + phys_len);
+   paddr_tmp = rte_mem_phy2mch(-1, paddr_tmp);
+
+   if (paddr_tmp != paddr + phys_len)
+   break;
+   }
+
+   ret = rte_mempool_populate_phys(mp, addr + off, paddr,
+   phys_len, free_cb, opaque);
+   if (ret < 0)
+   goto fail;
+   /* no need to call the free callback for next chunks */
+   free_cb = NULL;
+   cnt += ret;
+   }
+
+   return cnt;
+
+ fail:
+   rte_mempool_free_memchunks(mp);
+   return ret;
+}
+
+/* Default function to populate the mempool: allocate memory in memzones,
  * and populate them. Return the number of objects added, or a negative
  * value on error. */
-static int rte_mempool_populate_default(struct rte_mempool *mp)
+static int
+rte_mempool_populate_default(struct rte_mempool *mp)
 {
int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz;
-   size_t size, total_elt_sz, align;
+   size_t size, total_elt_sz, align, pg_sz, pg_shift;
unsigned mz_id, n;
int ret;

@@ -523,10 +563,19 @@ static int rte_mempool_populate_default(struct 
rte_mempool *mp)
if (mp->nb_mem_chunks != 0)
return -EEXIST;

-   align = RTE_CACHE_LINE_SIZE;
+   if (rte_eal_has_hugepages()) {
+

[dpdk-dev] [PATCH 22/36] eal: lock memory when using no-huge

2016-04-14 Thread Olivier Matz

Although the physical address won't be correct in memory segment,
this allows at least to retrieve the physical address using
rte_mem_virt2phy(). Indeed, if the page is not locked, the page
may not be present in physical memory.

With next commit, it allows a mempool to have properly filled physical
addresses when using --no-huge option.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5b9132c..79d1d2d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1074,7 +1074,7 @@ rte_eal_hugepage_init(void)
/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
addr = mmap(NULL, internal_config.memory, PROT_READ | 
PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   MAP_LOCKED | MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
-- 
2.1.4

[dpdk-dev] [PATCH 21/36] mempool: default allocation in several memory chunks

2016-04-14 Thread Olivier Matz

Introduce rte_mempool_populate_default() which allocates
mempool objects in several memzones.

The mempool header is now always allocated in a specific memzone
(not with its objects). Thanks to this modification, we can remove
many specific behavior that was required when hugepages are not
enabled in case we are using rte_mempool_xmem_create().

This change requires to update how kni and mellanox drivers lookup for
mbuf memory. For now, this will only work if there is only one memory
chunk (like today), but we could make use of rte_mempool_mem_iter() to
support more memory chunks.

We can also remove RTE_MEMPOOL_OBJ_NAME that is not required anymore for
the lookup, as memory chunks are referenced by the mempool.

Note that rte_mempool_create() is still broken (it was the case before)
when there is no hugepages support (rte_mempool_create_xmem() has to be
used). This is fixed in next commit.

Signed-off-by: Olivier Matz 
(cherry picked from commit e2ccba488aec7bfa5f06c12b4f7b771134255296)
Signed-off-by: Olivier Matz 
---
 drivers/net/mlx4/mlx4.c   |  87 ++---
 drivers/net/mlx5/mlx5_rxtx.c  |  87 ++---
 drivers/net/mlx5/mlx5_rxtx.h  |   2 +-
 lib/librte_kni/rte_kni.c  |  12 +++-
 lib/librte_mempool/rte_dom0_mempool.c |   2 +-
 lib/librte_mempool/rte_mempool.c  | 119 +++---
 lib/librte_mempool/rte_mempool.h  |  11 
 7 files changed, 233 insertions(+), 87 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 089bbec..c8481a7 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1198,8 +1198,71 @@ txq_complete(struct txq *txq)
return 0;
 }

+struct mlx4_check_mempool_data {
+   int ret;
+   char *start;
+   char *end;
+};
+
+/* Called by mlx4_check_mempool() when iterating the memory chunks. */
+static void mlx4_check_mempool_cb(struct rte_mempool *mp,
+   void *opaque, struct rte_mempool_memhdr *memhdr,
+   unsigned mem_idx)
+{
+   struct mlx4_check_mempool_data *data = opaque;
+
+   (void)mp;
+   (void)mem_idx;
+
+   /* It already failed, skip the next chunks. */
+   if (data->ret != 0)
+   return;
+   /* It is the first chunk. */
+   if (data->start == NULL && data->end == NULL) {
+   data->start = memhdr->addr;
+   data->end = data->start + memhdr->len;
+   return;
+   }
+   if (data->end == memhdr->addr) {
+   data->end += memhdr->len;
+   return;
+   }
+   if (data->start == (char *)memhdr->addr + memhdr->len) {
+   data->start -= memhdr->len;
+   return;
+   }
+   /* Error, mempool is not virtually contigous. */
+   data->ret = -1;
+}
+
+/**
+ * Check if a mempool can be used: it must be virtually contiguous.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool.
+ * @param[out] start
+ *   Pointer to the start address of the mempool virtual memory area
+ * @param[out] end
+ *   Pointer to the end address of the mempool virtual memory area
+ *
+ * @return
+ *   0 on success (mempool is virtually contiguous), -1 on error.
+ */
+static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
+   uintptr_t *end)
+{
+   struct mlx4_check_mempool_data data;
+
+   memset(, 0, sizeof(data));
+   rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, );
+   *start = (uintptr_t)data.start;
+   *end = (uintptr_t)data.end;
+
+   return data.ret;
+}
+
 /* For best performance, this function should not be inlined. */
-static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, const struct rte_mempool *)
+static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
__attribute__((noinline));

 /**
@@ -1214,15 +1277,21 @@ static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, const 
struct rte_mempool *)
  *   Memory region pointer, NULL in case of error.
  */
 static struct ibv_mr *
-mlx4_mp2mr(struct ibv_pd *pd, const struct rte_mempool *mp)
+mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 {
const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   uintptr_t start = mp->elt_va_start;
-   uintptr_t end = mp->elt_va_end;
+   uintptr_t start;
+   uintptr_t end;
unsigned int i;

+   if (mlx4_check_mempool(mp, , ) != 0) {
+   ERROR("mempool %p: not virtually contiguous",
+   (void *)mp);
+   return NULL;
+   }
+
DEBUG("mempool %p area start=%p end=%p size=%zu",
- (const void *)mp, (void *)start, (void *)end,
+ (void *)mp, (void *)start, (void *)end,
  (size_t)(end - start));
/* Round start and end to page boundary if found in memory segments. */
for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
@@ -1236,7 +1305,7 @@ mlx4_mp2mr(struct ibv_pd *pd, const struct rte_mempool 
*mp)

[dpdk-dev] [PATCH 20/36] mempool: make page size optional when getting xmem size

2016-04-14 Thread Olivier Matz

Update rte_mempool_xmem_size() so that when the page_shift argument is
set to 0, assume that memory is physically contiguous, allowing to
ignore page boundaries. This will be used in the next commits.

By the way, rename the variable 'n' as 'obj_per_page' and avoid the
affectation inside the if().

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 18 +-
 lib/librte_mempool/rte_mempool.h |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 0ae899b..edf26ae 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -252,18 +252,18 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift)
 {
-   size_t n, pg_num, pg_sz, sz;
+   size_t obj_per_page, pg_num, pg_sz;

-   pg_sz = (size_t)1 << pg_shift;
+   if (pg_shift == 0)
+   return total_elt_sz * elt_num;

-   if ((n = pg_sz / total_elt_sz) > 0) {
-   pg_num = (elt_num + n - 1) / n;
-   sz = pg_num << pg_shift;
-   } else {
-   sz = RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;
-   }
+   pg_sz = (size_t)1 << pg_shift;
+   obj_per_page = pg_sz / total_elt_sz;
+   if (obj_per_page == 0)
+   return RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;

-   return sz;
+   pg_num = (elt_num + obj_per_page - 1) / obj_per_page;
+   return pg_num << pg_shift;
 }

 /*
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e06ccfc..38e5abd 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -1257,7 +1257,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, 
uint32_t flags,
  *   The size of each element, including header and trailer, as returned
  *   by rte_mempool_calc_obj_size().
  * @param pg_shift
- *   LOG2 of the physical pages size.
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
  * @return
  *   Required memory size aligned at page boundary.
  */
-- 
2.1.4

[dpdk-dev] [PATCH 19/36] mempool: introduce a free callback for memory chunks

2016-04-14 Thread Olivier Matz

Introduce a free callback that is passed to the populate* functions,
which is used when freeing a mempool. This is unused now, but as next
commits will populate the mempool with several chunks of memory, we
need a way to free them properly on error.

Later in the series, we will also introduce a public rte_mempool_free()
and the ability for the user to populate a mempool with its own memory.
For that, we also need a free callback.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 27 ++-
 lib/librte_mempool/rte_mempool.h |  8 
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index f2f7846..0ae899b 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -388,6 +388,15 @@ rte_mempool_ring_create(struct rte_mempool *mp)
return 0;
 }

+/* free a memchunk allocated with rte_memzone_reserve() */
+__rte_unused static void
+rte_mempool_memchunk_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+   void *opaque)
+{
+   const struct rte_memzone *mz = opaque;
+   rte_memzone_free(mz);
+}
+
 /* Free memory chunks used by a mempool. Objects must be in pool */
 static void
 rte_mempool_free_memchunks(struct rte_mempool *mp)
@@ -405,6 +414,8 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
while (!STAILQ_EMPTY(>mem_list)) {
memhdr = STAILQ_FIRST(>mem_list);
STAILQ_REMOVE_HEAD(>mem_list, next);
+   if (memhdr->free_cb != NULL)
+   memhdr->free_cb(memhdr, memhdr->opaque);
rte_free(memhdr);
mp->nb_mem_chunks--;
}
@@ -415,7 +426,8 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
  * on error. */
 static int
 rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
-   phys_addr_t paddr, size_t len)
+   phys_addr_t paddr, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
+   void *opaque)
 {
unsigned total_elt_sz;
unsigned i = 0;
@@ -436,6 +448,8 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
memhdr->addr = vaddr;
memhdr->phys_addr = paddr;
memhdr->len = len;
+   memhdr->free_cb = free_cb;
+   memhdr->opaque = opaque;

if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
@@ -462,7 +476,8 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
  * number of objects added, or a negative value on error. */
 static int
 rte_mempool_populate_phys_tab(struct rte_mempool *mp, char *vaddr,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
+   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
+   rte_mempool_memchunk_free_cb_t *free_cb, void *opaque)
 {
uint32_t i, n;
int ret, cnt = 0;
@@ -480,11 +495,13 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, 
char *vaddr,
;

ret = rte_mempool_populate_phys(mp, vaddr + i * pg_sz,
-   paddr[i], n * pg_sz);
+   paddr[i], n * pg_sz, free_cb, opaque);
if (ret < 0) {
rte_mempool_free_memchunks(mp);
return ret;
}
+   /* no need to call the free callback for next chunks */
+   free_cb = NULL;
cnt += ret;
}
return cnt;
@@ -666,12 +683,12 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

ret = rte_mempool_populate_phys(mp, obj,
mp->phys_addr + ((char *)obj - (char *)mp),
-   objsz.total_size * n);
+   objsz.total_size * n, NULL, NULL);
if (ret != (int)mp->size)
goto exit_unlock;
} else {
ret = rte_mempool_populate_phys_tab(mp, vaddr,
-   paddr, pg_num, pg_shift);
+   paddr, pg_num, pg_shift, NULL, NULL);
if (ret != (int)mp->size)
goto exit_unlock;
}
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0e4641e..e06ccfc 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -187,6 +187,12 @@ struct rte_mempool_objtlr {
 STAILQ_HEAD(rte_mempool_memhdr_list, rte_mempool_memhdr);

 /**
+ * Callback used to free a memory chunk
+ */
+typedef void (rte_mempool_memchunk_free_cb_t)(struct rte_mempool_memhdr 
*memhdr,
+   void *opaque);
+
+/**
  * Mempool objects memory header structure
  *
  * The memory chunks where objects are stored. Each chunk is virtually
@@ -198,6 +204,8 @@ struct rte_mempool_memhdr {
void *addr;  /**< Virtual address of the chunk */
phys_addr_t phys_addr;   /**< Physical address of the chunk */

[dpdk-dev] [PATCH 18/36] mempool: simplify xmem_usage

2016-04-14 Thread Olivier Matz

Since previous commit, the function rte_mempool_xmem_usage() is
now the last user of rte_mempool_obj_mem_iter(). This complex
code can now be moved inside the function. We can get rid of the
callback and do some simplification to make the code more readable.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 138 +++
 1 file changed, 37 insertions(+), 101 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 3e9d686..f2f7846 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -126,15 +126,6 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
 }

-/**
- * A mempool object iterator callback function.
- */
-typedef void (*rte_mempool_obj_iter_t)(void * /*obj_iter_arg*/,
-   void * /*obj_start*/,
-   void * /*obj_end*/,
-   uint32_t /*obj_index */,
-   phys_addr_t /*physaddr*/);
-
 static void
 mempool_add_elem(struct rte_mempool *mp, void *obj, phys_addr_t physaddr)
 {
@@ -158,74 +149,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
rte_ring_sp_enqueue(mp->ring, obj);
 }

-/* Iterate through objects at the given address
- *
- * Given the pointer to the memory, and its topology in physical memory
- * (the physical addresses table), iterate through the "elt_num" objects
- * of size "elt_sz" aligned at "align". For each object in this memory
- * chunk, invoke a callback. It returns the effective number of objects
- * in this memory. */
-static uint32_t
-rte_mempool_obj_mem_iter(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
-   size_t align, const phys_addr_t paddr[], uint32_t pg_num,
-   uint32_t pg_shift, rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
-{
-   uint32_t i, j, k;
-   uint32_t pgn, pgf;
-   uintptr_t end, start, va;
-   uintptr_t pg_sz;
-   phys_addr_t physaddr;
-
-   pg_sz = (uintptr_t)1 << pg_shift;
-   va = (uintptr_t)vaddr;
-
-   i = 0;
-   j = 0;
-
-   while (i != elt_num && j != pg_num) {
-
-   start = RTE_ALIGN_CEIL(va, align);
-   end = start + total_elt_sz;
-
-   /* index of the first page for the next element. */
-   pgf = (end >> pg_shift) - (start >> pg_shift);
-
-   /* index of the last page for the current element. */
-   pgn = ((end - 1) >> pg_shift) - (start >> pg_shift);
-   pgn += j;
-
-   /* do we have enough space left for the element. */
-   if (pgn >= pg_num)
-   break;
-
-   for (k = j;
-   k != pgn &&
-   paddr[k] + pg_sz == paddr[k + 1];
-   k++)
-   ;
-
-   /*
-* if next pgn chunks of memory physically continuous,
-* use it to create next element.
-* otherwise, just skip that chunk unused.
-*/
-   if (k == pgn) {
-   physaddr = paddr[k] + (start & (pg_sz - 1));
-   if (obj_iter != NULL)
-   obj_iter(obj_iter_arg, (void *)start,
-   (void *)end, i, physaddr);
-   va = end;
-   j += pgf;
-   i++;
-   } else {
-   va = RTE_ALIGN_CEIL((va + 1), pg_sz);
-   j++;
-   }
-   }
-
-   return i;
-}
-
 /* call obj_cb() for each mempool element */
 uint32_t
 rte_mempool_obj_iter(struct rte_mempool *mp,
@@ -343,40 +266,53 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t 
total_elt_sz, uint32_t pg_shift)
return sz;
 }

-/* Callback used by rte_mempool_xmem_usage(): it sets the opaque
- * argument to the end of the object. */
-static void
-mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
-   __rte_unused uint32_t idx, __rte_unused phys_addr_t physaddr)
-{
-   *(uintptr_t *)arg = (uintptr_t)end;
-}
-
 /*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
 ssize_t
-rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
+rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
+   size_t total_elt_sz, const phys_addr_t paddr[], uint32_t pg_num,
+   uint32_t pg_shift)
 {
-   uint32_t n;
-   uintptr_t va, uv;
-   size_t pg_sz, usz;
+   uint32_t elt_cnt = 0;
+   phys_addr_t start, end;
+   uint32_t paddr_idx;
+   size_t pg_sz = (size_t)1 << pg_shift;

-   pg_sz = (size_t)1 << pg_shift;
-   va = (uintptr_t)vaddr;
-   uv = va;
+   /* if paddr is NULL, assume contiguous

[dpdk-dev] [PATCH 17/36] mempool: new function to iterate the memory chunks

2016-04-14 Thread Olivier Matz

In the same model than rte_mempool_obj_iter(), introduce
rte_mempool_mem_iter() to iterate the memory chunks attached
to the mempool.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c   | 16 
 lib/librte_mempool/rte_mempool.h   | 27 +++
 lib/librte_mempool/rte_mempool_version.map |  1 +
 3 files changed, 44 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 9e3cfde..3e9d686 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -244,6 +244,22 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
return n;
 }

+/* call mem_cb() for each mempool memory chunk */
+uint32_t
+rte_mempool_mem_iter(struct rte_mempool *mp,
+   rte_mempool_mem_cb_t *mem_cb, void *mem_cb_arg)
+{
+   struct rte_mempool_memhdr *hdr;
+   unsigned n = 0;
+
+   STAILQ_FOREACH(hdr, >mem_list, next) {
+   mem_cb(mp, mem_cb_arg, hdr, n);
+   n++;
+   }
+
+   return n;
+}
+
 /* get the header, trailer and total size of a mempool element. */
 uint32_t
 rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 7011a18..0e4641e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -332,6 +332,15 @@ typedef void (rte_mempool_obj_cb_t)(struct rte_mempool *mp,
 typedef rte_mempool_obj_cb_t rte_mempool_obj_ctor_t; /* compat */

 /**
+ * A memory callback function for mempool.
+ *
+ * Used by rte_mempool_mem_iter().
+ */
+typedef void (rte_mempool_mem_cb_t)(struct rte_mempool *mp,
+   void *opaque, struct rte_mempool_memhdr *memhdr,
+   unsigned mem_idx);
+
+/**
  * A mempool constructor callback function.
  *
  * Arguments are the mempool and the opaque pointer given by the user in
@@ -602,6 +611,24 @@ uint32_t rte_mempool_obj_iter(struct rte_mempool *mp,
rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg);

 /**
+ * Call a function for each mempool memory chunk
+ *
+ * Iterate across all memory chunks attached to a rte_mempool and call
+ * the callback function on it.
+ *
+ * @param mp
+ *   A pointer to an initialized mempool.
+ * @param mem_cb
+ *   A function pointer that is called for each memory chunk.
+ * @param mem_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   Number of memory chunks iterated.
+ */
+uint32_t rte_mempool_mem_iter(struct rte_mempool *mp,
+   rte_mempool_mem_cb_t *mem_cb, void *mem_cb_arg);
+
+/**
  * Dump the status of the mempool to the console.
  *
  * @param f
diff --git a/lib/librte_mempool/rte_mempool_version.map 
b/lib/librte_mempool/rte_mempool_version.map
index 4db75ca..ca887b5 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -21,6 +21,7 @@ DPDK_16.07 {
global:

rte_mempool_obj_iter;
+   rte_mempool_mem_iter;

local: *;
 } DPDK_2.0;
-- 
2.1.4

[dpdk-dev] [PATCH 16/36] mempool: store memory chunks in a list

2016-04-14 Thread Olivier Matz

Do not use paddr table to store the mempool memory chunks.
This will allow to have several chunks with different virtual addresses.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c  |   2 +-
 lib/librte_mempool/rte_mempool.c | 205 ++-
 lib/librte_mempool/rte_mempool.h |  51 +-
 3 files changed, 165 insertions(+), 93 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 2f317f2..2bc3ac0 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -123,7 +123,7 @@ test_mempool_basic(void)

printf("get private data\n");
if (rte_mempool_get_priv(mp) != (char *)mp +
-   MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
+   MEMPOOL_HEADER_SIZE(mp, mp->cache_size))
return -1;

 #ifndef RTE_EXEC_ENV_BSDAPP /* rte_mem_virt2phy() not supported on bsd */
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b8e46fc..9e3cfde 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -141,14 +141,12 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;

-   obj = (char *)obj + mp->header_size;
-   physaddr += mp->header_size;
-
/* set mempool ptr in header */
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
hdr->mp = mp;
hdr->physaddr = physaddr;
STAILQ_INSERT_TAIL(>elt_list, hdr, next);
+   mp->populated_size++;

 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
@@ -246,33 +244,6 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
return n;
 }

-/*
- * Populate  mempool with the objects.
- */
-
-static void
-mempool_obj_populate(void *arg, void *start, void *end,
-   __rte_unused uint32_t idx, phys_addr_t physaddr)
-{
-   struct rte_mempool *mp = arg;
-
-   mempool_add_elem(mp, start, physaddr);
-   mp->elt_va_end = (uintptr_t)end;
-}
-
-static void
-mempool_populate(struct rte_mempool *mp, size_t num, size_t align)
-{
-   uint32_t elt_sz;
-
-   elt_sz = mp->elt_size + mp->header_size + mp->trailer_size;
-
-   mp->size = rte_mempool_obj_mem_iter((void *)mp->elt_va_start,
-   num, elt_sz, align,
-   mp->elt_pa, mp->pg_num, mp->pg_shift,
-   mempool_obj_populate, mp);
-}
-
 /* get the header, trailer and total size of a mempool element. */
 uint32_t
 rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
@@ -465,6 +436,108 @@ rte_mempool_ring_create(struct rte_mempool *mp)
return 0;
 }

+/* Free memory chunks used by a mempool. Objects must be in pool */
+static void
+rte_mempool_free_memchunks(struct rte_mempool *mp)
+{
+   struct rte_mempool_memhdr *memhdr;
+   void *elt;
+
+   while (!STAILQ_EMPTY(>elt_list)) {
+   rte_ring_sc_dequeue(mp->ring, );
+   (void)elt;
+   STAILQ_REMOVE_HEAD(>elt_list, next);
+   mp->populated_size--;
+   }
+
+   while (!STAILQ_EMPTY(>mem_list)) {
+   memhdr = STAILQ_FIRST(>mem_list);
+   STAILQ_REMOVE_HEAD(>mem_list, next);
+   rte_free(memhdr);
+   mp->nb_mem_chunks--;
+   }
+}
+
+/* Add objects in the pool, using a physically contiguous memory
+ * zone. Return the number of objects added, or a negative value
+ * on error. */
+static int
+rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
+   phys_addr_t paddr, size_t len)
+{
+   unsigned total_elt_sz;
+   unsigned i = 0;
+   size_t off;
+   struct rte_mempool_memhdr *memhdr;
+
+   /* mempool is already populated */
+   if (mp->populated_size >= mp->size)
+   return -ENOSPC;
+
+   total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+   memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
+   if (memhdr == NULL)
+   return -ENOMEM;
+
+   memhdr->mp = mp;
+   memhdr->addr = vaddr;
+   memhdr->phys_addr = paddr;
+   memhdr->len = len;
+
+   if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+   off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
+   else
+   off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
+
+   while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
+   off += mp->header_size;
+   mempool_add_elem(mp, (char *)vaddr + off, paddr + off);
+   off += mp->elt_size + mp->trailer_size;
+   i++;
+   }
+
+   /* not enough room to store one object */
+   if (i == 0)
+   return -EINVAL;
+
+   STAILQ_INSERT_TAIL(>mem_list, memhdr, next);
+   mp->nb_mem_chunks++;
+   return i;
+}
+
+/* Add objects in the pool, using a table of physical pages. Return the
+ * number of objects

[dpdk-dev] [PATCH 15/36] mempool: remove MEMPOOL_IS_CONTIG()

2016-04-14 Thread Olivier Matz

The next commits will change the behavior of the mempool library so that
the objects will never be allocated in the same memzone than the mempool
header. Therefore, there is no reason to keep this macro that would
always return 0.

This macro was only used in app/test.

Signed-off-by: Olivier Matz 
---
 app/test/test_mempool.c  | 7 +++
 lib/librte_mempool/rte_mempool.h | 7 ---
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 10e1fa4..2f317f2 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -126,12 +126,11 @@ test_mempool_basic(void)
MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
return -1;

+#ifndef RTE_EXEC_ENV_BSDAPP /* rte_mem_virt2phy() not supported on bsd */
printf("get physical address of an object\n");
-   if (MEMPOOL_IS_CONTIG(mp) &&
-   rte_mempool_virt2phy(mp, obj) !=
-   (phys_addr_t) (mp->phys_addr +
-   (phys_addr_t) ((char*) obj - (char*) mp)))
+   if (rte_mempool_virt2phy(mp, obj) != rte_mem_virt2phy(obj))
return -1;
+#endif

printf("put the object back\n");
rte_mempool_put(mp, obj);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 00ca087..74cecd6 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -271,13 +271,6 @@ struct rte_mempool {
(sizeof(*(mp)) + __PA_SIZE(mp, pgn) + (((cs) == 0) ? 0 : \
(sizeof(struct rte_mempool_cache) * RTE_MAX_LCORE)))

-/**
- * Return true if the whole mempool is in contiguous memory.
- */
-#defineMEMPOOL_IS_CONTIG(mp)  \
-   ((mp)->pg_num == MEMPOOL_PG_NUM_DEFAULT && \
-   (mp)->phys_addr == (mp)->elt_pa[0])
-
 /* return the header of a mempool object (internal) */
 static inline struct rte_mempool_objhdr *__mempool_get_header(void *obj)
 {
-- 
2.1.4

[dpdk-dev] [PATCH 14/36] mempool: store physaddr in mempool objects

2016-04-14 Thread Olivier Matz

Store the physical address of the object in its header. It simplifies
rte_mempool_virt2phy() and prepares the removing of the paddr[] table
in the mempool header.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 17 +++--
 lib/librte_mempool/rte_mempool.h | 11 ++-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 839b828..b8e46fc 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -132,19 +132,22 @@ static unsigned optimize_object_size(unsigned obj_size)
 typedef void (*rte_mempool_obj_iter_t)(void * /*obj_iter_arg*/,
void * /*obj_start*/,
void * /*obj_end*/,
-   uint32_t /*obj_index */);
+   uint32_t /*obj_index */,
+   phys_addr_t /*physaddr*/);

 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj)
+mempool_add_elem(struct rte_mempool *mp, void *obj, phys_addr_t physaddr)
 {
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;

obj = (char *)obj + mp->header_size;
+   physaddr += mp->header_size;

/* set mempool ptr in header */
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
hdr->mp = mp;
+   hdr->physaddr = physaddr;
STAILQ_INSERT_TAIL(>elt_list, hdr, next);

 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
@@ -173,6 +176,7 @@ rte_mempool_obj_mem_iter(void *vaddr, uint32_t elt_num, 
size_t total_elt_sz,
uint32_t pgn, pgf;
uintptr_t end, start, va;
uintptr_t pg_sz;
+   phys_addr_t physaddr;

pg_sz = (uintptr_t)1 << pg_shift;
va = (uintptr_t)vaddr;
@@ -208,9 +212,10 @@ rte_mempool_obj_mem_iter(void *vaddr, uint32_t elt_num, 
size_t total_elt_sz,
 * otherwise, just skip that chunk unused.
 */
if (k == pgn) {
+   physaddr = paddr[k] + (start & (pg_sz - 1));
if (obj_iter != NULL)
obj_iter(obj_iter_arg, (void *)start,
-   (void *)end, i);
+   (void *)end, i, physaddr);
va = end;
j += pgf;
i++;
@@ -247,11 +252,11 @@ rte_mempool_obj_iter(struct rte_mempool *mp,

 static void
 mempool_obj_populate(void *arg, void *start, void *end,
-   __rte_unused uint32_t idx)
+   __rte_unused uint32_t idx, phys_addr_t physaddr)
 {
struct rte_mempool *mp = arg;

-   mempool_add_elem(mp, start);
+   mempool_add_elem(mp, start, physaddr);
mp->elt_va_end = (uintptr_t)end;
 }

@@ -355,7 +360,7 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t 
total_elt_sz, uint32_t pg_shift)
  * argument to the end of the object. */
 static void
 mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
-   __rte_unused uint32_t idx)
+   __rte_unused uint32_t idx, __rte_unused phys_addr_t physaddr)
 {
*(uintptr_t *)arg = (uintptr_t)end;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0153e62..00ca087 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -158,6 +158,7 @@ struct rte_mempool_objsz {
 struct rte_mempool_objhdr {
STAILQ_ENTRY(rte_mempool_objhdr) next; /**< Next in list. */
struct rte_mempool *mp;  /**< The mempool owning the object. */
+   phys_addr_t physaddr;/**< Physical address of the object. */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
uint64_t cookie; /**< Debug cookie. */
 #endif
@@ -1125,13 +1126,13 @@ rte_mempool_empty(const struct rte_mempool *mp)
  *   The physical address of the elt element.
  */
 static inline phys_addr_t
-rte_mempool_virt2phy(const struct rte_mempool *mp, const void *elt)
+rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void 
*elt)
 {
if (rte_eal_has_hugepages()) {
-   uintptr_t off;
-
-   off = (const char *)elt - (const char *)mp->elt_va_start;
-   return mp->elt_pa[off >> mp->pg_shift] + (off & mp->pg_mask);
+   const struct rte_mempool_objhdr *hdr;
+   hdr = (const struct rte_mempool_objhdr *)RTE_PTR_SUB(elt,
+   sizeof(*hdr));
+   return hdr->physaddr;
} else {
/*
 * If huge pages are disabled, we cannot assume the
-- 
2.1.4

[dpdk-dev] [PATCH 12/36] mempool: use the list to initialize mempool objects

2016-04-14 Thread Olivier Matz

Before this patch, the mempool elements were initialized at the time
they were added to the mempool. This patch changes this to do the
initialization of all objects once the mempool is populated, using
rte_mempool_obj_iter() introduced in previous commits.

Thanks to this modification, we are getting closer to a new API
that would allow us to do:
  mempool_init()
  mempool_populate(mem1)
  mempool_populate(mem2)
  mempool_populate(mem3)
  mempool_init_obj()

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2266f38..5d957b1 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -135,8 +135,7 @@ typedef void (*rte_mempool_obj_iter_t)(void * 
/*obj_iter_arg*/,
uint32_t /*obj_index */);

 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, uint32_t obj_idx,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg)
+mempool_add_elem(struct rte_mempool *mp, void *obj)
 {
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;
@@ -153,9 +152,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
tlr = __mempool_get_trailer(obj);
tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-   /* call the initializer */
-   if (obj_init)
-   obj_init(mp, obj_init_arg, obj, obj_idx);

/* enqueue in ring */
rte_ring_sp_enqueue(mp->ring, obj);
@@ -249,37 +245,27 @@ rte_mempool_obj_iter(struct rte_mempool *mp,
  * Populate  mempool with the objects.
  */

-struct mempool_populate_arg {
-   struct rte_mempool *mp;
-   rte_mempool_obj_cb_t   *obj_init;
-   void   *obj_init_arg;
-};
-
 static void
-mempool_obj_populate(void *arg, void *start, void *end, uint32_t idx)
+mempool_obj_populate(void *arg, void *start, void *end,
+   __rte_unused uint32_t idx)
 {
-   struct mempool_populate_arg *pa = arg;
+   struct rte_mempool *mp = arg;

-   mempool_add_elem(pa->mp, start, idx, pa->obj_init, pa->obj_init_arg);
-   pa->mp->elt_va_end = (uintptr_t)end;
+   mempool_add_elem(mp, start);
+   mp->elt_va_end = (uintptr_t)end;
 }

 static void
-mempool_populate(struct rte_mempool *mp, size_t num, size_t align,
-   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg)
+mempool_populate(struct rte_mempool *mp, size_t num, size_t align)
 {
uint32_t elt_sz;
-   struct mempool_populate_arg arg;

elt_sz = mp->elt_size + mp->header_size + mp->trailer_size;
-   arg.mp = mp;
-   arg.obj_init = obj_init;
-   arg.obj_init_arg = obj_init_arg;

mp->size = rte_mempool_obj_mem_iter((void *)mp->elt_va_start,
num, elt_sz, align,
mp->elt_pa, mp->pg_num, mp->pg_shift,
-   mempool_obj_populate, );
+   mempool_obj_populate, mp);
 }

 /* get the header, trailer and total size of a mempool element. */
@@ -648,7 +634,11 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
if (mp_init)
mp_init(mp, mp_init_arg);

-   mempool_populate(mp, n, 1, obj_init, obj_init_arg);
+   mempool_populate(mp, n, 1);
+
+   /* call the initializer */
+   if (obj_init)
+   rte_mempool_obj_iter(mp, obj_init, obj_init_arg);

te->data = (void *) mp;

-- 
2.1.4

[dpdk-dev] [PATCH 11/36] mempool: use the list to audit all elements

2016-04-14 Thread Olivier Matz

Use the new rte_mempool_obj_iter() instead the old rte_mempool_obj_iter()
to iterate among objects to audit them (check for cookies).

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 41 ++--
 1 file changed, 6 insertions(+), 35 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 5cb58db..2266f38 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -720,12 +720,6 @@ rte_mempool_dump_cache(FILE *f, const struct rte_mempool 
*mp)
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

-struct mempool_audit_arg {
-   const struct rte_mempool *mp;
-   uintptr_t obj_end;
-   uint32_t obj_num;
-};
-
 /* check and update cookies or panic (internal) */
 void __mempool_check_cookies(const struct rte_mempool *mp,
void * const *obj_table_const, unsigned n, int free)
@@ -795,45 +789,22 @@ void __mempool_check_cookies(const struct rte_mempool *mp,
 }

 static void
-mempool_obj_audit(void *arg, void *start, void *end, uint32_t idx)
+mempool_obj_audit(struct rte_mempool *mp, __rte_unused void *opaque,
+   void *obj, __rte_unused unsigned idx)
 {
-   struct mempool_audit_arg *pa = arg;
-   void *obj;
-
-   obj = (char *)start + pa->mp->header_size;
-   pa->obj_end = (uintptr_t)end;
-   pa->obj_num = idx + 1;
-   __mempool_check_cookies(pa->mp, , 1, 2);
+   __mempool_check_cookies(mp, , 1, 2);
 }

 static void
 mempool_audit_cookies(struct rte_mempool *mp)
 {
-   uint32_t elt_sz, num;
-   struct mempool_audit_arg arg;
-
-   elt_sz = mp->elt_size + mp->header_size + mp->trailer_size;
-
-   arg.mp = mp;
-   arg.obj_end = mp->elt_va_start;
-   arg.obj_num = 0;
-
-   num = rte_mempool_obj_mem_iter((void *)mp->elt_va_start,
-   mp->size, elt_sz, 1,
-   mp->elt_pa, mp->pg_num, mp->pg_shift,
-   mempool_obj_audit, );
+   unsigned num;

+   num = rte_mempool_obj_iter(mp, mempool_obj_audit, NULL);
if (num != mp->size) {
-   rte_panic("rte_mempool_obj_iter(mempool=%p, size=%u) "
+   rte_panic("rte_mempool_obj_iter(mempool=%p, size=%u) "
"iterated only over %u elements\n",
mp, mp->size, num);
-   } else if (arg.obj_end != mp->elt_va_end || arg.obj_num != mp->size) {
-   rte_panic("rte_mempool_obj_iter(mempool=%p, size=%u) "
-   "last callback va_end: %#tx (%#tx expeceted), "
-   "num of objects: %u (%u expected)\n",
-   mp, mp->size,
-   arg.obj_end, mp->elt_va_end,
-   arg.obj_num, mp->size);
}
 }

-- 
2.1.4

[dpdk-dev] [PATCH 09/36] mempool: remove const qualifier in dump and audit

2016-04-14 Thread Olivier Matz

In next commits, we will use an iterator to walk through the objects in
mempool in rte_mempool_audit(). This iterator takes a "struct
rte_mempool *" as a parameter because it is assumed that the callback
function can modify the mempool.

The previous approach was to introduce a RTE_DECONST() macro, but
after discussion it seems that removing the const qualifier is better
to avoid fooling the compiler, and also because these functions are
not used in datapath (possible compiler optimizations due to const
are not critical).

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 8 
 lib/librte_mempool/rte_mempool.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 664a2bf..0fd244b 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -781,7 +781,7 @@ mempool_obj_audit(void *arg, void *start, void *end, 
uint32_t idx)
 }

 static void
-mempool_audit_cookies(const struct rte_mempool *mp)
+mempool_audit_cookies(struct rte_mempool *mp)
 {
uint32_t elt_sz, num;
struct mempool_audit_arg arg;
@@ -839,7 +839,7 @@ mempool_audit_cache(const struct rte_mempool *mp)

 /* check the consistency of mempool (size, cookies, ...) */
 void
-rte_mempool_audit(const struct rte_mempool *mp)
+rte_mempool_audit(struct rte_mempool *mp)
 {
mempool_audit_cache(mp);
mempool_audit_cookies(mp);
@@ -850,7 +850,7 @@ rte_mempool_audit(const struct rte_mempool *mp)

 /* dump the status of the mempool on the console */
 void
-rte_mempool_dump(FILE *f, const struct rte_mempool *mp)
+rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 {
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
struct rte_mempool_debug_stats sum;
@@ -921,7 +921,7 @@ rte_mempool_dump(FILE *f, const struct rte_mempool *mp)
 void
 rte_mempool_list_dump(FILE *f)
 {
-   const struct rte_mempool *mp = NULL;
+   struct rte_mempool *mp = NULL;
struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 54a5917..a80335f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -645,7 +645,7 @@ rte_dom0_mempool_create(const char *name, unsigned n, 
unsigned elt_size,
  * @param mp
  *   A pointer to the mempool structure.
  */
-void rte_mempool_dump(FILE *f, const struct rte_mempool *mp);
+void rte_mempool_dump(FILE *f, struct rte_mempool *mp);

 /**
  * @internal Put several objects back in the mempool; used internally.
@@ -1183,7 +1183,7 @@ rte_mempool_virt2phy(const struct rte_mempool *mp, const 
void *elt)
  * @param mp
  *   A pointer to the mempool structure.
  */
-void rte_mempool_audit(const struct rte_mempool *mp);
+void rte_mempool_audit(struct rte_mempool *mp);

 /**
  * Return a pointer to the private data in an mempool structure.
-- 
2.1.4

[dpdk-dev] [PATCH 08/36] mempool: remove const attribute in mempool_walk

2016-04-14 Thread Olivier Matz

Most functions that can be done on a mempool require a non-const mempool
pointer, except the dump and the audit. Therefore, the mempool_walk()
is more useful if the mempool pointer is not const.

This is required by next commit where the mellanox drivers use
rte_mempool_walk() to iterate the mempools, then rte_mempool_obj_iter()
to iterate the objects in each mempool.

Signed-off-by: Olivier Matz 
---
 drivers/net/mlx4/mlx4.c  | 2 +-
 drivers/net/mlx5/mlx5_rxtx.c | 2 +-
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 lib/librte_mempool/rte_mempool.c | 2 +-
 lib/librte_mempool/rte_mempool.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 4f21dbe..41453cb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1369,7 +1369,7 @@ txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
  *   Pointer to TX queue structure.
  */
 static void
-txq_mp2mr_iter(const struct rte_mempool *mp, void *arg)
+txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 {
struct txq *txq = arg;
struct txq_mp2mr_mbuf_check_data data = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 9d1380a..88226b6 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -311,7 +311,7 @@ txq_mp2mr_mbuf_check(void *arg, void *start, void *end,
  *   Pointer to TX queue structure.
  */
 void
-txq_mp2mr_iter(const struct rte_mempool *mp, void *arg)
+txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 {
struct txq *txq = arg;
struct txq_mp2mr_mbuf_check_data data = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0e2b607..db054d6 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -342,7 +342,7 @@ uint16_t mlx5_tx_burst_secondary_setup(void *dpdk_txq, 
struct rte_mbuf **pkts,
 /* mlx5_rxtx.c */

 struct ibv_mr *mlx5_mp2mr(struct ibv_pd *, const struct rte_mempool *);
-void txq_mp2mr_iter(const struct rte_mempool *, void *);
+void txq_mp2mr_iter(struct rte_mempool *, void *);
 uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t);
 uint16_t mlx5_rx_burst_sp(void *, struct rte_mbuf **, uint16_t);
 uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 83afda8..664a2bf 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -965,7 +965,7 @@ rte_mempool_lookup(const char *name)
return mp;
 }

-void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
+void rte_mempool_walk(void (*func)(struct rte_mempool *, void *),
  void *arg)
 {
struct rte_tailq_entry *te = NULL;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 469bcbc..54a5917 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -1304,7 +1304,7 @@ ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t 
elt_num,
  * @param arg
  *   Argument passed to iterator
  */
-void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *arg),
+void rte_mempool_walk(void (*func)(struct rte_mempool *, void *arg),
  void *arg);

 #ifdef __cplusplus
-- 
2.1.4

[dpdk-dev] [PATCH 06/36] mempool: update library version

2016-04-14 Thread Olivier Matz

Next changes of this patch series are too heavy to keep a compat
layer. So bump the version number of the library.

Signed-off-by: Olivier Matz 
---
 doc/guides/rel_notes/release_16_04.rst | 2 +-
 lib/librte_mempool/Makefile| 2 +-
 lib/librte_mempool/rte_mempool_version.map | 6 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index d0a09ef..5fe172d 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -513,7 +513,7 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_kvargs.so.1
  librte_lpm.so.2
  librte_mbuf.so.2
- librte_mempool.so.1
+   + librte_mempool.so.2
  librte_meter.so.1
+ librte_pipeline.so.3
  librte_pmd_bond.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index a6898ef..706f844 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -38,7 +38,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3

 EXPORT_MAP := rte_mempool_version.map

-LIBABIVER := 1
+LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
diff --git a/lib/librte_mempool/rte_mempool_version.map 
b/lib/librte_mempool/rte_mempool_version.map
index 17151e0..8c157d0 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -17,3 +17,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_16.07 {
+   global:
+
+   local: *;
+} DPDK_2.0;
-- 
2.1.4

[dpdk-dev] [PATCH 05/36] mempool: rename mempool_obj_ctor_t as mempool_obj_cb_t

2016-04-14 Thread Olivier Matz

In next commits, we will add the ability to populate the
mempool and iterate through objects using the same function.
We will use the same callback type for that. As the callback is
not a constructor anymore, rename it into rte_mempool_obj_cb_t.

The rte_mempool_obj_iter_t that was used to iterate over objects
will be removed in next commits.

No functional change.
In this commit, the API is preserved through a compat typedef.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/mempool_anon.c|  4 ++--
 app/test-pmd/mempool_osdep.h   |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.h  |  2 +-
 drivers/net/xenvirt/rte_mempool_gntalloc.c |  4 ++--
 lib/librte_mempool/rte_dom0_mempool.c  |  2 +-
 lib/librte_mempool/rte_mempool.c   |  8 
 lib/librte_mempool/rte_mempool.h   | 27 ++-
 7 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/app/test-pmd/mempool_anon.c b/app/test-pmd/mempool_anon.c
index 4730432..5e23848 100644
--- a/app/test-pmd/mempool_anon.c
+++ b/app/test-pmd/mempool_anon.c
@@ -86,7 +86,7 @@ struct rte_mempool *
 mempool_anon_create(const char *name, unsigned elt_num, unsigned elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags)
 {
struct rte_mempool *mp;
@@ -190,7 +190,7 @@ mempool_anon_create(__rte_unused const char *name,
__rte_unused unsigned private_data_size,
__rte_unused rte_mempool_ctor_t *mp_init,
__rte_unused void *mp_init_arg,
-   __rte_unused rte_mempool_obj_ctor_t *obj_init,
+   __rte_unused rte_mempool_obj_cb_t *obj_init,
__rte_unused void *obj_init_arg,
__rte_unused int socket_id, __rte_unused unsigned flags)
 {
diff --git a/app/test-pmd/mempool_osdep.h b/app/test-pmd/mempool_osdep.h
index 6b8df68..7ce7297 100644
--- a/app/test-pmd/mempool_osdep.h
+++ b/app/test-pmd/mempool_osdep.h
@@ -48,7 +48,7 @@ struct rte_mempool *
 mempool_anon_create(const char *name, unsigned n, unsigned elt_size,
unsigned cache_size, unsigned private_data_size,
rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
int socket_id, unsigned flags);

 #endif /*_RTE_MEMPOOL_OSDEP_H_ */
diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.h 
b/drivers/net/xenvirt/rte_eth_xenvirt.h
index fc15a63..4995a9b 100644
--- a/drivers/net/xenvirt/rte_eth_xenvirt.h
+++ b/drivers/net/xenvirt/rte_eth_xenvirt.h
@@ -51,7 +51,7 @@ struct rte_mempool *
 rte_mempool_gntalloc_create(const char *name, unsigned elt_num, unsigned 
elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags);


diff --git a/drivers/net/xenvirt/rte_mempool_gntalloc.c 
b/drivers/net/xenvirt/rte_mempool_gntalloc.c
index 7bfbfda..69b9231 100644
--- a/drivers/net/xenvirt/rte_mempool_gntalloc.c
+++ b/drivers/net/xenvirt/rte_mempool_gntalloc.c
@@ -78,7 +78,7 @@ static struct _mempool_gntalloc_info
 _create_mempool(const char *name, unsigned elt_num, unsigned elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags)
 {
struct _mempool_gntalloc_info mgi;
@@ -253,7 +253,7 @@ struct rte_mempool *
 rte_mempool_gntalloc_create(const char *name, unsigned elt_num, unsigned 
elt_size,
   unsigned cache_size, unsigned private_data_size,
   rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+  rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
   int socket_id, unsigned flags)
 {
int rv;
diff --git a/lib/librte_mempool/rte_dom0_mempool.c 
b/lib/librte_mempool/rte_dom0_mempool.c
index 0d6d750..0051bd5 100644
--- a/lib/librte_mempool/rte_dom0_mempool.c
+++ b/lib/librte_mempool/rte_dom0_mempool.c
@@ -83,7 +83,7 @@ struct rte_mempool *
 rte_dom0_mempool_create(const char *name, unsigned elt_num, unsigned elt_size,
unsigned cache_size, unsigned private_data_size,
rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-   rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
+   rte_mempool_obj_cb_t

[dpdk-dev] [PATCH 04/36] mempool: use sizeof to get the size of header and trailer

2016-04-14 Thread Olivier Matz

Since commits d2e0ca22f and 97e7e685b the headers and trailers
of the mempool are defined as a structure. We can get their
size using a sizeof instead of doing a calculation that will
become wrong at the first structure update.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2e1ccc0..b5b87e7 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -264,24 +264,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,

sz = (sz != NULL) ? sz : 

-   /*
-* In header, we have at least the pointer to the pool, and
-* optionaly a 64 bits cookie.
-*/
-   sz->header_size = 0;
-   sz->header_size += sizeof(struct rte_mempool *); /* ptr to pool */
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-   sz->header_size += sizeof(uint64_t); /* cookie */
-#endif
+   sz->header_size = sizeof(struct rte_mempool_objhdr);
if ((flags & MEMPOOL_F_NO_CACHE_ALIGN) == 0)
sz->header_size = RTE_ALIGN_CEIL(sz->header_size,
RTE_MEMPOOL_ALIGN);

-   /* trailer contains the cookie in debug mode */
-   sz->trailer_size = 0;
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-   sz->trailer_size += sizeof(uint64_t); /* cookie */
-#endif
+   sz->trailer_size = sizeof(struct rte_mempool_objtlr);
+
/* element size is 8 bytes-aligned at least */
sz->elt_size = RTE_ALIGN_CEIL(elt_size, sizeof(uint64_t));

-- 
2.1.4

[dpdk-dev] [PATCH 03/36] mempool: uninline function to check cookies

2016-04-14 Thread Olivier Matz

There's no reason to keep this function inlined. Move it to
rte_mempool.c.

Note: we don't see it in the patch, but the #pragma ignoring
"-Wcast-qual" is still there in the C file.

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 68 +++
 lib/librte_mempool/rte_mempool.h | 77 ++--
 2 files changed, 71 insertions(+), 74 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 90b5b1b..2e1ccc0 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -709,6 +709,74 @@ struct mempool_audit_arg {
uint32_t obj_num;
 };

+/* check and update cookies or panic (internal) */
+void __mempool_check_cookies(const struct rte_mempool *mp,
+   void * const *obj_table_const, unsigned n, int free)
+{
+   struct rte_mempool_objhdr *hdr;
+   struct rte_mempool_objtlr *tlr;
+   uint64_t cookie;
+   void *tmp;
+   void *obj;
+   void **obj_table;
+
+   /* Force to drop the "const" attribute. This is done only when
+* DEBUG is enabled */
+   tmp = (void *) obj_table_const;
+   obj_table = (void **) tmp;
+
+   while (n--) {
+   obj = obj_table[n];
+
+   if (rte_mempool_from_obj(obj) != mp)
+   rte_panic("MEMPOOL: object is owned by another "
+ "mempool\n");
+
+   hdr = __mempool_get_header(obj);
+   cookie = hdr->cookie;
+
+   if (free == 0) {
+   if (cookie != RTE_MEMPOOL_HEADER_COOKIE1) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 
"\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad header cookie (put)\n");
+   }
+   hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
+   }
+   else if (free == 1) {
+   if (cookie != RTE_MEMPOOL_HEADER_COOKIE2) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 
"\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad header cookie (get)\n");
+   }
+   hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE1;
+   }
+   else if (free == 2) {
+   if (cookie != RTE_MEMPOOL_HEADER_COOKIE1 &&
+   cookie != RTE_MEMPOOL_HEADER_COOKIE2) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 
"\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad header cookie 
(audit)\n");
+   }
+   }
+   tlr = __mempool_get_trailer(obj);
+   cookie = tlr->cookie;
+   if (cookie != RTE_MEMPOOL_TRAILER_COOKIE) {
+   rte_log_set_history(0);
+   RTE_LOG(CRIT, MEMPOOL,
+   "obj=%p, mempool=%p, cookie=%" PRIx64 "\n",
+   obj, (const void *) mp, cookie);
+   rte_panic("MEMPOOL: bad trailer cookie\n");
+   }
+   }
+}
+
 static void
 mempool_obj_audit(void *arg, void *start, void *end, uint32_t idx)
 {
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index ca4657f..6d98cdf 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -296,6 +296,7 @@ static inline struct rte_mempool_objtlr 
*__mempool_get_trailer(void *obj)
return (struct rte_mempool_objtlr *)RTE_PTR_ADD(obj, mp->elt_size);
 }

+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 /**
  * @internal Check and update cookies or panic.
  *
@@ -310,80 +311,8 @@ static inline struct rte_mempool_objtlr 
*__mempool_get_trailer(void *obj)
  *   - 1: object is supposed to be free, mark it as allocated
  *   - 2: just check that cookie is valid (free or allocated)
  */
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-#ifndef __INTEL_COMPILER
-#pragma GCC diagnostic ignored "-Wcast-qual"
-#endif
-static inline void __mempool_check_cookies(const struct rte_mempool *mp,
-  void * const *obj_table_const,
-  unsigned n, int free)
-{
-   struct rte_mempool_objhdr *hdr;
-   struct rte_mempool_objtlr *tlr;
-   uint64_t cookie;
-   void *tmp;
-   void *obj;
-   void **obj_table;
-
-

[dpdk-dev] [PATCH 02/36] mempool: replace elt_size by total_elt_size

2016-04-14 Thread Olivier Matz

In some mempool functions, we use the size of the elements as arguments or in
variables. There is a confusion between the size including or not including
the header and trailer.

To avoid this confusion:
- update the API documentation
- rename the variables and argument names as "elt_size" when the size does not
  include the header and trailer, or else as "total_elt_size".

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 21 +++--
 lib/librte_mempool/rte_mempool.h | 19 +++
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index ce78476..90b5b1b 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -156,13 +156,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
  *
  * Given the pointer to the memory, and its topology in physical memory
  * (the physical addresses table), iterate through the "elt_num" objects
- * of size "total_elt_sz" aligned at "align". For each object in this memory
+ * of size "elt_sz" aligned at "align". For each object in this memory
  * chunk, invoke a callback. It returns the effective number of objects
  * in this memory. */
 uint32_t
-rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
align,
-   const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
-   rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
+rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
+   size_t align, const phys_addr_t paddr[], uint32_t pg_num,
+   uint32_t pg_shift, rte_mempool_obj_iter_t obj_iter, void *obj_iter_arg)
 {
uint32_t i, j, k;
uint32_t pgn, pgf;
@@ -178,7 +178,7 @@ rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t 
elt_sz, size_t align,
while (i != elt_num && j != pg_num) {

start = RTE_ALIGN_CEIL(va, align);
-   end = start + elt_sz;
+   end = start + total_elt_sz;

/* index of the first page for the next element. */
pgf = (end >> pg_shift) - (start >> pg_shift);
@@ -255,6 +255,7 @@ mempool_populate(struct rte_mempool *mp, size_t num, size_t 
align,
mempool_obj_populate, );
 }

+/* get the header, trailer and total size of a mempool element. */
 uint32_t
 rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
struct rte_mempool_objsz *sz)
@@ -332,17 +333,17 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t 
flags,
  * Calculate maximum amount of memory required to store given number of 
objects.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, uint32_t pg_shift)
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift)
 {
size_t n, pg_num, pg_sz, sz;

pg_sz = (size_t)1 << pg_shift;

-   if ((n = pg_sz / elt_sz) > 0) {
+   if ((n = pg_sz / total_elt_sz) > 0) {
pg_num = (elt_num + n - 1) / n;
sz = pg_num << pg_shift;
} else {
-   sz = RTE_ALIGN_CEIL(elt_sz, pg_sz) * elt_num;
+   sz = RTE_ALIGN_CEIL(total_elt_sz, pg_sz) * elt_num;
}

return sz;
@@ -362,7 +363,7 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
void *end,
  * given memory footprint to store required number of elements.
  */
 ssize_t
-rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
+rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t total_elt_sz,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
 {
uint32_t n;
@@ -373,7 +374,7 @@ rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, 
size_t elt_sz,
va = (uintptr_t)vaddr;
uv = va;

-   if ((n = rte_mempool_obj_iter(vaddr, elt_num, elt_sz, 1,
+   if ((n = rte_mempool_obj_iter(vaddr, elt_num, total_elt_sz, 1,
paddr, pg_num, pg_shift, mempool_lelem_iter,
)) != elt_num) {
return -(ssize_t)n;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index bd78df5..ca4657f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -1289,7 +1289,7 @@ struct rte_mempool *rte_mempool_lookup(const char *name);
  * calculates header, trailer, body and total sizes of the mempool object.
  *
  * @param elt_size
- *   The size of each element.
+ *   The size of each element, without header and trailer.
  * @param flags
  *   The flags used for the mempool creation.
  *   Consult rte_mempool_create() for more information about possible values.
@@ -1315,14 +1315,15 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, 
uint32_t flags,
  *
  * @param elt_num
  *   Number of elements.
- * @param elt_sz
- *   The size of each element.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by

[dpdk-dev] [PATCH 01/36] mempool: fix comments and style

2016-04-14 Thread Olivier Matz

No functional change, just fix some comments and styling issues.
Also avoid to duplicate comments between rte_mempool_create()
and rte_mempool_xmem_create().

Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/rte_mempool.c | 17 +---
 lib/librte_mempool/rte_mempool.h | 59 +---
 2 files changed, 26 insertions(+), 50 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7a0e07e..ce78476 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -152,6 +152,13 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
uint32_t obj_idx,
rte_ring_sp_enqueue(mp->ring, obj);
 }

+/* Iterate through objects at the given address
+ *
+ * Given the pointer to the memory, and its topology in physical memory
+ * (the physical addresses table), iterate through the "elt_num" objects
+ * of size "total_elt_sz" aligned at "align". For each object in this memory
+ * chunk, invoke a callback. It returns the effective number of objects
+ * in this memory. */
 uint32_t
 rte_mempool_obj_iter(void *vaddr, uint32_t elt_num, size_t elt_sz, size_t 
align,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
@@ -341,10 +348,8 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t elt_sz, 
uint32_t pg_shift)
return sz;
 }

-/*
- * Calculate how much memory would be actually required with the
- * given memory footprint to store required number of elements.
- */
+/* Callback used by rte_mempool_xmem_usage(): it sets the opaque
+ * argument to the end of the object. */
 static void
 mempool_lelem_iter(void *arg, __rte_unused void *start, void *end,
__rte_unused uint32_t idx)
@@ -352,6 +357,10 @@ mempool_lelem_iter(void *arg, __rte_unused void *start, 
void *end,
*(uintptr_t *)arg = (uintptr_t)end;
 }

+/*
+ * Calculate how much memory would be actually required with the
+ * given memory footprint to store required number of elements.
+ */
 ssize_t
 rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num, size_t elt_sz,
const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8595e77..bd78df5 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -214,7 +214,7 @@ struct rte_mempool {

 }  __rte_cache_aligned;

-#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread in memory. */
+#define MEMPOOL_F_NO_SPREAD  0x0001 /**< Do not spread among memory 
channels. */
 #define MEMPOOL_F_NO_CACHE_ALIGN 0x0002 /**< Do not align objs on cache 
lines.*/
 #define MEMPOOL_F_SP_PUT 0x0004 /**< Default put is 
"single-producer".*/
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
@@ -270,7 +270,8 @@ struct rte_mempool {
 /* return the header of a mempool object (internal) */
 static inline struct rte_mempool_objhdr *__mempool_get_header(void *obj)
 {
-   return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj, sizeof(struct 
rte_mempool_objhdr));
+   return (struct rte_mempool_objhdr *)RTE_PTR_SUB(obj,
+   sizeof(struct rte_mempool_objhdr));
 }

 /**
@@ -544,8 +545,9 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
elt_size,
 /**
  * Create a new mempool named *name* in memory.
  *
- * This function uses ``memzone_reserve()`` to allocate memory. The
- * pool contains n elements of elt_size. Its size is set to n.
+ * The pool contains n elements of elt_size. Its size is set to n.
+ * This function uses ``memzone_reserve()`` to allocate the mempool header
+ * (and the objects if vaddr is NULL).
  * Depending on the input parameters, mempool elements can be either allocated
  * together with the mempool header, or an externally provided memory buffer
  * could be used to store mempool objects. In later case, that external
@@ -560,18 +562,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned 
elt_size,
  * @param elt_size
  *   The size of each element.
  * @param cache_size
- *   If cache_size is non-zero, the rte_mempool library will try to
- *   limit the accesses to the common lockless pool, by maintaining a
- *   per-lcore object cache. This argument must be lower or equal to
- *   CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE. It is advised to choose
- *   cache_size to have "n modulo cache_size == 0": if this is
- *   not the case, some elements will always stay in the pool and will
- *   never be used. The access to the per-lcore table is of course
- *   faster than the multi-producer/consumer pool. The cache can be
- *   disabled if the cache_size argument is set to 0; it can be useful to
- *   avoid losing objects in cache. Note that even if not used, the
- *   memory space for cache is always reserved in a mempool structure,
- *   except if CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE is set to 0.
+ *   Size of the cache. See rte_mempool_create() for details.
  * @param private_data_size
  *

[dpdk-dev] [PATCH 00/36] mempool: rework memory allocation

2016-04-14 Thread Olivier Matz

This series is a rework of mempool. For those who don't want to read
all the cover letter, here is a sumary:

- it is not possible to allocate large mempools if there is not enough
  contiguous memory, this series solves this issue
- introduce new APIs with less arguments: "create, populate, obj_init"
- allow to free a mempool
- split code in smaller functions, will ease the introduction of ext_handler
- remove test-pmd anonymous mempool creation
- remove most of dom0-specific mempool code
- opens the door for a eal_memory rework: we probably don't need large
  contiguous memory area anymore, working with pages would work.

This breaks the ABI as it was indicated in the deprecation for 16.04.
The API stays almost the same, no modification is needed in examples app
or in test-pmd. Only kni and mellanox drivers are slightly modified.

This patch applies on top of 16.04 + v5 of Keith's patch:
"mempool: reduce rte_mempool structure size"

Changes RFC -> v1:

- remove the rte_deconst macro, and remove some const qualifier in
  dump/audit functions
- rework modifications in mellanox drivers to ensure the mempool is
  virtually contiguous
- fix mempool memory chunk iteration (bad pointer was used)
- fix compilation on freebsd: replace MAP_LOCKED flag by mlock()
- fix compilation on tilera (pointer arithmetics)
- slightly rework and clean the mempool autotest
- fix mempool autotest on bsd
- more validation (especially mellanox drivers and kni that were not
  tested in RFC)
- passed autotests (x86_64-native-linuxapp-gcc and x86_64-native-bsdapp-gcc)
- rebase on head, reorder the patches a bit and fix minor split issues


Description of the initial issue


The allocation of mbuf pool can fail even if there is enough memory.
The problem is related to the way the memory is allocated and used in
dpdk. It is particularly annoying with mbuf pools, but it can also fail
in other use cases allocating a large amount of memory.

- rte_malloc() allocates physically contiguous memory, which is needed
  for mempools, but useless most of the time.

  Allocating a large physically contiguous zone is often impossible
  because the system provide hugepages which may not be contiguous.

- rte_mempool_create() (and therefore rte_pktmbuf_pool_create())
  requires a physically contiguous zone.

- rte_mempool_xmem_create() does not solve the issue as it still
  needs the memory to be virtually contiguous, and there is no
  way in dpdk to allocate a virtually contiguous memory that is
  not also physically contiguous.

How to reproduce the issue
--

- start the dpdk with some 2MB hugepages (it can also occur with 1GB)
- allocate a large mempool
- even if there is enough memory, the allocation can fail

Example:

  git clone http://dpdk.org/git/dpdk
  cd dpdk
  make config T=x86_64-native-linuxapp-gcc
  make -j32
  mkdir -p /mnt/huge
  mount -t hugetlbfs nodev /mnt/huge
  echo 256 > 
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

  # we try to allocate a mempool whose size is ~450MB, it fails
  ./build/app/testpmd -l 2,4 -- --total-num-mbufs=20 -i

The EAL logs "EAL: Virtual area found at..." shows that there are
several zones, but all smaller than 450MB.

Workarounds:

- Use 1GB hugepages: it sometimes work, but for very large
  pools (millions of mbufs) there is the same issue. Moreover,
  it would consume 1GB memory at least which can be a lot
  in some cases.

- Reboot the machine or allocate hugepages at boot time: this increases
  the chances to have more contiguous memory, but does not completely
  solve the issue

Solutions
-

Below is a list of proposed solutions. I implemented a quick and dirty
PoC of solution 1, but it's not working in all conditions and it's
really an ugly hack.  This series implement the solution 4 which looks
the best to me, knowing it does not prevent to do more enhancements
in dpdk memory in the future (solution 3 for instance).

Solution 1: in application
--

- allocate several hugepages using rte_malloc() or rte_memzone_reserve()
  (only keeping complete hugepages)
- parse memsegs and /proc/maps to check which files mmaps these pages
- mmap the files in a contiguous virtual area
- use rte_mempool_xmem_create()

Cons:

- 1a. parsing the memsegs of rte config in the application does not
  use a public API, and can be broken if internal dpdk code changes
- 1b. some memory is lost due to malloc headers. Also, if the memory is
  very fragmented (ex: all 2MB pages are physically separated), it does
  not work at all because we cannot get any complete page. It is not
  possible to use a lower level allocator since commit fafcc11985a.
- 1c. we cannot use rte_pktmbuf_pool_create(), so we need to use mempool
  api and do a part of the job manually
- 1d. it breaks secondary processes as the virtual addresses won't be
  mmap'd at the same place in secondary process
- 1e. it only fixes the issue for

[dpdk-dev] [PATCH v1 1/1] ixgbe: fix queue stop

2016-04-14 Thread Piotr Azarewicz

It should be checked if queue enable bit is clear.

CID 13215 : Wrong operator used (CONSTANT_EXPRESSION_RESULT)
operator_confusion: txdctl | 33554432 is always 1/true regardless of the
values of its operand. This occurs as the logical second operand of
'&&'.

CID 13216 : Wrong operator used (CONSTANT_EXPRESSION_RESULT)
operator_confusion: rxdctl | 33554432 is always 1/true regardless of the
values of its operand. This occurs as the logical second operand of
'&&'.

Coverity issue: 13215
Coverity issue: 13216
Fixes: 029fd06d40fa ("ixgbe: queue start and stop")

Signed-off-by: Piotr Azarewicz 
---
 drivers/net/ixgbe/ixgbe_rxtx.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9fb38a6..8483e51 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -4813,12 +4813,12 @@ ixgbe_dev_rx_queue_stop(struct rte_eth_dev *dev, 
uint16_t rx_queue_id)
rxdctl &= ~IXGBE_RXDCTL_ENABLE;
IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(rxq->reg_idx), rxdctl);

-   /* Wait until RX Enable ready */
+   /* Wait until RX Enable bit clear */
poll_ms = RTE_IXGBE_REGISTER_POLL_WAIT_10_MS;
do {
rte_delay_ms(1);
rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx));
-   } while (--poll_ms && (rxdctl | IXGBE_RXDCTL_ENABLE));
+   } while (--poll_ms && (rxdctl & IXGBE_RXDCTL_ENABLE));
if (!poll_ms)
PMD_INIT_LOG(ERR, "Could not disable Rx Queue %d",
 rx_queue_id);
@@ -4914,14 +4914,14 @@ ixgbe_dev_tx_queue_stop(struct rte_eth_dev *dev, 
uint16_t tx_queue_id)
txdctl &= ~IXGBE_TXDCTL_ENABLE;
IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(txq->reg_idx), txdctl);

-   /* Wait until TX Enable ready */
+   /* Wait until TX Enable bit clear */
if (hw->mac.type == ixgbe_mac_82599EB) {
poll_ms = RTE_IXGBE_REGISTER_POLL_WAIT_10_MS;
do {
rte_delay_ms(1);
txdctl = IXGBE_READ_REG(hw,
IXGBE_TXDCTL(txq->reg_idx));
-   } while (--poll_ms && (txdctl | IXGBE_TXDCTL_ENABLE));
+   } while (--poll_ms && (txdctl & IXGBE_TXDCTL_ENABLE));
if (!poll_ms)
PMD_INIT_LOG(ERR, "Could not disable "
 "Tx Queue %d", tx_queue_id);
-- 
1.7.9.5

[dpdk-dev] [PATCH] examples/ip_pipeline: fix out-of-bounds write

2016-04-14 Thread Marcin Kerlin

CID 124567:
In the function app_init_eal(struct app params * app) number of 
entries into array exceeds the size of the array if the conditions 
are fulfilled.

Fixes: 7f64b9c004aa ("examples/ip_pipeline: rework config file syntax")

Signed-off-by: Marcin Kerlin 
---
 examples/ip_pipeline/app.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 55a9841..e775024 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -415,7 +415,7 @@ struct app_eal_params {
 #endif

 #ifndef APP_EAL_ARGC
-#define APP_EAL_ARGC 32
+#define APP_EAL_ARGC 64
 #endif

 #ifndef APP_MAX_PIPELINE_TYPES
-- 
1.9.1

[dpdk-dev] [PATCH] examples: fix CID 30708 out-of-bounds read

2016-04-14 Thread Slawomir Mrozowicz

It fix coverity issue:
CID 30708 (#1 of 1): Out-of-bounds read (OVERRUN)
12. overrun-local: Overrunning array tokens of 8 8-byte elements
at element index 4294967294 (byte offset 34359738352)
using index i (which evaluates to 4294967294).

Fixes: de3cfa2c9823 ("sched: initial import")
Signed-off-by: Slawomir Mrozowicz 
---
 examples/qos_sched/args.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/examples/qos_sched/args.c b/examples/qos_sched/args.c
index 3e7fd08..d819269 100644
--- a/examples/qos_sched/args.c
+++ b/examples/qos_sched/args.c
@@ -175,9 +175,11 @@ app_parse_opt_vals(const char *conf_str, char separator, 
uint32_t n_vals, uint32

n_tokens = rte_strsplit(string, strnlen(string, 32), tokens, n_vals, 
separator);

-   for(i = 0; i < n_tokens; i++) {
+   if (n_tokens > MAX_OPT_VALUES)
+   return -1;
+
+   for (i = 0; i < n_tokens; i++)
opt_vals[i] = (uint32_t)atol(tokens[i]);
-   }

free(string);

-- 
1.9.1



Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

[dpdk-dev] [PATCH v5] mempool: reduce rte_mempool structure size

2016-04-14 Thread Olivier Matz

From: Keith Wiles 

The rte_mempool structure is changed, which will cause an ABI change
for this structure. Providing backward compat is not reasonable
here as this structure is used in multiple defines/inlines.

Allow mempool cache support to be dynamic depending on if the
mempool being created needs cache support. Saves about 1.5M of
memory used by the rte_mempool structure.

Allocating small mempools which do not require cache can consume
larges amounts of memory if you have a number of these mempools.

Change to be effective in release 16.07.

Signed-off-by: Keith Wiles 
Acked-by: Olivier Matz 
---

Changes in v5:

- use RTE_PTR_ADD() instead of cast to (char *) to fix compilation on tilera.
  Error log was:

  rte_mempool.c: In function ?rte_mempool_xmem_create?:
  rte_mempool.c:595: error: cast increases required alignment of target type


 app/test/test_mempool.c  |  4 +--
 lib/librte_mempool/rte_mempool.c | 55 ++--
 lib/librte_mempool/rte_mempool.h | 29 ++---
 3 files changed, 40 insertions(+), 48 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index f0f823b..10e1fa4 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -122,8 +122,8 @@ test_mempool_basic(void)
return -1;

printf("get private data\n");
-   if (rte_mempool_get_priv(mp) !=
-   (char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
+   if (rte_mempool_get_priv(mp) != (char *)mp +
+   MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
return -1;

printf("get physical address of an object\n");
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index f8781e1..7a0e07e 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -452,12 +452,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
/* compilation-time checks */
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
  RTE_CACHE_LINE_MASK) != 0);
-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
  RTE_CACHE_LINE_MASK) != 0);
-   RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, local_cache) &
- RTE_CACHE_LINE_MASK) != 0);
-#endif
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
  RTE_CACHE_LINE_MASK) != 0);
@@ -527,9 +523,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 */
int head = sizeof(struct rte_mempool);
int new_size = (private_data_size + head) % page_size;
-   if (new_size) {
+   if (new_size)
private_data_size += page_size - new_size;
-   }
}

/* try to allocate tailq entry */
@@ -544,7 +539,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * store mempool objects. Otherwise reserve a memzone that is large
 * enough to hold mempool header and metadata plus mempool objects.
 */
-   mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
+   mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size);
+   mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
if (vaddr == NULL)
mempool_size += (size_t)objsz.total_size * n;
@@ -591,8 +587,15 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
mp->private_data_size = private_data_size;

+   /*
+* local_cache pointer is set even if cache_size is zero.
+* The local_cache points to just past the elt_pa[] array.
+*/
+   mp->local_cache = (struct rte_mempool_cache *)
+   RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
+
/* calculate address of the first element for continuous mempool. */
-   obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
+   obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
private_data_size;
obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);

@@ -606,9 +609,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
mp->elt_va_start = (uintptr_t)obj;
mp->elt_pa[0] = mp->phys_addr +
(mp->elt_va_start - (uintptr_t)mp);
-
-   /* mempool elements in a separate chunk of memory. */
} else {
+   /* mempool elements in a separate chunk of memory. */
mp->elt_va_start = (uintptr_t)vaddr;
memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * pg_num);
}
@@ -643,19 +645,15 @@

[dpdk-dev] [PATCH] i40e: improve performance of vector PMD

2016-04-14 Thread Bruce Richardson

An analysis of the i40e code using Intel? VTune? Amplifier 2016 showed
that the code was unexpectedly causing stalls due to "Loads blocked by
Store Forwards". This can occur when a load from memory has to wait
due to the prior store being to the same address, but being of a smaller
size i.e. the stored value cannot be directly returned to the loader.
[See ref: https://software.intel.com/en-us/node/544454]

These stalls are due to the way in which the data_len values are handled
in the driver. The lengths are extracted using vector operations, but those
16-bit lengths are then assigned using scalar operations i.e. 16-bit
stores.

These regular 16-bit stores actually have two effects in the code:
* they cause the "Loads blocked by Store Forwards" issues reported
* they also cause the previous loads in the RX function to actually be a
load followed by a store to an address on the stack, because the 16-bit
assignment can't be done to an xmm register.

By converting the 16-bit stores operations into a sequence of SSE blend
operations, we can ensure that the descriptor loads only occur once, and
avoid both the additional store and loads from the stack, as well as the
stalls due to the second loads being blocked.

Signed-off-by: Bruce Richardson 

---
 drivers/net/i40e/i40e_rxtx_vec.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 047aff5..d0a0cc9 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -192,11 +192,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
 static inline void
 desc_pktlen_align(__m128i descs[4])
 {
-   __m128i pktlen0, pktlen1, zero;
-   union {
-   uint16_t e[4];
-   uint64_t dword;
-   } vol;
+   __m128i pktlen0, pktlen1;

/* mask everything except pktlen field*/
const __m128i pktlen_msk = _mm_set_epi32(PKTLEN_MASK, PKTLEN_MASK,
@@ -206,18 +202,18 @@ desc_pktlen_align(__m128i descs[4])
pktlen1 = _mm_unpackhi_epi32(descs[1], descs[3]);
pktlen0 = _mm_unpackhi_epi32(pktlen0, pktlen1);

-   zero = _mm_xor_si128(pktlen0, pktlen0);
-
pktlen0 = _mm_srli_epi32(pktlen0, PKTLEN_SHIFT);
pktlen0 = _mm_and_si128(pktlen0, pktlen_msk);

-   pktlen0 = _mm_packs_epi32(pktlen0, zero);
-   vol.dword = _mm_cvtsi128_si64(pktlen0);
-   /* let the descriptor byte 15-14 store the pkt len */
-   *((uint16_t *)[0]+7) = vol.e[0];
-   *((uint16_t *)[1]+7) = vol.e[1];
-   *((uint16_t *)[2]+7) = vol.e[2];
-   *((uint16_t *)[3]+7) = vol.e[3];
+   pktlen0 = _mm_packs_epi32(pktlen0, pktlen0);
+
+   descs[3] = _mm_blend_epi16(descs[3], pktlen0, 0x80);
+   pktlen0 = _mm_slli_epi64(pktlen0, 16);
+   descs[2] = _mm_blend_epi16(descs[2], pktlen0, 0x80);
+   pktlen0 = _mm_slli_epi64(pktlen0, 16);
+   descs[1] = _mm_blend_epi16(descs[1], pktlen0, 0x80);
+   pktlen0 = _mm_slli_epi64(pktlen0, 16);
+   descs[0] = _mm_blend_epi16(descs[0], pktlen0, 0x80);
 }

  /*
-- 
2.5.5

[dpdk-dev] [RFC 2/2] librte_ether: add new fields to rte_eth_dev_info struct

2016-04-14 Thread Reshma Pattan

New fields nb_rx_queues and nb_tx_queues are added to
rte_eth_dev_info structure.
Changes to API rte_eth_dev_info_get() are done to update
these new fields to rte_eth_dev_info object.

Signed-off-by:reshma Pattan
---
 lib/librte_ether/rte_ethdev.c | 2 ++
 lib/librte_ether/rte_ethdev.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..032c6bf 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1661,6 +1661,8 @@ rte_eth_dev_info_get(uint8_t port_id, struct 
rte_eth_dev_info *dev_info)
(*dev->dev_ops->dev_infos_get)(dev, dev_info);
dev_info->pci_dev = dev->pci_dev;
dev_info->driver_name = dev->data->drv_name;
+   dev_info->nb_tx_queues = dev->data->nb_tx_queues;
+   dev_info->nb_rx_queues = dev->data->nb_rx_queues;
 }

 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 022733e..e8e370d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -908,6 +908,9 @@ struct rte_eth_dev_info {
struct rte_eth_desc_lim rx_desc_lim;  /**< RX descriptors limits */
struct rte_eth_desc_lim tx_desc_lim;  /**< TX descriptors limits */
uint32_t speed_capa;  /**< Supported speeds bitmap (ETH_LINK_SPEED_). */
+   /** number of queues configured by software*/
+   uint16_t nb_rx_queues; /**< Number of RX queues. */
+   uint16_t nb_tx_queues; /**< Number of TX queues. */
 };

 /**
-- 
2.5.0

[dpdk-dev] [RFC 1/2] doc: announce ABI change for rte_eth_dev_info structure

2016-04-14 Thread Reshma Pattan

New fields nb_rx_queues and nb_tx_queues will be added to
rte_eth_dev_info structure.
Changes to API rte_eth_dev_info_get() will be done to update
these new fields to rte_eth_dev_info object.

Signed-off-by:reshma Pattan
---
 doc/guides/rel_notes/deprecation.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 327fc2b..78cedb7 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -90,3 +90,9 @@ Deprecation Notices
   a handle, like the way kernel exposes an fd to user for locating a
   specific file, and to keep all major structures internally, so that
   we are likely to be free from ABI violations in future.
+
+* A librte_ether public structure ``rte_eth_dev_info`` will be changed in 
16.07.
+  The proposed change will add new parameters ``nb_rx_queues``, 
``nb_tx_queues``
+  to the structure. These are the number of queues configured by software.
+  Modification to definition of ``rte_eth_dev_info_get()`` will be done
+  to update new parameters to ``rte_eth_dev_info`` object.
-- 
2.5.0

[dpdk-dev] [RFC 0/2] add new fields to rte_eth_dev_info structure

2016-04-14 Thread Reshma Pattan

New fields nb_rx_queues and nb_tx_queues are added to rte_eth_dev_info 
structure.
Changes to API rte_eth_dev_info_get() are done to update these new fields to 
rte_eth_dev_info object.

These changes are ABI breakage and we are late to announce deprecation notice 
for 16.07,
however the rte_ether library is already subject to a deprecation notice in 
16.07.

Reshma Pattan (2):
  doc: announce ABI change for rte_eth_dev_info structure
  librte_ether: add new fields to rte_eth_dev_info struct

 doc/guides/rel_notes/deprecation.rst | 6 ++
 lib/librte_ether/rte_ethdev.c| 2 ++
 lib/librte_ether/rte_ethdev.h| 3 +++
 3 files changed, 11 insertions(+)

-- 
2.5.0

[dpdk-dev] [PATCH] bond: inherit maximum rx packet length

2016-04-14 Thread Eric Kinzie

  Instead of a hard-coded maximum receive length, allow the bond interface
  to inherit this limit from the first slave added.  This allows
  an application that uses jumbo frames to pass realistic values to
  rte_eth_dev_configure without causing an error.

Signed-off-by: Eric Kinzie 
---
 drivers/net/bonding/rte_eth_bond_api.c |4 
 drivers/net/bonding/rte_eth_bond_pmd.c |2 +-
 drivers/net/bonding/rte_eth_bond_private.h |2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index e9247b5..b763b37 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -247,6 +247,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
internals->active_slave_count = 0;
internals->rx_offload_capa = 0;
internals->tx_offload_capa = 0;
+   internals->max_rx_pktlen = (uint32_t)2048;

/* Initially allow to choose any offload type */
internals->flow_type_rss_offloads = ETH_RSS_PROTO_MASK;
@@ -365,6 +366,9 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
internals->tx_offload_capa = dev_info.tx_offload_capa;
internals->flow_type_rss_offloads = 
dev_info.flow_type_rss_offloads;

+   /* Inherit first slave's max rx packet size */
+   internals->max_rx_pktlen = dev_info.max_rx_pktlen;
+
} else {
/* Check slave link properties are supported if props are set,
 * all slaves must be the same */
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 54788cf..189fb47 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1650,7 +1650,7 @@ bond_ethdev_info(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)

dev_info->max_mac_addrs = 1;

-   dev_info->max_rx_pktlen = (uint32_t)2048;
+   dev_info->max_rx_pktlen = internals->max_rx_pktlen;

dev_info->max_rx_queues = (uint16_t)128;
dev_info->max_tx_queues = (uint16_t)512;
diff --git a/drivers/net/bonding/rte_eth_bond_private.h 
b/drivers/net/bonding/rte_eth_bond_private.h
index 8312397..79ca69d 100644
--- a/drivers/net/bonding/rte_eth_bond_private.h
+++ b/drivers/net/bonding/rte_eth_bond_private.h
@@ -169,6 +169,8 @@ struct bond_dev_private {

struct rte_kvargs *kvlist;
uint8_t slave_update_idx;
+
+   uint32_t max_rx_pktlen;
 };

 extern const struct eth_dev_ops default_dev_ops;
-- 
1.7.10.4

[dpdk-dev] [PATCH v1] doc: add template release notes for 16.07

2016-04-14 Thread Thomas Monjalon

2016-04-12 13:55, John McNamara:
> Added template release notes for DPDK 16.07 with inline
> explanations of the various sections.
> 
> Signed-off-by: John McNamara 

Applied, thanks

[dpdk-dev] memory allocation requirements

2016-04-14 Thread Sergio Gonzalez Monroy

On 13/04/2016 17:03, Thomas Monjalon wrote:
> After looking at the patches for container support, it appears that
> some changes are needed in the memory management:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788

+1

> I think it is time to collect what are the needs and expectations of
> the DPDK memory allocator. The goal is to satisfy every needs while
> cleaning the API.
> Here is a first try to start the discussion.
>
> The memory allocator has 2 classes of API in DPDK.
> First the user/application allows or requires DPDK to take over some
> memory resources of the system. The characteristics can be:
>   - numa node
>   - page size
>   - swappable or not
>   - contiguous (cannot be guaranteed) or not
>   - physical address (as root only)

I think this ties up with the different command line options related to 
memory.
We have 3 choices:
1) no option : allocate all free hugepages in the system.
 Read free hugepages from sysfs (possible race conditions if 
multiple mount points
 for the same page size). We also need to account for a limit on the 
hugetlbfs mount,
 plus if we have a cgroup it looks like we have no other way than 
handle SIGBUS signal
 to deal with the fact that we may succeed allocating the hugepages 
even though
 they are not pre-faulted (this happens with MAP_POPULATE option too).
2) -m : allocate as much memory regardless of the numa node.
3) --socket-mem  : allocate memory per numa node.

At the moment we are not able to specify how much memory of a given page 
size we
want to allocate.

So would be provide contiguous memory as an option changing default 
behavior?

> Then the drivers or other libraries use the memory through
>   - rte_malloc
>   - rte_memzone
>   - rte_mempool
> I think we can integrate the characteristics of the requested memory
> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
> The rte_mempool still focus on collection of objects with cache.

So the other bit we need to remember is the memory for the hardware queues.
There is already an API in ethdev rte_eth_dma_zone_reserve() which I 
think would
make sense to move to EAL so the memory allocator can guarantee contig 
memory
transparently for the cases that we may have memory of different 
hugepage sizes.

> If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM
> and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed.
> The Xen support should also be better integrated.

CONFIG_RTE_LIBRTE_IVSHMEM should probably be a runtime option and
CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS could likely be removed once we have a
single mmap file for hugepages.

> Currently, the first class of API is directly implemented as command line
> parameters. Please let's think of C functions first.
> The EAL parameters should simply wrap some API functions and let the
> applications tune the memory initialization with a well documented API.
>
> Probably that I forget some needs, e.g. for the secondary processes.
> Please comment.

Regards,
Sergio

[dpdk-dev] Issue on rte_sched.c

2016-04-14 Thread Thomas Monjalon

2016-04-13 20:35, Ariel Rodriguez:
> Hello, viewing the new code of librte_sched/ i found this line strange ...
> 
> #if defined(__SSE4__)

Are you refering to http://dpdk.org/commit/90f455f ?

> if instead i use :
> 
> #if defined(__SSE4_2__) || defined(__SSE4_1__)
> 
> works like a charm ...
> 
> i never see in any directive like __SSE4__

Indeed, it is strange.
By the way, it is recommended to use RTE_MACHINE_CPUFLAG_*.

[dpdk-dev] ethtool doesnt work on some interface after unbinding dpdk

2016-04-14 Thread Remy Horton

Morning,

On 13/04/2016 15:48, Gopakumar Choorakkot Edakkunni wrote:
[..]
> then after a while I
> unbind from igb_uio and bind them back to igb/ixgbe. At this point, one of
> the 4 igb ports (random) stops responding to ethtool, ethtool bails out
> with some error. But otherwise the interface seems to work fine, it has a
> linux interface created and pops up in /sys/class/net etc.. Has anyone seen
> this before ? I thought of checking before starting to debug this further

Can you give details of the error? If you were you unbinding from 
igb_uio while examples/ethtool was still running it likley caused 
something to trip up, as at least DPDK ethtool itself was not made with 
run-time unbinding in mind.

Regards,

..Remy

[dpdk-dev] [PATCH] examples: fix CID 30704 negative loop bound

2016-04-14 Thread Slawomir Mrozowicz

It fix coverity issue: CID 30704 (#1 of 1): Negative loop bound 
(NEGATIVE_RETURNS) 8. negative_returns: Using unsigned variable n_tokens in a 
loop exit condition.
Date: Thu, 14 Apr 2016 13:15:49 +0200
Message-Id: <1460632549-20942-1-git-send-email-slawomirx.mrozowicz at intel.com>
X-Mailer: git-send-email 1.9.1

Fixes: de3cfa2c9823 ("sched: initial import")
Signed-off-by: Slawomir Mrozowicz 
---
 examples/qos_sched/args.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/qos_sched/args.c b/examples/qos_sched/args.c
index 3e7fd08..7a98e5c 100644
--- a/examples/qos_sched/args.c
+++ b/examples/qos_sched/args.c
@@ -162,7 +162,7 @@ static int
 app_parse_opt_vals(const char *conf_str, char separator, uint32_t n_vals, 
uint32_t *opt_vals)
 {
char *string;
-   uint32_t i, n_tokens;
+   int i, n_tokens;
char *tokens[MAX_OPT_VALUES];

if (conf_str == NULL || opt_vals == NULL || n_vals == 0 || n_vals > 
MAX_OPT_VALUES)
-- 
1.9.1



Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

[dpdk-dev] Bug in i40e PMD for flexible payload

2016-04-14 Thread Wu, Jingjing

Thanks, Michael.

Ack to your change. Could you send patch for that?

Thanks
Jingjing

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Habibi
> Sent: Thursday, March 24, 2016 2:45 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Bug in i40e PMD for flexible payload
> 
> We are using the i40 implementation to configure flow director with flexible
> payload rules. When setting up rules, it allows you to set a value to 63 to
> disable the rule (NONUSE_FLX_PIT_DEST_OFF). However, the macro in
> question is always adding an offset value 50
> (I40E_FLX_OFFSET_IN_FIELD_VECTOR). This doesn't work when you use it in
> conjunction with NONUSE_FLX_PIT_DEST_OFF to disable it, because instead
> of taking 63 as is, it does 63 + 50 and breaks the functionality.
> 
> We used the following fix and it appears to work. Just sharing with the DPDK
> team in case they want to bring it in.
> 
> Index: i40e_fdir.c
> 
> ==
> =
> 
> --- i40e_fdir.c (revision 30006)
> 
> +++ i40e_fdir.c (working copy)
> 
> @@ -90,7 +90,8 @@
> 
>   I40E_PRTQF_FLX_PIT_SOURCE_OFF_MASK) | \
> 
> (((fsize) << I40E_PRTQF_FLX_PIT_FSIZE_SHIFT) & \
> 
>I40E_PRTQF_FLX_PIT_FSIZE_MASK) | \
> 
> -dst_offset) + I40E_FLX_OFFSET_IN_FIELD_VECTOR) << \
> 
> +dst_offset) + ((dst_offset < NONUSE_FLX_PIT_DEST_OFF) ? \
> 
> +   I40E_FLX_OFFSET_IN_FIELD_VECTOR : 0)) << \
> 
>I40E_PRTQF_FLX_PIT_DEST_OFF_SHIFT) & \
> 
>I40E_PRTQF_FLX_PIT_DEST_OFF_MASK))

[dpdk-dev] [PATCH v1] drivers/net/i40e: fix incorrect register dump offset

2016-04-14 Thread Wu, Jingjing



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Wednesday, April 13, 2016 5:45 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1] drivers/net/i40e: fix incorrect register dump
> offset
> 
> The position of register values within i40e register dumps is supposed to
> reflect the register addresses. These were not being correctly calculated.
> 
> Fixes: d9efd0136ac1 ("i40e: add EEPROM and registers dumping")
> 
> Signed-off-by: Remy Horton 
Acked-by: Jingjing Wu 

Thanks
Jingjing

83 matches

Mail list logo