[dpdk-dev] [PATCH] examples: add a new example for link reset

2016-06-08 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> Sent: Monday, June 06, 2016 6:48 AM
> To: dev at dpdk.org
> Cc: Lu, Wenzhuo
> Subject: [dpdk-dev] [PATCH] examples: add a new example for link reset
> 
> Add a new example to show when the PF is down and up,
> VF port can be reset and recover.

Do we really need a totally new example for it?
Can't we put it in one of already existing ones?
Let say we have l3fwd-vf... wouldn't that suit your needs?
Konstantin

> 
> Signed-off-by: Wenzhuo Lu 
> ---
>  MAINTAINERS |   4 +
>  doc/guides/sample_app_ug/link_reset.rst | 177 
>  examples/link_reset/Makefile|  50 +++
>  examples/link_reset/main.c  | 769 
> 
>  4 files changed, 1000 insertions(+)
>  create mode 100644 doc/guides/sample_app_ug/link_reset.rst
>  create mode 100644 examples/link_reset/Makefile
>  create mode 100644 examples/link_reset/main.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3e8558f..76879c3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -650,3 +650,7 @@ F: examples/tep_termination/
>  F: examples/vmdq/
>  F: examples/vmdq_dcb/
>  F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
> +
> +M: Wenzhuo Lu 
> +F: examples/link_reset/
> +F: doc/guides/sample_app_ug/link_reset.rst
> diff --git a/doc/guides/sample_app_ug/link_reset.rst 
> b/doc/guides/sample_app_ug/link_reset.rst
> new file mode 100644
> index 000..fecae6d
> --- /dev/null
> +++ b/doc/guides/sample_app_ug/link_reset.rst
> @@ -0,0 +1,177 @@
> +..  BSD LICENSE
> +Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
> +All rights reserved.
> +
> +Redistribution and use in source and binary forms, with or without
> +modification, are permitted provided that the following conditions
> +are met:
> +
> +* Redistributions of source code must retain the above copyright
> +notice, this list of conditions and the following disclaimer.
> +* Redistributions in binary form must reproduce the above copyright
> +notice, this list of conditions and the following disclaimer in
> +the documentation and/or other materials provided with the
> +distribution.
> +* Neither the name of Intel Corporation nor the names of its
> +contributors may be used to endorse or promote products derived
> +from this software without specific prior written permission.
> +
> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +Link Reset Sample Application (in Virtualized Environments)
> +===
> +
> +The Link Reset sample application is a simple example of VF traffic recovery
> +using the Data Plane Development Kit (DPDK) which also takes advantage of 
> Single
> +Root I/O Virtualization (SR-IOV) features in a virtualized environment.
> +
> +Overview
> +
> +
> +The Link Reset sample application, which should operate in virtualized
> +environments, performs L2 forwarding for each packet that is received on an
> +RX_PORT.
> +This example is extended from the L2 forwarding example. Please reference the
> +example of L2 forwarding in virtualized environments for more details and
> +explanation about the behavior of forwarding and how to setup the test.
> +The purpose of this example is to show when the PF port is down and up, the 
> VF
> +port can recover and the traffic can recover too.
> +
> +Virtual Function Setup Instructions
> +~~~
> +
> +This application can use the virtual function available in the system and
> +therefore can be used in a virtual machine without passing through
> +the whole Network Device into a guest machine in a virtualized scenario.
> +The virtual functions can be enabled in the host machine or the hypervisor
> +with the respective physical function driver.
> +
> +For example, in a Linux* host machine, it is possible to enable a virtual
> +function using the following command:
> +
> +.. code-block:: console
> +
> +modprobe ixgbe max_vfs=2,2
> +
> +This command enables two Virtual Functions on each of Physical Function of 
> the
> +NIC, with two physical ports in 

[dpdk-dev] [PATCH v2 4/5] testpmd: handle all rxqs in rss setup

2016-06-08 Thread Wang, Zhihong


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Tuesday, June 7, 2016 6:30 PM
> To: Wang, Zhihong ; dev at dpdk.org
> Cc: Ananyev, Konstantin ; Richardson, Bruce
> ; thomas.monjalon at 6wind.com
> Subject: RE: [PATCH v2 4/5] testpmd: handle all rxqs in rss setup
> 
> 
> 
> > -Original Message-
> > From: Wang, Zhihong
> > Sent: Wednesday, June 01, 2016 4:28 AM
> > To: dev at dpdk.org
> > Cc: Ananyev, Konstantin; Richardson, Bruce; De Lara Guarch, Pablo;
> > thomas.monjalon at 6wind.com; Wang, Zhihong
> > Subject: [PATCH v2 4/5] testpmd: handle all rxqs in rss setup
> >
> > This patch removes constraints in rxq handling when multiqueue is enabled
> > to handle all the rxqs.
> >
> > Current testpmd forces a dedicated core for each rxq, some rxqs may be
> > ignored when core number is less than rxq number, and that causes
> > confusion
> > and inconvenience.
> >
> >
> > Signed-off-by: Zhihong Wang 
> 
> Patch looks good, but you said that you were going to add a more detailed
> description in the commit message.

I added them in the cover letter.
Will add them here too.

> 
> Thanks,
> Pablo


[dpdk-dev] [PATCH v2 1/5] testpmd: add retry option

2016-06-08 Thread Wang, Zhihong


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Tuesday, June 7, 2016 5:28 PM
> To: Wang, Zhihong ; dev at dpdk.org
> Cc: Ananyev, Konstantin ; Richardson, Bruce
> ; thomas.monjalon at 6wind.com
> Subject: RE: [PATCH v2 1/5] testpmd: add retry option
> 
> 
> 
> > -Original Message-
> > From: Wang, Zhihong
> > Sent: Wednesday, June 01, 2016 4:28 AM
> > To: dev at dpdk.org
> > Cc: Ananyev, Konstantin; Richardson, Bruce; De Lara Guarch, Pablo;
> > thomas.monjalon at 6wind.com; Wang, Zhihong
> > Subject: [PATCH v2 1/5] testpmd: add retry option
> >
> > This patch adds retry option in testpmd to prevent most packet losses.
> > It can be enabled by "set fwd  retry". All modes except rxonly
> > support this option.
> >
> > Adding retry mechanism expands test case coverage to support scenarios
> > where packet loss affects test results.
> >
> >
> > Signed-off-by: Zhihong Wang 
> 
> ...
> 
> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > @@ -249,8 +249,10 @@ set fwd
> >
> >  Set the packet forwarding mode::
> >
> > -   testpmd> set fwd (io|mac|mac_retry|macswap|flowgen| \
> > - rxonly|txonly|csum|icmpecho)
> > +   testpmd> set fwd (io|mac|macswap|flowgen| \
> > + rxonly|txonly|csum|icmpecho) (""|retry)
> > +
> > +``retry`` can be specified for forwarding engines except ``rx_only``.
> >
> >  The available information categories are:
> >
> > @@ -260,8 +262,6 @@ The available information categories are:
> >
> >  * ``mac``: Changes the source and the destination Ethernet addresses of
> > packets before forwarding them.
> >
> > -* ``mac_retry``: Same as "mac" forwarding mode, but includes retries if the
> > destination queue is full.
> > -
> >  * ``macswap``: MAC swap forwarding mode.
> >Swaps the source and the destination Ethernet addresses of packets
> before
> > forwarding them.
> >
> > @@ -392,7 +392,7 @@ Set number of packets per burst::
> >
> >  This is equivalent to the ``--burst command-line`` option.
> >
> > -In ``mac_retry`` forwarding mode, the transmit delay time and number of
> > retries can also be set::
> > +When retry is enabled, the transmit delay time and number of retries can
> > also be set::
> >
> > testpmd> set burst tx delay (micrseconds) retry (num)
> 
> Could you fix the typo "micrseconds" in this patch?

Sure ;)

> 
> >
> > --
> > 2.5.0
> 
> Apart from this,
> 
> Acked-by: Pablo de Lara 



[dpdk-dev] [PATCH v2 3/5] testpmd: show throughput in port stats

2016-06-08 Thread Wang, Zhihong


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Tuesday, June 7, 2016 6:03 PM
> To: Wang, Zhihong ; dev at dpdk.org
> Cc: Ananyev, Konstantin ; Richardson, Bruce
> ; thomas.monjalon at 6wind.com
> Subject: RE: [PATCH v2 3/5] testpmd: show throughput in port stats
> 
> 
> 
> > -Original Message-
> > From: Wang, Zhihong
> > Sent: Wednesday, June 01, 2016 4:28 AM
> > To: dev at dpdk.org
> > Cc: Ananyev, Konstantin; Richardson, Bruce; De Lara Guarch, Pablo;
> > thomas.monjalon at 6wind.com; Wang, Zhihong
> > Subject: [PATCH v2 3/5] testpmd: show throughput in port stats
> >
> > This patch adds throughput numbers (in the period since last use of this
> > command) in port statistics display for "show port stats (port_id|all)".
> >
> >
> > Signed-off-by: Zhihong Wang 
> > ---
> >  app/test-pmd/config.c | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> > index c611649..f487b87 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -92,6 +92,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "testpmd.h"
> >
> > @@ -150,6 +151,10 @@ print_ethaddr(const char *name, struct ether_addr
> > *eth_addr)
> >  void
> >  nic_stats_display(portid_t port_id)
> >  {
> > +   static uint64_t sum_rx[RTE_MAX_ETHPORTS];
> > +   static uint64_t sum_tx[RTE_MAX_ETHPORTS];
> > +   static uint64_t cycles[RTE_MAX_ETHPORTS];
> > +   uint64_t pkt_rx, pkt_tx, cycle;
> 
> Could you rename some of these variables to something more specific?

Thanks for the suggestion! Will rename them.

> Like:
> pkt_rx -> diff_rx_pkts
> sum_rx -> prev_rx_pkts
> cycle -> diff_cycles
> cycles -> prev_cycles
> 
> 
> 
> > struct rte_eth_stats stats;
> > struct rte_port *port = [port_id];
> > uint8_t i;
> > @@ -209,6 +214,21 @@ nic_stats_display(portid_t port_id)
> > }
> > }
> >
> > +   cycle = cycles[port_id];
> > +   cycles[port_id] = rte_rdtsc();
> > +   if (cycle > 0)
> > +   cycle = cycles[port_id] - cycle;
> > +
> > +   pkt_rx = stats.ipackets - sum_rx[port_id];
> > +   pkt_tx = stats.opackets - sum_tx[port_id];
> > +   sum_rx[port_id] = stats.ipackets;
> > +   sum_tx[port_id] = stats.opackets;
> > +   printf("\n  Throughput (since last show)\n");
> > +   printf("  RX-pps: %12"PRIu64"\n"
> > +   "  TX-pps: %12"PRIu64"\n",
> > +   cycle > 0 ? pkt_rx * rte_get_tsc_hz() / cycle : 0,
> > +   cycle > 0 ? pkt_tx * rte_get_tsc_hz() / cycle : 0);
> > +
> > printf("  %s%s\n",
> >nic_stats_border, nic_stats_border);
> >  }
> > --
> > 2.5.0



[dpdk-dev] [PATCH v4 2/8] lib/librte_ether: defind RX/TX lock mode

2016-06-08 Thread Lu, Wenzhuo
Hi Konstantin,


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, June 7, 2016 5:59 PM
> To: Tao, Zhe; dev at dpdk.org
> Cc: Lu, Wenzhuo; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, 
> Jingjing;
> Zhang, Helin
> Subject: RE: [PATCH v4 2/8] lib/librte_ether: defind RX/TX lock mode
> 
> 
> Hi Zhe & Wenzhuo,
> 
> Please find my comments below.
> BTW, for clarification - is that patch for 16.11?
> I believe it's too late to introduce such significant change in 16.07.
> Thanks
> Konstantin
Thanks for the comments.
Honestly, our purpose is 16.07. Realizing the big impact, we use NEXT_ABI to 
comment our change. So, I think although we want to merge it in 16.07 this 
change will become effective after we remove NEXT_ABI in 16.11.

> 
> > Define lock mode for RX/TX queue. Because when resetting the device we
> > want the resetting thread to get the lock of the RX/TX queue to make
> > sure the RX/TX is stopped.
> >
> > Using next ABI macro for this ABI change as it has too much impact. 7
> > APIs and 1 global variable are impacted.
> >
> > Signed-off-by: Wenzhuo Lu 
> > Signed-off-by: Zhe Tao 
> > ---
> >  lib/librte_ether/rte_ethdev.h | 62
> > +++
> >  1 file changed, 62 insertions(+)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 74e895f..4efb5e9 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -354,7 +354,12 @@ struct rte_eth_rxmode {
> > jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
> > hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
> > enable_scatter   : 1, /**< Enable scatter packets rx handler */
> > +#ifndef RTE_NEXT_ABI
> > enable_lro   : 1; /**< Enable LRO */
> > +#else
> > +   enable_lro   : 1, /**< Enable LRO */
> > +   lock_mode: 1; /**< Using lock path */
> > +#endif
> >  };
> >
> >  /**
> > @@ -634,11 +639,68 @@ struct rte_eth_txmode {
> > /**< If set, reject sending out tagged pkts */
> > hw_vlan_reject_untagged : 1,
> > /**< If set, reject sending out untagged pkts */
> > +#ifndef RTE_NEXT_ABI
> > hw_vlan_insert_pvid : 1;
> > /**< If set, enable port based VLAN insertion */
> > +#else
> > +   hw_vlan_insert_pvid : 1,
> > +   /**< If set, enable port based VLAN insertion */
> > +   lock_mode : 1;
> > +   /**< If set, using lock path */
> > +#endif
> >  };
> >
> >  /**
> > + * The macros for the RX/TX lock mode functions  */ #ifdef
> > +RTE_NEXT_ABI #define RX_LOCK_FUNCTION(dev, func) \
> > +   (dev->data->dev_conf.rxmode.lock_mode ? \
> > +   func ## _lock : func)
> > +
> > +#define TX_LOCK_FUNCTION(dev, func) \
> > +   (dev->data->dev_conf.txmode.lock_mode ? \
> > +   func ## _lock : func)
> > +#else
> > +#define RX_LOCK_FUNCTION(dev, func) func
> > +
> > +#define TX_LOCK_FUNCTION(dev, func) func #endif
> > +
> > +/* Add the lock RX/TX function for VF reset */ #define
> > +GENERATE_RX_LOCK(func, nic) \ uint16_t func ## _lock(void *rx_queue,
> > +\
> > + struct rte_mbuf **rx_pkts, \
> > + uint16_t nb_pkts) \
> > +{  \
> > +   struct nic ## _rx_queue *rxq = rx_queue; \
> > +   uint16_t nb_rx = 0; \
> > +   \
> > +   if (rte_spinlock_trylock(>rx_lock)) { \
> > +   nb_rx = func(rx_queue, rx_pkts, nb_pkts); \
> > +   rte_spinlock_unlock(>rx_lock); \
> > +   } \
> > +   \
> > +   return nb_rx; \
> > +}
> > +
> > +#define GENERATE_TX_LOCK(func, nic) \ uint16_t func ## _lock(void
> > +*tx_queue, \
> > + struct rte_mbuf **tx_pkts, \
> > + uint16_t nb_pkts) \
> > +{  \
> > +   struct nic ## _tx_queue *txq = tx_queue; \
> > +   uint16_t nb_tx = 0; \
> > +   \
> > +   if (rte_spinlock_trylock(>tx_lock)) { \
> > +   nb_tx = func(tx_queue, tx_pkts, nb_pkts); \
> > +   rte_spinlock_unlock(>tx_lock); \
> > +   } \
> > +   \
> > +   return nb_tx; \
> > +}
> 
> 1. As I said in off-line dicussiion, I think this locking could (and I think 
> better be)
> impelented completely on rte_ethdev layer.
> So actual PMD code will be unaffected.
> Again that avoids us to introduce _lock version of every RX/Tx function in 
> each
> PMD.
One purpose of implementing the lock in PMD layer is to avoid ABI change. But 
we introduce the field lock_mode in struct rte_eth_rx/txmode. So seems it's not 
a good reason now :)
The other purpose is we want to add a lock for every queue. But in rte layer 
the queue is void *, so we add the lock in the specific structures of the NICs. 
But as you mentioned below, we can add the lock as dev->data->rx_queue_state it 
the struct rte_eth_dev_data.
So, I prefer to add the lock in rte layer now.

> 
> 2. 

[dpdk-dev] [PATCH] examples: add a new example for link reset

2016-06-08 Thread Lu, Wenzhuo
Hi Konstantin,

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Wednesday, June 8, 2016 8:25 AM
> To: Lu, Wenzhuo; dev at dpdk.org
> Cc: Lu, Wenzhuo
> Subject: RE: [dpdk-dev] [PATCH] examples: add a new example for link reset
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > Sent: Monday, June 06, 2016 6:48 AM
> > To: dev at dpdk.org
> > Cc: Lu, Wenzhuo
> > Subject: [dpdk-dev] [PATCH] examples: add a new example for link reset
> >
> > Add a new example to show when the PF is down and up, VF port can be
> > reset and recover.
> 
> Do we really need a totally new example for it?
> Can't we put it in one of already existing ones?
> Let say we have l3fwd-vf... wouldn't that suit your needs?
> Konstantin
I thought about just modifying an existing example. But I choose to add a new 
one at last. The benefit of a totally new example is we can make it simple 
enough and focus on the reset function.
So it's easier for the users to find what we want to show. And it's also easier 
for us as we don't need to care about if our modification will break some 
function of the original example :)


[dpdk-dev] [PATCH v4 4/8] ixgbe: implement device reset on VF

2016-06-08 Thread Lu, Wenzhuo
Hi Konstantin,

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, June 7, 2016 6:03 PM
> To: Tao, Zhe; dev at dpdk.org
> Cc: Lu, Wenzhuo; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, 
> Jingjing;
> Zhang, Helin
> Subject: RE: [PATCH v4 4/8] ixgbe: implement device reset on VF
> 
> 
> 
> > -Original Message-
> > From: Tao, Zhe
> > Sent: Tuesday, June 07, 2016 7:53 AM
> > To: dev at dpdk.org
> > Cc: Lu, Wenzhuo; Tao, Zhe; Ananyev, Konstantin; Richardson, Bruce;
> > Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
> > Subject: [PATCH v4 4/8] ixgbe: implement device reset on VF
> >
> > Implement the device reset function.
> > 1, Add the fake RX/TX functions.
> > 2, The reset function tries to stop RX/TX by replacing
> >the RX/TX functions with the fake ones and getting the
> >locks to make sure the regular RX/TX finished.
> > 3, After the RX/TX stopped, reset the VF port, and then
> >release the locks and restore the RX/TX functions.
> >
> > Signed-off-by: Wenzhuo Lu 
> >
> >  static int
> > +ixgbevf_dev_reset(struct rte_eth_dev *dev) {
> > +   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> > +   struct ixgbe_adapter *adapter =
> > +   (struct ixgbe_adapter *)dev->data->dev_private;
> > +   int diag = 0;
> > +   uint32_t vteiam;
> > +   uint16_t i;
> > +   struct ixgbe_rx_queue *rxq;
> > +   struct ixgbe_tx_queue *txq;
> > +
> > +   /* Nothing needs to be done if the device is not started. */
> > +   if (!dev->data->dev_started)
> > +   return 0;
> > +
> > +   PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
> > +
> > +   /**
> > +* Stop RX/TX by fake functions and locks.
> > +* Fake functions are used to make RX/TX lock easier.
> > +*/
> > +   adapter->rx_backup = dev->rx_pkt_burst;
> > +   adapter->tx_backup = dev->tx_pkt_burst;
> > +   dev->rx_pkt_burst = ixgbevf_recv_pkts_fake;
> > +   dev->tx_pkt_burst = ixgbevf_xmit_pkts_fake;
> 
> If you have locking over each queue underneath, why do you still need fake
> functions?
The fake functions are used to help saving the time of waiting for the locks.
As you see, we want to lock every queue. If we don't use fake functions we have 
to wait for every queue.
But if the real functions are replaced by fake functions, ideally when we're 
waiting for the release of the first queue's lock,
the other queues will run into the fake functions. So we need not wait for them 
and get the locks directly.

> 
> > +
> > +   if (dev->data->rx_queues)
> > +   for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > +   rxq = dev->data->rx_queues[i];
> > +   rte_spinlock_lock(>rx_lock);
> > +   }
> > +
> > +   if (dev->data->tx_queues)
> > +   for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > +   txq = dev->data->tx_queues[i];
> > +   rte_spinlock_lock(>tx_lock);
> > +   }
> 
> Probably worth to create a separate function for the lines above:
> lock_all_queues(), unlock_all_queues.
> But as I sadi in previous mail - I think that code better be in rte_ethdev.
We're discussing it in the previous thread :)

> >
> > @@ -5235,11 +5243,21 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
> > rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
> > } while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
> > if (!poll_ms)
> > +#ifndef RTE_NEXT_ABI
> > +   PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d",
> i); #else
> > +   {
> > PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d",
> i);
> > +   if (dev->data->dev_conf.rxmode.lock_mode)
> > +   return -1;
> > +   }
> > +#endif
> 
> 
> Why the code has to be different here?
As you see this rxtx_start may have chance to fail. I want to expose this 
failure, so the reset function can try again.

> Thanks
> Konstantin



[dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode

2016-06-08 Thread Lu, Wenzhuo
Hi Stephen,


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Wednesday, June 8, 2016 10:16 AM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Tao, Zhe
> Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
> 
> On Mon,  6 Jun 2016 13:40:47 +0800
> Wenzhuo Lu  wrote:
> 
> > Define lock mode for RX/TX queue. Because when resetting the device we
> > want the resetting thread to get the lock of the RX/TX queue to make
> > sure the RX/TX is stopped.
> >
> > Using next ABI macro for this ABI change as it has too much impact. 7
> > APIs and 1 global variable are impacted.
> >
> > Signed-off-by: Wenzhuo Lu 
> > Signed-off-by: Zhe Tao 
> 
> Why does this patch set make a different assumption the rest of the DPDK?
> 
> The rest of the DPDK operates on the principle that the application is smart
> enough to stop the device before making changes. There is no equivalent to the
> Linux kernel RTNL mutex. The API assumes application threads are well behaved
> and will not try and sabotage each other.
> 
> If you restrict the reset operation to only being available when RX/TX is 
> stopped,
> then no lock is needed.
> 
> The fact that it requires lots more locking inside each device driver implies 
> to me
> this is not correct way to architect this.
It's a good question. This patch set doesn't follow the regular assumption of 
DPDK.
But it's a requirement we've got from some customers. The users want the driver 
does as much as it can. The best is the link state change is transparent to the 
 users.
The patch set tries to provide another choice if the users don't want to stop 
their rx/tx to handle the reset event.

And as discussed in the other thread, most probably we will move the lock from 
the PMD layer to rte lay. It'll avoid the change in every device.


[dpdk-dev] [PATCH v1 1/1] examples/l2fwd-crypto: improve random key generator

2016-06-08 Thread Azarewicz, PiotrX T
> 2016-05-25 15:34, Piotr Azarewicz:
> > This patch improve generate_random_key() function by replacing rand()
> > function with reading from /dev/urandom.
> >
> > CID 120136 : Calling risky function (DC.WEAK_CRYPTO)
> > dont_call: rand should not be used for security related applications,
> > as linear congruential algorithms are too easy to break
> >
> > Coverity issue: 120136
> >
> > Signed-off-by: Piotr Azarewicz 
> > ---
> >  examples/l2fwd-crypto/main.c |   18 +-
> 
> Is it relevant for this example?

Maybe not. But it don't break anything, and in the end make Coverity tool happy.

Declan, please share your opinion.


[dpdk-dev] about rx checksum flags

2016-06-08 Thread Chandran, Sugesh


Regards
_Sugesh


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Friday, June 3, 2016 1:43 PM
> To: Chandran, Sugesh ; Ananyev, Konstantin
> ; Stephen Hemminger
> 
> Cc: Yuanhan Liu ; dev at dpdk.org; Richardson,
> Bruce ; Adrien Mazarguil
> ; Tan, Jianfeng 
> Subject: Re: [dpdk-dev] about rx checksum flags
> 
> Hi,
> 
> On 06/02/2016 09:42 AM, Chandran, Sugesh wrote:
>  Do you also suggest to drop IP checksum flags?
> >>> > >
> >>> > > IP checksum offload is mostly useless. If application needs to
> >>> > > look at IP, it can do whole checksum in very few instructions,
> >>> > > the whole header is in the same cache line as src/dst so the HW
> >>> > > offload is really no
> >> > help.
> >>> > >
> > [Sugesh] The checksum offload can boost the tunneling performance in
> OVS.
> > I guess the IP checksum also important as L4. In some cases, UDP
> > checksum is zero and no need to validate it. But Ip checksum is
> > present on all the packets and that must be validated all  the time.
> > At higher packet rate, the ip checksum offload can offer slight performance
> improvement. What do you think??
> >
> 
> Agree, in some situations (and this is even more true with packet types /
> smartnics), the application could process without accessing the packet data if
> we keep the IP cksum flags.
[Sugesh] True, If that's the case, Will you considering to implement IP
checksum flags as well along with L4?
As you said , this will be useful when we offload packet lookup itself into the 
NICs(May be
when using Flow director) ? 



> 
> Regards,
> Olivier


[dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements

2016-06-08 Thread Adrien Mazarguil
An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
applications with a log level high enough to trigger it.

While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
expects it to be 0. Considering users are not expected to reset the
reference count to satisfy assert() and that raw functions are designed on
purpose without safety belts, remove these checks.

Signed-off-by: Adrien Mazarguil 
---
 lib/librte_mbuf/rte_mbuf.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 11fa06d..7070bb8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct 
rte_mempool *mp)
if (rte_mempool_get(mp, ) < 0)
return NULL;
m = (struct rte_mbuf *)mb;
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
rte_mbuf_refcnt_set(m, 1);
__rte_mbuf_sanity_check(m, 0);

@@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
 static inline void __attribute__((always_inline))
 __rte_mbuf_raw_free(struct rte_mbuf *m)
 {
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
rte_mempool_put(m->pool, m);
 }

-- 
2.1.4



[dpdk-dev] [PATCH] examples: add a new example for link reset

2016-06-08 Thread Ananyev, Konstantin
Hi Wenzhuo,


> Hi Konstantin,
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Wednesday, June 8, 2016 8:25 AM
> > To: Lu, Wenzhuo; dev at dpdk.org
> > Cc: Lu, Wenzhuo
> > Subject: RE: [dpdk-dev] [PATCH] examples: add a new example for link reset
> >
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > > Sent: Monday, June 06, 2016 6:48 AM
> > > To: dev at dpdk.org
> > > Cc: Lu, Wenzhuo
> > > Subject: [dpdk-dev] [PATCH] examples: add a new example for link reset
> > >
> > > Add a new example to show when the PF is down and up, VF port can be
> > > reset and recover.
> >
> > Do we really need a totally new example for it?
> > Can't we put it in one of already existing ones?
> > Let say we have l3fwd-vf... wouldn't that suit your needs?
> > Konstantin
> I thought about just modifying an existing example. But I choose to add a new 
> one at last. The benefit of a totally new example is we
> can make it simple enough and focus on the reset function.
> So it's easier for the users to find what we want to show. And it's also 
> easier for us as we don't need to care about if our modification
> will break some function of the original example :)

I still think that adding a new example for esch new feature/function in 
rte_ethdev API iw way too expensive.
If your change is not good enough and will break original example, then you 
probably re-work your feature patch
to make it stable enough.
After all people will use it in their existing apps, not write the new ones 
right?
BTW, why not make it work with testpmd?
After all it is a new PMD api, an that's for we have our testpmd here?
Konstantin 




[dpdk-dev] [PATCH v4 4/8] ixgbe: implement device reset on VF

2016-06-08 Thread Ananyev, Konstantin


> -Original Message-
> From: Lu, Wenzhuo
> Sent: Wednesday, June 08, 2016 8:24 AM
> To: Ananyev, Konstantin; Tao, Zhe; dev at dpdk.org
> Cc: Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, 
> Helin
> Subject: RE: [PATCH v4 4/8] ixgbe: implement device reset on VF
> 
> Hi Konstantin,
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Tuesday, June 7, 2016 6:03 PM
> > To: Tao, Zhe; dev at dpdk.org
> > Cc: Lu, Wenzhuo; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, 
> > Jingjing;
> > Zhang, Helin
> > Subject: RE: [PATCH v4 4/8] ixgbe: implement device reset on VF
> >
> >
> >
> > > -Original Message-
> > > From: Tao, Zhe
> > > Sent: Tuesday, June 07, 2016 7:53 AM
> > > To: dev at dpdk.org
> > > Cc: Lu, Wenzhuo; Tao, Zhe; Ananyev, Konstantin; Richardson, Bruce;
> > > Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
> > > Subject: [PATCH v4 4/8] ixgbe: implement device reset on VF
> > >
> > > Implement the device reset function.
> > > 1, Add the fake RX/TX functions.
> > > 2, The reset function tries to stop RX/TX by replacing
> > >the RX/TX functions with the fake ones and getting the
> > >locks to make sure the regular RX/TX finished.
> > > 3, After the RX/TX stopped, reset the VF port, and then
> > >release the locks and restore the RX/TX functions.
> > >
> > > Signed-off-by: Wenzhuo Lu 
> > >
> > >  static int
> > > +ixgbevf_dev_reset(struct rte_eth_dev *dev) {
> > > + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> > >dev_private);
> > > + struct ixgbe_adapter *adapter =
> > > + (struct ixgbe_adapter *)dev->data->dev_private;
> > > + int diag = 0;
> > > + uint32_t vteiam;
> > > + uint16_t i;
> > > + struct ixgbe_rx_queue *rxq;
> > > + struct ixgbe_tx_queue *txq;
> > > +
> > > + /* Nothing needs to be done if the device is not started. */
> > > + if (!dev->data->dev_started)
> > > + return 0;
> > > +
> > > + PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
> > > +
> > > + /**
> > > +  * Stop RX/TX by fake functions and locks.
> > > +  * Fake functions are used to make RX/TX lock easier.
> > > +  */
> > > + adapter->rx_backup = dev->rx_pkt_burst;
> > > + adapter->tx_backup = dev->tx_pkt_burst;
> > > + dev->rx_pkt_burst = ixgbevf_recv_pkts_fake;
> > > + dev->tx_pkt_burst = ixgbevf_xmit_pkts_fake;
> >
> > If you have locking over each queue underneath, why do you still need fake
> > functions?
> The fake functions are used to help saving the time of waiting for the locks.
> As you see, we want to lock every queue. If we don't use fake functions we 
> have to wait for every queue.
> But if the real functions are replaced by fake functions, ideally when we're 
> waiting for the release of the first queue's lock,
> the other queues will run into the fake functions. So we need not wait for 
> them and get the locks directly.

Well, data-path invokes only try_lock(), so it shouldn't be affected 
significantly, right?
Control path still have to spin on lock and grab it before it can proceed, if 
it'll spin a bit longer
I wouldn't see a big deal here.
What I am trying to say - if we'll go that way - introduce sync 
control/datapath API anyway,
we don't need any additional tricks here with rx/tx function replacement, 
correct?
So let's keep it clean and simple, after all it is a control path and not need 
to be lightning fast.
Konstantin

> 
> >
> > > +
> > > + if (dev->data->rx_queues)
> > > + for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > > + rxq = dev->data->rx_queues[i];
> > > + rte_spinlock_lock(>rx_lock);
> > > + }
> > > +
> > > + if (dev->data->tx_queues)
> > > + for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > > + txq = dev->data->tx_queues[i];
> > > + rte_spinlock_lock(>tx_lock);
> > > + }
> >
> > Probably worth to create a separate function for the lines above:
> > lock_all_queues(), unlock_all_queues.
> > But as I sadi in previous mail - I think that code better be in rte_ethdev.
> We're discussing it in the previous thread :)
> 
> > >
> > > @@ -5235,11 +5243,21 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
> > >   rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
> > >   } while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
> > >   if (!poll_ms)
> > > +#ifndef RTE_NEXT_ABI
> > > + PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d",
> > i); #else
> > > + {
> > >   PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d",
> > i);
> > > + if (dev->data->dev_conf.rxmode.lock_mode)
> > > + return -1;
> > > + }
> > > +#endif
> >
> >
> > Why the code has to be different here?
> As you see this rxtx_start may have chance to fail. I want to expose this 
> failure, so the reset function can try again.
> 
> > Thanks
> > Konstantin



[dpdk-dev] [PATCH] doc: remove reference to MATCH

2016-06-08 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

Intel stopped supporting MATCH, remove reference of MATCH in the
document.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/nics/fm10k.rst |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index c4915d8..7fc4862 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -157,10 +157,9 @@ Switch manager
 The Intel FM1 family of NICs integrate a hardware switch and multiple host
 interfaces. The FM1 PMD driver only manages host interfaces. For the
 switch component another switch driver has to be loaded prior to to the
-FM1 PMD driver.  The switch driver can be acquired for Intel support or
-from the `Match Interface `_ project.
+FM1 PMD driver. The switch driver can be acquired from Intel support.
 Only Testpoint is validated with DPDK, the latest version that has been
-validated with DPDK2.2 is 4.1.6.
+validated with DPDK is 4.1.6.

 CRC striping
 
-- 
1.7.7.6



[dpdk-dev] [PATCH v3 1/2] ethdev: add callback to get register size in bytes

2016-06-08 Thread Thomas Monjalon
Hi Zyta,

2016-06-01 09:56, zr at semihalf.com:
> rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
> do not provide register size to the app in any way. It is
> needed to allocate proper number of bytes before retrieving
> registers content with rte_eth_dev_get_reg.

Yes, register size is needed.
And I think it makes sense to register it in the struct rte_dev_reg_info.
We already have a length field, so we could just add a width field.

> @@ -1455,6 +1458,8 @@ struct eth_dev_ops {
>  
>   eth_get_reg_length_t get_reg_length;
>   /**< Get # of registers */
> + eth_get_reg_width_t get_reg_width;
> + /**< Get # of bytes in register */
>   eth_get_reg_t get_reg;
>   /**< Get registers */

I am not sure it is a good practice to add a new function for each
parameter of a request.
I would prefer having only one function rte_eth_dev_get_regs()
which returns length and width if data is NULL.
The first call is a parameter request before buffer allocation,
and the second call fills the buffer.

We can deprecate the old API and introduce this new one.

Opinions?


[dpdk-dev] [PATCH] examples: add a new example for link reset

2016-06-08 Thread Thomas Monjalon
2016-06-08 08:37, Ananyev, Konstantin:
> > From: Ananyev, Konstantin
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > > > Add a new example to show when the PF is down and up, VF port can be
> > > > reset and recover.
> > >
> > > Do we really need a totally new example for it?
> > > Can't we put it in one of already existing ones?
> > > Let say we have l3fwd-vf... wouldn't that suit your needs?
> > > Konstantin
> > I thought about just modifying an existing example. But I choose to add a 
> > new one at last. The benefit of a totally new example is we
> > can make it simple enough and focus on the reset function.
> > So it's easier for the users to find what we want to show. And it's also 
> > easier for us as we don't need to care about if our modification
> > will break some function of the original example :)
> 
> I still think that adding a new example for esch new feature/function in 
> rte_ethdev API iw way too expensive.
> If your change is not good enough and will break original example, then you 
> probably re-work your feature patch
> to make it stable enough.
> After all people will use it in their existing apps, not write the new ones 
> right?
> BTW, why not make it work with testpmd?
> After all it is a new PMD api, an that's for we have our testpmd here?

+1 for testpmd


[dpdk-dev] [PATCH v4 2/8] lib/librte_ether: defind RX/TX lock mode

2016-06-08 Thread Ananyev, Konstantin


> 
> Hi Konstantin,
> 
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Tuesday, June 7, 2016 5:59 PM
> > To: Tao, Zhe; dev at dpdk.org
> > Cc: Lu, Wenzhuo; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, 
> > Jingjing;
> > Zhang, Helin
> > Subject: RE: [PATCH v4 2/8] lib/librte_ether: defind RX/TX lock mode
> >
> >
> > Hi Zhe & Wenzhuo,
> >
> > Please find my comments below.
> > BTW, for clarification - is that patch for 16.11?
> > I believe it's too late to introduce such significant change in 16.07.
> > Thanks
> > Konstantin
> Thanks for the comments.
> Honestly, our purpose is 16.07. Realizing the big impact, we use NEXT_ABI to 
> comment our change. So, I think although we want to
> merge it in 16.07 this change will become effective after we remove NEXT_ABI 
> in 16.11.

I don't think it is achievable.
First I think your code is not in proper shape yet, right now.
Second, as you said, it is a significant change and I would like to hear 
opinions from the rest of the community.

> 
> >
> > > Define lock mode for RX/TX queue. Because when resetting the device we
> > > want the resetting thread to get the lock of the RX/TX queue to make
> > > sure the RX/TX is stopped.
> > >
> > > Using next ABI macro for this ABI change as it has too much impact. 7
> > > APIs and 1 global variable are impacted.
> > >
> > > Signed-off-by: Wenzhuo Lu 
> > > Signed-off-by: Zhe Tao 
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 62
> > > +++
> > >  1 file changed, 62 insertions(+)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 74e895f..4efb5e9 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -354,7 +354,12 @@ struct rte_eth_rxmode {
> > >   jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
> > >   hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
> > >   enable_scatter   : 1, /**< Enable scatter packets rx handler */
> > > +#ifndef RTE_NEXT_ABI
> > >   enable_lro   : 1; /**< Enable LRO */
> > > +#else
> > > + enable_lro   : 1, /**< Enable LRO */
> > > + lock_mode: 1; /**< Using lock path */
> > > +#endif
> > >  };
> > >
> > >  /**
> > > @@ -634,11 +639,68 @@ struct rte_eth_txmode {
> > >   /**< If set, reject sending out tagged pkts */
> > >   hw_vlan_reject_untagged : 1,
> > >   /**< If set, reject sending out untagged pkts */
> > > +#ifndef RTE_NEXT_ABI
> > >   hw_vlan_insert_pvid : 1;
> > >   /**< If set, enable port based VLAN insertion */
> > > +#else
> > > + hw_vlan_insert_pvid : 1,
> > > + /**< If set, enable port based VLAN insertion */
> > > + lock_mode : 1;
> > > + /**< If set, using lock path */
> > > +#endif
> > >  };
> > >
> > >  /**
> > > + * The macros for the RX/TX lock mode functions  */ #ifdef
> > > +RTE_NEXT_ABI #define RX_LOCK_FUNCTION(dev, func) \
> > > + (dev->data->dev_conf.rxmode.lock_mode ? \
> > > + func ## _lock : func)
> > > +
> > > +#define TX_LOCK_FUNCTION(dev, func) \
> > > + (dev->data->dev_conf.txmode.lock_mode ? \
> > > + func ## _lock : func)
> > > +#else
> > > +#define RX_LOCK_FUNCTION(dev, func) func
> > > +
> > > +#define TX_LOCK_FUNCTION(dev, func) func #endif
> > > +
> > > +/* Add the lock RX/TX function for VF reset */ #define
> > > +GENERATE_RX_LOCK(func, nic) \ uint16_t func ## _lock(void *rx_queue,
> > > +\
> > > +   struct rte_mbuf **rx_pkts, \
> > > +   uint16_t nb_pkts) \
> > > +{\
> > > + struct nic ## _rx_queue *rxq = rx_queue; \
> > > + uint16_t nb_rx = 0; \
> > > + \
> > > + if (rte_spinlock_trylock(>rx_lock)) { \
> > > + nb_rx = func(rx_queue, rx_pkts, nb_pkts); \
> > > + rte_spinlock_unlock(>rx_lock); \
> > > + } \
> > > + \
> > > + return nb_rx; \
> > > +}
> > > +
> > > +#define GENERATE_TX_LOCK(func, nic) \ uint16_t func ## _lock(void
> > > +*tx_queue, \
> > > +   struct rte_mbuf **tx_pkts, \
> > > +   uint16_t nb_pkts) \
> > > +{\
> > > + struct nic ## _tx_queue *txq = tx_queue; \
> > > + uint16_t nb_tx = 0; \
> > > + \
> > > + if (rte_spinlock_trylock(>tx_lock)) { \
> > > + nb_tx = func(tx_queue, tx_pkts, nb_pkts); \
> > > + rte_spinlock_unlock(>tx_lock); \
> > > + } \
> > > + \
> > > + return nb_tx; \
> > > +}
> >
> > 1. As I said in off-line dicussiion, I think this locking could (and I 
> > think better be)
> > impelented completely on rte_ethdev layer.
> > So actual PMD code will be unaffected.
> > Again that avoids us to introduce _lock version of every RX/Tx function in 
> > each
> > PMD.
> One purpose of implementing the lock in PMD layer is to avoid ABI change. But 
> we introduce the field 

[dpdk-dev] [PATCH v4 4/8] ixgbe: implement device reset on VF

2016-06-08 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Wednesday, June 08, 2016 9:42 AM
> To: Lu, Wenzhuo; Tao, Zhe; dev at dpdk.org
> Cc: Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, 
> Helin
> Subject: Re: [dpdk-dev] [PATCH v4 4/8] ixgbe: implement device reset on VF
> 
> 
> 
> > -Original Message-
> > From: Lu, Wenzhuo
> > Sent: Wednesday, June 08, 2016 8:24 AM
> > To: Ananyev, Konstantin; Tao, Zhe; dev at dpdk.org
> > Cc: Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, 
> > Helin
> > Subject: RE: [PATCH v4 4/8] ixgbe: implement device reset on VF
> >
> > Hi Konstantin,
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, June 7, 2016 6:03 PM
> > > To: Tao, Zhe; dev at dpdk.org
> > > Cc: Lu, Wenzhuo; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, 
> > > Jingjing;
> > > Zhang, Helin
> > > Subject: RE: [PATCH v4 4/8] ixgbe: implement device reset on VF
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Tao, Zhe
> > > > Sent: Tuesday, June 07, 2016 7:53 AM
> > > > To: dev at dpdk.org
> > > > Cc: Lu, Wenzhuo; Tao, Zhe; Ananyev, Konstantin; Richardson, Bruce;
> > > > Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
> > > > Subject: [PATCH v4 4/8] ixgbe: implement device reset on VF
> > > >
> > > > Implement the device reset function.
> > > > 1, Add the fake RX/TX functions.
> > > > 2, The reset function tries to stop RX/TX by replacing
> > > >the RX/TX functions with the fake ones and getting the
> > > >locks to make sure the regular RX/TX finished.
> > > > 3, After the RX/TX stopped, reset the VF port, and then
> > > >release the locks and restore the RX/TX functions.
> > > >
> > > > Signed-off-by: Wenzhuo Lu 
> > > >
> > > >  static int
> > > > +ixgbevf_dev_reset(struct rte_eth_dev *dev) {
> > > > +   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data-
> > > >dev_private);
> > > > +   struct ixgbe_adapter *adapter =
> > > > +   (struct ixgbe_adapter *)dev->data->dev_private;
> > > > +   int diag = 0;
> > > > +   uint32_t vteiam;
> > > > +   uint16_t i;
> > > > +   struct ixgbe_rx_queue *rxq;
> > > > +   struct ixgbe_tx_queue *txq;
> > > > +
> > > > +   /* Nothing needs to be done if the device is not started. */
> > > > +   if (!dev->data->dev_started)
> > > > +   return 0;
> > > > +
> > > > +   PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
> > > > +
> > > > +   /**
> > > > +* Stop RX/TX by fake functions and locks.
> > > > +* Fake functions are used to make RX/TX lock easier.
> > > > +*/
> > > > +   adapter->rx_backup = dev->rx_pkt_burst;
> > > > +   adapter->tx_backup = dev->tx_pkt_burst;
> > > > +   dev->rx_pkt_burst = ixgbevf_recv_pkts_fake;
> > > > +   dev->tx_pkt_burst = ixgbevf_xmit_pkts_fake;
> > >
> > > If you have locking over each queue underneath, why do you still need fake
> > > functions?
> > The fake functions are used to help saving the time of waiting for the 
> > locks.
> > As you see, we want to lock every queue. If we don't use fake functions we 
> > have to wait for every queue.
> > But if the real functions are replaced by fake functions, ideally when 
> > we're waiting for the release of the first queue's lock,
> > the other queues will run into the fake functions. So we need not wait for 
> > them and get the locks directly.
> 
> Well, data-path invokes only try_lock(), so it shouldn't be affected 
> significantly, right?
> Control path still have to spin on lock and grab it before it can proceed, if 
> it'll spin a bit longer
> I wouldn't see a big deal here.
> What I am trying to say - if we'll go that way - introduce sync 
> control/datapath API anyway,
> we don't need any additional tricks here with rx/tx function replacement, 
> correct?
> So let's keep it clean and simple, after all it is a control path and not 
> need to be lightning fast.
> Konstantin
> 
> >
> > >
> > > > +
> > > > +   if (dev->data->rx_queues)
> > > > +   for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > > > +   rxq = dev->data->rx_queues[i];
> > > > +   rte_spinlock_lock(>rx_lock);
> > > > +   }
> > > > +
> > > > +   if (dev->data->tx_queues)
> > > > +   for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > > > +   txq = dev->data->tx_queues[i];
> > > > +   rte_spinlock_lock(>tx_lock);
> > > > +   }
> > >
> > > Probably worth to create a separate function for the lines above:
> > > lock_all_queues(), unlock_all_queues.
> > > But as I sadi in previous mail - I think that code better be in 
> > > rte_ethdev.
> > We're discussing it in the previous thread :)
> >
> > > >
> > > > @@ -5235,11 +5243,21 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
> > > >   

[dpdk-dev] [PATCH v3 01/10] rte: change xstats to use integer ids

2016-06-08 Thread Thomas Monjalon
2016-05-30 11:48, Remy Horton:
>  struct rte_eth_xstats {
> + /* FIXME: Remove name[] once remaining drivers converted */
>   char name[RTE_ETH_XSTATS_NAME_SIZE];

What is the plan? This field must be deprecated with an attribute.
We cannot have 2 different APIs depending of the driver.
What are the remaining drivers to convert?

> + uint64_t id;
>   uint64_t value;
>  };
>  
> +/**
> + * A name-key lookup element for extended statistics.
> + *
> + * This structure is used to map between names and ID numbers
> + * for extended ethernet statistics.
> + */
> +struct rte_eth_xstats_name {
> + char name[RTE_ETH_XSTATS_NAME_SIZE];
> + uint64_t id;
> +};

This structure and the other one (rte_eth_xstats) are badly named.
There is only one stat in each. So they should not have the plural form.
rte_eth_xstat and rte_eth_xstat_name would be better.

[...]
>  /**
> + * Retrieve names of extended statistics of an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param ptr_names
> + *  Block of memory to insert names into. Must be at least limit in size.

"xstat_names" would be a better name than "ptr_names".
We don't use ptr in the variable names because it doesn't really convey a
semantic information.

> + * @param limit
> + *  Capacity of ptr_strings (number of names).

We are more used to "size" than "limit".

> + * @return
> + *  If successful, number of statistics; negative on error.
> + */
> +int rte_eth_xstats_names(uint8_t port_id, struct rte_eth_xstats_name 
> *ptr_names,

Why not rte_eth_xstats_get_names?

> + unsigned limit);

A (double) indent tab is missing.

> +
> +/**
> + * Retrieve number of extended statistics of an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @return
> + *  If successful, number of statistics; negative on error.
> + */
> +int rte_eth_xstats_count(uint8_t port_id);

This function is useless because we can have the count with
rte_eth_xstats_get(p, NULL, 0)
By the way it would be more consistent to have the same behaviour
in rte_eth_xstats_names().


[dpdk-dev] [PATCH 0/7] Miscellaneous fixes for mlx4 and mlx5

2016-06-08 Thread Nelio Laranjeiro
Various minor fixes for mlx4 (ConnectX-3) and mlx5 (ConnectX-4).

Adrien Mazarguil (4):
  mlx: ensure MTU update is effective
  mlx: retrieve mbuf size through proper API
  mlx5: fix RX VLAN stripping capability check
  mlx5: cosmetic changes (coding style)

Nelio Laranjeiro (3):
  mlx: remove unused memory region property
  mlx5: enhance SR-IOV detection
  mlx5: update documentation part related to features and limitations

 doc/guides/nics/mlx5.rst   |  2 +-
 drivers/net/mlx4/mlx4.c| 43 --
 drivers/net/mlx5/mlx5.c| 14 +++--
 drivers/net/mlx5/mlx5.h|  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c | 47 --
 drivers/net/mlx5/mlx5_rxq.c| 23 +++--
 drivers/net/mlx5/mlx5_rxtx.c   |  7 +++
 drivers/net/mlx5/mlx5_rxtx.h   | 16 ++
 drivers/net/mlx5/mlx5_txq.c|  2 +-
 drivers/net/mlx5/mlx5_vlan.c   |  2 +-
 10 files changed, 98 insertions(+), 61 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH 1/7] mlx: remove unused memory region property

2016-06-08 Thread Nelio Laranjeiro
Memory regions are always local with raw Ethernet queues, drop the remote
property as it adds extra processing on the hardware side.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c  | 4 ++--
 drivers/net/mlx5/mlx5_rxtx.c | 2 +-
 drivers/net/mlx5/mlx5_txq.c  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 9ed1491..661c49f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -998,7 +998,7 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
}
mr_linear =
ibv_reg_mr(txq->priv->pd, elts_linear, sizeof(*elts_linear),
-  (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+  IBV_ACCESS_LOCAL_WRITE);
if (mr_linear == NULL) {
ERROR("%p: unable to configure MR, ibv_reg_mr() failed",
  (void *)txq);
@@ -1310,7 +1310,7 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
return ibv_reg_mr(pd,
  (void *)start,
  end - start,
- IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);
+ IBV_ACCESS_LOCAL_WRITE);
 }

 /**
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 29bfcec..7f02641 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -252,7 +252,7 @@ mlx5_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
return ibv_reg_mr(pd,
  (void *)start,
  end - start,
- IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);
+ IBV_ACCESS_LOCAL_WRITE);
 }

 /**
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 31ce53a..e20df21 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -95,7 +95,7 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
}
mr_linear =
ibv_reg_mr(txq->priv->pd, elts_linear, sizeof(*elts_linear),
-  (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+  IBV_ACCESS_LOCAL_WRITE);
if (mr_linear == NULL) {
ERROR("%p: unable to configure MR, ibv_reg_mr() failed",
  (void *)txq);
-- 
2.1.4



[dpdk-dev] [PATCH 2/7] mlx: ensure MTU update is effective

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

There is no guarantee that the new MTU is effective after writing its value
to sysfs. Retrieve it to be sure.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c| 10 +-
 drivers/net/mlx5/mlx5_ethdev.c | 10 +-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 661c49f..6174e4b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -659,7 +659,15 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
 static int
 priv_set_mtu(struct priv *priv, uint16_t mtu)
 {
-   return priv_set_sysfs_ulong(priv, "mtu", mtu);
+   uint16_t new_mtu;
+
+   if (priv_set_sysfs_ulong(priv, "mtu", mtu) ||
+   priv_get_mtu(priv, _mtu))
+   return -1;
+   if (new_mtu == mtu)
+   return 0;
+   errno = EINVAL;
+   return -1;
 }

 /**
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 36b369e..25926cb 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -398,7 +398,15 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
 static int
 priv_set_mtu(struct priv *priv, uint16_t mtu)
 {
-   return priv_set_sysfs_ulong(priv, "mtu", mtu);
+   uint16_t new_mtu;
+
+   if (priv_set_sysfs_ulong(priv, "mtu", mtu) ||
+   priv_get_mtu(priv, _mtu))
+   return -1;
+   if (new_mtu == mtu)
+   return 0;
+   errno = EINVAL;
+   return -1;
 }

 /**
-- 
2.1.4



[dpdk-dev] [PATCH 3/7] mlx: retrieve mbuf size through proper API

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

No need to allocate a mbuf for that.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c| 29 -
 drivers/net/mlx5/mlx5_ethdev.c |  5 -
 drivers/net/mlx5/mlx5_rxq.c| 20 ++--
 drivers/net/mlx5/mlx5_rxtx.c   |  1 -
 drivers/net/mlx5/mlx5_rxtx.h   |  1 -
 5 files changed, 22 insertions(+), 34 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 6174e4b..82b1c63 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -197,7 +197,6 @@ struct rxq {
unsigned int sp:1; /* Use scattered RX elements. */
unsigned int csum:1; /* Enable checksum offloading. */
unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
-   uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -3160,7 +3159,6 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
rep->ol_flags = -1;
 #endif
assert(rep->buf_len == seg->buf_len);
-   assert(rep->buf_len == rxq->mb_len);
/* Reconfigure sge to use rep instead of seg. */
assert(sge->lkey == rxq->mr->lkey);
sge->addr = ((uintptr_t)rep->buf_addr + seg_headroom);
@@ -3581,6 +3579,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
unsigned int i, k;
struct ibv_exp_qp_attr mod;
struct ibv_recv_wr *bad_wr;
+   unsigned int mb_len;
int err;
int parent = (rxq == >rxq_parent);

@@ -3589,6 +3588,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
  (void *)dev, (void *)rxq);
return EINVAL;
}
+   mb_len = rte_pktmbuf_data_room_size(rxq->mp);
DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
/* Number of descriptors and mbufs currently allocated. */
desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
@@ -3603,9 +3603,10 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
rxq->csum_l2tun = tmpl.csum_l2tun;
}
/* Enable scattered packets support for this queue if necessary. */
+   assert(mb_len >= RTE_PKTMBUF_HEADROOM);
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
-(tmpl.mb_len - RTE_PKTMBUF_HEADROOM))) {
+(mb_len - RTE_PKTMBUF_HEADROOM))) {
tmpl.sp = 1;
desc_n /= MLX4_PMD_SGE_WR_N;
} else
@@ -3796,7 +3797,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
} attr;
enum ibv_exp_query_intf_status status;
struct ibv_recv_wr *bad_wr;
-   struct rte_mbuf *buf;
+   unsigned int mb_len;
int ret = 0;
int parent = (rxq == >rxq_parent);

@@ -3812,31 +3813,22 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
desc = 1;
goto skip_mr;
}
+   mb_len = rte_pktmbuf_data_room_size(mp);
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of RX descriptors (must be a"
  " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
-   /* Get mbuf length. */
-   buf = rte_pktmbuf_alloc(mp);
-   if (buf == NULL) {
-   ERROR("%p: unable to allocate mbuf", (void *)dev);
-   return ENOMEM;
-   }
-   tmpl.mb_len = buf->buf_len;
-   assert((rte_pktmbuf_headroom(buf) +
-   rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
-   assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
-   rte_pktmbuf_free(buf);
/* Toggle RX checksum offload if hardware supports it. */
if (priv->hw_csum)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
if (priv->hw_csum_l2tun)
tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
+   assert(mb_len >= RTE_PKTMBUF_HEADROOM);
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
-(tmpl.mb_len - RTE_PKTMBUF_HEADROOM))) {
+(mb_len - RTE_PKTMBUF_HEADROOM))) {
tmpl.sp = 1;
desc /= MLX4_PMD_SGE_WR_N;
}
@@ -4873,6 +4865,7 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
/* Reconfigure each RX queue. */
for (i = 0; (i != priv->rxqs_n); ++i) {
struct rxq *rxq = (*priv->rxqs)[i];
+   unsigned int mb_len;
unsigned int max_frame_len;
   

[dpdk-dev] [PATCH 4/7] mlx5: fix RX VLAN stripping capability check

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

A hardware capability check is missing before enabling RX VLAN stripping
during queue setup.

Also, while dev_conf.rxmode.hw_vlan_strip is currently a single bit that
can be stored in priv->hw_vlan_strip directly, it should be interpreted as
a boolean value for safety.

Fixes: f3db9489188a ("mlx5: support Rx VLAN stripping")

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxq.c  | 3 ++-
 drivers/net/mlx5/mlx5_vlan.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 469ba98..0bcf55b 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1222,7 +1222,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
DEBUG("priv->device_attr.max_sge is %d",
  priv->device_attr.max_sge);
/* Configure VLAN stripping. */
-   tmpl.vlan_strip = dev->data->dev_conf.rxmode.hw_vlan_strip;
+   tmpl.vlan_strip = (priv->hw_vlan_strip &&
+  !!dev->data->dev_conf.rxmode.hw_vlan_strip);
attr.wq = (struct ibv_exp_wq_init_attr){
.wq_context = NULL, /* Could be useful in the future. */
.wq_type = IBV_EXP_WQT_RQ,
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index ea7af1e..ff40538 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -218,7 +218,7 @@ mlx5_vlan_offload_set(struct rte_eth_dev *dev, int mask)
unsigned int i;

if (mask & ETH_VLAN_STRIP_MASK) {
-   int hw_vlan_strip = dev->data->dev_conf.rxmode.hw_vlan_strip;
+   int hw_vlan_strip = !!dev->data->dev_conf.rxmode.hw_vlan_strip;

if (!priv->hw_vlan_strip) {
ERROR("VLAN stripping is not supported");
-- 
2.1.4



[dpdk-dev] [PATCH 5/7] mlx5: cosmetic changes (coding style)

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

Add consistency to mlx5_rxtx.h.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxtx.h | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index dd3003c..47f6299 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -231,7 +231,8 @@ struct hash_rxq {
struct ibv_qp *qp; /* Hash RX QP. */
enum hash_rxq_type type; /* Hash RX queue type. */
/* MAC flow steering rules, one per VLAN ID. */
-   struct ibv_exp_flow 
*mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+   struct ibv_exp_flow *mac_flow
+   [MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
struct ibv_exp_flow *special_flow
[MLX5_MAX_SPECIAL_FLOWS][MLX5_MAX_VLAN_IDS];
 };
@@ -322,21 +323,17 @@ int rxq_setup(struct rte_eth_dev *, struct rxq *, 
uint16_t, unsigned int,
 int mlx5_rx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
const struct rte_eth_rxconf *, struct rte_mempool *);
 void mlx5_rx_queue_release(void *);
-uint16_t mlx5_rx_burst_secondary_setup(void *dpdk_rxq, struct rte_mbuf **pkts,
- uint16_t pkts_n);
-
+uint16_t mlx5_rx_burst_secondary_setup(void *, struct rte_mbuf **, uint16_t);

 /* mlx5_txq.c */

 void txq_cleanup(struct txq *);
-int txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
- unsigned int socket, const struct rte_eth_txconf *conf);
-
+int txq_setup(struct rte_eth_dev *, struct txq *, uint16_t, unsigned int,
+ const struct rte_eth_txconf *);
 int mlx5_tx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
const struct rte_eth_txconf *);
 void mlx5_tx_queue_release(void *);
-uint16_t mlx5_tx_burst_secondary_setup(void *dpdk_txq, struct rte_mbuf **pkts,
- uint16_t pkts_n);
+uint16_t mlx5_tx_burst_secondary_setup(void *, struct rte_mbuf **, uint16_t);

 /* mlx5_rxtx.c */

-- 
2.1.4



[dpdk-dev] [PATCH 6/7] mlx5: enhance SR-IOV detection

2016-06-08 Thread Nelio Laranjeiro
SR-IOV mode is currently set when dealing with VF devices. PF devices must
be taken into account as well if they have active VFs.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c| 14 --
 drivers/net/mlx5/mlx5.h|  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c | 32 
 drivers/net/mlx5/mlx5_rxtx.c   |  4 ++--
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 041cfc3..67a541c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -260,7 +260,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
int err = 0;
struct ibv_context *attr_ctx = NULL;
struct ibv_device_attr device_attr;
-   unsigned int vf;
+   unsigned int sriov;
unsigned int mps;
int idx;
int i;
@@ -303,17 +303,17 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
(pci_dev->addr.devid != pci_addr.devid) ||
(pci_dev->addr.function != pci_addr.function))
continue;
-   vf = ((pci_dev->id.device_id ==
+   sriov = ((pci_dev->id.device_id ==
   PCI_DEVICE_ID_MELLANOX_CONNECTX4VF) ||
  (pci_dev->id.device_id ==
   PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF));
/* Multi-packet send is only supported by ConnectX-4 Lx PF. */
mps = (pci_dev->id.device_id ==
   PCI_DEVICE_ID_MELLANOX_CONNECTX4LX);
-   INFO("PCI information matches, using device \"%s\" (VF: %s,"
-" MPS: %s)",
+   INFO("PCI information matches, using device \"%s\""
+" (SR-IOV: %s, MPS: %s)",
 list[i]->name,
-vf ? "true" : "false",
+sriov ? "true" : "false",
 mps ? "true" : "false");
attr_ctx = ibv_open_device(list[i]);
err = errno;
@@ -351,6 +351,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_exp_device_attr exp_device_attr;
 #endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
+   uint16_t num_vfs = 0;

 #ifdef HAVE_EXP_QUERY_DEVICE
exp_device_attr.comp_mask =
@@ -464,7 +465,8 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */

-   priv->vf = vf;
+   priv_get_num_vfs(priv, _vfs);
+   priv->sriov = (num_vfs || sriov);
priv->mps = mps;
/* Allocate and register default RSS hash keys. */
priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 2487662..dccc18d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -105,7 +105,7 @@ struct priv {
unsigned int hw_vlan_strip:1; /* VLAN stripping is supported. */
unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
unsigned int hw_padding:1; /* End alignment padding is supported. */
-   unsigned int vf:1; /* This is a VF device. */
+   unsigned int sriov:1; /* This is a VF or PF with VF devices. */
unsigned int mps:1; /* Whether multi-packet send is supported. */
unsigned int pending_alarm:1; /* An alarm is pending. */
/* RX/TX queues. */
@@ -173,6 +173,7 @@ struct priv *mlx5_get_priv(struct rte_eth_dev *dev);
 int mlx5_is_secondary(void);
 int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
 int priv_ifreq(const struct priv *, int req, struct ifreq *);
+int priv_get_num_vfs(struct priv *, uint16_t *);
 int priv_get_mtu(struct priv *, uint16_t *);
 int priv_set_flags(struct priv *, unsigned int, unsigned int);
 int mlx5_dev_configure(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ecbb49b..d2a63b8 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -363,6 +363,38 @@ priv_ifreq(const struct priv *priv, int req, struct ifreq 
*ifr)
 }

 /**
+ * Return the number of active VFs for the current device.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[out] num_vfs
+ *   Number of active VFs.
+ *
+ * @return
+ *   0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_num_vfs(struct priv *priv, uint16_t *num_vfs)
+{
+   /* The sysfs entry name depends on the operating system. */
+   const char **name = (const char *[]){
+   "device/sriov_numvfs",
+   "device/mlx5_num_vfs",
+   NULL,
+   };
+   int ret;
+
+   do {
+   

[dpdk-dev] [PATCH 7/7] mlx5: update documentation part related to features and limitations

2016-06-08 Thread Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro 
---
 doc/guides/nics/mlx5.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b6f91e6..d9196d1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -86,11 +86,11 @@ Features
 - Hardware checksum offloads.
 - Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN).
 - Secondary process TX is supported.
+- KVM and VMware ESX SR-IOV modes are supported.

 Limitations
 ---

-- KVM and VMware ESX SR-IOV modes are not supported yet.
 - Inner RSS for VXLAN frames is not supported yet.
 - Port statistics through software counters only.
 - Hardware checksum offloads for VXLAN inner header are not supported yet.
-- 
2.1.4



[dpdk-dev] [PATCH 00/24] Refactor mlx5 to improve performance

2016-06-08 Thread Nelio Laranjeiro
Enhance mlx5 with a data path that bypasses Verbs.

The first half of this patchset removes support for functionality completely
rewritten in the second half (scatter/gather, inline send), while the data
path is refactored without Verbs.

The PMD remains usable during the transition.

This patchset must be applied after "Miscellaneous fixes for mlx4 and mlx5".

Adrien Mazarguil (8):
  mlx5: replace countdown with threshold for TX completions
  mlx5: add debugging information about TX queues capabilities
  mlx5: check remaining space while processing TX burst
  mlx5: resurrect TX gather support
  mlx5: work around spurious compilation errors
  mlx5: remove redundant RX queue initialization code
  mlx5: make RX queue reinitialization safer
  mlx5: resurrect RX scatter support

Nelio Laranjeiro (15):
  mlx5: split memory registration function for better performance
  mlx5: remove TX gather support
  mlx5: remove RX scatter support
  mlx5: remove configuration variable for maximum number of segments
  mlx5: remove inline TX support
  mlx5: split TX queue structure
  mlx5: split RX queue structure
  mlx5: update prerequisites for upcoming enhancements
  mlx5: add definitions for data path without Verbs
  mlx5: add support for configuration through kvargs
  mlx5: add TX/RX burst function selection wrapper
  mlx5: refactor RX data path
  mlx5: refactor TX data path
  mlx5: handle RX CQE compression
  mlx5: add support for multi-packet send

Yaacov Hazan (1):
  mlx5: add support for inline send

 config/common_base |2 -
 doc/guides/nics/mlx5.rst   |   94 +-
 drivers/net/mlx5/Makefile  |   49 +-
 drivers/net/mlx5/mlx5.c|  158 ++-
 drivers/net/mlx5/mlx5.h|   10 +
 drivers/net/mlx5/mlx5_defs.h   |   26 +-
 drivers/net/mlx5/mlx5_ethdev.c |  188 +++-
 drivers/net/mlx5/mlx5_fdir.c   |   20 +-
 drivers/net/mlx5/mlx5_mr.c |  280 +
 drivers/net/mlx5/mlx5_prm.h|  155 +++
 drivers/net/mlx5/mlx5_rxmode.c |8 -
 drivers/net/mlx5/mlx5_rxq.c|  757 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 2206 +++-
 drivers/net/mlx5/mlx5_rxtx.h   |  176 ++--
 drivers/net/mlx5/mlx5_txq.c|  362 ---
 drivers/net/mlx5/mlx5_vlan.c   |6 +-
 16 files changed, 2578 insertions(+), 1919 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_mr.c
 create mode 100644 drivers/net/mlx5/mlx5_prm.h

-- 
2.1.4



[dpdk-dev] [PATCH 01/24] mlx5: split memory registration function for better performance

2016-06-08 Thread Nelio Laranjeiro
Except for the first time when memory registration occurs, the lkey is
always cached. Since memory registration is slow and performs system calls,
performance can be improved by moving that code to its own function outside
of the data path so only the lookup code is left in the original inlined
function.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/Makefile|   1 +
 drivers/net/mlx5/mlx5_mr.c   | 277 +++
 drivers/net/mlx5/mlx5_rxtx.c | 209 ++--
 drivers/net/mlx5/mlx5_rxtx.h |   8 +-
 4 files changed, 295 insertions(+), 200 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_mr.c

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 92bfa07..1dba3de 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -47,6 +47,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_fdir.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c

 # Dependencies.
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
new file mode 100644
index 000..7c3e87f
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -0,0 +1,277 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+
+struct mlx5_check_mempool_data {
+   int ret;
+   char *start;
+   char *end;
+};
+
+/* Called by mlx5_check_mempool() when iterating the memory chunks. */
+static void mlx5_check_mempool_cb(struct rte_mempool *mp,
+   void *opaque, struct rte_mempool_memhdr *memhdr,
+   unsigned mem_idx)
+{
+   struct mlx5_check_mempool_data *data = opaque;
+
+   (void)mp;
+   (void)mem_idx;
+
+   /* It already failed, skip the next chunks. */
+   if (data->ret != 0)
+   return;
+   /* It is the first chunk. */
+   if (data->start == NULL && data->end == NULL) {
+   data->start = memhdr->addr;
+   data->end = data->start + memhdr->len;
+   return;
+   }
+   if (data->end == memhdr->addr) {
+   data->end += memhdr->len;
+   return;
+   }
+   if (data->start == (char *)memhdr->addr + memhdr->len) {
+   data->start -= memhdr->len;
+   return;
+   }
+   /* Error, mempool is not virtually contiguous. */
+   data->ret = -1;
+}
+
+/**
+ * Check if a mempool can be used: it must be virtually contiguous.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool.
+ * @param[out] start
+ *   Pointer to the start address of the mempool virtual memory area
+ * @param[out] end
+ *   Pointer to the end address of the mempool virtual memory area
+ *
+ * @return
+ *   0 on success 

[dpdk-dev] [PATCH 02/24] mlx5: remove TX gather support

2016-06-08 Thread Nelio Laranjeiro
This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. TX gather cannot be maintained during the
transition and will be reimplemented later.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c |   2 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 315 -
 drivers/net/mlx5/mlx5_rxtx.h   |  17 ---
 drivers/net/mlx5/mlx5_txq.c|  49 ++-
 4 files changed, 69 insertions(+), 314 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d2a63b8..29aec49 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1261,7 +1261,7 @@ mlx5_secondary_data_setup(struct priv *priv)
if (txq != NULL) {
if (txq_setup(priv->dev,
  txq,
- primary_txq->elts_n * MLX5_PMD_SGE_WR_N,
+ primary_txq->elts_n,
  primary_txq->socket,
  NULL) == 0) {
txq->stats.idx = primary_txq->stats.idx;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 616cf7a..6e184c3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -228,156 +228,6 @@ insert_vlan_sw(struct rte_mbuf *buf)
return 0;
 }

-#if MLX5_PMD_SGE_WR_N > 1
-
-/**
- * Copy scattered mbuf contents to a single linear buffer.
- *
- * @param[out] linear
- *   Linear output buffer.
- * @param[in] buf
- *   Scattered input buffer.
- *
- * @return
- *   Number of bytes copied to the output buffer or 0 if not large enough.
- */
-static unsigned int
-linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
-{
-   unsigned int size = 0;
-   unsigned int offset;
-
-   do {
-   unsigned int len = DATA_LEN(buf);
-
-   offset = size;
-   size += len;
-   if (unlikely(size > sizeof(*linear)))
-   return 0;
-   memcpy(&(*linear)[offset],
-  rte_pktmbuf_mtod(buf, uint8_t *),
-  len);
-   buf = NEXT(buf);
-   } while (buf != NULL);
-   return size;
-}
-
-/**
- * Handle scattered buffers for mlx5_tx_burst().
- *
- * @param txq
- *   TX queue structure.
- * @param segs
- *   Number of segments in buf.
- * @param elt
- *   TX queue element to fill.
- * @param[in] buf
- *   Buffer to process.
- * @param elts_head
- *   Index of the linear buffer to use if necessary (normally txq->elts_head).
- * @param[out] sges
- *   Array filled with SGEs on success.
- *
- * @return
- *   A structure containing the processed packet size in bytes and the
- *   number of SGEs. Both fields are set to (unsigned int)-1 in case of
- *   failure.
- */
-static struct tx_burst_sg_ret {
-   unsigned int length;
-   unsigned int num;
-}
-tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
-   struct rte_mbuf *buf, unsigned int elts_head,
-   struct ibv_sge (*sges)[MLX5_PMD_SGE_WR_N])
-{
-   unsigned int sent_size = 0;
-   unsigned int j;
-   int linearize = 0;
-
-   /* When there are too many segments, extra segments are
-* linearized in the last SGE. */
-   if (unlikely(segs > RTE_DIM(*sges))) {
-   segs = (RTE_DIM(*sges) - 1);
-   linearize = 1;
-   }
-   /* Update element. */
-   elt->buf = buf;
-   /* Register segments as SGEs. */
-   for (j = 0; (j != segs); ++j) {
-   struct ibv_sge *sge = &(*sges)[j];
-   uint32_t lkey;
-
-   /* Retrieve Memory Region key for this memory pool. */
-   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-   if (unlikely(lkey == (uint32_t)-1)) {
-   /* MR does not exist. */
-   DEBUG("%p: unable to get MP <-> MR association",
- (void *)txq);
-   /* Clean up TX element. */
-   elt->buf = NULL;
-   goto stop;
-   }
-   /* Update SGE. */
-   sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
-   if (txq->priv->sriov)
-   rte_prefetch0((volatile void *)
- (uintptr_t)sge->addr);
-   sge->length = DATA_LEN(buf);
-   sge->lkey = lkey;
-   sent_size += sge->length;
-   buf = NEXT(buf);
-   }
-   /* If buf is not NULL here and is not going to be linearized,
-* nb_segs is not valid. */
-   assert(j == segs);
-   assert((buf == NULL) || (linearize));
-   /* Linearize extra segments. */
-   if (linearize) {
-   struct ibv_sge *sge = &(*sges)[segs];
-   linear_t *linear = 

[dpdk-dev] [PATCH 03/24] mlx5: remove RX scatter support

2016-06-08 Thread Nelio Laranjeiro
This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. RX scatter cannot be maintained during the
transition and will be reimplemented later.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c |  31 +---
 drivers/net/mlx5/mlx5_rxq.c| 314 ++---
 drivers/net/mlx5/mlx5_rxtx.c   | 211 +--
 drivers/net/mlx5/mlx5_rxtx.h   |  13 +-
 4 files changed, 53 insertions(+), 516 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 29aec49..bab826c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -624,8 +624,7 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev)

};

-   if (dev->rx_pkt_burst == mlx5_rx_burst ||
-   dev->rx_pkt_burst == mlx5_rx_burst_sp)
+   if (dev->rx_pkt_burst == mlx5_rx_burst)
return ptypes;
return NULL;
 }
@@ -763,19 +762,11 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
mb_len = rte_pktmbuf_data_room_size(rxq->mp);
assert(mb_len >= RTE_PKTMBUF_HEADROOM);
sp = (max_frame_len > (mb_len - RTE_PKTMBUF_HEADROOM));
-   /* Provide new values to rxq_setup(). */
-   dev->data->dev_conf.rxmode.jumbo_frame = sp;
-   dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame_len;
-   ret = rxq_rehash(dev, rxq);
-   if (ret) {
-   /* Force SP RX if that queue requires it and abort. */
-   if (rxq->sp)
-   rx_func = mlx5_rx_burst_sp;
-   break;
+   if (sp) {
+   ERROR("%p: RX scatter is not supported", (void *)dev);
+   ret = ENOTSUP;
+   goto out;
}
-   /* Scattered burst function takes priority. */
-   if (rxq->sp)
-   rx_func = mlx5_rx_burst_sp;
}
/* Burst functions can now be called again. */
rte_wmb();
@@ -1104,22 +1095,12 @@ priv_set_link(struct priv *priv, int up)
 {
struct rte_eth_dev *dev = priv->dev;
int err;
-   unsigned int i;

if (up) {
err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
if (err)
return err;
-   for (i = 0; i < priv->rxqs_n; i++)
-   if ((*priv->rxqs)[i]->sp)
-   break;
-   /* Check if an sp queue exists.
-* Note: Some old frames might be received.
-*/
-   if (i == priv->rxqs_n)
-   dev->rx_pkt_burst = mlx5_rx_burst;
-   else
-   dev->rx_pkt_burst = mlx5_rx_burst_sp;
+   dev->rx_pkt_burst = mlx5_rx_burst;
dev->tx_pkt_burst = mlx5_tx_burst;
} else {
err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0bcf55b..38ff9fd 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -634,145 +634,6 @@ priv_rehash_flows(struct priv *priv)
 }

 /**
- * Allocate RX queue elements with scattered packets support.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- * @param[in] pool
- *   If not NULL, fetch buffers from this array instead of allocating them
- *   with rte_pktmbuf_alloc().
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
- struct rte_mbuf **pool)
-{
-   unsigned int i;
-   struct rxq_elt_sp (*elts)[elts_n] =
-   rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
- rxq->socket);
-   int ret = 0;
-
-   if (elts == NULL) {
-   ERROR("%p: can't allocate packets array", (void *)rxq);
-   ret = ENOMEM;
-   goto error;
-   }
-   /* For each WR (packet). */
-   for (i = 0; (i != elts_n); ++i) {
-   unsigned int j;
-   struct rxq_elt_sp *elt = &(*elts)[i];
-   struct ibv_sge (*sges)[RTE_DIM(elt->sges)] = >sges;
-
-   /* These two arrays must have the same size. */
-   assert(RTE_DIM(elt->sges) == RTE_DIM(elt->bufs));
-   /* For each SGE (segment). */
-   for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
-   struct ibv_sge *sge = &(*sges)[j];
-   struct rte_mbuf *buf;
-
-   if (pool != NULL) {
-   buf = *(pool++);
-   assert(buf != NULL);
-   rte_pktmbuf_reset(buf);
- 

[dpdk-dev] [PATCH 04/24] mlx5: remove configuration variable for maximum number of segments

2016-06-08 Thread Nelio Laranjeiro
There is no scatter/gather support anymore, CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
has no purpose and can be removed.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 config/common_base   | 1 -
 doc/guides/nics/mlx5.rst | 7 ---
 drivers/net/mlx5/Makefile| 4 
 drivers/net/mlx5/mlx5_defs.h | 5 -
 drivers/net/mlx5/mlx5_rxq.c  | 4 
 drivers/net/mlx5/mlx5_txq.c  | 4 
 6 files changed, 25 deletions(-)

diff --git a/config/common_base b/config/common_base
index 47c26f6..a4a3a3a 100644
--- a/config/common_base
+++ b/config/common_base
@@ -207,7 +207,6 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
 #
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
-CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index d9196d1..84c35a0 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -114,13 +114,6 @@ These options can be modified in the ``.config`` file.
   adds additional run-time checks and debugging messages at the cost of
   lower performance.

-- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**)
-
-  Number of scatter/gather elements (SGEs) per work request (WR). Lowering
-  this number improves performance but also limits the ability to receive
-  scattered packets (packets that do not fit a single mbuf). The default
-  value is a safe tradeoff.
-
 - ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)

   Amount of data to be inlined during TX operations. Improves latency.
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1dba3de..9a26269 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -84,10 +84,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif

-ifdef CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
-CFLAGS += -DMLX5_PMD_SGE_WR_N=$(CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE
 CFLAGS += -DMLX5_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE)
 endif
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 09207d9..da1c90e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -54,11 +54,6 @@
 /* RSS Indirection table size. */
 #define RSS_INDIRECTION_TABLE_SIZE 256

-/* Maximum number of Scatter/Gather Elements per Work Request. */
-#ifndef MLX5_PMD_SGE_WR_N
-#define MLX5_PMD_SGE_WR_N 4
-#endif
-
 /* Maximum size for inline data. */
 #ifndef MLX5_PMD_MAX_INLINE
 #define MLX5_PMD_MAX_INLINE 0
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 38ff9fd..4000624 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -976,10 +976,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
ERROR("%p: invalid number of RX descriptors", (void *)dev);
return EINVAL;
}
-   if (MLX5_PMD_SGE_WR_N > 1) {
-   ERROR("%p: RX scatter is not supported", (void *)dev);
-   return ENOTSUP;
-   }
/* Toggle RX checksum offload if hardware supports it. */
if (priv->hw_csum)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5a248c9..59974c5 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -264,10 +264,6 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
ERROR("%p: invalid number of TX descriptors", (void *)dev);
return EINVAL;
}
-   if (MLX5_PMD_SGE_WR_N > 1) {
-   ERROR("%p: TX gather is not supported", (void *)dev);
-   return EINVAL;
-   }
/* MRs will be registered in mp2mr[] later. */
attr.rd = (struct ibv_exp_res_domain_init_attr){
.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
-- 
2.1.4



[dpdk-dev] [PATCH 06/24] mlx5: split TX queue structure

2016-06-08 Thread Nelio Laranjeiro
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure txq_ctrl.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c|  21 +++--
 drivers/net/mlx5/mlx5_ethdev.c |  27 +++---
 drivers/net/mlx5/mlx5_mr.c |  39 
 drivers/net/mlx5/mlx5_rxtx.h   |   9 +-
 drivers/net/mlx5/mlx5_txq.c| 198 +
 5 files changed, 158 insertions(+), 136 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 67a541c..cc30463 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -98,7 +98,6 @@ static void
 mlx5_dev_close(struct rte_eth_dev *dev)
 {
struct priv *priv = mlx5_get_priv(dev);
-   void *tmp;
unsigned int i;

priv_lock(priv);
@@ -122,12 +121,13 @@ mlx5_dev_close(struct rte_eth_dev *dev)
/* XXX race condition if mlx5_rx_burst() is still running. */
usleep(1000);
for (i = 0; (i != priv->rxqs_n); ++i) {
-   tmp = (*priv->rxqs)[i];
-   if (tmp == NULL)
+   struct rxq *rxq = (*priv->rxqs)[i];
+
+   if (rxq == NULL)
continue;
(*priv->rxqs)[i] = NULL;
-   rxq_cleanup(tmp);
-   rte_free(tmp);
+   rxq_cleanup(rxq);
+   rte_free(rxq);
}
priv->rxqs_n = 0;
priv->rxqs = NULL;
@@ -136,12 +136,15 @@ mlx5_dev_close(struct rte_eth_dev *dev)
/* XXX race condition if mlx5_tx_burst() is still running. */
usleep(1000);
for (i = 0; (i != priv->txqs_n); ++i) {
-   tmp = (*priv->txqs)[i];
-   if (tmp == NULL)
+   struct txq *txq = (*priv->txqs)[i];
+   struct txq_ctrl *txq_ctrl;
+
+   if (txq == NULL)
continue;
+   txq_ctrl = container_of(txq, struct txq_ctrl, txq);
(*priv->txqs)[i] = NULL;
-   txq_cleanup(tmp);
-   rte_free(tmp);
+   txq_cleanup(txq_ctrl);
+   rte_free(txq_ctrl);
}
priv->txqs_n = 0;
priv->txqs = NULL;
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index bab826c..3710bba 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1233,28 +1233,31 @@ mlx5_secondary_data_setup(struct priv *priv)
/* TX queues. */
for (i = 0; i != nb_tx_queues; ++i) {
struct txq *primary_txq = (*sd->primary_priv->txqs)[i];
-   struct txq *txq;
+   struct txq_ctrl *primary_txq_ctrl;
+   struct txq_ctrl *txq_ctrl;

if (primary_txq == NULL)
continue;
-   txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0,
-   primary_txq->socket);
-   if (txq != NULL) {
+   primary_txq_ctrl = container_of(primary_txq,
+   struct txq_ctrl, txq);
+   txq_ctrl = rte_calloc_socket("TXQ", 1, sizeof(*txq_ctrl), 0,
+primary_txq_ctrl->socket);
+   if (txq_ctrl != NULL) {
if (txq_setup(priv->dev,
- txq,
+ primary_txq_ctrl,
  primary_txq->elts_n,
- primary_txq->socket,
+ primary_txq_ctrl->socket,
  NULL) == 0) {
-   txq->stats.idx = primary_txq->stats.idx;
-   tx_queues[i] = txq;
+   txq_ctrl->txq.stats.idx = 
primary_txq->stats.idx;
+   tx_queues[i] = _ctrl->txq;
continue;
}
-   rte_free(txq);
+   rte_free(txq_ctrl);
}
while (i) {
-   txq = tx_queues[--i];
-   txq_cleanup(txq);
-   rte_free(txq);
+   txq_ctrl = tx_queues[--i];
+   txq_cleanup(txq_ctrl);
+   rte_free(txq_ctrl);
}
goto error;
}
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 7c3e87f..79d5568 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -183,33 +183,36 @@ mlx5_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 

[dpdk-dev] [PATCH 05/24] mlx5: remove inline TX support

2016-06-08 Thread Nelio Laranjeiro
Inline TX will be fully managed by the PMD after Verbs is bypassed in the
data path. Remove the current code until then.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 config/common_base   |  1 -
 doc/guides/nics/mlx5.rst | 10 --
 drivers/net/mlx5/Makefile|  4 ---
 drivers/net/mlx5/mlx5_defs.h |  5 ---
 drivers/net/mlx5/mlx5_rxtx.c | 73 +++-
 drivers/net/mlx5/mlx5_rxtx.h |  9 --
 drivers/net/mlx5/mlx5_txq.c  | 16 --
 7 files changed, 25 insertions(+), 93 deletions(-)

diff --git a/config/common_base b/config/common_base
index a4a3a3a..2d6832f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -207,7 +207,6 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
 #
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
-CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8

 #
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 84c35a0..77fa957 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -114,16 +114,6 @@ These options can be modified in the ``.config`` file.
   adds additional run-time checks and debugging messages at the cost of
   lower performance.

-- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)
-
-  Amount of data to be inlined during TX operations. Improves latency.
-  Can improve PPS performance when PCI backpressure is detected and may be
-  useful for scenarios involving heavy traffic on many queues.
-
-  Since the additional software logic necessary to handle this mode can
-  lower performance when there is no backpressure, it is not enabled by
-  default.
-
 - ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)

   Maximum number of cached memory pools (MPs) per TX queue. Each MP from
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 9a26269..798859c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -84,10 +84,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif

-ifdef CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE
-CFLAGS += -DMLX5_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE
 CFLAGS += -DMLX5_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE)
 endif
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index da1c90e..9a19835 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -54,11 +54,6 @@
 /* RSS Indirection table size. */
 #define RSS_INDIRECTION_TABLE_SIZE 256

-/* Maximum size for inline data. */
-#ifndef MLX5_PMD_MAX_INLINE
-#define MLX5_PMD_MAX_INLINE 0
-#endif
-
 /*
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
  * from which buffers are to be transmitted will have to be mapped by this
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 07d95eb..4ba88ea 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -329,56 +329,33 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
rte_prefetch0((volatile void *)
  (uintptr_t)buf_next_addr);
}
-   /* Put packet into send queue. */
-#if MLX5_PMD_MAX_INLINE > 0
-   if (length <= txq->max_inline) {
-#ifdef HAVE_VERBS_VLAN_INSERTION
-   if (insert_vlan)
-   err = txq->send_pending_inline_vlan
-   (txq->qp,
-(void *)addr,
-length,
-send_flags,
->vlan_tci);
-   else
-#endif /* HAVE_VERBS_VLAN_INSERTION */
-   err = txq->send_pending_inline
-   (txq->qp,
-(void *)addr,
-length,
-send_flags);
-   } else
-#endif
-   {
-   /* Retrieve Memory Region key for this
-* memory pool. */
-   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-   if (unlikely(lkey == (uint32_t)-1)) {
-   /* MR does not exist. */
-   DEBUG("%p: unable to get MP <-> MR"
- " association", (void *)txq);
-   /* Clean up TX element. */
-   elt->buf = NULL;
-   goto stop;
-   }
+   /* Retrieve Memory Region key for this memory pool. */
+   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
+   if (unlikely(lkey == (uint32_t)-1)) {
+   /* MR does not exist. */
+   DEBUG("%p: unable to 

[dpdk-dev] [PATCH 07/24] mlx5: split RX queue structure

2016-06-08 Thread Nelio Laranjeiro
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure rxq_ctrl.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c  |   6 +-
 drivers/net/mlx5/mlx5_fdir.c |   8 +-
 drivers/net/mlx5/mlx5_rxq.c  | 250 ++-
 drivers/net/mlx5/mlx5_rxtx.c |   1 -
 drivers/net/mlx5/mlx5_rxtx.h |  13 ++-
 5 files changed, 148 insertions(+), 130 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cc30463..95279bd 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -122,12 +122,14 @@ mlx5_dev_close(struct rte_eth_dev *dev)
usleep(1000);
for (i = 0; (i != priv->rxqs_n); ++i) {
struct rxq *rxq = (*priv->rxqs)[i];
+   struct rxq_ctrl *rxq_ctrl;

if (rxq == NULL)
continue;
+   rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
(*priv->rxqs)[i] = NULL;
-   rxq_cleanup(rxq);
-   rte_free(rxq);
+   rxq_cleanup(rxq_ctrl);
+   rte_free(rxq_ctrl);
}
priv->rxqs_n = 0;
priv->rxqs = NULL;
diff --git a/drivers/net/mlx5/mlx5_fdir.c b/drivers/net/mlx5/mlx5_fdir.c
index 63e43ad..e3b97ba 100644
--- a/drivers/net/mlx5/mlx5_fdir.c
+++ b/drivers/net/mlx5/mlx5_fdir.c
@@ -424,7 +424,9 @@ create_flow:
 static struct fdir_queue *
 priv_get_fdir_queue(struct priv *priv, uint16_t idx)
 {
-   struct fdir_queue *fdir_queue = &(*priv->rxqs)[idx]->fdir_queue;
+   struct rxq_ctrl *rxq_ctrl =
+   container_of((*priv->rxqs)[idx], struct rxq_ctrl, rxq);
+   struct fdir_queue *fdir_queue = _ctrl->fdir_queue;
struct ibv_exp_rwq_ind_table *ind_table = NULL;
struct ibv_qp *qp = NULL;
struct ibv_exp_rwq_ind_table_init_attr ind_init_attr;
@@ -629,8 +631,10 @@ priv_fdir_disable(struct priv *priv)
/* Run on every RX queue to destroy related flow director QP and
 * indirection table. */
for (i = 0; (i != priv->rxqs_n); i++) {
-   fdir_queue = &(*priv->rxqs)[i]->fdir_queue;
+   struct rxq_ctrl *rxq_ctrl =
+   container_of((*priv->rxqs)[i], struct rxq_ctrl, rxq);

+   fdir_queue = _ctrl->fdir_queue;
if (fdir_queue->qp != NULL) {
claim_zero(ibv_destroy_qp(fdir_queue->qp));
fdir_queue->qp = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4000624..8d32e74 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -636,7 +636,7 @@ priv_rehash_flows(struct priv *priv)
 /**
  * Allocate RX queue elements.
  *
- * @param rxq
+ * @param rxq_ctrl
  *   Pointer to RX queue structure.
  * @param elts_n
  *   Number of elements to allocate.
@@ -648,16 +648,17 @@ priv_rehash_flows(struct priv *priv)
  *   0 on success, errno value on failure.
  */
 static int
-rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
+rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int elts_n,
+  struct rte_mbuf **pool)
 {
unsigned int i;
struct rxq_elt (*elts)[elts_n] =
rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
- rxq->socket);
+ rxq_ctrl->socket);
int ret = 0;

if (elts == NULL) {
-   ERROR("%p: can't allocate packets array", (void *)rxq);
+   ERROR("%p: can't allocate packets array", (void *)rxq_ctrl);
ret = ENOMEM;
goto error;
}
@@ -672,10 +673,10 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, 
struct rte_mbuf **pool)
assert(buf != NULL);
rte_pktmbuf_reset(buf);
} else
-   buf = rte_pktmbuf_alloc(rxq->mp);
+   buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
if (buf == NULL) {
assert(pool == NULL);
-   ERROR("%p: empty mbuf pool", (void *)rxq);
+   ERROR("%p: empty mbuf pool", (void *)rxq_ctrl);
ret = ENOMEM;
goto error;
}
@@ -691,15 +692,15 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, 
struct rte_mbuf **pool)
sge->addr = (uintptr_t)
((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
-   sge->lkey = rxq->mr->lkey;
+   sge->lkey = rxq_ctrl->mr->lkey;
/* Redundant check for tailroom. */
assert(sge->length == rte_pktmbuf_tailroom(buf));

[dpdk-dev] [PATCH 08/24] mlx5: update prerequisites for upcoming enhancements

2016-06-08 Thread Nelio Laranjeiro
The latest version of Mellanox OFED exposes hardware definitions necessary
to implement data path operation bypassing Verbs. Update the minimum
version requirement to MLNX_OFED >= 3.3 and clean up compatibility checks
for previous releases.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/mlx5.rst   | 44 +++---
 drivers/net/mlx5/Makefile  | 39 -
 drivers/net/mlx5/mlx5.c| 23 --
 drivers/net/mlx5/mlx5.h|  5 +
 drivers/net/mlx5/mlx5_defs.h   |  9 -
 drivers/net/mlx5/mlx5_fdir.c   | 10 --
 drivers/net/mlx5/mlx5_rxmode.c |  8 
 drivers/net/mlx5/mlx5_rxq.c| 30 
 drivers/net/mlx5/mlx5_rxtx.c   |  4 
 drivers/net/mlx5/mlx5_rxtx.h   |  8 
 drivers/net/mlx5/mlx5_txq.c|  2 --
 drivers/net/mlx5/mlx5_vlan.c   |  3 ---
 12 files changed, 16 insertions(+), 169 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 77fa957..3a07928 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -125,16 +125,6 @@ These options can be modified in the ``.config`` file.
 Environment variables
 ~

-- ``MLX5_ENABLE_CQE_COMPRESSION``
-
-  A nonzero value lets ConnectX-4 return smaller completion entries to
-  improve performance when PCI backpressure is detected. It is most useful
-  for scenarios involving heavy traffic on many queues.
-
-  Since the additional software logic necessary to handle this mode can
-  lower performance when there is no backpressure, it is not enabled by
-  default.
-
 - ``MLX5_PMD_ENABLE_PADDING``

   Enables HW packet padding in PCI bus transactions.
@@ -211,40 +201,12 @@ DPDK and must be installed separately:

 Currently supported by DPDK:

-- Mellanox OFED **3.1-1.0.3**, **3.1-1.5.7.1** or **3.2-2.0.0.0** depending
-  on usage.
-
-The following features are supported with version **3.1-1.5.7.1** and
-above only:
-
-- IPv6, UPDv6, TCPv6 RSS.
-- RX checksum offloads.
-- IBM POWER8.
-
-The following features are supported with version **3.2-2.0.0.0** and
-above only:
-
-- Flow director.
-- RX VLAN stripping.
-- TX VLAN insertion.
-- RX CRC stripping configuration.
+- Mellanox OFED **3.3-1.0.0.0**.

 - Minimum firmware version:

-  With MLNX_OFED **3.1-1.0.3**:
-
-  - ConnectX-4: **12.12.1240**
-  - ConnectX-4 Lx: **14.12.1100**
-
-  With MLNX_OFED **3.1-1.5.7.1**:
-
-  - ConnectX-4: **12.13.0144**
-  - ConnectX-4 Lx: **14.13.0144**
-
-  With MLNX_OFED **3.2-2.0.0.0**:
-
-  - ConnectX-4: **12.14.2036**
-  - ConnectX-4 Lx: **14.14.2036**
+  - ConnectX-4: **12.16.1006**
+  - ConnectX-4 Lx: **14.16.1006**

 Getting Mellanox OFED
 ~
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 798859c..a63d6b3 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -102,42 +102,19 @@ endif
 mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q sh -- '$<' '$@' \
-   HAVE_EXP_QUERY_DEVICE \
-   infiniband/verbs.h \
-   type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_FLOW_SPEC_IPV6 \
-   infiniband/verbs.h \
-   type 'struct ibv_exp_flow_spec_ipv6' $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
-   infiniband/verbs.h \
-   enum IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
-   infiniband/verbs.h \
-   enum IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_EXP_CQ_RX_TCP_PACKET \
+   HAVE_VERBS_VLAN_INSERTION \
infiniband/verbs.h \
-   enum IBV_EXP_CQ_RX_TCP_PACKET \
+   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_FCS \
-   infiniband/verbs.h \
-   enum IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS \
+   HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \
+   infiniband/verbs_exp.h \
+   enum IBV_EXP_CQ_COMPRESSED_CQE \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_RX_END_PADDING \
-   infiniband/verbs.h \
-   enum IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_VERBS_VLAN_INSERTION \
-   infiniband/verbs.h \
-   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
+   HAVE_VERBS_MLX5_ETH_VLAN_INLINE_HEADER_SIZE \
+   

[dpdk-dev] [PATCH 10/24] mlx5: add support for configuration through kvargs

2016-06-08 Thread Nelio Laranjeiro
The intent is to replace the remaining compile-time options and environment
variables with a common mean of runtime configuration. This commit only
adds the kvargs handling code, subsequent commits will update the rest.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c | 72 +
 1 file changed, 72 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e9cc38a..62e6e16 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 /* Verbs header. */
@@ -57,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -237,6 +239,70 @@ mlx5_dev_idx(struct rte_pci_addr *pci_addr)
return ret;
 }

+/**
+ * Verify and store value for device argument.
+ *
+ * @param[in] key
+ *   Key argument to verify.
+ * @param[in] val
+ *   Value associated with key.
+ * @param opaque
+ *   User data.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+static int
+mlx5_args_check(const char *key, const char *val, void *opaque)
+{
+   struct priv *priv = opaque;
+
+   /* No parameters are expected at the moment. */
+   (void)priv;
+   (void)val;
+   WARN("%s: unknown parameter", key);
+   return EINVAL;
+}
+
+/**
+ * Parse device parameters.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param devargs
+ *   Device arguments structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+static int
+mlx5_args(struct priv *priv, struct rte_devargs *devargs)
+{
+   static const char *params[] = {
+   NULL,
+   };
+   struct rte_kvargs *kvlist;
+   int ret = 0;
+   int i;
+
+   if (devargs == NULL)
+   return 0;
+   kvlist = rte_kvargs_parse(devargs->args, params);
+   if (kvlist == NULL)
+   return 0;
+   /* Process parameters. */
+   for (i = 0; (i != RTE_DIM(params)); ++i) {
+   if (rte_kvargs_count(kvlist, params[i])) {
+   ret = rte_kvargs_process(kvlist, params[i],
+mlx5_args_check, priv);
+   if (ret != 0)
+   return ret;
+   }
+   }
+   rte_kvargs_free(kvlist);
+   return 0;
+}
+
 static struct eth_driver mlx5_driver;

 /**
@@ -408,6 +474,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   err = mlx5_args(priv, pci_dev->devargs);
+   if (err) {
+   ERROR("failed to process device arguments: %s",
+ strerror(err));
+   goto port_error;
+   }
if (ibv_exp_query_device(ctx, _device_attr)) {
ERROR("ibv_exp_query_device() failed");
goto port_error;
-- 
2.1.4



[dpdk-dev] [PATCH 09/24] mlx5: add definitions for data path without Verbs

2016-06-08 Thread Nelio Laranjeiro
These structures and macros extend those exposed by libmlx5 (in mlx5_hw.h)
to let the PMD manage work queue and completion queue elements directly.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_prm.h | 155 
 1 file changed, 155 insertions(+)
 create mode 100644 drivers/net/mlx5/mlx5_prm.h

diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
new file mode 100644
index 000..c4fb1c2
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -0,0 +1,155 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_PRM_H_
+#define RTE_PMD_MLX5_PRM_H_
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* Get CQE owner bit. */
+#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
+
+/* Get CQE format. */
+#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
+
+/* Get CQE opcode. */
+#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
+
+/* Get CQE solicited event. */
+#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
+
+/* Invalidate a CQE. */
+#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
+
+/* CQE value to inform that VLAN is stripped. */
+#define MLX5_CQE_VLAN_STRIPPED 0x1
+
+/* Maximum number of packets a multi-packet WQE can handle. */
+#define MLX5_MPW_DSEG_MAX 5
+
+/* Room for inline data in regular work queue element. */
+#define MLX5_WQE64_INL_DATA 12
+
+/* Room for inline data in multi-packet WQE. */
+#define MLX5_MWQE64_INL_DATA 28
+
+/* Subset of struct mlx5_wqe_eth_seg. */
+struct mlx5_wqe_eth_seg_small {
+   uint32_t rsvd0;
+   uint8_t cs_flags;
+   uint8_t rsvd1;
+   uint16_t mss;
+   uint32_t rsvd2;
+   uint16_t inline_hdr_sz;
+};
+
+/* Regular WQE. */
+struct mlx5_wqe_regular {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg eseg;
+   struct mlx5_wqe_data_seg dseg;
+} __rte_aligned(64);
+
+/* Inline WQE. */
+struct mlx5_wqe_inl {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg eseg;
+   uint32_t byte_cnt;
+   uint8_t data[MLX5_WQE64_INL_DATA];
+} __rte_aligned(64);
+
+/* Multi-packet WQE. */
+struct mlx5_wqe_mpw {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg_small eseg;
+   struct mlx5_wqe_data_seg dseg[2];
+} __rte_aligned(64);
+
+/* Multi-packet WQE with inline. */
+struct mlx5_wqe_mpw_inl {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg_small eseg;
+   uint32_t byte_cnt;
+   uint8_t data[MLX5_MWQE64_INL_DATA];
+} __rte_aligned(64);
+
+/* Union of all WQE types. */
+union mlx5_wqe {
+   struct mlx5_wqe_regular wqe;
+   struct mlx5_wqe_inl inl;
+   struct mlx5_wqe_mpw mpw;
+   struct mlx5_wqe_mpw_inl mpw_inl;
+   uint8_t data[64];
+};
+
+/* MPW session status. */
+enum mlx5_mpw_state {

[dpdk-dev] [PATCH 11/24] mlx5: add TX/RX burst function selection wrapper

2016-06-08 Thread Nelio Laranjeiro
These wrappers are meant to prevent code duplication later.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.h|  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 34 --
 drivers/net/mlx5/mlx5_txq.c|  2 +-
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4170e3b..382aac5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -197,6 +197,8 @@ void priv_dev_interrupt_handler_install(struct priv *, 
struct rte_eth_dev *);
 int mlx5_set_link_down(struct rte_eth_dev *dev);
 int mlx5_set_link_up(struct rte_eth_dev *dev);
 struct priv *mlx5_secondary_data_setup(struct priv *priv);
+void priv_select_tx_function(struct priv *);
+void priv_select_rx_function(struct priv *);

 /* mlx5_mac.c */

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3710bba..c612b31 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1100,8 +1100,8 @@ priv_set_link(struct priv *priv, int up)
err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
if (err)
return err;
-   dev->rx_pkt_burst = mlx5_rx_burst;
-   dev->tx_pkt_burst = mlx5_tx_burst;
+   priv_select_tx_function(priv);
+   priv_select_rx_function(priv);
} else {
err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
if (err)
@@ -1290,13 +1290,11 @@ mlx5_secondary_data_setup(struct priv *priv)
rte_mb();
priv->dev->data = >data;
rte_mb();
-   priv->dev->tx_pkt_burst = mlx5_tx_burst;
-   priv->dev->rx_pkt_burst = removed_rx_burst;
+   priv_select_tx_function(priv);
+   priv_select_rx_function(priv);
priv_unlock(priv);
 end:
/* More sanity checks. */
-   assert(priv->dev->tx_pkt_burst == mlx5_tx_burst);
-   assert(priv->dev->rx_pkt_burst == removed_rx_burst);
assert(priv->dev->data == >data);
rte_spinlock_unlock(>lock);
return priv;
@@ -1307,3 +1305,27 @@ error:
rte_spinlock_unlock(>lock);
return NULL;
 }
+
+/**
+ * Configure the TX function to use.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_select_tx_function(struct priv *priv)
+{
+   priv->dev->tx_pkt_burst = mlx5_tx_burst;
+}
+
+/**
+ * Configure the RX function to use.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_select_rx_function(struct priv *priv)
+{
+   priv->dev->rx_pkt_burst = mlx5_rx_burst;
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9f3a33b..d7cc39d 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -477,7 +477,7 @@ mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, 
uint16_t desc,
  (void *)dev, (void *)txq_ctrl);
(*priv->txqs)[idx] = _ctrl->txq;
/* Update send callback. */
-   dev->tx_pkt_burst = mlx5_tx_burst;
+   priv_select_tx_function(priv);
}
priv_unlock(priv);
return -ret;
-- 
2.1.4



[dpdk-dev] [PATCH 12/24] mlx5: refactor RX data path

2016-06-08 Thread Nelio Laranjeiro
Bypass Verbs to improve RX performance.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c |   4 +-
 drivers/net/mlx5/mlx5_fdir.c   |   2 +-
 drivers/net/mlx5/mlx5_rxq.c| 291 +++--
 drivers/net/mlx5/mlx5_rxtx.c   | 288 +---
 drivers/net/mlx5/mlx5_rxtx.h   |  37 +++---
 drivers/net/mlx5/mlx5_vlan.c   |   3 +-
 6 files changed, 310 insertions(+), 315 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index c612b31..4cfcbd5 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1263,7 +1263,9 @@ mlx5_secondary_data_setup(struct priv *priv)
}
/* RX queues. */
for (i = 0; i != nb_rx_queues; ++i) {
-   struct rxq *primary_rxq = (*sd->primary_priv->rxqs)[i];
+   struct rxq_ctrl *primary_rxq =
+   container_of((*sd->primary_priv->rxqs)[i],
+struct rxq_ctrl, rxq);

if (primary_rxq == NULL)
continue;
diff --git a/drivers/net/mlx5/mlx5_fdir.c b/drivers/net/mlx5/mlx5_fdir.c
index 1850218..73eb00e 100644
--- a/drivers/net/mlx5/mlx5_fdir.c
+++ b/drivers/net/mlx5/mlx5_fdir.c
@@ -431,7 +431,7 @@ priv_get_fdir_queue(struct priv *priv, uint16_t idx)
ind_init_attr = (struct ibv_exp_rwq_ind_table_init_attr){
.pd = priv->pd,
.log_ind_tbl_size = 0,
-   .ind_tbl = &((*priv->rxqs)[idx]->wq),
+   .ind_tbl = _ctrl->wq,
.comp_mask = 0,
};

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 7db4ce7..ac2b69f 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -43,6 +43,8 @@
 #pragma GCC diagnostic ignored "-pedantic"
 #endif
 #include 
+#include 
+#include 
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -373,8 +375,13 @@ priv_create_hash_rxqs(struct priv *priv)
DEBUG("indirection table extended to assume %u WQs",
  priv->reta_idx_n);
}
-   for (i = 0; (i != priv->reta_idx_n); ++i)
-   wqs[i] = (*priv->rxqs)[(*priv->reta_idx)[i]]->wq;
+   for (i = 0; (i != priv->reta_idx_n); ++i) {
+   struct rxq_ctrl *rxq_ctrl;
+
+   rxq_ctrl = container_of((*priv->rxqs)[(*priv->reta_idx)[i]],
+   struct rxq_ctrl, rxq);
+   wqs[i] = rxq_ctrl->wq;
+   }
/* Get number of hash RX queues to configure. */
for (i = 0, hash_rxqs_n = 0; (i != ind_tables_n); ++i)
hash_rxqs_n += ind_table_init[i].hash_types_n;
@@ -638,21 +645,13 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int 
elts_n,
   struct rte_mbuf **pool)
 {
unsigned int i;
-   struct rxq_elt (*elts)[elts_n] =
-   rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
- rxq_ctrl->socket);
int ret = 0;

-   if (elts == NULL) {
-   ERROR("%p: can't allocate packets array", (void *)rxq_ctrl);
-   ret = ENOMEM;
-   goto error;
-   }
/* For each WR (packet). */
for (i = 0; (i != elts_n); ++i) {
-   struct rxq_elt *elt = &(*elts)[i];
-   struct ibv_sge *sge = &(*elts)[i].sge;
struct rte_mbuf *buf;
+   volatile struct mlx5_wqe_data_seg *scat =
+   &(*rxq_ctrl->rxq.wqes)[i];

if (pool != NULL) {
buf = *(pool++);
@@ -666,40 +665,36 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int 
elts_n,
ret = ENOMEM;
goto error;
}
-   elt->buf = buf;
/* Headroom is reserved by rte_pktmbuf_alloc(). */
assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
/* Buffer is supposed to be empty. */
assert(rte_pktmbuf_data_len(buf) == 0);
assert(rte_pktmbuf_pkt_len(buf) == 0);
-   /* sge->addr must be able to store a pointer. */
-   assert(sizeof(sge->addr) >= sizeof(uintptr_t));
-   /* SGE keeps its headroom. */
-   sge->addr = (uintptr_t)
-   ((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
-   sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
-   sge->lkey = rxq_ctrl->mr->lkey;
-   /* Redundant check for tailroom. */
-   assert(sge->length == rte_pktmbuf_tailroom(buf));
+   assert(!buf->next);
+   PORT(buf) = rxq_ctrl->rxq.port_id;
+   DATA_LEN(buf) = rte_pktmbuf_tailroom(buf);
+   PKT_LEN(buf) = DATA_LEN(buf);
+   NB_SEGS(buf) = 1;
+   /* 

[dpdk-dev] [PATCH 13/24] mlx5: refactor TX data path

2016-06-08 Thread Nelio Laranjeiro
Bypass Verbs to improve TX performance.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/Makefile  |   5 -
 drivers/net/mlx5/mlx5_ethdev.c |  10 +-
 drivers/net/mlx5/mlx5_mr.c |   4 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 359 ++---
 drivers/net/mlx5/mlx5_rxtx.h   |  53 +++---
 drivers/net/mlx5/mlx5_txq.c| 210 
 6 files changed, 334 insertions(+), 307 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a63d6b3..9b4455b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -102,11 +102,6 @@ endif
 mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_VLAN_INSERTION \
-   infiniband/verbs.h \
-   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \
infiniband/verbs_exp.h \
enum IBV_EXP_CQ_COMPRESSED_CQE \
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 4cfcbd5..aaa6c16 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1243,11 +1243,11 @@ mlx5_secondary_data_setup(struct priv *priv)
txq_ctrl = rte_calloc_socket("TXQ", 1, sizeof(*txq_ctrl), 0,
 primary_txq_ctrl->socket);
if (txq_ctrl != NULL) {
-   if (txq_setup(priv->dev,
- primary_txq_ctrl,
- primary_txq->elts_n,
- primary_txq_ctrl->socket,
- NULL) == 0) {
+   if (txq_ctrl_setup(priv->dev,
+  primary_txq_ctrl,
+  primary_txq->elts_n,
+  primary_txq_ctrl->socket,
+  NULL) == 0) {
txq_ctrl->txq.stats.idx = 
primary_txq->stats.idx;
tx_queues[i] = _ctrl->txq;
continue;
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 79d5568..e5e8a04 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -189,7 +189,7 @@ txq_mp2mr_reg(struct txq *txq, struct rte_mempool *mp, 
unsigned int idx)
/* Add a new entry, register MR first. */
DEBUG("%p: discovered new memory pool \"%s\" (%p)",
  (void *)txq_ctrl, mp->name, (void *)mp);
-   mr = mlx5_mp2mr(txq_ctrl->txq.priv->pd, mp);
+   mr = mlx5_mp2mr(txq_ctrl->priv->pd, mp);
if (unlikely(mr == NULL)) {
DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
  (void *)txq_ctrl);
@@ -208,7 +208,7 @@ txq_mp2mr_reg(struct txq *txq, struct rte_mempool *mp, 
unsigned int idx)
/* Store the new entry. */
txq_ctrl->txq.mp2mr[idx].mp = mp;
txq_ctrl->txq.mp2mr[idx].mr = mr;
-   txq_ctrl->txq.mp2mr[idx].lkey = mr->lkey;
+   txq_ctrl->txq.mp2mr[idx].lkey = htonl(mr->lkey);
DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
  (void *)txq_ctrl, mp->name, (void *)mp,
  txq_ctrl->txq.mp2mr[idx].lkey);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 7d74074..cee6067 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -119,68 +119,52 @@ get_cqe64(volatile struct mlx5_cqe64 cqes[],
  *
  * @param txq
  *   Pointer to TX queue structure.
- *
- * @return
- *   0 on success, -1 on failure.
  */
-static int
+static void
 txq_complete(struct txq *txq)
 {
-   unsigned int elts_comp = txq->elts_comp;
-   unsigned int elts_tail = txq->elts_tail;
-   unsigned int elts_free = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
-   int wcs_n;
-
-   if (unlikely(elts_comp == 0))
-   return 0;
-#ifdef DEBUG_SEND
-   DEBUG("%p: processing %u work requests completions",
- (void *)txq, elts_comp);
-#endif
-   wcs_n = txq->poll_cnt(txq->cq, elts_comp);
-   if (unlikely(wcs_n == 0))
-   return 0;
-   if (unlikely(wcs_n < 0)) {
-   DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
- (void *)txq, wcs_n);
-   return -1;
+   const unsigned int cqe_n = txq->cqe_n;
+   uint16_t elts_free = txq->elts_tail;
+   uint16_t elts_tail;
+   uint16_t cq_ci = txq->cq_ci;
+   unsigned int wqe_ci = (unsigned int)-1;
+   int ret = 0;
+
+   while (ret == 0) {
+   volatile struct mlx5_cqe64 *cqe;
+
+   cqe = get_cqe64(*txq->cqes, cqe_n, _ci);
+   

[dpdk-dev] [PATCH 14/24] mlx5: handle RX CQE compression

2016-06-08 Thread Nelio Laranjeiro
Mini (compressed) CQEs are returned by the NIC when PCI back pressure is
detected, in which case the first CQE64 contains common packet information
followed by a number of CQE8 providing the rest, followed by a matching
number of empty CQE64 entries to be used by software for decompression.

Before decompression:

  0   1  2   6 7 8
  +---+  +-+ +---+   +---+ +---+ +---+
  | CQE64 |  |  CQE64  | | CQE64 |   | CQE64 | | CQE64 | | CQE64 |
  |---|  |-| |---|   |---| |---| |---|
  | . |  | cqe8[0] | |   | . |   | |   | | . |
  | . |  | cqe8[1] | |   | . |   | |   | | . |
  | . |  | ... | |   | . |   | |   | | . |
  | . |  | cqe8[7] | |   |   |   | |   | | . |
  +---+  +-+ +---+   +---+ +---+ +---+

After decompression:

  0  1 ... 8
  +---+  +---+ +---+
  | CQE64 |  | CQE64 | | CQE64 |
  |---|  |---| |---|
  | . |  | . |  .  | . |
  | . |  | . |  .  | . |
  | . |  | . |  .  | . |
  | . |  | . | | . |
  +---+  +---+ +---+

This patch does not perform the entire decompression step as it would be
really expensive, instead the first CQE64 is consumed and an internal
context is maintained to interpret the following CQE8 entries directly.

Intermediate empty CQE64 entries are handed back to HW without further
processing.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst |   6 +
 drivers/net/mlx5/mlx5.c  |  25 -
 drivers/net/mlx5/mlx5.h  |   1 +
 drivers/net/mlx5/mlx5_rxq.c  |   9 +-
 drivers/net/mlx5/mlx5_rxtx.c | 259 ---
 drivers/net/mlx5/mlx5_rxtx.h |  11 ++
 drivers/net/mlx5/mlx5_txq.c  |   5 +
 7 files changed, 247 insertions(+), 69 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 3a07928..756153b 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -148,6 +148,12 @@ Run-time configuration

 - **ethtool** operations on related kernel interfaces also affect the PMD.

+- ``rxq_cqe_comp_en`` parameter [int]
+
+  A nonzero value enables the compression of CQE on RX side. This feature
+  allows to save PCI bandwidth and improve performance at the cost of a
+  slightly higher CPU usage.  Enabled by default.
+
 Prerequisites
 -

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 62e6e16..9bb08b6 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -69,6 +69,9 @@
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"

+/* Device parameter to enable RX completion queue compression. */
+#define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -256,12 +259,21 @@ static int
 mlx5_args_check(const char *key, const char *val, void *opaque)
 {
struct priv *priv = opaque;
+   unsigned long tmp;

-   /* No parameters are expected at the moment. */
-   (void)priv;
-   (void)val;
-   WARN("%s: unknown parameter", key);
-   return EINVAL;
+   errno = 0;
+   tmp = strtoul(val, NULL, 0);
+   if (errno) {
+   WARN("%s: \"%s\" is not a valid integer", key, val);
+   return errno;
+   }
+   if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0)
+   priv->cqe_comp = !!tmp;
+   else {
+   WARN("%s: unknown parameter", key);
+   return EINVAL;
+   }
+   return 0;
 }

 /**
@@ -279,7 +291,7 @@ static int
 mlx5_args(struct priv *priv, struct rte_devargs *devargs)
 {
static const char *params[] = {
-   NULL,
+   MLX5_RXQ_CQE_COMP_EN,
};
struct rte_kvargs *kvlist;
int ret = 0;
@@ -474,6 +486,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   priv->cqe_comp = 1; /* Enable compression by default. */
err = mlx5_args(priv, pci_dev->devargs);
if (err) {
ERROR("failed to process device arguments: %s",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 382aac5..3344360 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -112,6 +112,7 @@ struct priv {
unsigned int hw_padding:1; /* End alignment padding is supported. */
unsigned int sriov:1; /* This is a VF or PF with VF devices. */
unsigned int mps:1; /* Whether multi-packet send is supported. */
+   unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
/* RX/TX 

[dpdk-dev] [PATCH 15/24] mlx5: replace countdown with threshold for TX completions

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

Replacing the variable countdown (which depends on the number of
descriptors) with a fixed relative threshold known at compile time improves
performance by reducing the TX queue structure footprint and the amount of
code to manage completions during a burst.

Completions are now requested at most once per burst after threshold is
reached.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_defs.h |  7 +--
 drivers/net/mlx5/mlx5_rxtx.c | 42 --
 drivers/net/mlx5/mlx5_rxtx.h |  5 ++---
 drivers/net/mlx5/mlx5_txq.c  | 19 ---
 4 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 8d2ec7a..cc2a6f3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -48,8 +48,11 @@
 /* Maximum number of special flows. */
 #define MLX5_MAX_SPECIAL_FLOWS 4

-/* Request send completion once in every 64 sends, might be less. */
-#define MLX5_PMD_TX_PER_COMP_REQ 64
+/*
+ * Request TX completion every time descriptors reach this threshold since
+ * the previous request. Must be a power of two for performance reasons.
+ */
+#define MLX5_TX_COMP_THRESH 32

 /* RSS Indirection table size. */
 #define RSS_INDIRECTION_TABLE_SIZE 256
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 05b9c88..1495a53 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -154,9 +154,6 @@ check_cqe64(volatile struct mlx5_cqe64 *cqe,
  * Manage TX completions.
  *
  * When sending a burst, mlx5_tx_burst() posts several WRs.
- * To improve performance, a completion event is only required once every
- * MLX5_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
- * for other WRs, but this information would not be used anyway.
  *
  * @param txq
  *   Pointer to TX queue structure.
@@ -170,14 +167,16 @@ txq_complete(struct txq *txq)
uint16_t elts_free = txq->elts_tail;
uint16_t elts_tail;
uint16_t cq_ci = txq->cq_ci;
-   unsigned int wqe_ci = (unsigned int)-1;
+   volatile struct mlx5_cqe64 *cqe = NULL;
+   volatile union mlx5_wqe *wqe;

do {
-   unsigned int idx = cq_ci & cqe_cnt;
-   volatile struct mlx5_cqe64 *cqe = &(*txq->cqes)[idx];
+   volatile struct mlx5_cqe64 *tmp;

-   if (check_cqe64(cqe, cqe_n, cq_ci) == 1)
+   tmp = &(*txq->cqes)[cq_ci & cqe_cnt];
+   if (check_cqe64(tmp, cqe_n, cq_ci))
break;
+   cqe = tmp;
 #ifndef NDEBUG
if (MLX5_CQE_FORMAT(cqe->op_own) == MLX5_COMPRESSED) {
if (!check_cqe64_seen(cqe))
@@ -191,14 +190,15 @@ txq_complete(struct txq *txq)
return;
}
 #endif /* NDEBUG */
-   wqe_ci = ntohs(cqe->wqe_counter);
++cq_ci;
} while (1);
-   if (unlikely(wqe_ci == (unsigned int)-1))
+   if (unlikely(cqe == NULL))
return;
+   wqe = &(*txq->wqes)[htons(cqe->wqe_counter) & (txq->wqe_n - 1)];
+   elts_tail = wqe->wqe.ctrl.data[3];
+   assert(elts_tail < txq->wqe_n);
/* Free buffers. */
-   elts_tail = (wqe_ci + 1) & (elts_n - 1);
-   do {
+   while (elts_free != elts_tail) {
struct rte_mbuf *elt = (*txq->elts)[elts_free];
unsigned int elts_free_next =
(elts_free + 1) & (elts_n - 1);
@@ -214,7 +214,7 @@ txq_complete(struct txq *txq)
/* Only one segment needs to be freed. */
rte_pktmbuf_free_seg(elt);
elts_free = elts_free_next;
-   } while (elts_free != elts_tail);
+   }
txq->cq_ci = cq_ci;
txq->elts_tail = elts_tail;
/* Update the consumer index. */
@@ -435,6 +435,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
const unsigned int elts_n = txq->elts_n;
unsigned int i;
unsigned int max;
+   unsigned int comp;
volatile union mlx5_wqe *wqe;
struct rte_mbuf *buf;

@@ -484,12 +485,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
buf->vlan_tci);
else
mlx5_wqe_write(txq, wqe, addr, length, lkey);
-   /* Request completion if needed. */
-   if (unlikely(--txq->elts_comp == 0)) {
-   wqe->wqe.ctrl.data[2] = htonl(8);
-   txq->elts_comp = txq->elts_comp_cd_init;
-   } else
-   wqe->wqe.ctrl.data[2] = 0;
+   wqe->wqe.ctrl.data[2] = 0;
/* Should we enable HW CKSUM offload */
if (buf->ol_flags &
(PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | 

[dpdk-dev] [PATCH 16/24] mlx5: add support for inline send

2016-06-08 Thread Nelio Laranjeiro
From: Yaacov Hazan 

Implement send inline feature which copies packet data directly into WQEs
for improved latency. The maximum packet size and the minimum number of TX
queues to qualify for inline send are user-configurable.

This feature is effective when HW causes a performance bottleneck.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
---
 doc/guides/nics/mlx5.rst   |  17 +++
 drivers/net/mlx5/mlx5.c|  13 ++
 drivers/net/mlx5/mlx5.h|   2 +
 drivers/net/mlx5/mlx5_ethdev.c |   5 +
 drivers/net/mlx5/mlx5_rxtx.c   | 271 +
 drivers/net/mlx5/mlx5_rxtx.h   |   2 +
 drivers/net/mlx5/mlx5_txq.c|   4 +
 7 files changed, 314 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 756153b..9ada221 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -154,6 +154,23 @@ Run-time configuration
   allows to save PCI bandwidth and improve performance at the cost of a
   slightly higher CPU usage.  Enabled by default.

+- ``txq_inline`` parameter [int]
+
+  Amount of data to be inlined during TX operations. Improves latency.
+  Can improve PPS performance when PCI back pressure is detected and may be
+  useful for scenarios involving heavy traffic on many queues.
+
+  It is not enabled by default (set to 0) since the additional software
+  logic necessary to handle this mode can lower performance when back
+  pressure is not expected.
+
+- ``txqs_min_inline`` parameter [int]
+
+  Enable inline send only when the number of TX queues is greater or equal
+  to this value.
+
+  This option should be used in combination with ``txq_inline`` above.
+
 Prerequisites
 -

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9bb08b6..4213286 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -72,6 +72,13 @@
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"

+/* Device parameter to configure inline send. */
+#define MLX5_TXQ_INLINE "txq_inline"
+
+/* Device parameter to configure the number of TX queues threshold for
+ * enabling inline send. */
+#define MLX5_TXQS_MIN_INLINE "txqs_min_inline"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -269,6 +276,10 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
}
if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0)
priv->cqe_comp = !!tmp;
+   else if (strcmp(MLX5_TXQ_INLINE, key) == 0)
+   priv->txq_inline = tmp;
+   else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0)
+   priv->txqs_inline = tmp;
else {
WARN("%s: unknown parameter", key);
return EINVAL;
@@ -292,6 +303,8 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs)
 {
static const char *params[] = {
MLX5_RXQ_CQE_COMP_EN,
+   MLX5_TXQ_INLINE,
+   MLX5_TXQS_MIN_INLINE,
};
struct rte_kvargs *kvlist;
int ret = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3344360..c99ef7e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -114,6 +114,8 @@ struct priv {
unsigned int mps:1; /* Whether multi-packet send is supported. */
unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
+   unsigned int txq_inline; /* Maximum packet size for inlining. */
+   unsigned int txqs_inline; /* Queue number threshold for inlining. */
/* RX/TX queues. */
unsigned int rxqs_n; /* RX queues array size. */
unsigned int txqs_n; /* TX queues array size. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index aaa6c16..9dfb3ca 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1318,6 +1318,11 @@ void
 priv_select_tx_function(struct priv *priv)
 {
priv->dev->tx_pkt_burst = mlx5_tx_burst;
+   if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) {
+   priv->dev->tx_pkt_burst = mlx5_tx_burst_inline;
+   DEBUG("selected inline TX function (%u >= %u queues)",
+ priv->txqs_n, priv->txqs_inline);
+   }
 }

 /**
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 1495a53..1ccb69d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -374,6 +374,139 @@ mlx5_wqe_write_vlan(struct txq *txq, volatile union 
mlx5_wqe *wqe,
 }

 /**
+ * Write a inline WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param wqe
+ *   Pointer to the WQE to fill.
+ * @param addr
+ *   Buffer data address.
+ * @param length
+ *   Packet length.
+ * @param lkey
+ *   Memory region lkey.
+ */
+static inline void
+mlx5_wqe_write_inline(struct txq 

[dpdk-dev] [PATCH 17/24] mlx5: add support for multi-packet send

2016-06-08 Thread Nelio Laranjeiro
This feature enables the TX burst function to emit up to 5 packets using
only two WQEs on devices that support it. Saves PCI bandwidth and improves
performance.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst   |  10 ++
 drivers/net/mlx5/mlx5.c|  14 +-
 drivers/net/mlx5/mlx5_ethdev.c |  15 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 400 +
 drivers/net/mlx5/mlx5_rxtx.h   |   2 +
 drivers/net/mlx5/mlx5_txq.c|   2 +-
 6 files changed, 439 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9ada221..063c4a5 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -171,6 +171,16 @@ Run-time configuration

   This option should be used in combination with ``txq_inline`` above.

+- ``txq_mpw_en`` parameter [int]
+
+  A nonzero value enables multi-packet send. This feature allows the TX
+  burst function to pack up to five packets in two descriptors in order to
+  save PCI bandwidth and improve performance at the cost of a slightly
+  higher CPU usage.
+
+  It is currently only supported on the ConnectX-4 Lx family of adapters.
+  Enabled by default.
+
 Prerequisites
 -

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4213286..411486d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -79,6 +79,9 @@
  * enabling inline send. */
 #define MLX5_TXQS_MIN_INLINE "txqs_min_inline"

+/* Device parameter to enable multi-packet send WQEs. */
+#define MLX5_TXQ_MPW_EN "txq_mpw_en"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -280,6 +283,8 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
priv->txq_inline = tmp;
else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0)
priv->txqs_inline = tmp;
+   else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0)
+   priv->mps = !!tmp;
else {
WARN("%s: unknown parameter", key);
return EINVAL;
@@ -305,6 +310,7 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs)
MLX5_RXQ_CQE_COMP_EN,
MLX5_TXQ_INLINE,
MLX5_TXQS_MIN_INLINE,
+   MLX5_TXQ_MPW_EN,
};
struct rte_kvargs *kvlist;
int ret = 0;
@@ -499,6 +505,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   priv->mps = mps; /* Enable MPW by default if supported. */
priv->cqe_comp = 1; /* Enable compression by default. */
err = mlx5_args(priv, pci_dev->devargs);
if (err) {
@@ -547,7 +554,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

priv_get_num_vfs(priv, _vfs);
priv->sriov = (num_vfs || sriov);
-   priv->mps = mps;
+   if (priv->mps && !mps) {
+   ERROR("multi-packet send not supported on this device"
+ " (" MLX5_TXQ_MPW_EN ")");
+   err = ENOTSUP;
+   goto port_error;
+   }
/* Allocate and register default RSS hash keys. */
priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
sizeof((*priv->rss_conf)[0]), 0);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 9dfb3ca..1767fe4 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -585,7 +585,8 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
  DEV_RX_OFFLOAD_UDP_CKSUM |
  DEV_RX_OFFLOAD_TCP_CKSUM) :
 0);
-   info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
+   if (!priv->mps)
+   info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
if (priv->hw_csum)
info->tx_offload_capa |=
(DEV_TX_OFFLOAD_IPV4_CKSUM |
@@ -1318,7 +1319,17 @@ void
 priv_select_tx_function(struct priv *priv)
 {
priv->dev->tx_pkt_burst = mlx5_tx_burst;
-   if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) {
+   /* Display warning for unsupported configurations. */
+   if (priv->sriov && priv->mps)
+   WARN("multi-packet send WQE cannot be used on a SR-IOV setup");
+   /* Select appropriate TX function. */
+   if ((priv->sriov == 0) && priv->mps && priv->txq_inline) {
+   priv->dev->tx_pkt_burst = mlx5_tx_burst_mpw_inline;
+   DEBUG("selected MPW inline TX function");
+   } else if ((priv->sriov == 0) && priv->mps) {
+   priv->dev->tx_pkt_burst = mlx5_tx_burst_mpw;
+   DEBUG("selected MPW TX function");
+  

[dpdk-dev] [PATCH 18/24] mlx5: add debugging information about TX queues capabilities

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_txq.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 15c8f73..d013230 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -334,6 +334,11 @@ txq_ctrl_setup(struct rte_eth_dev *dev, struct txq_ctrl 
*txq_ctrl,
  (void *)dev, strerror(ret));
goto error;
}
+   DEBUG("TX queue capabilities: max_send_wr=%u, max_send_sge=%u,"
+ " max_inline_data=%u",
+ attr.init.cap.max_send_wr,
+ attr.init.cap.max_send_sge,
+ attr.init.cap.max_inline_data);
attr.mod = (struct ibv_exp_qp_attr){
/* Move the QP to this state. */
.qp_state = IBV_QPS_INIT,
-- 
2.1.4



[dpdk-dev] [PATCH 19/24] mlx5: check remaining space while processing TX burst

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

The space necessary to store segmented packets cannot be known in advance
and must be verified for each of them.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxtx.c | 136 ++-
 1 file changed, 70 insertions(+), 66 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index b6ee47b..1478b2d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -583,50 +583,49 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct txq *txq = (struct txq *)dpdk_txq;
uint16_t elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
-   unsigned int i;
+   unsigned int i = 0;
unsigned int max;
unsigned int comp;
volatile union mlx5_wqe *wqe;
-   struct rte_mbuf *buf;

if (unlikely(!pkts_n))
return 0;
-   buf = pkts[0];
/* Prefetch first packet cacheline. */
tx_prefetch_cqe(txq, txq->cq_ci);
tx_prefetch_cqe(txq, txq->cq_ci + 1);
-   rte_prefetch0(buf);
+   rte_prefetch0(*pkts);
/* Start processing. */
txq_complete(txq);
max = (elts_n - (elts_head - txq->elts_tail));
if (max > elts_n)
max -= elts_n;
-   assert(max >= 1);
-   assert(max <= elts_n);
-   /* Always leave one free entry in the ring. */
-   --max;
-   if (max == 0)
-   return 0;
-   if (max > pkts_n)
-   max = pkts_n;
-   for (i = 0; (i != max); ++i) {
-   unsigned int elts_head_next = (elts_head + 1) & (elts_n - 1);
+   do {
+   struct rte_mbuf *buf;
+   unsigned int elts_head_next;
uintptr_t addr;
uint32_t length;
uint32_t lkey;

+   /* Make sure there is enough room to store this packet and
+* that one ring entry remains unused. */
+   if (max < 1 + 1)
+   break;
+   --max;
+   --pkts_n;
+   buf = *(pkts++);
+   elts_head_next = (elts_head + 1) & (elts_n - 1);
wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)];
rte_prefetch0(wqe);
-   if (i + 1 < max)
-   rte_prefetch0(pkts[i + 1]);
+   if (pkts_n)
+   rte_prefetch0(*pkts);
/* Retrieve buffer information. */
addr = rte_pktmbuf_mtod(buf, uintptr_t);
length = DATA_LEN(buf);
/* Update element. */
(*txq->elts)[elts_head] = buf;
/* Prefetch next buffer data. */
-   if (i + 1 < max)
-   rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 1],
+   if (pkts_n)
+   rte_prefetch0(rte_pktmbuf_mtod(*pkts,
   volatile void *));
/* Retrieve Memory Region key for this memory pool. */
lkey = txq_mp2mr(txq, txq_mb2mp(buf));
@@ -649,8 +648,8 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
txq->stats.obytes += length;
 #endif
elts_head = elts_head_next;
-   buf = pkts[i + 1];
-   }
+   ++i;
+   } while (pkts_n);
/* Take a shortcut if nothing must be sent. */
if (unlikely(i == 0))
return 0;
@@ -693,44 +692,43 @@ mlx5_tx_burst_inline(void *dpdk_txq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
struct txq *txq = (struct txq *)dpdk_txq;
uint16_t elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
-   unsigned int i;
+   unsigned int i = 0;
unsigned int max;
unsigned int comp;
volatile union mlx5_wqe *wqe;
-   struct rte_mbuf *buf;
unsigned int max_inline = txq->max_inline;

if (unlikely(!pkts_n))
return 0;
-   buf = pkts[0];
/* Prefetch first packet cacheline. */
tx_prefetch_cqe(txq, txq->cq_ci);
tx_prefetch_cqe(txq, txq->cq_ci + 1);
-   rte_prefetch0(buf);
+   rte_prefetch0(*pkts);
/* Start processing. */
txq_complete(txq);
max = (elts_n - (elts_head - txq->elts_tail));
if (max > elts_n)
max -= elts_n;
-   assert(max >= 1);
-   assert(max <= elts_n);
-   /* Always leave one free entry in the ring. */
-   --max;
-   if (max == 0)
-   return 0;
-   if (max > pkts_n)
-   max = pkts_n;
-   for (i = 0; (i != max); ++i) {
-   unsigned int elts_head_next = (elts_head + 1) & (elts_n - 1);
+   do {
+   struct rte_mbuf *buf;
+   unsigned int elts_head_next;
uintptr_t addr;
uint32_t 

[dpdk-dev] [PATCH 20/24] mlx5: resurrect TX gather support

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

Compared to its previous incarnation, the software limit on the number of
mbuf segments is no more (previously MLX5_PMD_SGE_WR_N, set to 4 by
default) hence no need for linearization code and related buffers that
permanently consumed a non negligible amount of memory to handle oversized
mbufs.

The resulting code is both lighter and faster.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.c | 231 +--
 drivers/net/mlx5/mlx5_txq.c  |   6 +-
 2 files changed, 182 insertions(+), 55 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 1478b2d..53d2a57 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -301,6 +301,7 @@ mlx5_wqe_write(struct txq *txq, volatile union mlx5_wqe 
*wqe,
 {
wqe->wqe.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->wqe.ctrl.data[1] = htonl((txq->qp_num_8s) | 4);
+   wqe->wqe.ctrl.data[2] = 0;
wqe->wqe.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -346,6 +347,7 @@ mlx5_wqe_write_vlan(struct txq *txq, volatile union 
mlx5_wqe *wqe,

wqe->wqe.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->wqe.ctrl.data[1] = htonl((txq->qp_num_8s) | 4);
+   wqe->wqe.ctrl.data[2] = 0;
wqe->wqe.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -423,6 +425,7 @@ mlx5_wqe_write_inline(struct txq *txq, volatile union 
mlx5_wqe *wqe,
assert(size < 64);
wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size);
+   wqe->inl.ctrl.data[2] = 0;
wqe->inl.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -496,6 +499,7 @@ mlx5_wqe_write_inline_vlan(struct txq *txq, volatile union 
mlx5_wqe *wqe,
assert(size < 64);
wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size);
+   wqe->inl.ctrl.data[2] = 0;
wqe->inl.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -584,6 +588,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
uint16_t elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
unsigned int i = 0;
+   unsigned int j = 0;
unsigned int max;
unsigned int comp;
volatile union mlx5_wqe *wqe;
@@ -600,21 +605,25 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
if (max > elts_n)
max -= elts_n;
do {
-   struct rte_mbuf *buf;
+   struct rte_mbuf *buf = *(pkts++);
unsigned int elts_head_next;
uintptr_t addr;
uint32_t length;
uint32_t lkey;
+   unsigned int segs_n = buf->nb_segs;
+   volatile struct mlx5_wqe_data_seg *dseg;
+   unsigned int ds = sizeof(*wqe) / 16;

/* Make sure there is enough room to store this packet and
 * that one ring entry remains unused. */
-   if (max < 1 + 1)
+   assert(segs_n);
+   if (max < segs_n + 1)
break;
-   --max;
+   max -= segs_n;
--pkts_n;
-   buf = *(pkts++);
elts_head_next = (elts_head + 1) & (elts_n - 1);
wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)];
+   dseg = >wqe.dseg;
rte_prefetch0(wqe);
if (pkts_n)
rte_prefetch0(*pkts);
@@ -634,7 +643,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
buf->vlan_tci);
else
mlx5_wqe_write(txq, wqe, addr, length, lkey);
-   wqe->wqe.ctrl.data[2] = 0;
/* Should we enable HW CKSUM offload */
if (buf->ol_flags &
(PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
@@ -643,6 +651,35 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
MLX5_ETH_WQE_L4_CSUM;
} else
wqe->wqe.eseg.cs_flags = 0;
+   while (--segs_n) {
+   /* Spill on next WQE when the current one does not have
+* enough room left. Size of WQE must a be a multiple
+* of data segment size. */
+   assert(!(sizeof(*wqe) % sizeof(*dseg)));
+   if (!(ds % (sizeof(*wqe) / 16)))
+   dseg = (volatile void *)
+   

[dpdk-dev] [PATCH 21/24] mlx5: work around spurious compilation errors

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

Since commit "mlx5: resurrect TX gather support", older GCC versions (such
as 4.8.5) may complain about the following:

 mlx5_rxtx.c: In function `mlx5_tx_burst':
 mlx5_rxtx.c:705:25: error: `wqe' may be used uninitialized in this
 function [-Werror=maybe-uninitialized]

 mlx5_rxtx.c: In function `mlx5_tx_burst_inline':
 mlx5_rxtx.c:864:25: error: `wqe' may be used uninitialized in this
 function [-Werror=maybe-uninitialized]

In both cases, this code cannot be reached when wqe is not initialized.

Considering older GCC versions are still widely used, work around this
issue by initializing wqe preemptively, even if it should not be necessary.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxtx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 53d2a57..f4af769 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -591,7 +591,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
unsigned int j = 0;
unsigned int max;
unsigned int comp;
-   volatile union mlx5_wqe *wqe;
+   volatile union mlx5_wqe *wqe = NULL;

if (unlikely(!pkts_n))
return 0;
@@ -733,7 +733,7 @@ mlx5_tx_burst_inline(void *dpdk_txq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
unsigned int j = 0;
unsigned int max;
unsigned int comp;
-   volatile union mlx5_wqe *wqe;
+   volatile union mlx5_wqe *wqe = NULL;
unsigned int max_inline = txq->max_inline;

if (unlikely(!pkts_n))
-- 
2.1.4



[dpdk-dev] [PATCH 22/24] mlx5: remove redundant RX queue initialization code

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

Toggling RX checksum offloads is already done at initialization time. This
code does not belong in rxq_rehash().

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxq.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b3972ff..20a236e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -798,7 +798,6 @@ rxq_cleanup(struct rxq_ctrl *rxq_ctrl)
 int
 rxq_rehash(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl)
 {
-   struct priv *priv = rxq_ctrl->priv;
struct rxq_ctrl tmpl = *rxq_ctrl;
unsigned int mbuf_n;
unsigned int desc_n;
@@ -811,15 +810,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq_ctrl 
*rxq_ctrl)
/* Number of descriptors and mbufs currently allocated. */
desc_n = tmpl.rxq.elts_n;
mbuf_n = desc_n;
-   /* Toggle RX checksum offload if hardware supports it. */
-   if (priv->hw_csum) {
-   tmpl.rxq.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-   rxq_ctrl->rxq.csum = tmpl.rxq.csum;
-   }
-   if (priv->hw_csum_l2tun) {
-   tmpl.rxq.csum_l2tun = 
!!dev->data->dev_conf.rxmode.hw_ip_checksum;
-   rxq_ctrl->rxq.csum_l2tun = tmpl.rxq.csum_l2tun;
-   }
/* From now on, any failure will render the queue unusable.
 * Reinitialize WQ. */
mod = (struct ibv_exp_wq_attr){
-- 
2.1.4



[dpdk-dev] [PATCH 23/24] mlx5: make RX queue reinitialization safer

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

The primary purpose of rxq_rehash() function is to stop and restart
reception on a queue after re-posting buffers. This may fail if the array
that temporarily stores existing buffers for reuse cannot be allocated.

Update rxq_rehash() to work on the target queue directly (not through a
template copy) and avoid this allocation.

rxq_alloc_elts() is modified accordingly to take buffers from an existing
queue directly and update their refcount.

Unlike rxq_rehash(), rxq_setup() must work on a temporary structure but
should not allocate new mbufs from the pool while reinitializing an
existing queue. This is achieved by using the refcount-aware
rxq_alloc_elts() before overwriting queue data.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxq.c | 94 -
 1 file changed, 51 insertions(+), 43 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 20a236e..17a28e4 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -642,7 +642,7 @@ priv_rehash_flows(struct priv *priv)
  */
 static int
 rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int elts_n,
-  struct rte_mbuf **pool)
+  struct rte_mbuf *(*pool)[])
 {
unsigned int i;
int ret = 0;
@@ -654,9 +654,10 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int 
elts_n,
&(*rxq_ctrl->rxq.wqes)[i];

if (pool != NULL) {
-   buf = *(pool++);
+   buf = (*pool)[i];
assert(buf != NULL);
rte_pktmbuf_reset(buf);
+   rte_pktmbuf_refcnt_update(buf, 1);
} else
buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
if (buf == NULL) {
@@ -781,7 +782,7 @@ rxq_cleanup(struct rxq_ctrl *rxq_ctrl)
 }

 /**
- * Reconfigure a RX queue with new parameters.
+ * Reconfigure RX queue buffers.
  *
  * rxq_rehash() does not allocate mbufs, which, if not done from the right
  * thread (such as a control thread), may corrupt the pool.
@@ -798,67 +799,48 @@ rxq_cleanup(struct rxq_ctrl *rxq_ctrl)
 int
 rxq_rehash(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl)
 {
-   struct rxq_ctrl tmpl = *rxq_ctrl;
-   unsigned int mbuf_n;
-   unsigned int desc_n;
-   struct rte_mbuf **pool;
-   unsigned int i, k;
+   unsigned int elts_n = rxq_ctrl->rxq.elts_n;
+   unsigned int i;
struct ibv_exp_wq_attr mod;
int err;

DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq_ctrl);
-   /* Number of descriptors and mbufs currently allocated. */
-   desc_n = tmpl.rxq.elts_n;
-   mbuf_n = desc_n;
/* From now on, any failure will render the queue unusable.
 * Reinitialize WQ. */
mod = (struct ibv_exp_wq_attr){
.attr_mask = IBV_EXP_WQ_ATTR_STATE,
.wq_state = IBV_EXP_WQS_RESET,
};
-   err = ibv_exp_modify_wq(tmpl.wq, );
+   err = ibv_exp_modify_wq(rxq_ctrl->wq, );
if (err) {
ERROR("%p: cannot reset WQ: %s", (void *)dev, strerror(err));
assert(err > 0);
return err;
}
-   /* Allocate pool. */
-   pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
-   if (pool == NULL) {
-   ERROR("%p: cannot allocate memory", (void *)dev);
-   return ENOBUFS;
-   }
/* Snatch mbufs from original queue. */
-   k = 0;
-   for (i = 0; (i != desc_n); ++i)
-   pool[k++] = (*rxq_ctrl->rxq.elts)[i];
-   assert(k == mbuf_n);
-   rte_free(pool);
+   claim_zero(rxq_alloc_elts(rxq_ctrl, elts_n, rxq_ctrl->rxq.elts));
+   for (i = 0; i != elts_n; ++i) {
+   struct rte_mbuf *buf = (*rxq_ctrl->rxq.elts)[i];
+
+   assert(rte_mbuf_refcnt_read(buf) == 2);
+   rte_pktmbuf_free_seg(buf);
+   }
/* Change queue state to ready. */
mod = (struct ibv_exp_wq_attr){
.attr_mask = IBV_EXP_WQ_ATTR_STATE,
.wq_state = IBV_EXP_WQS_RDY,
};
-   err = ibv_exp_modify_wq(tmpl.wq, );
+   err = ibv_exp_modify_wq(rxq_ctrl->wq, );
if (err) {
ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s",
  (void *)dev, strerror(err));
goto error;
}
-   /* Post SGEs. */
-   err = rxq_alloc_elts(, desc_n, pool);
-   if (err) {
-   ERROR("%p: cannot reallocate WRs, aborting", (void *)dev);
-   rte_free(pool);
-   assert(err > 0);
-   return err;
-   }
/* Update doorbell counter. */
-   rxq_ctrl->rxq.rq_ci = desc_n;
+   rxq_ctrl->rxq.rq_ci = elts_n;
rte_wmb();
*rxq_ctrl->rxq.rq_db = htonl(rxq_ctrl->rxq.rq_ci);
 error:
-   *rxq_ctrl = tmpl;
   

[dpdk-dev] [PATCH 24/24] mlx5: resurrect RX scatter support

2016-06-08 Thread Nelio Laranjeiro
From: Adrien Mazarguil 

This commit brings back RX scatter and related support by the MTU update
function. The maximum number of segments per packet is not a fixed value
anymore (previously MLX5_PMD_SGE_WR_N, set to 4 by default) as it caused
performance issues when fewer segments were actually needed as well as
limitations on the maximum packet size that could be received with the
default mbuf size (supporting at most 8576 bytes).

These limitations are now lifted as the number of SGEs is derived from the
MTU (which implies MRU) at queue initialization and during MTU update.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c |  84 ++
 drivers/net/mlx5/mlx5_rxq.c|  73 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 133 -
 drivers/net/mlx5/mlx5_rxtx.h   |   1 +
 4 files changed, 211 insertions(+), 80 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 1767fe4..32af304 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -726,6 +726,9 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
unsigned int i;
uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
mlx5_rx_burst;
+   unsigned int max_frame_len;
+   int rehash;
+   int restart = priv->started;

if (mlx5_is_secondary())
return -E_RTE_SECONDARY;
@@ -739,7 +742,6 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
goto out;
} else
DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
-   priv->mtu = mtu;
/* Temporarily replace RX handler with a fake one, assuming it has not
 * been copied elsewhere. */
dev->rx_pkt_burst = removed_rx_burst;
@@ -747,28 +749,88 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 * removed_rx_burst() instead. */
rte_wmb();
usleep(1000);
+   /* MTU does not include header and CRC. */
+   max_frame_len = ETHER_HDR_LEN + mtu + ETHER_CRC_LEN;
+   /* Check if at least one queue is going to need a SGE update. */
+   for (i = 0; i != priv->rxqs_n; ++i) {
+   struct rxq *rxq = (*priv->rxqs)[i];
+   unsigned int mb_len;
+   unsigned int size = RTE_PKTMBUF_HEADROOM + max_frame_len;
+   unsigned int sges_n;
+
+   if (rxq == NULL)
+   continue;
+   mb_len = rte_pktmbuf_data_room_size(rxq->mp);
+   assert(mb_len >= RTE_PKTMBUF_HEADROOM);
+   /* Determine the number of SGEs needed for a full packet
+* and round it to the next power of two. */
+   sges_n = log2above((size / mb_len) + !!(size % mb_len));
+   if (sges_n != rxq->sges_n)
+   break;
+   }
+   /* If all queues have the right number of SGEs, a simple rehash
+* of their buffers is enough, otherwise SGE information can only
+* be updated in a queue by recreating it. All resources that depend
+* on queues (flows, indirection tables) must be recreated as well in
+* that case. */
+   rehash = (i == priv->rxqs_n);
+   if (!rehash) {
+   /* Clean up everything as with mlx5_dev_stop(). */
+   priv_special_flow_disable_all(priv);
+   priv_mac_addrs_disable(priv);
+   priv_destroy_hash_rxqs(priv);
+   priv_fdir_disable(priv);
+   priv_dev_interrupt_handler_uninstall(priv, dev);
+   }
+recover:
/* Reconfigure each RX queue. */
for (i = 0; (i != priv->rxqs_n); ++i) {
struct rxq *rxq = (*priv->rxqs)[i];
-   unsigned int mb_len;
-   unsigned int max_frame_len;
+   struct rxq_ctrl *rxq_ctrl =
+   container_of(rxq, struct rxq_ctrl, rxq);
int sp;
+   unsigned int mb_len;
+   unsigned int tmp;

if (rxq == NULL)
continue;
-   /* Calculate new maximum frame length according to MTU and
-* toggle scattered support (sp) if necessary. */
-   max_frame_len = (priv->mtu + ETHER_HDR_LEN +
-(ETHER_MAX_VLAN_FRAME_LEN - ETHER_MAX_LEN));
mb_len = rte_pktmbuf_data_room_size(rxq->mp);
assert(mb_len >= RTE_PKTMBUF_HEADROOM);
+   /* Toggle scattered support (sp) if necessary. */
sp = (max_frame_len > (mb_len - RTE_PKTMBUF_HEADROOM));
-   if (sp) {
-   ERROR("%p: RX scatter is not supported", (void *)dev);
-   ret = ENOTSUP;
-   goto out;
+   /* Provide new values to rxq_setup(). */
+   dev->data->dev_conf.rxmode.jumbo_frame = sp;
+  

[dpdk-dev] [PATCH v2 3/3] doc: add keepalive enhancement documentation

2016-06-08 Thread Thomas Monjalon
2016-05-18 10:30, Remy Horton:

There is no explanation and it is totally normal, because this patch
must be squashed with the code change.

> Signed-off-by: Remy Horton 
> ---
>  doc/guides/rel_notes/release_16_07.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_16_07.rst 
> b/doc/guides/rel_notes/release_16_07.rst
> index f6d543c..bc269b0 100644
> --- a/doc/guides/rel_notes/release_16_07.rst
> +++ b/doc/guides/rel_notes/release_16_07.rst
> @@ -34,6 +34,11 @@ This section should contain new features added in this 
> release. Sample format:
>  
>Refer to the previous release notes for examples.
>  
> +* **Added keepalive enhancements.**
> +
> +   Adds support for reporting LCore liveness to secondary processes and
> +   support for idled CPUs.
> +



[dpdk-dev] [PATCH v2 1/3] eal: add new keepalive states & callback hooks

2016-06-08 Thread Thomas Monjalon
2016-05-18 10:30, Remy Horton:

The explanations are missing.
Probably you should split this patch.

> Signed-off-by: Remy Horton 
[...]
> +enum rte_keepalive_state {
> + UNUSED = 0,
> + ALIVE = 1,
> + MISSING = 4,
> + DEAD = 2,
> + GONE = 3,
> + DOZING = 5,
> + SLEEP = 6
> +};

Please use RTE_ prefix.

[...]
>  /**
> + * Keepalive relay callback.
> + *
> + *  Receives a data pointer passed to rte_keepalive_register_relay_callback()
> + *  and the id of the core for which state is to be forwarded.
> + */

Please document each parameter.

> +typedef void (*rte_keepalive_relay_callback_t)(
> + void *data,
> + const int id_core,
> + enum rte_keepalive_state core_state,
> + uint64_t last_seen
> + );

[...]
> +/**
> + * Per-core sleep-time indication.
> + * @param *keepcfg
> + *   Keepalive structure pointer
> + *
> + * This function needs to be called from within the main process loop of
> + * the LCore going to sleep.

Why? Please add more comments.

> + */



[dpdk-dev] [PATCH v2 0/7] examples/ip_pipeline: CLI rework and improvements

2016-06-08 Thread Azarewicz, PiotrX T
> > Piotr Azarewicz (7):
> >   examples/ip_pipeline: add helper functions for parsing string
> >   examples/ip_pipeline: modifies common pipeline CLI
> >   examples/ip_pipeline: modifies firewall pipeline CLI
> >   examples/ip_pipeline: modifies flow classifications pipeline CLI
> >   examples/ip_pipeline: modifies flow action pipeline CLI
> >   examples/ip_pipeline: modifies routing pipeline CLI
> >   examples/ip_pipeline: update edge router usecase
> 
> Please take care of the authorship in patches 2, 3 and 4.
> It is probably wrong. You can fix it with git commit --amend --author in an
> interactive rebase.
> To avoid such issue, you must use git-am to apply patches.

Thanks Thomas for hints.


[dpdk-dev] [PATCH v4 30/39] bnxt: add start/stop/link update operations

2016-06-08 Thread Bruce Richardson
On Mon, Jun 06, 2016 at 03:08:34PM -0700, Stephen Hurd wrote:
> From: Ajit Khaparde 
> 
> This patch adds code to add the start, stop and link update dev_ops.
> The BNXT driver will now minimally pass traffic with testpmd.
> 
> v4:
> - Fix issues pointed out by checkpatch.
> - Shorten the string passed for reserving memzone
> when default completion ring is created.
> 
> Signed-off-by: Ajit Khaparde 
> Reviewed-by: David Christensen 
> Signed-off-by: Stephen Hurd 
> ---
>  drivers/net/bnxt/bnxt_ethdev.c | 269 
> +
>  1 file changed, 269 insertions(+)
> 
I get compilation errors after applying this patch:

== Build drivers/net/bnxt
  CC bnxt_ethdev.o
/home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c: In function 
?bnxt_init_chip?:
/home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:135:7: error: 
implicit declaration of function ?bnxt_alloc_hwrm_rings? 
[-Werror=implicit-function-declaration]
  rc = bnxt_alloc_hwrm_rings(bp);
   ^
/home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:135:2: error: 
nested extern declaration of ?bnxt_alloc_hwrm_rings? [-Werror=nested-externs]
  rc = bnxt_alloc_hwrm_rings(bp);
  ^
/home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c: In function 
?bnxt_init_nic?:
/home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:233:2: error: 
implicit declaration of function ?bnxt_init_ring_grps? 
[-Werror=implicit-function-declaration]
  bnxt_init_ring_grps(bp);
  ^
/home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:233:2: error: 
nested extern declaration of ?bnxt_init_ring_grps? [-Werror=nested-externs]
cc1: all warnings being treated as errors
/home/bruce/next-net/dpdk-next-net/mk/internal/rte.compile-pre.mk:126: recipe 
for target 'bnxt_ethdev.o' failed
make[5]: *** [bnxt_ethdev.o] Error 1



[dpdk-dev] [PATCH v4 01/39] bnxt: new driver for Broadcom NetXtreme-C devices

2016-06-08 Thread Bruce Richardson
On Mon, Jun 06, 2016 at 03:08:05PM -0700, Stephen Hurd wrote:
> From: Ajit Khaparde 
> 
> This patch adds the initial skeleton for bnxt driver along with the
> nic guide to tie into the build system.
> At this point, the driver simply fails init.
> 
> v4:
> Fix a warning that the document isn't included in any toctree
> Also remove a PCI ID added erroneously.
> 
> Signed-off-by: Ajit Khaparde 
> Reviewed-by: David Christensen 
> Signed-off-by: Stephen Hurd 
> ---
Hi Stephen, Ajit,

in the absense of a cover letter, I'll post my overall comments on this set 
here.

Thanks for the updated v4, I'm not seeing any checkpatch issues with the patches
that have applied and compiled up cleanly. However,

* the build is broken by patch 30, and none of the later patches 31-38 seem
to fix it for me. Is there a header file include missing in that patch or 
something? [I'm using gcc 5.3.1 on Fedora 23]
* patch 39 fails to apply for me with rejects on other files in the driver,
which is very strange. [drivers/net/bnxt/bnxt_hwrm.c, 
drivers/net/bnxt/bnxt_ring.c and drivers/net/bnxt/bnxt_ring.h]

Apart from this, the other concern I still have is with the explanation
accompaning some of the patches, especially for those to with rings. There are
many patches throughout the set which seem to be doing the same thing, adding
allocate and free functions for rings. 

For example:
Patch 28 is titled "add ring alloc, free and group init". For a start it's
unclear from the title, whether the alloc and free refers to individual rings
or to the groups. If it's referring to the rings themselves, then how is this
different functionality from:
Patch 7: add ring structs and free() func
Patch 10/11: add TX/RX queue create/destroy operations
Patch 15: code to alloc/free ring
Patch 24: add HWRM ring alloc/free functions

Or if it's to do with allocating and freeing the groups, it would seem to be
the same functionality as patch 25: "add ring group alloc/free functions".

In some cases, the commit message does add some detail, e.g. patches 7 and 10
point out what they don't cover, but the rest is still very unclear, as to what
each of the 5/6 patches for ring create/free are really doing and how they
work together. I'm not sure exactly how best to do this without understanding
the details of these patches, but one way might be to list out the different
part of the ring allocation/free in each patch and then explain what part of
that process this patch is doing and how it fits in the sequence. Otherwise,
maybe some of the patches may need to be merged if they are very closely 
related.

Can you please look to improve the commit messages when you do rework to fix
the compilation and patch application errors.

Thanks,
/Bruce


[dpdk-dev] [PATCH v3 0/7] examples/ip_pipeline: CLI rework and improvements

2016-06-08 Thread Piotr Azarewicz
Using the latest librte_cmdline improvements, the CLI implementation of the
ip_pipeline application is streamlined and improved, which results in
eliminating thousands of lines of code from the application, thus leading to
code that is easier to maintain and extend.

v3 changes:
- fix the authorship in patches

v2 changes:
- added functions for parsing hex values
- added standard error messages for CLI and file bulk
- for all CLI commands: separate code paths for each flavor of each command
(e.g. route add, route add default, route ls, route del, route del default,
etc do not share any line of code)
- for bulk commands: simplified error checking
- added additional config files

Acked-by: Cristian Dumitrescu 

Daniel Mrzyglod (1):
  examples/ip_pipeline: modifies firewall pipeline CLI

Piotr Azarewicz (4):
  examples/ip_pipeline: add helper functions for parsing string
  examples/ip_pipeline: modifies flow action pipeline CLI
  examples/ip_pipeline: modifies routing pipeline CLI
  examples/ip_pipeline: update edge router usecase

Tomasz Kulasek (2):
  examples/ip_pipeline: modifies common pipeline CLI
  examples/ip_pipeline: modifies flow classifications pipeline CLI

 examples/ip_pipeline/Makefile  |1 +
 examples/ip_pipeline/config/action.cfg |   68 +
 examples/ip_pipeline/config/action.sh  |  119 ++
 examples/ip_pipeline/config/action.txt |8 +
 .../ip_pipeline/config/edge_router_downstream.cfg  |   30 +-
 .../ip_pipeline/config/edge_router_downstream.sh   |7 +-
 .../ip_pipeline/config/edge_router_upstream.cfg|   36 +-
 .../ip_pipeline/config/edge_router_upstream.sh |   37 +-
 examples/ip_pipeline/config/firewall.cfg   |   68 +
 examples/ip_pipeline/config/firewall.sh|   13 +
 examples/ip_pipeline/config/firewall.txt   |9 +
 examples/ip_pipeline/config/flow.cfg   |   72 +
 examples/ip_pipeline/config/flow.sh|   25 +
 examples/ip_pipeline/config/flow.txt   |   17 +
 examples/ip_pipeline/config/l2fwd.cfg  |5 +-
 examples/ip_pipeline/config/l3fwd.cfg  |9 +-
 examples/ip_pipeline/config/l3fwd.sh   |   32 +-
 examples/ip_pipeline/config/l3fwd_arp.cfg  |   70 +
 examples/ip_pipeline/config/l3fwd_arp.sh   |   43 +
 examples/ip_pipeline/config_parse.c|  257 +--
 examples/ip_pipeline/parser.c  |  745 +++
 examples/ip_pipeline/parser.h  |   54 +-
 examples/ip_pipeline/pipeline/pipeline_common_fe.c |  452 ++---
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |9 +
 examples/ip_pipeline/pipeline/pipeline_firewall.c  | 1461 +-
 examples/ip_pipeline/pipeline/pipeline_firewall.h  |   12 +
 .../ip_pipeline/pipeline/pipeline_flow_actions.c   | 1505 +-
 .../ip_pipeline/pipeline/pipeline_flow_actions.h   |   11 +
 .../pipeline/pipeline_flow_classification.c| 2082 +---
 .../pipeline/pipeline_flow_classification.h|   28 +
 examples/ip_pipeline/pipeline/pipeline_routing.c   | 1636 ---
 examples/ip_pipeline/thread_fe.c   |   36 +-
 32 files changed, 4009 insertions(+), 4948 deletions(-)
 create mode 100644 examples/ip_pipeline/config/action.cfg
 create mode 100644 examples/ip_pipeline/config/action.sh
 create mode 100644 examples/ip_pipeline/config/action.txt
 create mode 100644 examples/ip_pipeline/config/firewall.cfg
 create mode 100644 examples/ip_pipeline/config/firewall.sh
 create mode 100644 examples/ip_pipeline/config/firewall.txt
 create mode 100644 examples/ip_pipeline/config/flow.cfg
 create mode 100644 examples/ip_pipeline/config/flow.sh
 create mode 100644 examples/ip_pipeline/config/flow.txt
 create mode 100644 examples/ip_pipeline/config/l3fwd_arp.cfg
 create mode 100644 examples/ip_pipeline/config/l3fwd_arp.sh
 create mode 100644 examples/ip_pipeline/parser.c

-- 
1.7.9.5



[dpdk-dev] [PATCH v3 1/7] examples/ip_pipeline: add helper functions for parsing string

2016-06-08 Thread Piotr Azarewicz
Add a couple of additional functions that will allow to parse many types
of input parameters, i.e.: bool, 16, 32, 64 bits, hex, etc.

Signed-off-by: Piotr Azarewicz 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/Makefile   |1 +
 examples/ip_pipeline/config_parse.c |  257 +---
 examples/ip_pipeline/parser.c   |  745 +++
 examples/ip_pipeline/parser.h   |   54 ++-
 4 files changed, 791 insertions(+), 266 deletions(-)
 create mode 100644 examples/ip_pipeline/parser.c

diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile
index 10fe1ba..5827117 100644
--- a/examples/ip_pipeline/Makefile
+++ b/examples/ip_pipeline/Makefile
@@ -50,6 +50,7 @@ INC += $(wildcard *.h) $(wildcard pipeline/*.h)
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) := main.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_parse.c
+SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += parser.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_parse_tm.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_check.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index e5efd03..ff917f3 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -30,6 +30,7 @@
  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
+
 #include 
 #include 
 #include 
@@ -229,13 +230,6 @@ app_print_usage(char *prgname)
rte_exit(0, app_usage, prgname, app_params_default.config_file);
 }

-#define skip_white_spaces(pos) \
-({ \
-   __typeof__(pos) _p = (pos); \
-   for ( ; isspace(*_p); _p++);\
-   _p; \
-})
-
 #define PARSER_PARAM_ADD_CHECK(result, params_array, section_name) \
 do {   \
APP_CHECK((result != -EINVAL),  \
@@ -248,44 +242,6 @@ do {   
\
"Parse error in section \"%s\"", section_name); \
 } while (0)

-int
-parser_read_arg_bool(const char *p)
-{
-   p = skip_white_spaces(p);
-   int result = -EINVAL;
-
-   if (((p[0] == 'y') && (p[1] == 'e') && (p[2] == 's')) ||
-   ((p[0] == 'Y') && (p[1] == 'E') && (p[2] == 'S'))) {
-   p += 3;
-   result = 1;
-   }
-
-   if (((p[0] == 'o') && (p[1] == 'n')) ||
-   ((p[0] == 'O') && (p[1] == 'N'))) {
-   p += 2;
-   result = 1;
-   }
-
-   if (((p[0] == 'n') && (p[1] == 'o')) ||
-   ((p[0] == 'N') && (p[1] == 'O'))) {
-   p += 2;
-   result = 0;
-   }
-
-   if (((p[0] == 'o') && (p[1] == 'f') && (p[2] == 'f')) ||
-   ((p[0] == 'O') && (p[1] == 'F') && (p[2] == 'F'))) {
-   p += 3;
-   result = 0;
-   }
-
-   p = skip_white_spaces(p);
-
-   if (p[0] != '\0')
-   return -EINVAL;
-
-   return result;
-}
-
 #define PARSE_ERROR(exp, section, entry)   \
 APP_CHECK(exp, "Parse error in section \"%s\": entry \"%s\"\n", section, entry)

@@ -318,217 +274,6 @@ APP_CHECK(exp, "Parse error in section \"%s\": 
unrecognized entry \"%s\"\n",\
 APP_CHECK(exp, "Parse error in section \"%s\": duplicate entry \"%s\"\n",\
section, entry)

-int
-parser_read_uint64(uint64_t *value, const char *p)
-{
-   char *next;
-   uint64_t val;
-
-   p = skip_white_spaces(p);
-   if (!isdigit(*p))
-   return -EINVAL;
-
-   val = strtoul(p, , 10);
-   if (p == next)
-   return -EINVAL;
-
-   p = next;
-   switch (*p) {
-   case 'T':
-   val *= 1024ULL;
-   /* fall through */
-   case 'G':
-   val *= 1024ULL;
-   /* fall through */
-   case 'M':
-   val *= 1024ULL;
-   /* fall through */
-   case 'k':
-   case 'K':
-   val *= 1024ULL;
-   p++;
-   break;
-   }
-
-   p = skip_white_spaces(p);
-   if (*p != '\0')
-   return -EINVAL;
-
-   *value = val;
-   return 0;
-}
-
-int
-parser_read_uint32(uint32_t *value, const char *p)
-{
-   uint64_t val = 0;
-   int ret = parser_read_uint64(, p);
-
-   if (ret < 0)
-   return ret;
-
-   if (val > UINT32_MAX)
-   return -ERANGE;
-
-   *value = val;
-   return 0;
-}
-
-int
-parse_pipeline_core(uint32_t *socket,
-   uint32_t *core,
-   uint32_t *ht,
-   const char *entry)
-{
-   size_t num_len;
-   char num[8];
-
-   uint32_t s = 0, c = 0, h = 0, val;
-   

[dpdk-dev] [PATCH v3 2/7] examples/ip_pipeline: modifies common pipeline CLI

2016-06-08 Thread Piotr Azarewicz
From: Tomasz Kulasek 

All link commands are merged into one command:
cmd_link_parsed.
Improve run command to allow run periodically.
Adding static keyword to a lot of token declarations.

Signed-off-by: Tomasz Kulasek 
Signed-off-by: Michal Kobylinski 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/pipeline/pipeline_common_fe.c |  452 ++--
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |9 +
 examples/ip_pipeline/thread_fe.c   |   36 +-
 3 files changed, 244 insertions(+), 253 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_common_fe.c 
b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
index a691d42..dc37a5f 100644
--- a/examples/ip_pipeline/pipeline/pipeline_common_fe.c
+++ b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -42,12 +42,10 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 

 #include "pipeline_common_fe.h"
+#include "parser.h"

 int
 app_pipeline_ping(struct app_params *app,
@@ -464,16 +462,16 @@ cmd_ping_parsed(
printf("Command failed\n");
 }

-cmdline_parse_token_string_t cmd_ping_p_string =
+static cmdline_parse_token_string_t cmd_ping_p_string =
TOKEN_STRING_INITIALIZER(struct cmd_ping_result, p_string, "p");

-cmdline_parse_token_num_t cmd_ping_pipeline_id =
+static cmdline_parse_token_num_t cmd_ping_pipeline_id =
TOKEN_NUM_INITIALIZER(struct cmd_ping_result, pipeline_id, UINT32);

-cmdline_parse_token_string_t cmd_ping_ping_string =
+static cmdline_parse_token_string_t cmd_ping_ping_string =
TOKEN_STRING_INITIALIZER(struct cmd_ping_result, ping_string, "ping");

-cmdline_parse_inst_t cmd_ping = {
+static cmdline_parse_inst_t cmd_ping = {
.f = cmd_ping_parsed,
.data = NULL,
.help_str = "Pipeline ping",
@@ -498,6 +496,7 @@ struct cmd_stats_port_in_result {
uint32_t port_in_id;

 };
+
 static void
 cmd_stats_port_in_parsed(
void *parsed_result,
@@ -531,23 +530,23 @@ cmd_stats_port_in_parsed(
stats.stats.n_pkts_drop);
 }

-cmdline_parse_token_string_t cmd_stats_port_in_p_string =
+static cmdline_parse_token_string_t cmd_stats_port_in_p_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_in_result, p_string,
"p");

-cmdline_parse_token_num_t cmd_stats_port_in_pipeline_id =
+static cmdline_parse_token_num_t cmd_stats_port_in_pipeline_id =
TOKEN_NUM_INITIALIZER(struct cmd_stats_port_in_result, pipeline_id,
UINT32);

-cmdline_parse_token_string_t cmd_stats_port_in_stats_string =
+static cmdline_parse_token_string_t cmd_stats_port_in_stats_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_in_result, stats_string,
"stats");

-cmdline_parse_token_string_t cmd_stats_port_in_port_string =
+static cmdline_parse_token_string_t cmd_stats_port_in_port_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_in_result, port_string,
"port");

-cmdline_parse_token_string_t cmd_stats_port_in_in_string =
+static cmdline_parse_token_string_t cmd_stats_port_in_in_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_in_result, in_string,
"in");

@@ -555,7 +554,7 @@ cmdline_parse_token_string_t cmd_stats_port_in_in_string =
TOKEN_NUM_INITIALIZER(struct cmd_stats_port_in_result, port_in_id,
UINT32);

-cmdline_parse_inst_t cmd_stats_port_in = {
+static cmdline_parse_inst_t cmd_stats_port_in = {
.f = cmd_stats_port_in_parsed,
.data = NULL,
.help_str = "Pipeline input port stats",
@@ -617,31 +616,31 @@ cmd_stats_port_out_parsed(
stats.stats.n_pkts_drop);
 }

-cmdline_parse_token_string_t cmd_stats_port_out_p_string =
+static cmdline_parse_token_string_t cmd_stats_port_out_p_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_out_result, p_string,
"p");

-cmdline_parse_token_num_t cmd_stats_port_out_pipeline_id =
+static cmdline_parse_token_num_t cmd_stats_port_out_pipeline_id =
TOKEN_NUM_INITIALIZER(struct cmd_stats_port_out_result, pipeline_id,
UINT32);

-cmdline_parse_token_string_t cmd_stats_port_out_stats_string =
+static cmdline_parse_token_string_t cmd_stats_port_out_stats_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_out_result, stats_string,
"stats");

-cmdline_parse_token_string_t cmd_stats_port_out_port_string =
+static cmdline_parse_token_string_t cmd_stats_port_out_port_string =
TOKEN_STRING_INITIALIZER(struct cmd_stats_port_out_result, port_string,
"port");

-cmdline_parse_token_string_t 

[dpdk-dev] [PATCH v4 01/39] bnxt: new driver for Broadcom NetXtreme-C devices

2016-06-08 Thread Bruce Richardson
On Tue, Jun 07, 2016 at 08:25:44AM +0200, Thomas Monjalon wrote:
> Hi Stephen,
> Reminder from http://dpdk.org/dev#send:
> 
> git send-email -39 -v4 --cover-letter --annotate
>   --in-reply-to 
> 
> Please do not forget --in-reply-to. Thanks

Three other minor style updates as well that would make my life a little easier:

* The sign-off type tags at the bottom of the email are kept in logical
chronological order (irrespective of actual chronological order). This means
that all signed-off-by tags go together followed by a reviewed-by tag 
afterwards.

* Watch out for initial caps at the start of a commit title. [This will be 
caught
by the check-git-log.sh script if you want to run that]

* The history updates from vN to vN+1 are best put after the signoffs and 
prefixed
with a cutline, so that they get stripped automatically when the patch is 
applied.
Otherwise I have to delete them manually from the commit message. The format of
the messages is therefore best done as:

bnxt: 



Signed-off-by: 
Signed-off-by: 
Reviewed-by: 

---
vN changes


Thanks,
/Bruce


[dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements

2016-06-08 Thread Ananyev, Konstantin
Hi Adrien,

> 
> An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> applications with a log level high enough to trigger it.
> 
> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> expects it to be 0. 
>Considering users are not expected to reset the
> reference count to satisfy assert() and that raw functions are designed on
> purpose without safety belts, remove these checks.

Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
Wright now, it is a user responsibility to make sure refcnt==0 before pushing
mbuf back to the pool.
Not sure why do you consider that wrong?
If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
sure mbuf's refcn==0.
BTW, why are you doing it?
The comment clearly states that the function is for internal use:
/**
 * @internal Put mbuf back into its original mempool.
 * The use of that function is reserved for RTE internal needs.
 * Please use rte_pktmbuf_free().
 *
 * @param m
 *   The mbuf to be freed.
 */
static inline void __attribute__((always_inline))
__rte_mbuf_raw_free(struct rte_mbuf *m)

Konstantin

> 
> Signed-off-by: Adrien Mazarguil 
> ---
>  lib/librte_mbuf/rte_mbuf.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 11fa06d..7070bb8 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf 
> *rte_mbuf_raw_alloc(struct rte_mempool *mp)
>   if (rte_mempool_get(mp, ) < 0)
>   return NULL;
>   m = (struct rte_mbuf *)mb;
> - RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
>   rte_mbuf_refcnt_set(m, 1);
>   __rte_mbuf_sanity_check(m, 0);
> 
> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
>  static inline void __attribute__((always_inline))
>  __rte_mbuf_raw_free(struct rte_mbuf *m)
>  {
> - RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
>   rte_mempool_put(m->pool, m);
>  }
> 
> --
> 2.1.4



[dpdk-dev] [PATCH v3 3/7] examples/ip_pipeline: modifies firewall pipeline CLI

2016-06-08 Thread Piotr Azarewicz
From: Daniel Mrzyglod 

Each command are merged into one: cmd_firewall_parsed.
ADD command format is changed:
p  firewall add priority  ipv4 
  
  port 

and bulk command was modified:
1. firewall add bulk
File line format:
priority  ipv4
  port 
(protomask is a hex value)
File line example:
priority 0 ipv4 1.2.3.0 24 10.20.30.40 32 0 63 64 127 6 0xF port 3

2. firewall del bulk
File line format:
ipv4  
   
File line example:
ipv4 1.2.3.0 24 10.20.30.40 32 0 63 64 127 6 0xF

Signed-off-by: Daniel Mrzyglod 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/config/firewall.cfg  |   68 +
 examples/ip_pipeline/config/firewall.sh   |   13 +
 examples/ip_pipeline/config/firewall.txt  |9 +
 examples/ip_pipeline/pipeline/pipeline_firewall.c | 1461 -
 examples/ip_pipeline/pipeline/pipeline_firewall.h |   12 +
 5 files changed, 622 insertions(+), 941 deletions(-)
 create mode 100644 examples/ip_pipeline/config/firewall.cfg
 create mode 100644 examples/ip_pipeline/config/firewall.sh
 create mode 100644 examples/ip_pipeline/config/firewall.txt

diff --git a/examples/ip_pipeline/config/firewall.cfg 
b/examples/ip_pipeline/config/firewall.cfg
new file mode 100644
index 000..2f5dd9f
--- /dev/null
+++ b/examples/ip_pipeline/config/firewall.cfg
@@ -0,0 +1,68 @@
+;   BSD LICENSE
+;
+;   Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
+;   All rights reserved.
+;
+;   Redistribution and use in source and binary forms, with or without
+;   modification, are permitted provided that the following conditions
+;   are met:
+;
+; * Redistributions of source code must retain the above copyright
+;   notice, this list of conditions and the following disclaimer.
+; * Redistributions in binary form must reproduce the above copyright
+;   notice, this list of conditions and the following disclaimer in
+;   the documentation and/or other materials provided with the
+;   distribution.
+; * Neither the name of Intel Corporation nor the names of its
+;   contributors may be used to endorse or promote products derived
+;   from this software without specific prior written permission.
+;
+;   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+;   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+;   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+;   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+;   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+;   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+;   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+;   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+;   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+;   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+; ___
+; RXQ0.0 --->|   |---> TXQ0.0
+;|   |
+; RXQ1.0 --->|   |---> TXQ1.0
+;|   Firewall|
+; RXQ2.0 --->|   |---> TXQ2.0
+;|   |
+; RXQ3.0 --->|   |---> TXQ3.0
+;|___|
+;|
+;+---> SINK0 (default rule)
+;
+; Input packet: Ethernet/IPv4
+;
+; Packet buffer layout:
+; #Field Name  Offset (Bytes)  Size (Bytes)
+; 0Mbuf0   128
+; 1Headroom128 128
+; 2Ethernet header 256 14
+; 3IPv4 header 270 20
+
+[EAL]
+log_level = 0
+
+[PIPELINE0]
+type = MASTER
+core = 0
+
+[PIPELINE1]
+type = FIREWALL
+core = 1
+pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
+pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0 SINK0
+n_rules = 4096
+pkt_type = ipv4
+;pkt_type = vlan_ipv4
+;pkt_type = qinq_ipv4
diff --git a/examples/ip_pipeline/config/firewall.sh 
b/examples/ip_pipeline/config/firewall.sh
new file mode 100644
index 000..c83857e
--- /dev/null
+++ b/examples/ip_pipeline/config/firewall.sh
@@ -0,0 +1,13 @@
+#
+# run ./config/firewall.sh
+#
+
+p 1 firewall add default 4 #SINK0
+p 1 firewall add priority 1 ipv4 0.0.0.0 0 100.0.0.0 10 0 65535 0 65535 6 0xF 
port 0
+p 1 firewall add priority 1 ipv4 0.0.0.0 0 100.64.0.0 10 0 65535 0 65535 6 0xF 
port 1
+p 1 firewall add priority 1 ipv4 0.0.0.0 0 100.128.0.0 10 0 65535 0 65535 6 
0xF port 2
+p 1 firewall add priority 1 ipv4 0.0.0.0 0 100.192.0.0 10 0 65535 0 65535 6 
0xF port 3
+
+#p 1 firewall add bulk ./config/firewall.txt
+
+p 1 firewall ls
diff --git a/examples/ip_pipeline/config/firewall.txt 
b/examples/ip_pipeline/config/firewall.txt
new file mode 100644
index 000..54cfffd
--- /dev/null
+++ b/examples/ip_pipeline/config/firewall.txt
@@ -0,0 +1,9 @@
+#

[dpdk-dev] [PATCH v3 5/7] examples/ip_pipeline: modifies flow action pipeline CLI

2016-06-08 Thread Piotr Azarewicz
All commands merged into one: cmd_action_parsed.

modified bulk command:
action flow bulk
File line format:
flow 
meter 0 meter 1 meter 2
meter 3
policer 0policer 1  
 policer 2policer 3 
 
port 

Signed-off-by: Marcin Kerlin 
Signed-off-by: Piotr Azarewicz 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/config/action.cfg |   68 +
 examples/ip_pipeline/config/action.sh  |  119 ++
 examples/ip_pipeline/config/action.txt |8 +
 .../ip_pipeline/pipeline/pipeline_flow_actions.c   | 1505 +++-
 .../ip_pipeline/pipeline/pipeline_flow_actions.h   |   11 +
 5 files changed, 707 insertions(+), 1004 deletions(-)
 create mode 100644 examples/ip_pipeline/config/action.cfg
 create mode 100644 examples/ip_pipeline/config/action.sh
 create mode 100644 examples/ip_pipeline/config/action.txt

diff --git a/examples/ip_pipeline/config/action.cfg 
b/examples/ip_pipeline/config/action.cfg
new file mode 100644
index 000..994ae94
--- /dev/null
+++ b/examples/ip_pipeline/config/action.cfg
@@ -0,0 +1,68 @@
+;   BSD LICENSE
+;
+;   Copyright(c) 2016 Intel Corporation. All rights reserved.
+;   All rights reserved.
+;
+;   Redistribution and use in source and binary forms, with or without
+;   modification, are permitted provided that the following conditions
+;   are met:
+;
+; * Redistributions of source code must retain the above copyright
+;   notice, this list of conditions and the following disclaimer.
+; * Redistributions in binary form must reproduce the above copyright
+;   notice, this list of conditions and the following disclaimer in
+;   the documentation and/or other materials provided with the
+;   distribution.
+; * Neither the name of Intel Corporation nor the names of its
+;   contributors may be used to endorse or promote products derived
+;   from this software without specific prior written permission.
+;
+;   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+;   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+;   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+;   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+;   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+;   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+;   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+;   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+;   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+;   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+; 
+; RXQ0.0 --->||---> TXQ0.0
+;||
+; RXQ1.0 --->||---> TXQ1.0
+;|  Flow  |
+; RXQ2.0 --->| Actions|---> TXQ2.0
+;||
+; RXQ3.0 --->||---> TXQ3.0
+;||
+;
+;
+; Input packet: Ethernet/IPv4
+;
+; Packet buffer layout:
+; #Field Name  Offset (Bytes)  Size (Bytes)
+; 0Mbuf0   128
+; 1Headroom128 128
+; 2Ethernet header 256 14
+; 3IPv4 header 270 20
+
+[EAL]
+log_level = 0
+
+[PIPELINE0]
+type = MASTER
+core = 0
+
+[PIPELINE1]
+type = FLOW_ACTIONS
+core = 1
+pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
+pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0
+n_flows = 65536
+n_meters_per_flow = 4
+flow_id_offset = 286; ipdaddr
+ip_hdr_offset = 270
+color_offset = 128
diff --git a/examples/ip_pipeline/config/action.sh 
b/examples/ip_pipeline/config/action.sh
new file mode 100644
index 000..2986ae6
--- /dev/null
+++ b/examples/ip_pipeline/config/action.sh
@@ -0,0 +1,119 @@
+#
+# run ./config/action.sh
+#
+
+p 1 action flow 0 meter 0 trtcm 125000 125000 100 100
+p 1 action flow 0 policer 0 g G y Y r R
+p 1 action flow 0 meter 1 trtcm 125000 125000 100 100
+p 1 action flow 0 policer 1 g G y Y r R
+p 1 action flow 0 meter 2 trtcm 125000 125000 100 100
+p 1 action flow 0 policer 2 g G y Y r R
+p 1 action flow 0 meter 3 trtcm 125000 125000 100 100
+p 1 action flow 0 policer 3 g G y Y r R
+p 1 action flow 0 port 0
+
+p 1 action flow 1 meter 0 trtcm 125000 125000 100 100
+p 1 action flow 1 policer 0 g G y Y r R
+p 1 action flow 1 meter 1 trtcm 125000 125000 100 100
+p 1 action flow 1 policer 1 g G y Y r R
+p 1 action flow 1 meter 2 trtcm 125000 125000 100 100
+p 1 action flow 1 policer 2 g G y Y r R
+p 1 action flow 1 meter 3 trtcm 125000 125000 100 100
+p 1 action flow 1 policer 3 g G y Y r R
+p 1 action flow 1 port 1
+
+p 1 action flow 2 meter 0 trtcm 125000 125000 

[dpdk-dev] [PATCH v3 6/7] examples/ip_pipeline: modifies routing pipeline CLI

2016-06-08 Thread Piotr Azarewicz
Several routing commands are merged into two commands:
route and arp - these two commands are handled by cli library.
Rest of the commands are handled internaly by the pipeline code.

Signed-off-by: Piotr Azarewicz 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/config/l2fwd.cfg|5 +-
 examples/ip_pipeline/config/l3fwd.cfg|9 +-
 examples/ip_pipeline/config/l3fwd.sh |   32 +-
 examples/ip_pipeline/config/l3fwd_arp.cfg|   70 +
 examples/ip_pipeline/config/l3fwd_arp.sh |   43 +
 examples/ip_pipeline/pipeline/pipeline_routing.c | 1636 ++
 6 files changed, 551 insertions(+), 1244 deletions(-)
 create mode 100644 examples/ip_pipeline/config/l3fwd_arp.cfg
 create mode 100644 examples/ip_pipeline/config/l3fwd_arp.sh

diff --git a/examples/ip_pipeline/config/l2fwd.cfg 
b/examples/ip_pipeline/config/l2fwd.cfg
index c743a14..a1df9e6 100644
--- a/examples/ip_pipeline/config/l2fwd.cfg
+++ b/examples/ip_pipeline/config/l2fwd.cfg
@@ -1,6 +1,6 @@
 ;   BSD LICENSE
 ;
-;   Copyright(c) 2015 Intel Corporation. All rights reserved.
+;   Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
 ;   All rights reserved.
 ;
 ;   Redistribution and use in source and binary forms, with or without
@@ -44,6 +44,9 @@
 ;||
 ;

+[EAL]
+log_level = 0
+
 [PIPELINE0]
 type = MASTER
 core = 0
diff --git a/examples/ip_pipeline/config/l3fwd.cfg 
b/examples/ip_pipeline/config/l3fwd.cfg
index 5449dc3..02c8f36 100644
--- a/examples/ip_pipeline/config/l3fwd.cfg
+++ b/examples/ip_pipeline/config/l3fwd.cfg
@@ -1,6 +1,6 @@
 ;   BSD LICENSE
 ;
-;   Copyright(c) 2015 Intel Corporation. All rights reserved.
+;   Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
 ;   All rights reserved.
 ;
 ;   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,9 @@
 ; 2Ethernet header 256 14
 ; 3IPv4 header 270 20

+[EAL]
+log_level = 0
+
 [PIPELINE0]
 type = MASTER
 core = 0
@@ -59,5 +62,7 @@ type = ROUTING
 core = 1
 pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
 pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0 SINK0
-encap = ethernet; encap = ethernet / ethernet_qinq / ethernet_mpls
+encap = ethernet
+;encap = ethernet_qinq
+;encap = ethernet_mpls
 ip_hdr_offset = 270
diff --git a/examples/ip_pipeline/config/l3fwd.sh 
b/examples/ip_pipeline/config/l3fwd.sh
index 2774010..47406aa 100644
--- a/examples/ip_pipeline/config/l3fwd.sh
+++ b/examples/ip_pipeline/config/l3fwd.sh
@@ -1,9 +1,33 @@
+#
+# run ./config/l3fwd.sh
+#
+
 

 # Routing: encap = ethernet, arp = off
 

 p 1 route add default 4 #SINK0
-p 1 route add 0.0.0.0 10 port 0 ether a0:b0:c0:d0:e0:f0
-p 1 route add 0.64.0.0 10 port 1 ether a1:b1:c1:d1:e1:f1
-p 1 route add 0.128.0.0 10 port 2 ether a2:b2:c2:d2:e2:f2
-p 1 route add 0.192.0.0 10 port 3 ether a3:b3:c3:d3:e3:f3
+p 1 route add 100.0.0.0 10 port 0 ether a0:b0:c0:d0:e0:f0
+p 1 route add 100.64.0.0 10 port 1 ether a1:b1:c1:d1:e1:f1
+p 1 route add 100.128.0.0 10 port 2 ether a2:b2:c2:d2:e2:f2
+p 1 route add 100.192.0.0 10 port 3 ether a3:b3:c3:d3:e3:f3
 p 1 route ls
+
+
+# Routing: encap = ethernet_qinq, arp = off
+
+#p 1 route add default 4 #SINK0
+#p 1 route add 100.0.0.0 10 port 0 ether a0:b0:c0:d0:e0:f0 qinq 1000 2000
+#p 1 route add 100.64.0.0 10 port 1 ether a1:b1:c1:d1:e1:f1 qinq 1001 2001
+#p 1 route add 100.128.0.0 10 port 2 ether a2:b2:c2:d2:e2:f2 qinq 1002 2002
+#p 1 route add 100.192.0.0 10 port 3 ether a3:b3:c3:d3:e3:f3 qinq 1003 2003
+#p 1 route ls
+
+
+# Routing: encap = ethernet_mpls, arp = off
+
+#p 1 route add default 4 #SINK0
+#p 1 route add 100.0.0.0 10 port 0 ether a0:b0:c0:d0:e0:f0 mpls 1000:2000
+#p 1 route add 100.64.0.0 10 port 1 ether a1:b1:c1:d1:e1:f1 mpls 1001:2001
+#p 1 route add 100.128.0.0 10 port 2 ether a2:b2:c2:d2:e2:f2 mpls 1002:2002
+#p 1 route add 100.192.0.0 10 port 3 ether a3:b3:c3:d3:e3:f3 mpls 1003:2003
+#p 1 route ls
diff --git a/examples/ip_pipeline/config/l3fwd_arp.cfg 
b/examples/ip_pipeline/config/l3fwd_arp.cfg
new file mode 100644
index 000..2c63c8f
--- /dev/null
+++ b/examples/ip_pipeline/config/l3fwd_arp.cfg
@@ -0,0 +1,70 @@
+;   BSD LICENSE
+;
+;   Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
+;   All rights reserved.
+;
+;   Redistribution and use in source and binary forms, with or without
+;   modification, are permitted provided that the following conditions
+;   are met:
+;
+; * Redistributions of source code 

[dpdk-dev] [PATCH v2 0/3] testpmd: extend commands for better scatter-gather tests

2016-06-08 Thread Thomas Monjalon
> > Maciej Czekaj (3):
> >   app/testpmd: add "enable-scatter" parameter
> >   app/testpmd: extend port config with scatter parameter
> >   app/testpmd: support setting up txq_flags value in command line
> 
> Series-acked-by: Pablo de Lara 

Applied, thanks


[dpdk-dev] [PATCH v3 7/7] examples/ip_pipeline: update edge router usecase

2016-06-08 Thread Piotr Azarewicz
Update edge router usecase config files to use bulk commands.

Signed-off-by: Piotr Azarewicz 
Acked-by: Cristian Dumitrescu 
---
 .../ip_pipeline/config/edge_router_downstream.cfg  |   30 +++-
 .../ip_pipeline/config/edge_router_downstream.sh   |7 ++--
 .../ip_pipeline/config/edge_router_upstream.cfg|   36 +--
 .../ip_pipeline/config/edge_router_upstream.sh |   37 +---
 4 files changed, 67 insertions(+), 43 deletions(-)

diff --git a/examples/ip_pipeline/config/edge_router_downstream.cfg 
b/examples/ip_pipeline/config/edge_router_downstream.cfg
index 85bbab8..c6b4e1f 100644
--- a/examples/ip_pipeline/config/edge_router_downstream.cfg
+++ b/examples/ip_pipeline/config/edge_router_downstream.cfg
@@ -1,6 +1,6 @@
 ;   BSD LICENSE
 ;
-;   Copyright(c) 2015 Intel Corporation. All rights reserved.
+;   Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
 ;   All rights reserved.
 ;
 ;   Redistribution and use in source and binary forms, with or without
@@ -36,9 +36,9 @@
 ;   network) contains the following functional blocks: Packet RX & Routing,
 ;   Traffic management and Packet TX. The input packets are assumed to be
 ;   IPv4, while the output packets are Q-in-Q IPv4.
-
+;
 ;  A simple implementation for this functional pipeline is presented below.
-
+;
 ;  Packet Rx &Traffic Management   
Packet Tx
 ;   Routing(Pass-Through)
(Pass-Through)
 ; _  SWQ0  __  SWQ4  
_
@@ -50,11 +50,23 @@
 ;| | SWQ3 |  | SWQ7 |  
   |
 ; RXQ3.0 --->| |->|  |->|  
   |---> TXQ3.0
 ;|_|  |__|  
|_|
-;   | _|_ ^ _|_ ^ _|_ ^ _|_ ^
-;   ||___|||___|||___|||___||
-;   +--> SINK0   |___|||___|||___|||___||
-;  (route miss)|__|  |__|  |__|  |__|
-;  TM0   TM1   TM2   TM3
+;   |  |  ^  |  ^  |  ^  |  ^
+;   |  |__|  |__|  |__|  |__|
+;   +--> SINK0  TM0   TM1   TM2   TM3
+;  (Default)
+;
+; Input packet: Ethernet/IPv4
+; Output packet: Ethernet/QinQ/IPv4
+;
+; Packet buffer layout:
+; #Field Name  Offset (Bytes)  Size (Bytes)
+; 0Mbuf0   128
+; 1Headroom128 128
+; 2Ethernet header 256 14
+; 3IPv4 header 270 20
+
+[EAL]
+log_level = 0

 [PIPELINE0]
 type = MASTER
@@ -67,7 +79,7 @@ pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
 pktq_out = SWQ0 SWQ1 SWQ2 SWQ3 SINK0
 encap = ethernet_qinq
 qinq_sched = test
-ip_hdr_offset = 270; mbuf (128) + headroom (128) + ethernet header (14) = 270
+ip_hdr_offset = 270

 [PIPELINE2]
 type = PASS-THROUGH
diff --git a/examples/ip_pipeline/config/edge_router_downstream.sh 
b/examples/ip_pipeline/config/edge_router_downstream.sh
index ce46beb..67c3a0d 100644
--- a/examples/ip_pipeline/config/edge_router_downstream.sh
+++ b/examples/ip_pipeline/config/edge_router_downstream.sh
@@ -1,3 +1,7 @@
+#
+# run ./config/edge_router_downstream.sh
+#
+
 

 # Routing: Ether QinQ, ARP off
 

@@ -6,5 +10,4 @@ p 1 route add 0.0.0.0 10 port 0 ether a0:b0:c0:d0:e0:f0 qinq 
256 257
 p 1 route add 0.64.0.0 10 port 1 ether a1:b1:c1:d1:e1:f1 qinq 258 259
 p 1 route add 0.128.0.0 10 port 2 ether a2:b2:c2:d2:e2:f2 qinq 260 261
 p 1 route add 0.192.0.0 10 port 3 ether a3:b3:c3:d3:e3:f3 qinq 262 263
-
-p 1 route ls
+#p 1 route ls
diff --git a/examples/ip_pipeline/config/edge_router_upstream.cfg 
b/examples/ip_pipeline/config/edge_router_upstream.cfg
index a08c5cc..dea42b9 100644
--- a/examples/ip_pipeline/config/edge_router_upstream.cfg
+++ b/examples/ip_pipeline/config/edge_router_upstream.cfg
@@ -1,6 +1,6 @@
 ;   BSD LICENSE
 ;
-;   Copyright(c) 2015 Intel Corporation. All rights reserved.
+;   Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
 ;   All rights reserved.
 ;
 ;   Redistribution and use in source and binary forms, with or without
@@ -29,6 +29,7 @@
 ;   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 ;   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+
 ;   An edge router typically sits between two networks such as the provider
 ;   core network and the provider access network. A typical packet processing
 ;   pipeline for the upstream traffic (i.e. traffic from 

[dpdk-dev] [PATCH] app/test: fix bond device name too long

2016-06-08 Thread Thomas Monjalon
2016-05-27 18:38, Thomas Monjalon:
> 2016-05-27 17:20, Michal Jastrzebski:
> > Bond device name was too long (grather than 32 signs) that
> > cause mempool allocation to fail.
> 
> Maybe that this kind of failure would be avoided if the test
> was added to autotests (app/test/autotest_data.py).
> 
> Generally speaking, it would be a good idea to make an audit
> on which tests are missing in "make fast_test" and "make test".

Any comment please?


[dpdk-dev] [PATCH v4 01/39] bnxt: new driver for Broadcom NetXtreme-C devices

2016-06-08 Thread Bruce Richardson
On Wed, Jun 08, 2016 at 11:21:23AM +0100, Bruce Richardson wrote:
> On Mon, Jun 06, 2016 at 03:08:05PM -0700, Stephen Hurd wrote:
> > From: Ajit Khaparde 
> > 
> > This patch adds the initial skeleton for bnxt driver along with the
> > nic guide to tie into the build system.
> > At this point, the driver simply fails init.
> > 
> > v4:
> > Fix a warning that the document isn't included in any toctree
> > Also remove a PCI ID added erroneously.
> > 
> > Signed-off-by: Ajit Khaparde 
> > Reviewed-by: David Christensen 
> > Signed-off-by: Stephen Hurd 
> > ---
> Hi Stephen, Ajit,
> 
> in the absense of a cover letter, I'll post my overall comments on this set 
> here.
> 
> Thanks for the updated v4, I'm not seeing any checkpatch issues with the 
> patches
> that have applied and compiled up cleanly. However,
> 
> * the build is broken by patch 30, and none of the later patches 31-38 seem
> to fix it for me. Is there a header file include missing in that patch or 
> something? [I'm using gcc 5.3.1 on Fedora 23]
> * patch 39 fails to apply for me with rejects on other files in the driver,
> which is very strange. [drivers/net/bnxt/bnxt_hwrm.c, 
> drivers/net/bnxt/bnxt_ring.c and drivers/net/bnxt/bnxt_ring.h]

Sorry, my bad here, the build is not broken, the patch 30 now applies fine
and compiles ok. I had somehow missed applying an earlier patch (patch 28)
after reviewing it, and that causes the failures.

Please ignore the above comments.

> 
> Apart from this, the other concern I still have is with the explanation
> accompaning some of the patches, especially for those to with rings. There are
> many patches throughout the set which seem to be doing the same thing, adding
> allocate and free functions for rings. 
> 
> For example:
> Patch 28 is titled "add ring alloc, free and group init". For a start it's
> unclear from the title, whether the alloc and free refers to individual rings
> or to the groups. If it's referring to the rings themselves, then how is this
> different functionality from:
> Patch 7: add ring structs and free() func
> Patch 10/11: add TX/RX queue create/destroy operations
> Patch 15: code to alloc/free ring
> Patch 24: add HWRM ring alloc/free functions
> 
> Or if it's to do with allocating and freeing the groups, it would seem to be
> the same functionality as patch 25: "add ring group alloc/free functions".
> 
> In some cases, the commit message does add some detail, e.g. patches 7 and 10
> point out what they don't cover, but the rest is still very unclear, as to 
> what
> each of the 5/6 patches for ring create/free are really doing and how they
> work together. I'm not sure exactly how best to do this without understanding
> the details of these patches, but one way might be to list out the different
> part of the ring allocation/free in each patch and then explain what part of
> that process this patch is doing and how it fits in the sequence. Otherwise,
> maybe some of the patches may need to be merged if they are very closely 
> related.
> 
> Can you please look to improve the commit messages when you do rework to fix
> the compilation and patch application errors.

Since the build break was my mistake, a new rev of the patches may not be
absolutely necessary. If it's more convenient and is not too complicated, you
can perhaps just post updated comments for the above-mentioned patches to the
list and I can fix up the commit messages on patch apply. However, the patches
and commit messages are quite confusing to read as they are right now.

Thanks,
/Bruce
> 
> Thanks,
> /Bruce


[dpdk-dev] [PATCH v4 30/39] bnxt: add start/stop/link update operations

2016-06-08 Thread Bruce Richardson
On Wed, Jun 08, 2016 at 11:02:08AM +0100, Bruce Richardson wrote:
> On Mon, Jun 06, 2016 at 03:08:34PM -0700, Stephen Hurd wrote:
> > From: Ajit Khaparde 
> > 
> > This patch adds code to add the start, stop and link update dev_ops.
> > The BNXT driver will now minimally pass traffic with testpmd.
> > 
> > v4:
> > - Fix issues pointed out by checkpatch.
> > - Shorten the string passed for reserving memzone
> > when default completion ring is created.
> > 
> > Signed-off-by: Ajit Khaparde 
> > Reviewed-by: David Christensen 
> > Signed-off-by: Stephen Hurd 
> > ---
> >  drivers/net/bnxt/bnxt_ethdev.c | 269 
> > +
> >  1 file changed, 269 insertions(+)
> > 
> I get compilation errors after applying this patch:
> 
> == Build drivers/net/bnxt
>   CC bnxt_ethdev.o
> /home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c: In 
> function ?bnxt_init_chip?:
> /home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:135:7: 
> error: implicit declaration of function ?bnxt_alloc_hwrm_rings? 
> [-Werror=implicit-function-declaration]
>   rc = bnxt_alloc_hwrm_rings(bp);
>^
> /home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:135:2: 
> error: nested extern declaration of ?bnxt_alloc_hwrm_rings? 
> [-Werror=nested-externs]
>   rc = bnxt_alloc_hwrm_rings(bp);
>   ^
> /home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c: In 
> function ?bnxt_init_nic?:
> /home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:233:2: 
> error: implicit declaration of function ?bnxt_init_ring_grps? 
> [-Werror=implicit-function-declaration]
>   bnxt_init_ring_grps(bp);
>   ^
> /home/bruce/next-net/dpdk-next-net/drivers/net/bnxt/bnxt_ethdev.c:233:2: 
> error: nested extern declaration of ?bnxt_init_ring_grps? 
> [-Werror=nested-externs]
> cc1: all warnings being treated as errors
> /home/bruce/next-net/dpdk-next-net/mk/internal/rte.compile-pre.mk:126: recipe 
> for target 'bnxt_ethdev.o' failed
> make[5]: *** [bnxt_ethdev.o] Error 1
> 
Please ignore, my mistake, as I was missing patch 28.

/Bruce


[dpdk-dev] [PATCH v3 01/10] rte: change xstats to use integer ids

2016-06-08 Thread Remy Horton
'noon,

On 08/06/2016 10:37, Thomas Monjalon wrote:
> 2016-05-30 11:48, Remy Horton:
>>   struct rte_eth_xstats {
>> +/* FIXME: Remove name[] once remaining drivers converted */
>>  char name[RTE_ETH_XSTATS_NAME_SIZE];
>
> What is the plan? This field must be deprecated with an attribute.
> We cannot have 2 different APIs depending of the driver.

This is where it gets logistically tricky..

Since there's an API/ABI breakage notice in place on this, my own 
preference would be to have the entire patchset quashed into a single 
patch. Problem is that rte/app changes (patches 1 & 7-9) are normally 
applied via master whereas driver changes (patches 2-6) go in via 
dpdk-next-net - it is not clear to me how patches should be submitted 
for this case..


> What are the remaining drivers to convert?

Opps, none. All relevant drivers are converted..


> This structure and the other one (rte_eth_xstats) are badly named.
> There is only one stat in each. So they should not have the plural form.
> rte_eth_xstat and rte_eth_xstat_name would be better.

I kept rte_eth_xstats as it was the name already in use within DPDK. 
Will change the other.


>> +int rte_eth_xstats_count(uint8_t port_id);
>
> This function is useless because we can have the count with
> rte_eth_xstats_get(p, NULL, 0)
> By the way it would be more consistent to have the same behaviour
> in rte_eth_xstats_names().

Feedback I got with earlier patches was that a seperate count function 
was preferable to overloading the fetch function using *data==NULL - is 
the use of the latter specifically preferred?

Other comments noted.

..Remy


[dpdk-dev] [PATCH v5] eal: fix allocating all free hugepages

2016-06-08 Thread Sergio Gonzalez Monroy
On 31/05/2016 04:37, Jianfeng Tan wrote:
> EAL memory init allocates all free hugepages of the whole system,
> which seen from sysfs, even when applications do not ask so many.
> When there is a limitation on how many hugepages an application can
> use (such as cgroup.hugetlb), or hugetlbfs is specified with an
> option of size (exceeding the quota of the fs), it just fails to
> start even there are enough hugepages allocated.
>
> To fix above issue, this patch:
>   - Changes the logic to continue memory init to see if hugetlb
> requirement of application can be addressed by already allocated
> hugepages.
>   - To make sure each hugepage is allocated successfully, we add a
> recover mechanism, which relies on a mem access to fault-in
> hugepages, and if it fails with SIGBUS, recover to previously
> saved stack environment with siglongjmp().
>
> For the case of CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS (enabled by
> default when compiling IVSHMEM target), it's indispensable to
> mapp all free hugepages in the system. Under this case, it fails
> to start when allocating fails.
>
> Test example:
>a. cgcreate -g hugetlb:/test-subgroup
>b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup
>c. cgexec -g hugetlb:test-subgroup \
>./examples/helloworld/build/helloworld -c 0x2 -n 4
>
> 
> Fixes: af75078fece ("first public release")
>
> Signed-off-by: Jianfeng Tan
> Acked-by: Neil Horman
> ---
> v5:
>   - Make this method as default instead of using an option.
>   - When SIGBUS is triggered in the case of RTE_EAL_SINGLE_FILE_SEGMENTS,
> just return error.
>   - Add prefix "huge_" to newly added function and static variables.
>   - Move the internal_config.memory assignment after the page allocations.
> v4:
>   - Change map_all_hugepages to return unsigned instead of int.
> v3:
>   - Reword commit message to include it fixes the hugetlbfs quota issue.
>   - setjmp -> sigsetjmp.
>   - Fix RTE_LOG complaint from ERR to DEBUG as it does not mean init error
> so far.
>   - Fix the second map_all_hugepages's return value check.
> v2:
>   - Address the compiling error by move setjmp into a wrap method.
>
>   lib/librte_eal/linuxapp/eal/eal.c|  20 -
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 138 
> ---
>   2 files changed, 125 insertions(+), 33 deletions(-)
>

Acked-by: Sergio Gonzalez Monroy 


[dpdk-dev] [PATCH] doc: remove reference to MATCH

2016-06-08 Thread Mcnamara, John


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chen Jing D(Mark)
> Sent: Wednesday, June 8, 2016 9:44 AM
> To: thomas.monjalon at 6wind.com
> Cc: dev at dpdk.org; Chen, Jing D 
> Subject: [dpdk-dev] [PATCH] doc: remove reference to MATCH
> 
> From: "Chen Jing D(Mark)" 
> 
> Intel stopped supporting MATCH, remove reference of MATCH in the document.
> 
> Signed-off-by: Chen Jing D(Mark) 

Acked-by: John McNamara 




[dpdk-dev] [PATCH] mem: fix overflowed return value

2016-06-08 Thread Mrzyglod, DanielX T


>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
>Sent: Friday, April 22, 2016 6:25 PM
>To: Kobylinski, MichalX 
>Cc: thomas.monjalon at 6wind.com; dev at dpdk.org
>Subject: Re: [dpdk-dev] [PATCH] mem: fix overflowed return value
>
>On Fri, 22 Apr 2016 12:44:18 +0200
>Michal Kobylinski  wrote:
>
>> Fix issue reported by Coverity.
>>
>> Coverity ID 13255: Overflowed return value: The return value will be too
>> small or even negative, likely resulting in unexpected behavior in a
>> caller that uses the return value. In rte_mem_virt2phy: An integer
>> overflow occurs, with the overflowed value used as the return value of
>> the function
>>
>> Fixes: 3097de6e6bfb ("mem: get physical address of any pointer")
>>
>> Signed-off-by: Michal Kobylinski 
>> ---
>>  lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index 5b9132c..6ceca5b 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -195,7 +195,7 @@ rte_mem_virt2phy(const void *virtaddr)
>>   * the pfn (page frame number) are bits 0-54 (see
>>   * pagemap.txt in linux Documentation)
>>   */
>> -physaddr = ((page & 0x7fULL) * page_size)
>> +physaddr = (uint64_t)((page & 0x7fULL) * page_size)
>>  + ((unsigned long)virtaddr % page_size);
>>  close(fd);
>>  return physaddr;
>
>I am not trusting any of these Coverity patches you are sending.
>It seems you think wraparound can be just fixed by casting, it can't

>From my point of view it's False Possitive there is no chance that page_size 
>will be bigger than  long.
Coverity Assume that page_size may be 18446744071562067968 but it can't.

Only for glibc<2.1 we probably should change page_size = getpagesize();   to  
page_size = sysconf(_SC_PAGESIZE); 
May I change this Coverity to False Positive or I missed something ? What's 
your opinion ? 



[dpdk-dev] [PATCH] app/test: fix bond device name too long

2016-06-08 Thread Jastrzebski, MichalX K
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, June 08, 2016 12:40 PM
> To: Jastrzebski, MichalX K ; Iremonger,
> Bernard 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] app/test: fix bond device name too long
> 
> 2016-05-27 18:38, Thomas Monjalon:
> > 2016-05-27 17:20, Michal Jastrzebski:
> > > Bond device name was too long (grather than 32 signs) that
> > > cause mempool allocation to fail.
> >
> > Maybe that this kind of failure would be avoided if the test
> > was added to autotests (app/test/autotest_data.py).
> >
> > Generally speaking, it would be a good idea to make an audit
> > on which tests are missing in "make fast_test" and "make test".
> 
> Any comment please?

Hi Thomas,

There is a small timeout in test_tlb_tx_burst - big burst has to be generated to
detect balancing and small timeout has to be included between each burst, 
thus I am not sure if link_bonding_autotest can be classified to fast tests 
(test takes about 3-4 seconds).
We can add this test to autotests script for which time is not so critical.

Best regards
Michal


[dpdk-dev] [PATCH v3 19/20] thunderx/nicvf: updated driver documentation and release notes

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Updated doc/guides/nics/overview.rst, doc/guides/nics/thunderx.rst
> and release notes
> 
> Changed "*" to "P" in overview.rst to capture the partially supported
> feature as "*" creating alignment issues with Sphinx table
> 
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Slawomir Rosek 
> Acked-by: John McNamara 
> ---
>  doc/guides/nics/index.rst  |   1 +
>  doc/guides/nics/overview.rst   |  96 -
>  doc/guides/nics/thunderx.rst   | 354 
> +
>  doc/guides/rel_notes/release_16_07.rst |   1 +
>  4 files changed, 404 insertions(+), 48 deletions(-)
>  create mode 100644 doc/guides/nics/thunderx.rst

Hi Jerin,

This patch doesn't apply on top of origin/rel_16_07:

Applying: thunderx/nicvf: updated driver documentation and release notes
Using index info to reconstruct a base tree...
M   doc/guides/nics/overview.rst
Falling back to patching base and 3-way merge...
Auto-merging doc/guides/nics/overview.rst
CONFLICT (content): Merge conflict in doc/guides/nics/overview.rst

Regards,
ferruh


[dpdk-dev] [PATCH v3 12/20] thunderx/nicvf: add single and multi segment tx functions

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> ---
>  drivers/net/thunderx/Makefile   |   2 +
>  drivers/net/thunderx/nicvf_ethdev.c |   5 +-
>  drivers/net/thunderx/nicvf_rxtx.c   | 256 
> 
>  drivers/net/thunderx/nicvf_rxtx.h   |  93 +
>  4 files changed, 355 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/thunderx/nicvf_rxtx.c
>  create mode 100644 drivers/net/thunderx/nicvf_rxtx.h

Patch is generating following checkpatch warnings:

CHECK:BRACES: Blank lines aren't necessary after an open brace '{'
#234: FILE: drivers/net/thunderx/nicvf_rxtx.c:154:
+   sq->xmit_bufs > sq->tx_free_thresh) {
+

ERROR:CODE_INDENT: code indent should use tabs where possible
#421: FILE: drivers/net/thunderx/nicvf_rxtx.h:79:
+entry->buff[0] = (uint64_t)SQ_DESC_TYPE_GATHER << 60 |$

WARNING:LEADING_SPACE: please, no spaces at the start of a line
#421: FILE: drivers/net/thunderx/nicvf_rxtx.h:79:
+entry->buff[0] = (uint64_t)SQ_DESC_TYPE_GATHER << 60 |$

ERROR:CODE_INDENT: code indent should use tabs where possible
#424: FILE: drivers/net/thunderx/nicvf_rxtx.h:82:
+entry->buff[1] = rte_mbuf_data_dma_addr(pkt);$

WARNING:LEADING_SPACE: please, no spaces at the start of a line
#424: FILE: drivers/net/thunderx/nicvf_rxtx.h:82:
+entry->buff[1] = rte_mbuf_data_dma_addr(pkt);$




[dpdk-dev] [PATCH v8 1/3] mempool: support external mempool operations

2016-06-08 Thread Olivier Matz
Hi David,

Please find some comments below.

On 06/03/2016 04:58 PM, David Hunt wrote:

> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h

> +/**
> + * Prototype for implementation specific data provisioning function.
> + *
> + * The function should provide the implementation specific memory for
> + * for use by the other mempool ops functions in a given mempool ops struct.
> + * E.g. the default ops provides an instance of the rte_ring for this 
> purpose.
> + * it will mostlikely point to a different type of data structure, and
> + * will be transparent to the application programmer.
> + */
> +typedef void *(*rte_mempool_alloc_t)(struct rte_mempool *mp);

In http://dpdk.org/ml/archives/dev/2016-June/040233.html, I suggested
to change the prototype to return an int (-errno) and directly set
the set mp->pool_data (void *) or mp->pool_if (uint64_t). No cast
would be required in this latter case.

By the way, there is a typo in the comment:
"mostlikely" -> "most likely"

> --- /dev/null
> +++ b/lib/librte_mempool/rte_mempool_default.c

> +static void
> +common_ring_free(struct rte_mempool *mp)
> +{
> + rte_ring_free((struct rte_ring *)mp->pool_data);
> +}

I don't think the cast is needed here.
(same in other functions)


> --- /dev/null
> +++ b/lib/librte_mempool/rte_mempool_ops.c

> +/* add a new ops struct in rte_mempool_ops_table, return its index */
> +int
> +rte_mempool_ops_register(const struct rte_mempool_ops *h)
> +{
> + struct rte_mempool_ops *ops;
> + int16_t ops_index;
> +
> + rte_spinlock_lock(_mempool_ops_table.sl);
> +
> + if (rte_mempool_ops_table.num_ops >=
> + RTE_MEMPOOL_MAX_OPS_IDX) {
> + rte_spinlock_unlock(_mempool_ops_table.sl);
> + RTE_LOG(ERR, MEMPOOL,
> + "Maximum number of mempool ops structs exceeded\n");
> + return -ENOSPC;
> + }
> +
> + if (h->put == NULL || h->get == NULL || h->get_count == NULL) {
> + rte_spinlock_unlock(_mempool_ops_table.sl);
> + RTE_LOG(ERR, MEMPOOL,
> + "Missing callback while registering mempool ops\n");
> + return -EINVAL;
> + }
> +
> + if (strlen(h->name) >= sizeof(ops->name) - 1) {
> + RTE_LOG(DEBUG, EAL, "%s(): mempool_ops <%s>: name too long\n",
> + __func__, h->name);
> + rte_errno = EEXIST;
> + return NULL;
> + }

This has already been noticed by Shreyansh, but in case of:

rte_mempool_ops.c:75:10: error: return makes integer from pointer
without a cast [-Werror=int-conversion]
   return NULL;
  ^


> +/* sets mempool ops previously registered by rte_mempool_ops_register */
> +int
> +rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name)


When I compile with shared libraries enabled, I get the following error:

librte_reorder.so: undefined reference to `rte_mempool_ops_table'
librte_mbuf.so: undefined reference to `rte_mempool_set_ops_byname'
...

The new functions and global variables must be in
rte_mempool_version.map. This was in v5
( http://dpdk.org/ml/archives/dev/2016-May/039365.html ) but
was dropped in v6.




Regards,
Olivier


[dpdk-dev] [PATCH v3 01/20] thunderx/nicvf/base: add hardware API for ThunderX nicvf inbuilt NIC

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Adds hardware specific API for ThunderX nicvf inbuilt NIC device under
> drivers/net/thunderx/nicvf/base directory.
> 
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> ---

...

> +
> +struct pf_rq_cfg { union { struct {
> +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> + uint64_t reserverd1:1;
doesn't really matter but, as a detail, s/reserved/reserverd ? A few
more occurrence below.

> + uint64_t reserverd0:34;
> + uint64_t strip_pre_l2:1;
> + uint64_t caching:2;
> + uint64_t cq_qs:7;
> + uint64_t cq_idx:3;
> + uint64_t rbdr_cont_qs:7;
> + uint64_t rbdr_cont_idx:1;
> + uint64_t rbdr_strt_qs:7;
> + uint64_t rbdr_strt_idx:1;
> +#else
> + uint64_t rbdr_strt_idx:1;
> + uint64_t rbdr_strt_qs:7;
> + uint64_t rbdr_cont_idx:1;
> + uint64_t rbdr_cont_qs:7;
> + uint64_t cq_idx:3;
> + uint64_t cq_qs:7;
> + uint64_t caching:2;
> + uint64_t strip_pre_l2:1;
> + uint64_t reserverd0:34;
> + uint64_t reserverd1:1;
> +#endif
> + };
> + uint64_t value;
> +}; };
> +

...



[dpdk-dev] [PATCH v3 02/20] thunderx/nicvf: add pmd skeleton

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Introduce driver initialization and enable build infrastructure for
> nicvf pmd driver.
> 
> By default, It is enabled only for defconfig_arm64-thunderx-*
> config as it is an inbuilt NIC device.
> 
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> ---
>  config/common_base |  10 +
>  config/defconfig_arm64-thunderx-linuxapp-gcc   |  10 +
>  drivers/net/Makefile   |   1 +
>  drivers/net/thunderx/Makefile  |  63 ++
>  drivers/net/thunderx/nicvf_ethdev.c| 251 
> +
>  drivers/net/thunderx/nicvf_ethdev.h|  48 
>  drivers/net/thunderx/nicvf_logs.h  |  83 +++
>  drivers/net/thunderx/nicvf_struct.h| 124 ++
>  .../thunderx/rte_pmd_thunderx_nicvf_version.map|   4 +
>  mk/rte.app.mk  |   2 +
>  10 files changed, 596 insertions(+)
>  create mode 100644 drivers/net/thunderx/Makefile
>  create mode 100644 drivers/net/thunderx/nicvf_ethdev.c
>  create mode 100644 drivers/net/thunderx/nicvf_ethdev.h
>  create mode 100644 drivers/net/thunderx/nicvf_logs.h
>  create mode 100644 drivers/net/thunderx/nicvf_struct.h
>  create mode 100644 drivers/net/thunderx/rte_pmd_thunderx_nicvf_version.map
> 

...

> +
> + if (nic->sqs_mode) {
> + PMD_INIT_LOG(INFO, "Unsupported SQS VF detected, Detaching...");
> + /* Detach port by returning Postive error number */

s/Postive/Positive ?

...


[dpdk-dev] [PATCH v3 01/10] rte: change xstats to use integer ids

2016-06-08 Thread Thomas Monjalon
2016-06-08 12:16, Remy Horton:
> 'noon,
> 
> On 08/06/2016 10:37, Thomas Monjalon wrote:
> > 2016-05-30 11:48, Remy Horton:
> >>   struct rte_eth_xstats {
> >> +  /* FIXME: Remove name[] once remaining drivers converted */
> >>char name[RTE_ETH_XSTATS_NAME_SIZE];
> >
> > What is the plan? This field must be deprecated with an attribute.
> > We cannot have 2 different APIs depending of the driver.
> 
> This is where it gets logistically tricky..
> 
> Since there's an API/ABI breakage notice in place on this, my own 
> preference would be to have the entire patchset quashed into a single 
> patch. Problem is that rte/app changes (patches 1 & 7-9) are normally 
> applied via master whereas driver changes (patches 2-6) go in via 
> dpdk-next-net - it is not clear to me how patches should be submitted 
> for this case..

Misunderstanding here. Patches are fine and will be integrated in the
main tree because they are not only some drivers changes.
I was talking about the old API with name in rte_eth_xstats.
I have not seen the patch 9 which removes it.

> >> +int rte_eth_xstats_count(uint8_t port_id);
> >
> > This function is useless because we can have the count with
> > rte_eth_xstats_get(p, NULL, 0)
> > By the way it would be more consistent to have the same behaviour
> > in rte_eth_xstats_names().
> 
> Feedback I got with earlier patches was that a seperate count function 
> was preferable to overloading the fetch function using *data==NULL - is 
> the use of the latter specifically preferred?

I prefer the fetch/NULL style to get a count.
It also handles nicely the fetch error because of a too small buffer.


[dpdk-dev] [PATCH v3 09/10] remove name field from struct rte_eth_xstats

2016-06-08 Thread Thomas Monjalon
2016-05-30 11:48, Remy Horton:
>  struct rte_eth_xstats {
> - /* FIXME: Remove name[] once remaining drivers converted */
> - char name[RTE_ETH_XSTATS_NAME_SIZE];
>   uint64_t id;
>   uint64_t value;
>  };

While changing the content of this struct, it can be the opportunity
to fix its name from rte_eth_xstats to rte_eth_xstat.




[dpdk-dev] [PATCH v3 08/20] thunderx/nicvf: add tx_queue_setup/release support

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> ---
...

> +
> + /* Roundup nb_desc to avilable qsize and validate max number of desc */
s/avilable/available ?

> + nb_desc = nicvf_qsize_sq_roundup(nb_desc);
> + if (nb_desc == 0) {
> + PMD_INIT_LOG(ERR, "Value of nb_desc beyond available sq qsize");
> + return -EINVAL;
> + }
...


[dpdk-dev] [PATCH v3 17/20] thunderx/nicvf: add device start, stop and close support

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> ---
...
> +
> + /* Userspace process exited witout proper shutdown in last run */
s/witout/without

> + if (nicvf_qset_rbdr_active(nic, 0))
> + nicvf_dev_stop(dev);
> +
...


[dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements

2016-06-08 Thread Adrien Mazarguil
Hi Konstantin,

On Wed, Jun 08, 2016 at 10:34:17AM +, Ananyev, Konstantin wrote:
> Hi Adrien,
> 
> > 
> > An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> > when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> > applications with a log level high enough to trigger it.
> > 
> > While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> > expects it to be 0. 
> >Considering users are not expected to reset the
> > reference count to satisfy assert() and that raw functions are designed on
> > purpose without safety belts, remove these checks.
> 
> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> mbuf back to the pool.
> Not sure why do you consider that wrong?

I do not consider this wrong and I'm all for using assert() to catch
programming errors, however in this specific case, I think they are
inconsistent and misleading.

> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to 
> make
> sure mbuf's refcn==0.

Sure, however what harm does it cause (besides assert() to fail), since the
allocation function sets refcount to 1?

Why having the allocation function set the refcount if we are sure it is
already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
surely improve performance.

> BTW, why are you doing it?
> The comment clearly states that the function is for internal use:
> /**
>  * @internal Put mbuf back into its original mempool.
>  * The use of that function is reserved for RTE internal needs.
>  * Please use rte_pktmbuf_free().
>  *
>  * @param m
>  *   The mbuf to be freed.
>  */
> static inline void __attribute__((always_inline))
> __rte_mbuf_raw_free(struct rte_mbuf *m)

Several PMDs are using it anyway (won't name names, but I know one of them
quite well). I chose to modify this code instead of its users for the
following reasons:

- Considering their names, these functions should be opposites and able to
  work together like malloc()/free().

- PMDs are relying on these functions for performance reasons, we can assume
  they took the extra care necessary to make sure it would work properly.

- Preventing it would make these PMDs slower and is not acceptable either.

What remains is the consistency issue, I think these statements were only
added to catch multiple frees, and those should be caught at a higher
level, where other consistency checks are also performed.

> > Signed-off-by: Adrien Mazarguil 
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 11fa06d..7070bb8 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf 
> > *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> > if (rte_mempool_get(mp, ) < 0)
> > return NULL;
> > m = (struct rte_mbuf *)mb;
> > -   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> > rte_mbuf_refcnt_set(m, 1);
> > __rte_mbuf_sanity_check(m, 0);
> > 
> > @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >  static inline void __attribute__((always_inline))
> >  __rte_mbuf_raw_free(struct rte_mbuf *m)
> >  {
> > -   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> > rte_mempool_put(m->pool, m);
> >  }
> > 
> > --
> > 2.1.4
> 

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH v3 19/20] thunderx/nicvf: updated driver documentation and release notes

2016-06-08 Thread Jerin Jacob
On Wed, Jun 08, 2016 at 01:08:35PM +0100, Ferruh Yigit wrote:
> On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> > Updated doc/guides/nics/overview.rst, doc/guides/nics/thunderx.rst
> > and release notes
> > 
> > Changed "*" to "P" in overview.rst to capture the partially supported
> > feature as "*" creating alignment issues with Sphinx table
> > 
> > Signed-off-by: Jerin Jacob 
> > Signed-off-by: Slawomir Rosek 
> > Acked-by: John McNamara 
> > ---
> >  doc/guides/nics/index.rst  |   1 +
> >  doc/guides/nics/overview.rst   |  96 -
> >  doc/guides/nics/thunderx.rst   | 354 
> > +
> >  doc/guides/rel_notes/release_16_07.rst |   1 +
> >  4 files changed, 404 insertions(+), 48 deletions(-)
> >  create mode 100644 doc/guides/nics/thunderx.rst
> 
> Hi Jerin,
> 
> This patch doesn't apply on top of origin/rel_16_07:
> 
> Applying: thunderx/nicvf: updated driver documentation and release notes
> Using index info to reconstruct a base tree...
> M doc/guides/nics/overview.rst
> Falling back to patching base and 3-way merge...
> Auto-merging doc/guides/nics/overview.rst
> CONFLICT (content): Merge conflict in doc/guides/nics/overview.rst

Hi Ferruh,

Since docs files are files keep changing, this patch set
has been re-based on latest change-set i.e 
ca173a909538a2f1082cd0dcb4d778a97dab69c3
not origin/rel_16_07.

> 
> Regards,
> ferruh


[dpdk-dev] [PATCH v3 00/20] DPDK PMD for ThunderX NIC device

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> This patch set provides the initial version of DPDK PMD for the
> built-in NIC device in Cavium ThunderX SoC family.
> 
> Implemented features and ThunderX nicvf PMD documentation added
> in doc/guides/nics/overview.rst and doc/guides/nics/thunderx.rst
> respectively in this patch set.
> 
> These patches are checked using checkpatch.sh with following
> additional ignore option:
> options="$options --ignore=CAMELCASE,BRACKET_SPACE"
> CAMELCASE - To accommodate PRIx64
> BRACKET_SPACE - To accommodate AT inline line assembly in two places
> 
> This patch set is based on DPDK 16.07-RC1
> and tested with today's git HEAD change-set
> ca173a909538a2f1082cd0dcb4d778a97dab69c3 along with
> following depended patch
> 
> http://dpdk.org/dev/patchwork/patch/11826/
> ethdev: add tunnel and port RSS offload types
> 
> V1->V2
> 
> http://dpdk.org/dev/patchwork/patch/12609/
> -- added const for the const struct tables
> -- remove multiple blank lines
> -- addressed style comments
> http://dpdk.org/dev/patchwork/patch/12610/
> -- removed DEPDIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += lib/librte_net 
> lib/librte_malloc
> -- add const for table structs
> -- addressed style comments
> http://dpdk.org/dev/patchwork/patch/12614/
> -- s/DEFAULT_*/NICVF_DEFAULT_*/gc
> http://dpdk.org/dev/patchwork/patch/12615/
> -- Fix typos
> -- addressed style comments
> http://dpdk.org/dev/patchwork/patch/12616/
> -- removed redundant txq->tail = 0 and txq->head = 0
> http://dpdk.org/dev/patchwork/patch/12627/
> -- fixed the documentation changes
> 
> -- fixed TAB+space occurrences in functions
> -- rebased to c8c33ad7f94c59d1c0676af0cfd61207b3e808db
> 
> V2->V3
> 
> http://dpdk.org/dev/patchwork/patch/13060/
> -- Changed polling infrastructure to use rte_eal_alarm* instead of 
> timerfd_create API
> -- rebased to ca173a909538a2f1082cd0dcb4d778a97dab69c3
> 
> Jerin Jacob (20):
>   thunderx/nicvf/base: add hardware API for ThunderX nicvf inbuilt NIC
>   thunderx/nicvf: add pmd skeleton
>   thunderx/nicvf: add link status and link update support
>   thunderx/nicvf: add get_reg and get_reg_length support
>   thunderx/nicvf: add dev_configure support
>   thunderx/nicvf: add dev_infos_get support
>   thunderx/nicvf: add rx_queue_setup/release support
>   thunderx/nicvf: add tx_queue_setup/release support
>   thunderx/nicvf: add rss and reta query and update support
>   thunderx/nicvf: add mtu_set and promiscuous_enable support
>   thunderx/nicvf: add stats support
>   thunderx/nicvf: add single and multi segment tx functions
>   thunderx/nicvf: add single and multi segment rx functions
>   thunderx/nicvf: add dev_supported_ptypes_get and rx_queue_count
> support
>   thunderx/nicvf: add rx queue start and stop support
>   thunderx/nicvf: add tx queue start and stop support
>   thunderx/nicvf: add device start,stop and close support
>   thunderx/config: set max numa node to two
>   thunderx/nicvf: updated driver documentation and release notes
>   maintainers: claim responsibility for the ThunderX nicvf PMD
> 

Hi Jerin,

In patch subject, as tag, other drivers are using only driver name, and
Intel drivers also has "driver/base", since base code has some special
case. For thunderx, what do you think about keeping subject as:
 "thunderx: "

Thanks,
ferruh


[dpdk-dev] [PATCH] mk: generate internal library dependencies from DEPDIRS-y automatically

2016-06-08 Thread Olivier Matz


On 06/07/2016 04:40 PM, Wiles, Keith wrote:
> On 6/7/16, 9:19 AM, "dev on behalf of Thomas Monjalon"  dpdk.org on behalf of thomas.monjalon at 6wind.com> wrote:
> 
>> 2016-06-07 15:07, Bruce Richardson:
>>> On Tue, Jun 07, 2016 at 03:00:45PM +0200, Thomas Monjalon wrote:
 2016-06-07 14:36, Christian Ehrhardt:
> But I still struggle to see how to fix the circular dependency between
> librte_eal and librte_mempool.

 Why is there a circular dependency?
 Only because of logs using mempool?

> Maybe now is a time to look at this part of the original threads again to
> eventually get apps less overlinked?
> => http://www.dpdk.org/ml/archives/dev/2016-May/039441.html
> My naive suggestions in generalized form can be found there (no answer 
> yet):
> =>
> http://stackoverflow.com/questions/37351699/how-to-create-both-so-files-for-two-circular-depending-libraries

 I would prefer removing the circular dependency.
 Maybe we can rewrite the code to not use mempool or move it outside of EAL.

Indeed, mempools are used in eal for history. Is this feature still
useful now that logs are sent to syslog? Maybe we could deprecate this
API, and remove mempool calls in a future release?


>>> Or else we can take the attitude that the mempools and the rings are just a 
>>> core
>>> part of DPDK and move them and the EAL into a dpdk_core library at link 
>>> time.
>>> Having the code separate in the git tree is good, but I'm not sure having
>>> the resulting object files being in separate .a/.so files is particularly 
>>> useful.
>>> I can't see someone wanting to use one without the other.
>>
>> EAL could be used as an abstraction layer on top of systems and platforms.
>> And I think keeping things separated and layered help to maintain a design
>> easy to understand.

I like the idea to have one lib per directory (for consistency). It
may also simplify the Makefiles.
I'm in favor of keeping mempool and ring separated from eal if we
can remove the circular dep.

Olivier


[dpdk-dev] [PATCH] eal/linux: fix undefined allocation of 0 bytes (CERT MEM04-C; CWE-131)

2016-06-08 Thread Sergio Gonzalez Monroy
On 27/04/2016 18:06, Daniel Mrzyglod wrote:
> Fix issue reported by clang scan-build
>
> there is a chance that nr_hugepages will be 0 if conditions for loop
> for (i = 0; i < (int) internal_config.num_hugepage_sizes; i++)
> will be unmeet.
>
> Fixes: b6a468ad41d5 ("memory: add --socket-mem option")
>
> Signed-off-by: Daniel Mrzyglod 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..e94538e 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1114,6 +1114,8 @@ rte_eal_hugepage_init(void)
>* processing done on these pages, shared memory will be created
>* at a later stage.
>*/
> + if (nr_hugepages == 0)
> + goto fail;
>   tmp_hp = malloc(nr_hugepages * sizeof(struct hugepage_file));
>   if (tmp_hp == NULL)
>   goto fail;

The behavior of malloc(0) is implementation-defined, but on Linux man 
page it says that returns NULL.
So strictly speaking, without the patch the outcome is the same cause 
malloc(0) will return NULL.

Now, I'd consider the patch not needed but it doesn't really harm either.
Anyone else has comments/thoughts about it?

Regarding the patch itself, I think the title and commit message need to 
be modify to reflect that the patch
goal is to handle nr_hugepages = 0 case without relying in malloc to 
return NULL.

Sergio




[dpdk-dev] [PATCH v3 00/20] DPDK PMD for ThunderX NIC device

2016-06-08 Thread Jerin Jacob
On Wed, Jun 08, 2016 at 01:30:28PM +0100, Ferruh Yigit wrote:
> On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> > Jerin Jacob (20):
> >   thunderx/nicvf/base: add hardware API for ThunderX nicvf inbuilt NIC
> >   thunderx/nicvf: add pmd skeleton
> >   thunderx/nicvf: add link status and link update support
> >   thunderx/nicvf: add get_reg and get_reg_length support
> >   thunderx/nicvf: add dev_configure support
> >   thunderx/nicvf: add dev_infos_get support
> >   thunderx/nicvf: add rx_queue_setup/release support
> >   thunderx/nicvf: add tx_queue_setup/release support
> >   thunderx/nicvf: add rss and reta query and update support
> >   thunderx/nicvf: add mtu_set and promiscuous_enable support
> >   thunderx/nicvf: add stats support
> >   thunderx/nicvf: add single and multi segment tx functions
> >   thunderx/nicvf: add single and multi segment rx functions
> >   thunderx/nicvf: add dev_supported_ptypes_get and rx_queue_count
> > support
> >   thunderx/nicvf: add rx queue start and stop support
> >   thunderx/nicvf: add tx queue start and stop support
> >   thunderx/nicvf: add device start,stop and close support
> >   thunderx/config: set max numa node to two
> >   thunderx/nicvf: updated driver documentation and release notes
> >   maintainers: claim responsibility for the ThunderX nicvf PMD
> > 
> 
> Hi Jerin,
> 
> In patch subject, as tag, other drivers are using only driver name, and
> Intel drivers also has "driver/base", since base code has some special
> case. For thunderx, what do you think about keeping subject as:
>  "thunderx: "
> 

Hi Ferruh,

We may add crypto or other builtin ThunderX HW accelerated block drivers
in future to DPDK.
So that is the reason why I thought of keeping the subject as thunderx/nicvf.
If you don't have any objection then I would like to keep it as
thunderx/nicvf or just nicvf.

> Thanks,
> ferruh


[dpdk-dev] [PATCH v3 12/20] thunderx/nicvf: add single and multi segment tx functions

2016-06-08 Thread Ferruh Yigit
On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> ---
>  drivers/net/thunderx/Makefile   |   2 +
>  drivers/net/thunderx/nicvf_ethdev.c |   5 +-
>  drivers/net/thunderx/nicvf_rxtx.c   | 256 
> 
>  drivers/net/thunderx/nicvf_rxtx.h   |  93 +
>  4 files changed, 355 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/thunderx/nicvf_rxtx.c
>  create mode 100644 drivers/net/thunderx/nicvf_rxtx.h
> 
> diff --git a/drivers/net/thunderx/Makefile b/drivers/net/thunderx/Makefile

...

> +
> +static inline uint32_t __hot
> +nicvf_free_xmittted_buffers(struct nicvf_txq *sq, struct rte_mbuf **tx_pkts,

again, although this is perfectly fine, any intention to say xmitted
instead of xmittted?

...


[dpdk-dev] about rx checksum flags

2016-06-08 Thread Olivier Matz
Hi,

On 06/08/2016 10:22 AM, Chandran, Sugesh wrote:
>>> I guess the IP checksum also important as L4. In some cases, UDP
>>> checksum is zero and no need to validate it. But Ip checksum is
>>> present on all the packets and that must be validated all  the time.
>>> At higher packet rate, the ip checksum offload can offer slight performance
>> improvement. What do you think??
>>>
>>
>> Agree, in some situations (and this is even more true with packet types /
>> smartnics), the application could process without accessing the packet data 
>> if
>> we keep the IP cksum flags.
> [Sugesh] True, If that's the case, Will you considering to implement IP
> checksum flags as well along with L4?
> As you said , this will be useful when we offload packet lookup itself into 
> the NICs(May be
> when using Flow director) ? 

Yes, I plan to implement the same rx status flags (good, bad, unknown,
none) for rx IP checksum too.

Regards,
Olivier


[dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements

2016-06-08 Thread Ananyev, Konstantin
> 
> Hi Konstantin,
> 
> On Wed, Jun 08, 2016 at 10:34:17AM +, Ananyev, Konstantin wrote:
> > Hi Adrien,
> >
> > >
> > > An assertion failure occurs in __rte_mbuf_raw_free() (called by a few 
> > > PMDs)
> > > when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> > > applications with a log level high enough to trigger it.
> > >
> > > While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> > > expects it to be 0.
> > >Considering users are not expected to reset the
> > > reference count to satisfy assert() and that raw functions are designed on
> > > purpose without safety belts, remove these checks.
> >
> > Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> > Wright now, it is a user responsibility to make sure refcnt==0 before 
> > pushing
> > mbuf back to the pool.
> > Not sure why do you consider that wrong?
> 
> I do not consider this wrong and I'm all for using assert() to catch
> programming errors, however in this specific case, I think they are
> inconsistent and misleading.

Honestly, I don't understand why.
Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should 
be equal zero.
Yes, as you pointed below - that rule probably can be changed to: 
when mbuf is in the pool, it's refcnt should equal one, and that would probably 
allow us
to speedup things a bit, but I suppose that's the matter of another 
aptch/discussion.

> 
> > If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to 
> > make
> > sure mbuf's refcn==0.
> 
> Sure, however what harm does it cause (besides assert() to fail), since the
> allocation function sets refcount to 1?
> 
> Why having the allocation function set the refcount if we are sure it is
> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
> surely improve performance.

That's' just an assert() enabled when MBUF_DEBUG  is on.
Its sole purpose is to help troubleshoot the bugs and help to catch situations
when someone silently updates mbufs supposed to be free.  

> 
> > BTW, why are you doing it?
> > The comment clearly states that the function is for internal use:
> > /**
> >  * @internal Put mbuf back into its original mempool.
> >  * The use of that function is reserved for RTE internal needs.
> >  * Please use rte_pktmbuf_free().
> >  *
> >  * @param m
> >  *   The mbuf to be freed.
> >  */
> > static inline void __attribute__((always_inline))
> > __rte_mbuf_raw_free(struct rte_mbuf *m)
> 
> Several PMDs are using it anyway (won't name names, but I know one of them
> quite well).

Then it probably is a bug in these PMDs that need to be fixed.


> I chose to modify this code instead of its users for the
> following reasons:
> 
> - Considering their names, these functions should be opposites and able to
>   work together like malloc()/free().

These are internal functions.
Comments in mbuf clearly state that library users shouldn't call them directly.
They are written to fit internal librte_mbuf needs, and no-one ever promised
malloc/free() compatibility here. 

> 
> - PMDs are relying on these functions for performance reasons, we can assume
>   they took the extra care necessary to make sure it would work properly.

That just doesn't seem correct to me.
The proper way to do free fo mbuf segment is:

static inline void __attribute__((always_inline))
rte_pktmbuf_free_seg(struct rte_mbuf *m)
{
if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m {
m->next = NULL;
__rte_mbuf_raw_free(m);
}
}

If by some reason you choose not to use this function, then it is your
responsibility to perform similar actions on your own before pushing mbuf into 
the pool.
That's what some TX functions for some Intel NICs do to improve performance:
they call _prefree_seg() manually and try to put mbufs into the pool in groups.

> 
> - Preventing it would make these PMDs slower and is not acceptable either.

I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that 
severe...
But ok, probably  you do have some very specific case, but then why you PMD 
just doesn't call:
rte_mempool_put(m->pool, m); 
directly?
Why instead you choose to change common functions and compromise
librte_mbuf debug ability?

> 
> What remains is the consistency issue, I think these statements were only
> added to catch multiple frees,

Yes these asserts() here to help catch bugs,
and I think it is a good thing to have them here. 

> and those should be caught at a higher
> level, where other consistency checks are also performed.

Like where?

Konstantin


> 
> > > Signed-off-by: Adrien Mazarguil 
> > > ---
> > >  lib/librte_mbuf/rte_mbuf.h | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 11fa06d..7070bb8 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -1108,7 +1108,6 @@ static inline struct 

[dpdk-dev] [PATCH v3 00/20] DPDK PMD for ThunderX NIC device

2016-06-08 Thread Ferruh Yigit
On 6/8/2016 1:43 PM, Jerin Jacob wrote:
> On Wed, Jun 08, 2016 at 01:30:28PM +0100, Ferruh Yigit wrote:
>> On 6/7/2016 5:40 PM, Jerin Jacob wrote:
>>> Jerin Jacob (20):
>>>   thunderx/nicvf/base: add hardware API for ThunderX nicvf inbuilt NIC
>>>   thunderx/nicvf: add pmd skeleton
>>>   thunderx/nicvf: add link status and link update support
>>>   thunderx/nicvf: add get_reg and get_reg_length support
>>>   thunderx/nicvf: add dev_configure support
>>>   thunderx/nicvf: add dev_infos_get support
>>>   thunderx/nicvf: add rx_queue_setup/release support
>>>   thunderx/nicvf: add tx_queue_setup/release support
>>>   thunderx/nicvf: add rss and reta query and update support
>>>   thunderx/nicvf: add mtu_set and promiscuous_enable support
>>>   thunderx/nicvf: add stats support
>>>   thunderx/nicvf: add single and multi segment tx functions
>>>   thunderx/nicvf: add single and multi segment rx functions
>>>   thunderx/nicvf: add dev_supported_ptypes_get and rx_queue_count
>>> support
>>>   thunderx/nicvf: add rx queue start and stop support
>>>   thunderx/nicvf: add tx queue start and stop support
>>>   thunderx/nicvf: add device start,stop and close support
>>>   thunderx/config: set max numa node to two
>>>   thunderx/nicvf: updated driver documentation and release notes
>>>   maintainers: claim responsibility for the ThunderX nicvf PMD
>>>
>>
>> Hi Jerin,
>>
>> In patch subject, as tag, other drivers are using only driver name, and
>> Intel drivers also has "driver/base", since base code has some special
>> case. For thunderx, what do you think about keeping subject as:
>>  "thunderx: "
>>
> 
> Hi Ferruh,
> 
> We may add crypto or other builtin ThunderX HW accelerated block drivers
> in future to DPDK.
> So that is the reason why I thought of keeping the subject as thunderx/nicvf.
> If you don't have any objection then I would like to keep it as
> thunderx/nicvf or just nicvf.
> 

Ring has similar problem, but we are using same tag "ring:" for both
ring_pmd and ring library.

For this case perhaps we can use net/thunderx, crypto/thunderx ?

I am not aware of any defined convention for the case.

Thanks,
ferruh





[dpdk-dev] [PATCH v3 19/20] thunderx/nicvf: updated driver documentation and release notes

2016-06-08 Thread Bruce Richardson
On Wed, Jun 08, 2016 at 05:57:16PM +0530, Jerin Jacob wrote:
> On Wed, Jun 08, 2016 at 01:08:35PM +0100, Ferruh Yigit wrote:
> > On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> > > Updated doc/guides/nics/overview.rst, doc/guides/nics/thunderx.rst
> > > and release notes
> > > 
> > > Changed "*" to "P" in overview.rst to capture the partially supported
> > > feature as "*" creating alignment issues with Sphinx table
> > > 
> > > Signed-off-by: Jerin Jacob 
> > > Signed-off-by: Slawomir Rosek 
> > > Acked-by: John McNamara 
> > > ---
> > >  doc/guides/nics/index.rst  |   1 +
> > >  doc/guides/nics/overview.rst   |  96 -
> > >  doc/guides/nics/thunderx.rst   | 354 
> > > +
> > >  doc/guides/rel_notes/release_16_07.rst |   1 +
> > >  4 files changed, 404 insertions(+), 48 deletions(-)
> > >  create mode 100644 doc/guides/nics/thunderx.rst
> > 
> > Hi Jerin,
> > 
> > This patch doesn't apply on top of origin/rel_16_07:
> > 
> > Applying: thunderx/nicvf: updated driver documentation and release notes
> > Using index info to reconstruct a base tree...
> > M   doc/guides/nics/overview.rst
> > Falling back to patching base and 3-way merge...
> > Auto-merging doc/guides/nics/overview.rst
> > CONFLICT (content): Merge conflict in doc/guides/nics/overview.rst
> 
> Hi Ferruh,
> 
> Since docs files are files keep changing, this patch set
> has been re-based on latest change-set i.e 
> ca173a909538a2f1082cd0dcb4d778a97dab69c3
> not origin/rel_16_07.
>

The nic overview.rst doc causes lots of conflicts when merging, and those
I just fix on apply. There is no need to do a new patch revision solely for 
that.

So nothing to see here, move along... :-)

Regards,
/Bruce


[dpdk-dev] [PATCH v3 00/20] DPDK PMD for ThunderX NIC device

2016-06-08 Thread Bruce Richardson
On Wed, Jun 08, 2016 at 06:13:21PM +0530, Jerin Jacob wrote:
> On Wed, Jun 08, 2016 at 01:30:28PM +0100, Ferruh Yigit wrote:
> > On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> > > Jerin Jacob (20):
> > >   thunderx/nicvf/base: add hardware API for ThunderX nicvf inbuilt NIC
> > >   thunderx/nicvf: add pmd skeleton
> > >   thunderx/nicvf: add link status and link update support
> > >   thunderx/nicvf: add get_reg and get_reg_length support
> > >   thunderx/nicvf: add dev_configure support
> > >   thunderx/nicvf: add dev_infos_get support
> > >   thunderx/nicvf: add rx_queue_setup/release support
> > >   thunderx/nicvf: add tx_queue_setup/release support
> > >   thunderx/nicvf: add rss and reta query and update support
> > >   thunderx/nicvf: add mtu_set and promiscuous_enable support
> > >   thunderx/nicvf: add stats support
> > >   thunderx/nicvf: add single and multi segment tx functions
> > >   thunderx/nicvf: add single and multi segment rx functions
> > >   thunderx/nicvf: add dev_supported_ptypes_get and rx_queue_count
> > > support
> > >   thunderx/nicvf: add rx queue start and stop support
> > >   thunderx/nicvf: add tx queue start and stop support
> > >   thunderx/nicvf: add device start,stop and close support
> > >   thunderx/config: set max numa node to two
> > >   thunderx/nicvf: updated driver documentation and release notes
> > >   maintainers: claim responsibility for the ThunderX nicvf PMD
> > > 
> > 
> > Hi Jerin,
> > 
> > In patch subject, as tag, other drivers are using only driver name, and
> > Intel drivers also has "driver/base", since base code has some special
> > case. For thunderx, what do you think about keeping subject as:
> >  "thunderx: "
> > 
> 
> Hi Ferruh,
> 
> We may add crypto or other builtin ThunderX HW accelerated block drivers
> in future to DPDK.
> So that is the reason why I thought of keeping the subject as thunderx/nicvf.
> If you don't have any objection then I would like to keep it as
> thunderx/nicvf or just nicvf.

Are you upstreaming kernel modules for this device? If so, what is the Linux
kernel module-name for this device going to be, as perhaps that can help us
here?

Regards,
/Bruce


[dpdk-dev] [PATCH] app/test: fix bond device name too long

2016-06-08 Thread Thomas Monjalon
2016-06-08 11:50, Jastrzebski, MichalX K:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-05-27 18:38, Thomas Monjalon:
> > > 2016-05-27 17:20, Michal Jastrzebski:
> > > > Bond device name was too long (grather than 32 signs) that
> > > > cause mempool allocation to fail.
> > >
> > > Maybe that this kind of failure would be avoided if the test
> > > was added to autotests (app/test/autotest_data.py).
> > >
> > > Generally speaking, it would be a good idea to make an audit
> > > on which tests are missing in "make fast_test" and "make test".
> > 
> > Any comment please?
> 
> Hi Thomas,
> 
> There is a small timeout in test_tlb_tx_burst - big burst has to be generated 
> to
> detect balancing and small timeout has to be included between each burst, 
> thus I am not sure if link_bonding_autotest can be classified to fast tests 
> (test takes about 3-4 seconds).
> We can add this test to autotests script for which time is not so critical.

The bug we see here could be detected by just initializing bonding.
Maybe we can consider having some basic/fast tests and others longer.


[dpdk-dev] [PATCH v3 00/20] DPDK PMD for ThunderX NIC device

2016-06-08 Thread Jerin Jacob
On Wed, Jun 08, 2016 at 02:22:55PM +0100, Bruce Richardson wrote:
> On Wed, Jun 08, 2016 at 06:13:21PM +0530, Jerin Jacob wrote:
> > On Wed, Jun 08, 2016 at 01:30:28PM +0100, Ferruh Yigit wrote:
> > > On 6/7/2016 5:40 PM, Jerin Jacob wrote:
> > > > Jerin Jacob (20):
> > > >   thunderx/nicvf/base: add hardware API for ThunderX nicvf inbuilt NIC
> > > >   thunderx/nicvf: add pmd skeleton
> > > >   thunderx/nicvf: add link status and link update support
> > > >   thunderx/nicvf: add get_reg and get_reg_length support
> > > >   thunderx/nicvf: add dev_configure support
> > > >   thunderx/nicvf: add dev_infos_get support
> > > >   thunderx/nicvf: add rx_queue_setup/release support
> > > >   thunderx/nicvf: add tx_queue_setup/release support
> > > >   thunderx/nicvf: add rss and reta query and update support
> > > >   thunderx/nicvf: add mtu_set and promiscuous_enable support
> > > >   thunderx/nicvf: add stats support
> > > >   thunderx/nicvf: add single and multi segment tx functions
> > > >   thunderx/nicvf: add single and multi segment rx functions
> > > >   thunderx/nicvf: add dev_supported_ptypes_get and rx_queue_count
> > > > support
> > > >   thunderx/nicvf: add rx queue start and stop support
> > > >   thunderx/nicvf: add tx queue start and stop support
> > > >   thunderx/nicvf: add device start,stop and close support
> > > >   thunderx/config: set max numa node to two
> > > >   thunderx/nicvf: updated driver documentation and release notes
> > > >   maintainers: claim responsibility for the ThunderX nicvf PMD
> > > > 
> > > 
> > > Hi Jerin,
> > > 
> > > In patch subject, as tag, other drivers are using only driver name, and
> > > Intel drivers also has "driver/base", since base code has some special
> > > case. For thunderx, what do you think about keeping subject as:
> > >  "thunderx: "
> > > 
> > 
> > Hi Ferruh,
> > 
> > We may add crypto or other builtin ThunderX HW accelerated block drivers
> > in future to DPDK.
> > So that is the reason why I thought of keeping the subject as 
> > thunderx/nicvf.
> > If you don't have any objection then I would like to keep it as
> > thunderx/nicvf or just nicvf.
> 
> Are you upstreaming kernel modules for this device? If so, what is the Linux
> kernel module-name for this device going to be, as perhaps that can help us
> here?

Yes, Kernel module has been upstreamed.
the commit log in linux kernel is "net: thunderx: ..."

> 
> Regards,
> /Bruce


[dpdk-dev] [PATCH] mk: generate internal library dependencies from DEPDIRS-y automatically

2016-06-08 Thread Thomas Monjalon
2016-06-08 14:34, Olivier Matz:
> On 06/07/2016 04:40 PM, Wiles, Keith wrote:
> > On 6/7/16, 9:19 AM, "dev on behalf of Thomas Monjalon"  > dpdk.org on behalf of thomas.monjalon at 6wind.com> wrote:
> > 
> >> 2016-06-07 15:07, Bruce Richardson:
> >>> On Tue, Jun 07, 2016 at 03:00:45PM +0200, Thomas Monjalon wrote:
>  2016-06-07 14:36, Christian Ehrhardt:
> > But I still struggle to see how to fix the circular dependency between
> > librte_eal and librte_mempool.
> 
>  Why is there a circular dependency?
>  Only because of logs using mempool?
> 
> > Maybe now is a time to look at this part of the original threads again 
> > to
> > eventually get apps less overlinked?
> > => http://www.dpdk.org/ml/archives/dev/2016-May/039441.html
> > My naive suggestions in generalized form can be found there (no answer 
> > yet):
> > =>
> > http://stackoverflow.com/questions/37351699/how-to-create-both-so-files-for-two-circular-depending-libraries
> 
>  I would prefer removing the circular dependency.
>  Maybe we can rewrite the code to not use mempool or move it outside of 
>  EAL.
> 
> Indeed, mempools are used in eal for history. Is this feature still
> useful now that logs are sent to syslog? Maybe we could deprecate this
> API, and remove mempool calls in a future release?

+1 to deprecate log history

> >>> Or else we can take the attitude that the mempools and the rings are just 
> >>> a core
> >>> part of DPDK and move them and the EAL into a dpdk_core library at link 
> >>> time.
> >>> Having the code separate in the git tree is good, but I'm not sure having
> >>> the resulting object files being in separate .a/.so files is particularly 
> >>> useful.
> >>> I can't see someone wanting to use one without the other.
> >>
> >> EAL could be used as an abstraction layer on top of systems and platforms.
> >> And I think keeping things separated and layered help to maintain a design
> >> easy to understand.
> 
> I like the idea to have one lib per directory (for consistency). It
> may also simplify the Makefiles.
> I'm in favor of keeping mempool and ring separated from eal if we
> can remove the circular dep.

+1 to keep bijectivity in src/lib.


[dpdk-dev] [PATCH v5 0/9] add packet capture framework

2016-06-08 Thread Reshma Pattan
This patch set include below changes

1)Changes to librte_ether.
2)A new library librte_pdump added for packet capture framework.
3)A new app/pdump tool added for packet capturing.
4)Test pmd changes done to initialize packet capture framework.
5)Documentation update.

1)librte_pdump
==
To support packet capturing on dpdk Ethernet devices, a new library librte_pdump
is added.Users can develop their own packet capturing application using new 
library APIs.

Operation:
--
Pdump library provides APIs to support packet capturing on dpdk Ethernet 
devices.
Library provides APIs to initialize the packet capture framework, enable/disable
the packet capture and uninitialize the packet capture framework.

Pdump library works on client/server based model.

Sever is responsible for enabling/disabling the packet captures.
Clients are responsible for requesting enable/disable of the
packet captures.

As part of packet capture framework initialization, pthread and
the server socket is created. Only one server socket is allowed on the system.
As part of enabling/disabling the packet capture, client sockets are created
and multiple client sockets are allowed.
Who ever calls initialization first they will succeed with the initialization,
next subsequent calls of initialization are not allowed. So next users can only
request enabling/disabling the packet capture.

Applications using below APIs need to pass port/device_id, queue, mempool and
ring parameters. Library uses user provided ring and mempool to mirror the rx/tx
packets of the port for users. Users need to dequeue the rings and write the 
packets
to vdev(pcap/tuntap) to view the packets using any standard tools.

Note:
Mempool and Ring should be mc/mp supportable.
Mempool mbuf size should be big enough to handle the rx/tx packets of a port.

APIs:
-
rte_pdump_init()
rte_pdump_enable()
rte_pdump_enable_by_deviceid()
rte_pdump_disable()
rte_pdump_disable_by_deviceid()
rte_pdump_uninit()

2)app/pdump tool

Tool app/pdump is designed based on librte_pdump for packet capturing in DPDK.
This tool by default runs as secondary process, and provides the support for
the command line options for packet capture.

./build/app/dpdk_pdump --
   --pdump '(port= | device_id=),
(queue=),
(rx-dev= |
 tx-dev=),
[ring-size=],
[mbuf-size=],
[total-num-mbufs=]'

Parameters inside the parenthesis represents the mandatory parameters.
Parameters inside the square brackets represents optional parameters.
User has to pass on packet capture parameters under --pdump parameters, 
multiples of
--pdump can be passed to capture packets on different port and queue 
combinations

Operation:
--
*Tool parse the user command line arguments,
creates the mempool, ring and the PCAP PMD vdev with 'tx_stream' as either
of the device passed in rx-dev|tx-dev parameters.

*Then calls the APIs of librte_pdump i.e. 
rte_pdump_enable()/rte_pdump_enable_by_deviceid()
to enable packet capturing on a specific port/device_id and queue by passing on
port|device_id, queue, mempool and ring info.

*Tool runs in while loop to dequeue the packets from the ring and write them to 
pcap device.

*Tool can be stopped using SIGINT, upon which tool calls
rte_pdump_disable()/rte_pdump_disable_by_deviceid() and free the allocated 
resources.

Note:
CONFIG_RTE_LIBRTE_PMD_PCAP flag should be set to yes to compile and run the 
pdump tool.

3)Test-pmd changes
==
Changes are done to test-pmd application to initialize/uninitialize the packet 
capture framework.
So app/pdump tool can be run to see packets of dpdk ports that are used by 
test-pmd.

Similarly any application which needs packet capture should call 
initialize/uninitialize APIs of
librte_pdump and use pdump tool to start the capture.

4)Packet capture flow between pdump tool and librte_pdump
=
* Pdump tool (Secondary process) requests packet capture
for specific port|device_id and queue combinations.

*Library in secondary process context creates client socket and communicates
the port|device_id, queue, ring and mempool to server.

*Library initializes server in primary process 'test-pmd' context and server 
serves
the client request to enable Ethernet rxtx call-backs for a given 
port|device_id and queue.

*Copy the rx/tx packets to passed mempool and enqueue the packets to ring for 
secondary process.

*Pdump tool will dequeue the packets from ring and writes them to PCAPMD vdev,
so ultimately packets will be seen on the device that is passed in 
rx-dev|tx-dev.

*Once the pdump tool is terminated with SIGINT it will disable the packet 
capturing.

*Library receives the disable packet capture request, communicate the info to 
server,
server will 

  1   2   >