[dpdk-dev] [PATCH] kni: add support for core_id param in single threaded mode

2016-10-13 Thread Thomas Monjalon
> > Allow binding KNI thread to specific core in single threaded mode
> > by setting core_id and force_bind config parameters.
> > 
> > Signed-off-by: Vladyslav Buslov 
> > Acked-by: Ferruh Yigit 
> 
> Tested-by: Ferruh Yigit 

Applied, thanks


[dpdk-dev] [PATCH v4 2/3] app/test: add a case to verify lpm tlb8 recycle

2016-10-13 Thread Thomas Monjalon
2016-08-08 14:40, Wei Dai:
> As a bug-fix for lpm tlb8 recycle is introduced,
> add a test case to verify tlb8 group is correctly
> freed when it only includes a rule with depth=24.
> 
> Signed-off-by: Wei Dai 
> Acked-by: Bruce Richardson 

Series applied with typo tlb/tbl fixed


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>+/* When doing TSO, the IP length is not included in the pseudo header
> >>+ * checksum of the packet given to the PMD, but for virtio it is
> >>+ * expected.
> >>+ */
> >>+static void
> >>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>+{
> >>+   /* common case: header is not fragmented */
> >>+   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>+   m->l4_len)) {
> >...
> >>+   /* replace it in the packet */
> >>+   th->cksum = new_cksum;
> >>+   } else {
> >...
> >>+   /* replace it in the packet */
> >>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+   m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+   m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>+   }
> >
> >The tcp header will always be in the mbuf, right? Otherwise, you can't
> >update the cksum field here. What's the point of introducing the "else
> >clause" then?
> 
> Sorry, I don't see the problem you're pointing out here.
> 
> What I want to solve here is to support the cases where the mbuf is
> segmented in the middle of the network header (which is probably a rare
> case).

How it's gonna segmented?

> 
> In the "else" part, I only access the mbuf byte by byte using the
> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> the header in a linear buffer, fix the checksum, then copy it again in the
> packet, but there is no mbuf helpers to do these copies for now.

In the "else" clause, the ip header is still in the mbuf, right?
Why do you have to access it the way like:

ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
m->l2_len) >> 4;

Why can't you just use

iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
iph->version_ihl ;

Sorry, I'm just a bit lost.

--yliu


[dpdk-dev] [PATCH v4] log: respect rte_openlog_stream calls before rte_eal_init

2016-10-13 Thread Thomas Monjalon
2016-10-12 12:38, John Ousterhout:
> Before this patch, application-specific loggers could not be
> installed before rte_eal_init completed (the initialization process
> called rte_openlog_stream, overwriting any previously installed
> logger). This made it impossible for an application to capture the
> initial log messages generated during rte_eal_init. This patch changes
> initialization so that information from a previous call to
> rte_openlog_stream is not lost. Specifically:
> * The default log stream is now maintained separately from an
>   application-specific log stream installed with rte_openlog_stream.
> * rte_eal_common_log_init has been renamed to eal_log_set_default,
>   since this is all it does. It no longer invokes rte_openlog_stream; it
>   just updates the default stream. Also, this method now returns void,
>   rather than int, since there are no errors.
> 
> This patch also removes the "early log" mechanism and cleans up the
> log initialization mechanism:
> * The default log stream defaults to stderr on all platforms if
>   eal_log_set_default hasn't been invoked (Linux used to use stdout
>   during the first part of initialization).
> * Removed rte_eal_log_early_init; all of the desired functionality can
>   be achieved by calling eal_log_set_default.
> * Removed lib/librte_eal/bsdapp/eal/eal_log.c: it contained only one
>   function, rte_eal_log_init, which is not needed or invoked for BSD.
> * Removed declaration for eal_default_log_stream in rte_log.h (it's now
>   private to eal_common_log.c).
> * Moved call to rte_eal_log_init earlier in rte_eal_init for Linux, so
>   that it starts using the preferrred log ASAP.
> 
> Signed-off-by: John Ousterhout 

Applied, thanks



[dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support

2016-10-13 Thread Olivier Matz


On 10/13/2016 08:50 PM, Thomas Monjalon wrote:
> 2016-10-14 00:05, Yuanhan Liu:
>> On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
>>> +/* When doing TSO, the IP length is not included in the pseudo header
>>> + * checksum of the packet given to the PMD, but for virtio it is
>>> + * expected.
>>> + */
>>> +static void
>>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>>> +{
>>> +   /* common case: header is not fragmented */
>>> +   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>>> +   m->l4_len)) {
>>> +   struct ipv4_hdr *iph;
>>> +   struct ipv6_hdr *ip6h;
>>> +   struct tcp_hdr *th;
>>> +   uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
>>> +   uint32_t tmp;
>> ...
>>> +   } else {
>>
>> As discussed just now, if you drop the else part, you could add my
>> ACK for the whole virtio changes, and Review-ed by for all mbuf and
>> other changes.
>>
>> Thoams, please pick them by youself directly: since it depends on
>> other patches and they will be picked (or already be picked?) by you.
> 
> Applied
>   - without TSO checksum on fragmented header
>   - with some release notes changes
>   - with Yuanhan acked/reviewed
> Thanks
> 

Thanks Thomas, and also to Xiao, Maxime and Yuanhan for the review!


[dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over rte_config file

2016-10-13 Thread Tahhan, Maryam
> Hi John,
> 
> > Before this patch, DPDK used the file ~/.rte_config as a lock to
> > detect potential interference between multiple DPDK applications
> > running on the same machine. However, if a single user ran DPDK
> > applications concurrently on several different machines, and if the
> > user's home directory was shared between the machines via NFS, DPDK
> > would incorrectly detect conflicts for all but the first application
> > and abort them. This patch fixes the problem by incorporating the
> > machine name into the config file name (e.g., ~/.rte_hostname_config).
> >
> > Signed-off-by: John Ousterhout 
> > ---
> >  doc/guides/prog_guide/multi_proc_support.rst | 11 +++
> >  lib/librte_eal/common/eal_common_proc.c  |  8 ++--
> >  lib/librte_eal/common/eal_filesystem.h   | 15 +--
> >  3 files changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/multi_proc_support.rst
> > b/doc/guides/prog_guide/multi_proc_support.rst
> > index badd102..a54fa1c 100644
> > --- a/doc/guides/prog_guide/multi_proc_support.rst
> > +++ b/doc/guides/prog_guide/multi_proc_support.rst
> > @@ -129,10 +129,13 @@ Support for this usage scenario is provided
> > using the ``--file-prefix`` paramete
> >
> >  By default, the EAL creates hugepage files on each hugetlbfs
> > filesystem using the rtemap_X filename,  where X is in the range 0 to the
> maximum number of hugepages -1.
> > -Similarly, it creates shared configuration files, memory mapped in
> > each process, using the /var/run/.rte_config filename, -when run as
> > root (or $HOME/.rte_config when run as a non-root user; -if filesystem and
> device permissions are set up to allow this).
> > -The rte part of the filenames of each of the above is configurable using 
> > the
> file-prefix parameter.
> > +Similarly, it creates shared configuration files, memory mapped in each
> process.
> > +When run as root, the name of the configuration file will be
> > +/var/run/.rte_*host*_config, where *host* is the name of the machine.
> > +When run as a non-root user, the the name of the configuration file
> > +will be $HOME/.rte_*host*_config (if filesystem and device permissions
> are set up to allow this).
> > +If the ``--file-prefix`` parameter has been specified, its value will
> > +be used in place of "rte" in the file names.
> 
> I am not sure that we need to handle all such cases inside EAL.
> User can easily overcome that problem by just adding something like:
> --file-prefix=`uname -n`
> to his command-line.
> Konstantin
> 

I agree with Konstantin, there's no need to include the hostname in the rte 
config file + I'm not sure this will be backward compatible with existing DPDK 
applications that use a secondary processes that use the config file (as in, 
multiprocess DPDK applications in use would all need to be updated). What 
Konstantin suggests fixes the issue you were encountering without breaking 
backward compatibility.
Is addition, the hostname is not unique... you could in theory have 2 hosts 
with the same hostname and encounter the issue you were seeing again.

> >
> >  In addition to specifying the file-prefix parameter,  any DPDK
> > applications that are to be run side-by-side must explicitly limit their
> memory use.
> > diff --git a/lib/librte_eal/common/eal_common_proc.c
> > b/lib/librte_eal/common/eal_common_proc.c
> > index 12e0fca..517aa0c 100644
> > --- a/lib/librte_eal/common/eal_common_proc.c
> > +++ b/lib/librte_eal/common/eal_common_proc.c
> > @@ -45,12 +45,8 @@ rte_eal_primary_proc_alive(const char
> > *config_file_path)
> >
> > if (config_file_path)
> > config_fd = open(config_file_path, O_RDONLY);
> > -   else {
> > -   char default_path[PATH_MAX+1];
> > -   snprintf(default_path, PATH_MAX, RUNTIME_CONFIG_FMT,
> > -default_config_dir, "rte");
> > -   config_fd = open(default_path, O_RDONLY);
> > -   }
> > +   else
> > +   config_fd = open(eal_runtime_config_path(), O_RDONLY);
> > if (config_fd < 0)
> > return 0;
> >
> > diff --git a/lib/librte_eal/common/eal_filesystem.h
> > b/lib/librte_eal/common/eal_filesystem.h
> > index fdb4a70..4929aa3 100644
> > --- a/lib/librte_eal/common/eal_filesystem.h
> > +++ b/lib/librte_eal/common/eal_filesystem.h
> > @@ -41,7 +41,7 @@
> >  #define EAL_FILESYSTEM_H
> >
> >  /** Path of rte config file. */
> > -#define RUNTIME_CONFIG_FMT "%s/.%s_config"
> > +#define RUNTIME_CONFIG_FMT "%s/.%s_%s_config"
> >
> >  #include 
> >  #include 
> > @@ -59,11 +59,22 @@ eal_runtime_config_path(void)
> > static char buffer[PATH_MAX]; /* static so auto-zeroed */
> > const char *directory = default_config_dir;
> > const char *home_dir = getenv("HOME");
> > +   static char nameBuffer[1000];
> > +   int result;
> >
> > if (getuid() != 0 && home_dir != NULL)
> > directory = home_dir;
> > +
> > +   /*
> > +* Include the name of the host in the config file 

[dpdk-dev] [PATCH] fix documentation error on debug functions

2016-10-13 Thread Thomas Monjalon
2016-09-26 12:28, Mcnamara, John:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mauricio Vasquez B
> > Sent: Friday, September 2, 2016 12:02 PM
> > To: dev at dpdk.org
> > Cc: stephen at networkplumber.org
> > Subject: [dpdk-dev] [PATCH] fix documentation error on debug functions
> > 
> > Previous patch updated the functions without updating all the comments.
> > 
> > Fixes: 591a9d7985c1 ("add FILE argument to debug functions")
> > 
> > Signed-off-by: Mauricio Vasquez B 
> 
> Well spotted.
> 
> Acked-by: John McNamara 

Applied, thanks


[dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation

2016-10-13 Thread Thomas Monjalon
Hi,

2016-10-13 19:36, Tomasz Kulasek:
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
>   struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
>   uint16_t nb_seg_max;
>   /**< Max number of segments per whole packet. */
> 
>   uint16_t nb_mtu_seg_max;
>   /**< Max number of segments per one MTU */
> 
> Created `rte_pkt.h` header with common used functions:

Same comment as in previous revision:
this description lacks the usability and performance considerations.

> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> + struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)

Doxygen still do not parse it well (same issue as previous revision).

> +/**
> + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
> + * hardware tx checksum.
> + * For non-TSO tcp/udp packets full pseudo-header checksum is counted and 
> set.
> + * For TSO the IP payload length is not included.
> + */
> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)

You probably don't need this function since the recent improvements from 
Olivier.



[dpdk-dev] dpdk/vpp and cross-version migration for vhost

2016-10-13 Thread Michael S. Tsirkin
Hi!
So it looks like we face a problem with cross-version
migration when using vhost. It's not new but became more
acute with the advent of vhost user.

For users to be able to migrate between different versions
of the hypervisor the interface exposed to guests
by hypervisor must stay unchanged.

The problem is that a qemu device is connected
to a backend in another process, so the interface
exposed to guests depends on the capabilities of that
process.

Specifically, for vhost user interface based on virtio, this includes
the "host features" bitmap that defines the interface, as well as more
host values such as the max ring size.  Adding new features/changing
values to this interface is required to make progress, but on the other
hand we need ability to get the old host features to be compatible.

To solve this problem within qemu, qemu has a versioning system based on
a machine type concept which fundamentally is a version string, by
specifying that string one can get hardware compatible with a previous
qemu version. QEMU also reports the latest version and list of versions
supported so libvirt records the version at VM creation and then is
careful to use this machine version whenever it migrates a VM.

One might wonder how is this solved with a kernel vhost backend. The
answer is that it mostly isn't - instead an assumption is made, that
qemu versions are deployed together with the kernel - this is generally
true for downstreams.  Thus whenever qemu gains a new feature, it is
already supported by the kernel as well.  However, if one attempts
migration with a new qemu from a system with a new to old kernel, one
would get a failure.

In the world where we have multiple userspace backends, with some of
these supplied by ISVs, this seems non-realistic.

IMO we need to support vhost backend versioning, ideally
in a way that will also work for vhost kernel backends.

So I'd like to get some input from both backend and management
developers on what a good solution would look like.

If we want to emulate the qemu solution, this involves adding the
concept of interface versions to dpdk.  For example, dpdk could supply a
file (or utility printing?) with list of versions: latest and versions
supported. libvirt could read that and
- store latest version at vm creation
- pass it around with the vm
- pass it to qemu

>From here, qemu could pass this over the vhost-user channel,
thus making sure it's initialized with the correct
compatible interface.

As version here is an opaque string for libvirt and qemu,
anything can be used - but I suggest either a list
of values defining the interface, e.g.
any_layout=on,max_ring=256
or a version including the name and vendor of the backend,
e.g. "org.dpdk.v4.5.6".

Note that typically the list of supported versions can only be
extended, not shrunk. Also, if the host/guest interface
does not change, don't change the current version as
this just creates work for everyone.

Thoughts? Would this work well for management? dpdk? vpp?

Thanks!

-- 
MST


[dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support

2016-10-13 Thread Thomas Monjalon
2016-10-14 00:05, Yuanhan Liu:
> On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
> > +/* When doing TSO, the IP length is not included in the pseudo header
> > + * checksum of the packet given to the PMD, but for virtio it is
> > + * expected.
> > + */
> > +static void
> > +virtio_tso_fix_cksum(struct rte_mbuf *m)
> > +{
> > +   /* common case: header is not fragmented */
> > +   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> > +   m->l4_len)) {
> > +   struct ipv4_hdr *iph;
> > +   struct ipv6_hdr *ip6h;
> > +   struct tcp_hdr *th;
> > +   uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
> > +   uint32_t tmp;
> ...
> > +   } else {
> 
> As discussed just now, if you drop the else part, you could add my
> ACK for the whole virtio changes, and Review-ed by for all mbuf and
> other changes.
> 
> Thoams, please pick them by youself directly: since it depends on
> other patches and they will be picked (or already be picked?) by you.

Applied
- without TSO checksum on fragmented header
- with some release notes changes
- with Yuanhan acked/reviewed
Thanks


[dpdk-dev] [PATCH] doc: ZUC PMD cannot be built as a shared library

2016-10-13 Thread Pablo de Lara
ZUC PMD cannot be built as a shared library, due to
the fact that some assembly code in the underlying libsso
library is not relocatable.
This will be fixed in the future, but for the moment,
it is added as a limitation of the PMD.

Signed-off-by: Pablo de Lara 
---
 doc/guides/cryptodevs/zuc.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/cryptodevs/zuc.rst b/doc/guides/cryptodevs/zuc.rst
index 7267120..38d5068 100644
--- a/doc/guides/cryptodevs/zuc.rst
+++ b/doc/guides/cryptodevs/zuc.rst
@@ -53,6 +53,9 @@ Limitations
 * Chained mbufs are not supported.
 * ZUC (EIA3) supported only if hash offset field is byte-aligned.
 * ZUC (EEA3) supported only if cipher length, cipher offset fields are 
byte-aligned.
+* ZUC PMD cannot be built as a shared library, due to limitations in
+  in the underlying library.
+

 Installation
 
-- 
2.7.4



[dpdk-dev] [PATCH] doc: how to build KASUMI as shared library

2016-10-13 Thread Pablo de Lara
Libsso KASUMI library has to be built with specific
parameters to make the KASUMI PMD be built as
a shared library, so a note has been added in
its documentation.

Signed-off-by: Pablo de Lara 
---
 doc/guides/cryptodevs/kasumi.rst | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/guides/cryptodevs/kasumi.rst b/doc/guides/cryptodevs/kasumi.rst
index c122f00..90d8e7b 100644
--- a/doc/guides/cryptodevs/kasumi.rst
+++ b/doc/guides/cryptodevs/kasumi.rst
@@ -68,7 +68,13 @@ and click on "Kasumi Bit Stream crypto library" link, to 
download the library.
 After downloading the library, the user needs to unpack and compile it
 on their system before building DPDK::

-   make kasumi
+   make
+
+**Note**: To build the PMD as a shared library, the libsso_kasumi
+library must be built as follows::
+
+  make KASUMI_CFLAGS=-DKASUMI_C
+

 Initialization
 --
-- 
2.7.4



[dpdk-dev] [PATCH] doc: fix libcrypto title

2016-10-13 Thread Pablo de Lara
Libcrypto documentation was missing the equal signs ("="),
in its title, so it was not present in the documentation
generated.

Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")

Signed-off-by: Pablo de Lara 
---
 doc/guides/cryptodevs/libcrypto.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/cryptodevs/libcrypto.rst 
b/doc/guides/cryptodevs/libcrypto.rst
index 22d81b7..5e9c76a 100644
--- a/doc/guides/cryptodevs/libcrypto.rst
+++ b/doc/guides/cryptodevs/libcrypto.rst
@@ -28,6 +28,7 @@
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 LibCrypto Crypto Poll Mode Driver
+=

 This code provides the initial implementation of the libcrypto poll mode
 driver. All cryptography operations are using Openssl library crypto API.
-- 
2.7.4



[dpdk-dev] [PATCH] doc: ZUC PMD cannot be built as a shared library

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Thursday, October 13, 2016 12:35 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: [PATCH] doc: ZUC PMD cannot be built as a shared library
> 
> ZUC PMD cannot be built as a shared library, due to
> the fact that some assembly code in the underlying libsso
> library is not relocatable.
> This will be fixed in the future, but for the moment,
> it is added as a limitation of the PMD.
> 
> Signed-off-by: Pablo de Lara 

Applied to dpdk-next-crypto.

Pablo


[dpdk-dev] [PATCH] doc: how to build KASUMI as shared library

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Thursday, October 13, 2016 12:34 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: [PATCH] doc: how to build KASUMI as shared library
> 
> Libsso KASUMI library has to be built with specific
> parameters to make the KASUMI PMD be built as
> a shared library, so a note has been added in
> its documentation.
> 
> Signed-off-by: Pablo de Lara 

Applied to dpdk-next-crypto.

Pablo


[dpdk-dev] [PATCH] doc: fix libcrypto title

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Thursday, October 13, 2016 12:34 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: [dpdk-dev] [PATCH] doc: fix libcrypto title
> 
> Libcrypto documentation was missing the equal signs ("="),
> in its title, so it was not present in the documentation
> generated.
> 
> Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")
> 
> Signed-off-by: Pablo de Lara 

Applied to dpdk-next-crypto.

Pablo


[dpdk-dev] [PATCH] Revert "bonding: use existing enslaved device queues"

2016-10-13 Thread Eric Kinzie
On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
> > On 07.10.2016 05:02, Eric Kinzie wrote:
> > > On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
> > >> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> > >>
> > >> It is necessary to reconfigure all queues every time because 
> > >> configuration
> > >> can be changed.
> > >>
> > >> For example, if we're reconfiguring bonding device with new memory pool,
> > >> already configured queues will still use the old one. And if the old
> > >> mempool be freed, application likely will panic in attempt to use
> > >> freed mempool.
> > >>
> > >> This happens when we use the bonding device with OVS 2.6 while MTU
> > >> reconfiguration:
> > >>
> > >> PANIC in rte_mempool_get_ops():
> > >> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> > >>
> > >> Cc: 
> > >> Signed-off-by: Ilya Maximets 
> > >> ---
> > >>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++
> > >>  1 file changed, 2 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
> > >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> index b20a272..eb5b6d1 100644
> > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> > >>  struct bond_rx_queue *bd_rx_q;
> > >>  struct bond_tx_queue *bd_tx_q;
> > >>  
> > >> -uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> > >> -uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
> > >>  int errval;
> > >>  uint16_t q_id;
> > >>  
> > >> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> > >>  }
> > >>  
> > >>  /* Setup Rx Queues */
> > >> -/* Use existing queues, if any */
> > >> -for (q_id = old_nb_rx_queues;
> > >> - q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> > >> +for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; 
> > >> q_id++) {
> > >>  bd_rx_q = (struct bond_rx_queue 
> > >> *)bonded_eth_dev->data->rx_queues[q_id];
> > >>  
> > >>  errval = 
> > >> rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > >> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> > >>  }
> > >>  
> > >>  /* Setup Tx Queues */
> > >> -/* Use existing queues, if any */
> > >> -for (q_id = old_nb_tx_queues;
> > >> - q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> > >> +for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; 
> > >> q_id++) {
> > >>  bd_tx_q = (struct bond_tx_queue 
> > >> *)bonded_eth_dev->data->tx_queues[q_id];
> > >>  
> > >>  errval = 
> > >> rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > >> -- 
> > >> 2.7.4
> > >>
> > > 
> > > NAK
> > > 
> > > There are still some users of this code.  Let's give them a chance to
> > > comment before removing it.
> > 
> > Hi Eric,
> > 
> > Are these users in CC-list? If not, could you, please, add them?
> > This patch awaits in mail-list already more than a month. I think, it's 
> > enough
> > time period for all who wants to say something. Patch fixes a real bug that
> > prevent using of DPDK bonding in all applications that reconfigures devices
> > in runtime including OVS.
> > 
> Agreed.
> 
> Eric, does reverting this patch cause you problems directly, or is your 
> concern
> just with regards to potential impact to others?
> 
> Thanks,
> /Bruce

This won't impact me directly.  The users are CCed (different thread)
and I haven't seen any comment, so I no longer have any objection to
reverting this change.

Eric



[dpdk-dev] [PATCH v5 6/6] testpmd: use Tx preparation in csum engine

2016-10-13 Thread Tomasz Kulasek
Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding aditional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek 
---
 app/test-pmd/csumonly.c |   36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index f9e65b6..3354b3d 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));

 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-   if (ethertype == _htons(ETHER_TYPE_IPv4))
-   return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-   else /* assume ethertype == ETHER_TYPE_IPv6 */
-   return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -372,32 +363,24 @@ process_inner_cksums(void *l3_hdr, const struct 
testpmd_offload_info *info,
/* do not recalculate udp cksum if it was 0 */
if (udp_hdr->dgram_cksum != 0) {
udp_hdr->dgram_cksum = 0;
-   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
ol_flags |= PKT_TX_UDP_CKSUM;
-   udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-   info->ethertype, ol_flags);
-   } else {
+   else
udp_hdr->dgram_cksum =
get_udptcp_checksum(l3_hdr, udp_hdr,
info->ethertype);
-   }
}
} else if (info->l4_proto == IPPROTO_TCP) {
tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
tcp_hdr->cksum = 0;
-   if (tso_segsz) {
+   if (tso_segsz)
ol_flags |= PKT_TX_TCP_SEG;
-   tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-   ol_flags);
-   } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+   else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
ol_flags |= PKT_TX_TCP_CKSUM;
-   tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-   ol_flags);
-   } else {
+   else
tcp_hdr->cksum =
get_udptcp_checksum(l3_hdr, tcp_hdr,
info->ethertype);
-   }
} else if (info->l4_proto == IPPROTO_SCTP) {
sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
sctp_hdr->cksum = 0;
@@ -650,6 +633,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
uint16_t nb_rx;
uint16_t nb_tx;
+   uint16_t nb_prep;
uint16_t i;
uint64_t rx_ol_flags, tx_ol_flags;
uint16_t testpmd_ol_flags;
@@ -855,7 +839,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
printf("\n");
}
}
-   nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+   nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+   nb_rx);
+   if (nb_prep != nb_rx)
+   printf("Preparing packet burst to transmit failed: %s\n",
+   rte_strerror(rte_errno));
+
+   nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, 
nb_prep);
/*
 * Retry if necessary
 */
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 5/6] ixgbe: add Tx preparation

2016-10-13 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/ixgbe/ixgbe_ethdev.c |3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +-
 drivers/net/ixgbe/ixgbe_rxtx.h   |2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 6b3d4fa..1d740f0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
.nb_max = IXGBE_MAX_RING_DESC,
.nb_min = IXGBE_MIN_RING_DESC,
.nb_align = IXGBE_TXD_ALIGN,
+   .nb_seg_max = IXGBE_TX_MAX_SEG,
+   .nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };

 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->dev_ops = _eth_dev_ops;
eth_dev->rx_pkt_burst = _recv_pkts;
eth_dev->tx_pkt_burst = _xmit_pkts;
+   eth_dev->tx_pkt_prep = _prep_pkts;

/*
 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 8b99282..dd4f5ba 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
PKT_TX_TCP_SEG | \
PKT_TX_OUTER_IP_CKSUM)

+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:

 /*
  *
+ *  TX prep functions
+ *
+ **/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+   int i, ret;
+   uint64_t ol_flags;
+   struct rte_mbuf *m;
+   struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+   ol_flags = m->ol_flags;
+
+   /**
+* Check if packet meets requirements for number of segments
+*
+* NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and 
non-TSO
+*/
+
+   if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+   if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+   }
+
+   return i;
+}
+
+/*
+ *
  *  RX functions
  *
  **/
@@ -2280,6 +2334,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct 
ixgbe_tx_queue *txq)
if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+   dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
if (txq->tx_rs_thresh <= 

[dpdk-dev] [PATCH v5 4/6] i40e: add Tx preparation

2016-10-13 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/i40e/i40e_ethdev.c |3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++-
 drivers/net/i40e/i40e_rxtx.h   |8 +
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d0640b9..c6d61c0 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
dev->dev_ops = _eth_dev_ops;
dev->rx_pkt_burst = i40e_recv_pkts;
dev->tx_pkt_burst = i40e_xmit_pkts;
+   dev->tx_pkt_prep = i40e_prep_pkts;

/* for secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_max = I40E_MAX_RING_DESC,
.nb_min = I40E_MIN_RING_DESC,
.nb_align = I40E_ALIGN_RING_DESC,
+   .nb_seg_max = I40E_TX_MAX_SEG,
+   .nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
};

if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 2cb2e30..499217d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
PKT_TX_TCP_SEG | \
PKT_TX_OUTER_IP_CKSUM)

+#define I40E_TX_OFFLOAD_MASK (  \
+   PKT_TX_IP_CKSUM |   \
+   PKT_TX_L4_MASK |\
+   PKT_TX_OUTER_IP_CKSUM | \
+   PKT_TX_TCP_SEG |\
+   PKT_TX_QINQ_PKT |   \
+   PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
  struct rte_mbuf **tx_pkts,
  uint16_t nb_pkts);
@@ -1968,6 +1981,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
return nb_tx;
 }

+/*
+ *
+ *  TX prep functions
+ *
+ **/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   int i, ret;
+   uint64_t ol_flags;
+   struct rte_mbuf *m;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+   ol_flags = m->ol_flags;
+
+   /**
+* m->nb_segs is uint8_t, so m->nb_segs is always less than
+* I40E_TX_MAX_SEG.
+* We check only a condition for m->nb_segs > 
I40E_TX_MAX_MTU_SEG.
+*/
+   if (!(ol_flags & PKT_TX_TCP_SEG)) {
+   if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+   rte_errno = -1;
+   return i;
+   }
+   } else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+   (m->tso_segsz > I40E_MAX_TSO_MSS)) {
+   /* MSS outside the range (256B - 9674B) are considered 
malicious */
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+   if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+   }
+   return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -3318,9 +3386,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
dev->tx_pkt_burst = i40e_xmit_pkts_simple;
}
+   dev->tx_pkt_prep = NULL;
} else {
PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
dev->tx_pkt_burst = i40e_xmit_pkts;
+   

[dpdk-dev] [PATCH v5 3/6] fm10k: add Tx preparation

2016-10-13 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/fm10k/fm10k.h|6 +
 drivers/net/fm10k/fm10k_ethdev.c |5 
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct 
fm10k_tx_desc))

+#define FM10K_TX_MAX_SEG UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t 
offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 372564b..b9060c7 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
.nb_max = FM10K_MAX_TX_DESC,
.nb_min = FM10K_MIN_TX_DESC,
.nb_align = FM10K_MULT_TX_DESC,
+   .nb_seg_max = FM10K_TX_MAX_SEG,
+   .nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
};

dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
fm10k_txq_vec_setup(txq);
}
dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+   dev->tx_pkt_prep = NULL;
} else {
dev->tx_pkt_burst = fm10k_xmit_pkts;
+   dev->tx_pkt_prep = fm10k_prep_pkts;
PMD_INIT_LOG(DEBUG, "Use regular Tx func");
}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
dev->dev_ops = _eth_dev_ops;
dev->rx_pkt_burst = _recv_pkts;
dev->tx_pkt_burst = _xmit_pkts;
+   dev->tx_pkt_prep = _prep_pkts;

/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 5b2d04b..76b5705 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@

 #include 
 #include 
+#include 
 #include "fm10k.h"
 #include "base/fm10k_type.h"

@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif

+#define FM10K_TX_OFFLOAD_MASK (  \
+   PKT_TX_VLAN_PKT |\
+   PKT_TX_IP_CKSUM |\
+   PKT_TX_L4_MASK | \
+   PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -583,3 +593,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,

return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   int i, ret;
+   struct rte_mbuf *m;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+
+   if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+   (m->tso_segsz < FM10K_TSO_MINMSS)) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+   if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+   }
+
+   return i;
+}
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 2/6] e1000: add Tx preparation

2016-10-13 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/e1000/e1000_ethdev.h |   11 
 drivers/net/e1000/em_ethdev.c|5 +++-
 drivers/net/e1000/em_rxtx.c  |   48 ++-
 drivers/net/e1000/igb_ethdev.c   |4 +++
 drivers/net/e1000/igb_rxtx.c |   52 +-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID   RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START  RTE_INTR_VEC_RXTX_OFFSET

+#define IGB_TX_MAX_SEG UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG  UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index f767e1c..6d2c5fc 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->dev_ops = _em_ops;
eth_dev->rx_pkt_burst = (eth_rx_burst_t)_em_recv_pkts;
eth_dev->tx_pkt_burst = (eth_tx_burst_t)_em_xmit_pkts;
+   eth_dev->tx_pkt_prep = (eth_tx_prep_t)_em_prep_pkts;

/* for secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_max = E1000_MAX_RING_DESC,
.nb_min = E1000_MIN_RING_DESC,
.nb_align = EM_TXD_ALIGN,
+   .nb_seg_max = EM_TX_MAX_SEG,
+   .nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
};

dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..3af2f69 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 #include "e1000_logs.h"
@@ -77,6 +78,14 @@

 #define E1000_RXDCTL_GRAN  0x0100 /* RXDCTL Granularity */

+#define E1000_TX_OFFLOAD_MASK ( \
+   PKT_TX_IP_CKSUM |   \
+   PKT_TX_L4_MASK |\
+   PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:

 /*
  *
+ *  TX prep functions
+ *
+ **/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   int i, ret;
+   struct rte_mbuf *m;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+
+   if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;

[dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation

2016-10-13 Thread Tomasz Kulasek
Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

uint16_t nb_seg_max;
/**< Max number of segments per whole packet. */

uint16_t nb_mtu_seg_max;
/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
to validate general requirements for tx offload in packet such a
flag completness. In current implementation this function is called
optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
before hardware tx checksum offload.
 - for non-TSO tcp/udp packets full pseudo-header checksum is
   counted and set.
 - for TSO the IP payload length is not included.

Signed-off-by: Tomasz Kulasek 
---
 config/common_base|1 +
 lib/librte_ether/rte_ethdev.h |   85 +
 lib/librte_mbuf/rte_mbuf.h|9 +++
 lib/librte_net/Makefile   |3 +-
 lib/librte_net/rte_pkt.h  |  139 +
 5 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index f5d2eff..0af6481 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y

 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0a32ebb..db5dc93 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
uint16_t nb_max;   /**< Max allowed number of descriptors. */
uint16_t nb_min;   /**< Min allowed number of descriptors. */
uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+   uint16_t nb_seg_max; /**< Max number of segments per whole packet. 
*/
+   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };

 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. 
*/

+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+  struct rte_mbuf **tx_pkts,
+  uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet 
device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
   struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+   eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare 
function. */
struct rte_eth_dev_data *data;  /**< Pointer to device data */
const struct eth_driver *driver;/**< Driver for this device */
const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, 
nb_pkts);
 }

+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the 

[dpdk-dev] [PATCH v5 0/6] add Tx preparation

2016-10-13 Thread Tomasz Kulasek
As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose different 
requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
---

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
-

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

for (i = 0; i < nb_pkts; i++) {

/* initialize or process packet */

bufs[i]->tso_segsz = 800;
bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
| PKT_TX_IP_CKSUM;
bufs[i]->l2_len = sizeof(struct ether_hdr);
bufs[i]->l3_len = sizeof(struct ipv4_hdr);
bufs[i]->l4_len = sizeof(struct tcp_hdr);
}

/* Prepare burst of TX packets */
nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

if (nb_prep < nb_pkts) {
printf("tx_prep failed\n");

/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
 * can be used on remaining packets to find another ones.
 */

}

/* Send burst of TX packets */
nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

/* Free any unsent packets. */


v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  csum: fixup

 app/test-pmd/csumonly.c  |   36 --
 config/common_base   |1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++
 drivers/net/e1000/em_ethdev.c|5 +-
 drivers/net/e1000/em_rxtx.c  |   48 -
 

[dpdk-dev] [PATCH] crypto/zuc: fix init function names

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Trahe, Fiona
> Sent: Thursday, October 13, 2016 10:57 AM
> To: De Lara Guarch, Pablo; dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: RE: [dpdk-dev] [PATCH] crypto/zuc: fix init function names
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> > Sent: Tuesday, October 11, 2016 2:15 AM
> > To: dev at dpdk.org
> > Cc: De Lara Guarch, Pablo 
> > Subject: [dpdk-dev] [PATCH] crypto/zuc: fix init function names
> >
> > All init/uninit function names in the virtual devices have been renamed, so
> > they finish with probe or remove, so to keep consistency, same thing should
> be
> > done in this PMD.
> >
> > Fixes: cf7685d68f00 ("crypto/zuc: add driver for ZUC library")
> >
> > Signed-off-by: Pablo de Lara 
> Acked-by: Fiona Trahe 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH] crypto/libcrypto: fix init function names

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Trahe, Fiona
> Sent: Thursday, October 13, 2016 10:56 AM
> To: De Lara Guarch, Pablo; dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: RE: [dpdk-dev] [PATCH] crypto/libcrypto: fix init function names
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> > Sent: Tuesday, October 11, 2016 2:15 AM
> > To: dev at dpdk.org
> > Cc: De Lara Guarch, Pablo 
> > Subject: [dpdk-dev] [PATCH] crypto/libcrypto: fix init function names
> >
> > All init/uninit function names in the virtual devices have been renamed, so
> > they finish with probe or remove, so to keep consistency, same thing should
> be
> > done in this PMD.
> >
> > Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")
> >
> > Signed-off-by: Pablo de Lara 
> Acked-by: Fiona Trahe 

Applied to dpdk-next-crypto.

Pablo


[dpdk-dev] [PATCH v3] test_cryptodev_perf: IV and digest should be stored at a DMAeble address

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Trahe, Fiona
> Sent: Wednesday, October 12, 2016 11:26 AM
> To: akhil.goyal at nxp.com; Kusztal, ArkadiuszX; Doherty, Declan
> Cc: Griffin, John; Jain, Deepak K; dev at dpdk.org; Trahe, Fiona
> Subject: Re: [dpdk-dev] [PATCH v3] test_cryptodev_perf: IV and digest should
> be stored at a DMAeble address
> 
> 
> 
> > -Original Message-
> > From: akhil.goyal at nxp.com [mailto:akhil.goyal at nxp.com]
> > Sent: Wednesday, October 12, 2016 12:16 PM
> > To: Kusztal, ArkadiuszX ; Doherty, Declan
> > 
> > Cc: Griffin, John ; Trahe, Fiona
> > ; Jain, Deepak K ;
> > dev at dpdk.org; Akhil Goyal 
> > Subject: [PATCH v3] test_cryptodev_perf: IV and digest should be stored at a
> > DMAeble address
> >
> > From: Akhil Goyal 
> >
> > For physical crypto devices, IV and digest are processed by the crypto 
> > device
> > which need the contents to be written on some DMA able address.
> >
> > So in order to do that, IV and digest are accomodated in the packet.
> >
> > Signed-off-by: Akhil Goyal 
> Acked-by: Fiona Trahe 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH v2] examples/l3fwd: em: use hw accelerated crc hash function for arm64

2016-10-13 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 12:17:05AM +0530, Hemant Agrawal wrote:
> if machine level CRC extension are available, offload the
> hash to machine provide functions e.g. armv8-a CRC extensions
> support it
> 
> Signed-off-by: Hemant Agrawal 
> Reviewed-by: Jerin Jacob 
> ---
>  examples/l3fwd/l3fwd_em.c | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> index 89a68e6..d92d0aa 100644
> --- a/examples/l3fwd/l3fwd_em.c
> +++ b/examples/l3fwd/l3fwd_em.c
> @@ -57,13 +57,17 @@
>  
>  #include "l3fwd.h"
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) && defined(RTE_MACHINE_CPUFLAG_CRC32)

The will evaluate as FALSE always.

Please change to logical OR operation here. ie #if 
defined(RTE_MACHINE_CPUFLAG_SSE4_2) ||
defined(RTE_MACHINE_CPUFLAG_CRC32)

> +#define EM_HASH_CRC 1
> +#endif


[dpdk-dev] [PATCH v3 2/2] mempool: pktmbuf pool default fallback for mempool ops error

2016-10-13 Thread Hemant Agrawal
Hi Olivier,
Any updates w.r.t this patch set?

Regards
Hemant
On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
> Hi Olivier
>
> On 9/19/2016 7:27 PM, Olivier Matz wrote:
>> Hi Hemant,
>>
>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
>>> In the rte_pktmbuf_pool_create, if the default external mempool is
>>> not available, the implementation can default to "ring_mp_mc", which
>>> is an software implementation.
>>>
>>> Signed-off-by: Hemant Agrawal 
>>> ---
>>> Changes in V3:
>>> * adding warning message to say that falling back to default sw pool
>>> ---
>>>  lib/librte_mbuf/rte_mbuf.c | 8 
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>> index 4846b89..8ab0eb1 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
>>> unsigned n,
>>>
>>>  rte_errno = rte_mempool_set_ops_byname(mp,
>>>  RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>>> +
>>> +/* on error, try falling back to the software based default pool */
>>> +if (rte_errno == -EOPNOTSUPP) {
>>> +RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
>>> +"falling back to sw mempool \"ring_mp_mc\"");
>>> +rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
>>> +}
>>> +
>>>  if (rte_errno != 0) {
>>>  RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>>  return NULL;
>>>
>>
>> Without adding a new method ".supported()", the first call to
>> rte_mempool_populate() could return the same error ENOTSUP. In this
>> case, it is still possible to fallback.
>>
> It will be bit late.
>
> On failure, than we have to set the default ops and do a goto before
> rte_pktmbuf_pool_init(mp, _priv);
>
>
>> I've just submitted an RFC, which I think is quite linked:
>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
>> Assuming a new parameter "mempool_ops" is added to
>> rte_pktmbuf_pool_create(), would it make sense to fallback to
>> "ring_mp_mc"? What about just returning ENOTSUP? The application could
>> do the job and decide which sw fallback to use.
>
> We ran into this issue when trying to run the standard DPDK examples
> (l3fwd) in VM. Do you think, is it practical to add fallback handling in
> each of the DPDK examples?
>
> Typically when someone is writing a application on host, he need not
> worry non-availability of the hw offloaded mempool. He may also want to
> run the same binary in virtual machine. In VM, it is not guaranteed that
> hw offloaded mempools will be available.
>
> w.r.t your RFC, we can do this:
> if the user has specified a mempool_ops in rte_pktmbuf_pool_create(),
> don't fallback. It will be responsibility for application to decide on
> calling again rte_pktmbuf_pool_create() with different mempool_ops.
>
>>
>>
>> Regards,
>> Olivier
>>
>
>
>




[dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over rte_config file

2016-10-13 Thread Thomas Monjalon
2016-10-13 09:20, John Ousterhout:
> Hi Harry,
> 
> But, given the existence of the --file-prefix option, isn't it already
> unsafe for Collectd to check only for .rte_config? If it's important for
> other programs to be able to find the config files, it seems to me that a
> more robust mechanism is needed than the current one.

+1

And the default location of the file should respect the Linux standards,
e.g. .config/dpdk/...


[dpdk-dev] ZUC PMD as shared library

2016-10-13 Thread De Lara Guarch, Pablo
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, October 12, 2016 12:23 AM
> To: De Lara Guarch, Pablo
> Cc: dev at dpdk.org
> Subject: Re: ZUC PMD as shared library
> 
> 2016-10-12 02:23, De Lara Guarch, Pablo:
> > Hi Thomas,
> >
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > >
> > > Hi Pablo,
> > >
> > > You are probably aware of the issue, but I would like to make it clear
> > > in case someone else run into the same trouble:
> > >
> > > It is impossible to build the ZUC crypto PMD as a shared library:
> > >
> > > libsso-zuc-0.1.1/build/libsso_zuc.a(sso_zuc_yasm.o):
> > >   relocation R_X86_64_32 against `EK_d' can not be used
> > >   when making a shared object; recompile with -fPIC
> > >
> > > The library libsso-zuc-0.1.1 needs an update to make the asm code
> > > relocatable.
> > > Should we explicit this limitation in the PMD doc?
> >
> > Sorry for not replying to this earlier.
> > Yes, you are right, thanks for pointing it out.
> > I think the same problem is in KASUMI, so I will send a doc update for both
> PMDs.
> 
> No, KASUMI works as shared library, because there is no asm code.
> 
> Documenting the bug would be nice as a first step.
> But it is a serious bug, so a fix in ZUC library is highly desirable.

Will document it. Once the library is fixed, it will be announce and I will 
remove the note.

Thanks,
Pablo


[dpdk-dev] [PATCH v2] app/test: fix crypto mbuf pool size

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jastrzebski, MichalX
> K
> Sent: Thursday, October 13, 2016 1:48 AM
> To: Azarewicz, PiotrX T; Trahe, Fiona
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] app/test: fix crypto mbuf pool size
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Piotr Azarewicz
> > Sent: Tuesday, October 11, 2016 12:07 PM
> > To: Trahe, Fiona 
> > Cc: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] app/test: fix crypto mbuf pool size
> >
> > This patch fix that created pool for crypto mbufs may be too big in some
> > environments.
> > To avoid the issue, mbuf pool is reverted to its previous size.
> > Moreover, here is added additional small pool with one large mbuf, what
> > is needed in large data test scenarios.
> >
> > Fixes: ffbe3be0d4b5 ("app/test: add libcrypto")
> >
> > Signed-off-by: Piotr Azarewicz 
> 
> Acked-by: Michal Jastrzebski 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH] examples/l2fwd-crypto: fix verify with decrypt in chain

2016-10-13 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Jastrzebski, MichalX K
> Sent: Thursday, October 13, 2016 1:48 AM
> To: Azarewicz, PiotrX T; De Lara Guarch, Pablo; Yang, GangX
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] examples/l2fwd-crypto: fix verify with
> decrypt in chain
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Piotr Azarewicz
> > Sent: Wednesday, October 12, 2016 10:59 AM
> > To: De Lara Guarch, Pablo ; Yang, GangX
> > 
> > Cc: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] examples/l2fwd-crypto: fix verify with decrypt
> > in chain
> >
> > This patch fix setting crypto operation data parameters in l2fwd-crypto
> > application.
> > From now decryption in chain with auth verify work fine.
> >
> > How to reproduce the issue:
> >
> > 1. Run l2fwd_crypto with command:
> > -c 0x3 -n 4 --vdev "crypto_aesni_mb" \
> > --vdev "crypto_aesni_mb" \
> > -- -p 0x3 --chain CIPHER_HASH \
> > --cipher_op ENCRYPT --cipher_algo AES_CBC \
> > --cipher_key 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f \
> > --iv 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:ff \
> > --auth_op GENERATE --auth_algo SHA1_HMAC \
> > --auth_key
> > 11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:
> > 11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:
> > 11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11
> >
> > 2. Send packet with payload and capture forwarded packet.
> > Payload in forwarded packet is encrypted, what is good.
> >
> > 3. Run l2fwd_crypto with command:
> > -c 0x3 -n 4 --vdev "crypto_aesni_mb" \
> > --vdev "crypto_aesni_mb" \
> > -- -p 0x3 --chain HASH_CIPHER \
> > --cipher_op DECRYPT --cipher_algo AES_CBC \
> > --cipher_key 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f \
> > --iv 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:ff \
> > --auth_op VERIFY --auth_algo SHA1_HMAC \
> > --auth_key
> > 11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:
> > 11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:
> > 11:11:11:11:11:11:11:11:11:11:11:11:11:11:11:11
> >
> > 4. Send earlier captured packet and capture forwarded packet.
> > Payload in newly captured packet is not decrypted, what is wrong.
> >
> > Fixes: 387259bd6c67 ("examples/l2fwd-crypto: add sample application")
> >
> > Signed-off-by: Piotr Azarewicz 
> Acked-by: Michal Jastrzebski 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH 1/1] doc: fix errors in pdump doc

2016-10-13 Thread Thomas Monjalon
> > - Fix copy/paste error in description of how to capture both rx
> >   & tx traffic in a single pcap file
> > - Replace duplicate word with what original author presumably
> >   intended, such that description now makes sense
> > 
> > Signed-off-by: Mark Kavanagh 
> 
> Acked-by: John McNamara 

Applied, thanks


[dpdk-dev] [PATCH 1/1] doc: clarify usage of testpmd mac fwd mode

2016-10-13 Thread Thomas Monjalon
> > Explain default testpmd behavior in mac fwd mode to remove
> > amiguity/confusion regarding user's ability to specify Ethernet addresses.
> > 
> > Signed-off-by: Mark Kavanagh 
> 
> Acked-by: John McNamara 

Applied, thanks


[dpdk-dev] [PATCH] crypto/zuc: fix init function names

2016-10-13 Thread Trahe, Fiona


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Tuesday, October 11, 2016 2:15 AM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo 
> Subject: [dpdk-dev] [PATCH] crypto/zuc: fix init function names
> 
> All init/uninit function names in the virtual devices have been renamed, so
> they finish with probe or remove, so to keep consistency, same thing should be
> done in this PMD.
> 
> Fixes: cf7685d68f00 ("crypto/zuc: add driver for ZUC library")
> 
> Signed-off-by: Pablo de Lara 
Acked-by: Fiona Trahe 


[dpdk-dev] [PATCH] crypto/libcrypto: fix init function names

2016-10-13 Thread Trahe, Fiona


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Tuesday, October 11, 2016 2:15 AM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo 
> Subject: [dpdk-dev] [PATCH] crypto/libcrypto: fix init function names
> 
> All init/uninit function names in the virtual devices have been renamed, so
> they finish with probe or remove, so to keep consistency, same thing should be
> done in this PMD.
> 
> Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")
> 
> Signed-off-by: Pablo de Lara 
Acked-by: Fiona Trahe 


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Olivier Matz


Le 13 octobre 2016 17:29:35 CEST, Yuanhan Liu  
a ?crit :
>On Thu, Oct 13, 2016 at 05:15:24PM +0200, Olivier MATZ wrote:
>> 
>> 
>> On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
>> >On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
>> >>
>> >>
>> >>On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
>> >>>On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
>> 
>> 
>> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
>> >On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>> >>+/* When doing TSO, the IP length is not included in the pseudo
>header
>> >>+ * checksum of the packet given to the PMD, but for virtio it
>is
>> >>+ * expected.
>> >>+ */
>> >>+static void
>> >>+virtio_tso_fix_cksum(struct rte_mbuf *m)
>> >>+{
>> >>+  /* common case: header is not fragmented */
>> >>+  if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>> >>+  m->l4_len)) {
>> >...
>> >>+  /* replace it in the packet */
>> >>+  th->cksum = new_cksum;
>> >>+  } else {
>> >...
>> >>+  /* replace it in the packet */
>> >>+  *rte_pktmbuf_mtod_offset(m, uint8_t *,
>> >>+  m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>> >>+  *rte_pktmbuf_mtod_offset(m, uint8_t *,
>> >>+  m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>> >>+  }
>> >
>> >The tcp header will always be in the mbuf, right? Otherwise, you
>can't
>> >update the cksum field here. What's the point of introducing the
>"else
>> >clause" then?
>> 
>> Sorry, I don't see the problem you're pointing out here.
>> 
>> What I want to solve here is to support the cases where the mbuf
>is
>> segmented in the middle of the network header (which is probably
>a rare
>> case).
>> >>>
>> >>>How it's gonna segmented?
>> >>
>> >>The mbuf is given by the application. So if the application
>generates a
>> >>segmented mbuf, it should work.
>> >>
>> >>This could happen for instance if the application uses mbuf clones
>to share
>> >>the IP/TCP/data part of the mbuf and prepend a specific
>Ethernet/vlan for
>> >>different destination.
>> >>
>> >>
>> In the "else" part, I only access the mbuf byte by byte using the
>> rte_pktmbuf_mtod_offset() accessor. An alternative would have
>been to copy
>> the header in a linear buffer, fix the checksum, then copy it
>again in the
>> packet, but there is no mbuf helpers to do these copies for now.
>> >>>
>> >>>In the "else" clause, the ip header is still in the mbuf, right?
>> >>>Why do you have to access it the way like:
>> >>>
>> >>>  ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>> >>>  m->l2_len) >> 4;
>> >>>
>> >>>Why can't you just use
>> >>>
>> >>>  iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>> >>>  iph->version_ihl ;
>> >>
>> >>AFAIK, there is no requirement that each network header has to be
>contiguous
>> >>in a mbuf segment.
>> >>
>> >>Of course, a split in the middle of a network header probably never
>> >>happens... but we never knows, as it is not forbidden. I think the
>code
>> >>should be robust enough to avoid accesses to wrong addresses.
>> >>
>> >>Hope it's clear enough :)
>> >
>> >Thanks, but not really. Maybe let me ask this way: what wrong would
>> >happen if we use
>> >iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>> >to access the IP header? Is it about the endian?
>> 
>> If you have a packet split like this:
>> 
>> mbuf segment 1 mbuf segment 2
>>    --
>> | Ethernet header |  IP hea|   |der | TCP header | data
>>    --
>>^
>>iph
>
>Thanks, that's clear. How could you be able to access the tcp header
>from the first mbuf then? I mean, how is the following code supposed
>to work?
>
>prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>   m->l2_len + m->l3_len + 16);
>

Oh I see... Sorry there was a confusion on my side with another (internal) 
macro that browses the segments if the offset ils not in the first one.

If you agree, let's add the code without the else part, I'll fix it for the rc2.


>> The IP header is not contiguous. So accessing to the end of the
>structure
>> will access to a wrong location.
>> 
>> >One more question is do you have any case to trigger the "else"
>clause?
>> 
>> No, but I think it may happen.
>
>A piece of untest code is not trusted though ...
>
>   --yliu



[dpdk-dev] [PATCH] mempool: fix search of maximum contiguous pages

2016-10-13 Thread Wei Dai
paddr[i] + pg_sz always points to the start physical address of the
2nd page after pddr[i], so only up to 2 pages can be combinded to
be used. With this revision, more than 2 pages can be used.

Fixes: 84121f197187 ("mempool: store memory chunks in a list")

Signed-off-by: Wei Dai 
---
 lib/librte_mempool/rte_mempool.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 71017e1..e3e254a 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -426,9 +426,12 @@ rte_mempool_populate_phys_tab(struct rte_mempool *mp, char 
*vaddr,

for (i = 0; i < pg_num && mp->populated_size < mp->size; i += n) {

+   phys_addr_t paddr_next;
+   paddr_next = paddr[i] + pg_sz;
+
/* populate with the largest group of contiguous pages */
for (n = 1; (i + n) < pg_num &&
-paddr[i] + pg_sz == paddr[i+n]; n++)
+paddr_next == paddr[i+n]; n++, paddr_next += pg_sz)
;

ret = rte_mempool_populate_phys(mp, vaddr + i * pg_sz,
-- 
2.7.4



[dpdk-dev] [PATCH v2] app/testpmd: add support to configure 25G and 50G speeds

2016-10-13 Thread Thomas Monjalon
2016-10-12 16:26, Ajit Khaparde:
> Support to configure 25G and 50G speeds is missing from testpmd.
> This patch also updates the testpmd user guide accordingly.
> 
> Signed-off-by: Ajit Khaparde 
> --
> v2: shorten lines > 80 character limit
> ---

Applied, thanks


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Olivier MATZ


On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
> On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
>>
>>
>> On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
>>> On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:


 On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>> +/* When doing TSO, the IP length is not included in the pseudo header
>> + * checksum of the packet given to the PMD, but for virtio it is
>> + * expected.
>> + */
>> +static void
>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>> +{
>> +/* common case: header is not fragmented */
>> +if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>> +m->l4_len)) {
> ...
>> +/* replace it in the packet */
>> +th->cksum = new_cksum;
>> +} else {
> ...
>> +/* replace it in the packet */
>> +*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> +m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>> +*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> +m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>> +}
>
> The tcp header will always be in the mbuf, right? Otherwise, you can't
> update the cksum field here. What's the point of introducing the "else
> clause" then?

 Sorry, I don't see the problem you're pointing out here.

 What I want to solve here is to support the cases where the mbuf is
 segmented in the middle of the network header (which is probably a rare
 case).
>>>
>>> How it's gonna segmented?
>>
>> The mbuf is given by the application. So if the application generates a
>> segmented mbuf, it should work.
>>
>> This could happen for instance if the application uses mbuf clones to share
>> the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
>> different destination.
>>
>>
 In the "else" part, I only access the mbuf byte by byte using the
 rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
 the header in a linear buffer, fix the checksum, then copy it again in the
 packet, but there is no mbuf helpers to do these copies for now.
>>>
>>> In the "else" clause, the ip header is still in the mbuf, right?
>>> Why do you have to access it the way like:
>>>
>>> ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>>> m->l2_len) >> 4;
>>>
>>> Why can't you just use
>>>
>>> iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>>> iph->version_ihl ;
>>
>> AFAIK, there is no requirement that each network header has to be contiguous
>> in a mbuf segment.
>>
>> Of course, a split in the middle of a network header probably never
>> happens... but we never knows, as it is not forbidden. I think the code
>> should be robust enough to avoid accesses to wrong addresses.
>>
>> Hope it's clear enough :)
>
> Thanks, but not really. Maybe let me ask this way: what wrong would
> happen if we use
>   iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> to access the IP header? Is it about the endian?

If you have a packet split like this:

mbuf segment 1 mbuf segment 2
   --
| Ethernet header |  IP hea|   |der | TCP header | data
   --
^
iph

The IP header is not contiguous. So accessing to the end of the 
structure will access to a wrong location.

> One more question is do you have any case to trigger the "else" clause?

No, but I think it may happen.

Olivier


[dpdk-dev] [PATCH v2] app/testpmd: fix DCB config issue

2016-10-13 Thread Thomas Monjalon
> > An issue is found that DCB cannot be configured on ixgbe NICs. It's said the
> > TX queue number is not right.
> > On ixgbe the max TX queue number is not fixed, it depends on the multi-
> > queue mode.
> > 
> > This patch adds the device configuration before getting info in the DCB
> > configuration process. So the right info can be got depending on the
> > configuration.
> > 
> > Fixes: 1a572499beb6 (app/testpmd: setup DCB forwarding based on traffic
> > class)
> > Signed-off-by: Wenzhuo Lu 
> 
> Acked-by: Bernard Iremonger 

Applied, thanks


[dpdk-dev] [PATCH] mempool: fix search of maximum contiguous pages

2016-10-13 Thread Olivier MATZ
Hi Wei,

On 10/13/2016 02:31 PM, Ananyev, Konstantin wrote:
>
>>
> diff --git a/lib/librte_mempool/rte_mempool.c
> b/lib/librte_mempool/rte_mempool.c
> index 71017e1..e3e254a 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -426,9 +426,12 @@ rte_mempool_populate_phys_tab(struct
> rte_mempool *mp, char *vaddr,
>
>   for (i = 0; i < pg_num && mp->populated_size < mp->size; i += 
> n) {
>
> + phys_addr_t paddr_next;
> + paddr_next = paddr[i] + pg_sz;
> +
>   /* populate with the largest group of contiguous pages 
> */
>   for (n = 1; (i + n) < pg_num &&
> -  paddr[i] + pg_sz == paddr[i+n]; n++)
> +  paddr_next == paddr[i+n]; n++, paddr_next += pg_sz)
>   ;

 Good catch.
 Why not just paddr[i + n - 1] != paddr[i + n]?
>>>
>>> Sorry, I meant 'paddr[i + n - 1] + pg_sz == paddr[i+n]' off course.
>>>
 Then you don't need extra variable (paddr_next) here.
 Konstantin
>>
>> Thank you, Konstantin
>> 'paddr[i + n - 1] + pg_sz = paddr[i + n]' also can fix it and have straight 
>> meaning.
>> But I assume that my revision with paddr_next += pg_sz may have a bit better 
>> performance.
>
> I don't think there would be any real difference, again it is not performance 
> critical code-path.
>
>> By the way, paddr[i] + n * pg_sz = paddr[i + n] can also resolve it.
>
> Yes, that's one seems even better for me - make things more clear.

Thank you for fixing this.

My vote would go for "addr[i + n - 1] + pg_sz == paddr[i + n]"

If you feel "paddr[i] + n * pg_sz = paddr[i + n]" is clearer, I have no 
problem with it either.

Regards,
Olivier


[dpdk-dev] [PATCH v9] drivers/net:new PMD using tun/tap host interface

2016-10-13 Thread Keith Wiles
The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
on the local host. The PMD allows for DPDK and the host to
communicate using a raw device interface on the host and in
the DPDK application. The device created is a Tap device with
a L2 packet header.

v9 - Fix up the docs to use correct syntax
v8 - Fix issue with tap_tx_queue_setup() not return zero on success.
v7 - Reword the comment in common_base and fix the data->name issue
v6 - fixed the checkpatch issues
v5 - merge in changes from list review see related emails
 fixed many minor edits
v4 - merge with latest driver changes
v3 - fix includes by removing ifdef for other type besides Linux
 Fix the copyright notice in the Makefile
v2 - merge all of the patches into one patch
 Fix a typo on naming the tap device
 Update the maintainers list

Signed-off-by: Keith Wiles 
---
 MAINTAINERS |   5 +
 config/common_base  |   9 +
 config/common_linuxapp  |   1 +
 doc/guides/nics/index.rst   |   1 +
 doc/guides/nics/tap.rst | 136 ++
 drivers/net/Makefile|   1 +
 drivers/net/tap/Makefile|  57 +++
 drivers/net/tap/rte_eth_tap.c   | 756 
 drivers/net/tap/rte_pmd_tap_version.map |   4 +
 mk/rte.app.mk   |   1 +
 10 files changed, 971 insertions(+)
 create mode 100644 doc/guides/nics/tap.rst
 create mode 100644 drivers/net/tap/Makefile
 create mode 100644 drivers/net/tap/rte_eth_tap.c
 create mode 100644 drivers/net/tap/rte_pmd_tap_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f5fa82..433d402 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -394,6 +394,11 @@ F: doc/guides/nics/pcap_ring.rst
 F: app/test/test_pmd_ring.c
 F: app/test/test_pmd_ring_perf.c

+Tap PMD
+M: Keith Wiles 
+F: drivers/net/tap
+F: doc/guides/nics/tap.rst
+
 Null Networking PMD
 M: Tetsuya Mukawa 
 F: drivers/net/null/
diff --git a/config/common_base b/config/common_base
index f5d2eff..47ef843 100644
--- a/config/common_base
+++ b/config/common_base
@@ -592,3 +592,12 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the TAP PMD
+#
+# The TAP PMD is currently only built for Linux and the
+# option is enabled by default in common_linuxapp file,
+# set to 'n' in the common_base file.
+#
+CONFIG_RTE_LIBRTE_PMD_TAP=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2483dfa..782b503 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -44,3 +44,4 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
 CONFIG_RTE_VIRTIO_USER=y
+CONFIG_RTE_LIBRTE_PMD_TAP=y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 92d56a5..f676a52 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -56,6 +56,7 @@ Network Interface Controller Drivers
 vhost
 vmxnet3
 pcap_ring
+tap

 **Figures**

diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
new file mode 100644
index 000..622b9e7
--- /dev/null
+++ b/doc/guides/nics/tap.rst
@@ -0,0 +1,136 @@
+..  BSD LICENSE
+Copyright(c) 2016 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Tun/Tap Poll Mode Driver
+
+
+The ``rte_eth_tap.c`` PMD 

[dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over rte_config file

2016-10-13 Thread Ananyev, Konstantin

> 
> It's true that users can patch around this problem (and I started off doing 
> just that), but why impose this inconvenience on users when DPDK
> can just "do the right thing" to begin with? For example, it took me several 
> hours to figure out why the problem was occurring and then to
> hunt down the --file-prefix solution. Is there some reason why it would be a 
> bad idea to fix this in DPDK?


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Olivier MATZ


On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
>>
>>
>> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
>>> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
 +/* When doing TSO, the IP length is not included in the pseudo header
 + * checksum of the packet given to the PMD, but for virtio it is
 + * expected.
 + */
 +static void
 +virtio_tso_fix_cksum(struct rte_mbuf *m)
 +{
 +  /* common case: header is not fragmented */
 +  if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
 +  m->l4_len)) {
>>> ...
 +  /* replace it in the packet */
 +  th->cksum = new_cksum;
 +  } else {
>>> ...
 +  /* replace it in the packet */
 +  *rte_pktmbuf_mtod_offset(m, uint8_t *,
 +  m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
 +  *rte_pktmbuf_mtod_offset(m, uint8_t *,
 +  m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
 +  }
>>>
>>> The tcp header will always be in the mbuf, right? Otherwise, you can't
>>> update the cksum field here. What's the point of introducing the "else
>>> clause" then?
>>
>> Sorry, I don't see the problem you're pointing out here.
>>
>> What I want to solve here is to support the cases where the mbuf is
>> segmented in the middle of the network header (which is probably a rare
>> case).
>
> How it's gonna segmented?

The mbuf is given by the application. So if the application generates a 
segmented mbuf, it should work.

This could happen for instance if the application uses mbuf clones to 
share the IP/TCP/data part of the mbuf and prepend a specific 
Ethernet/vlan for different destination.


>> In the "else" part, I only access the mbuf byte by byte using the
>> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
>> the header in a linear buffer, fix the checksum, then copy it again in the
>> packet, but there is no mbuf helpers to do these copies for now.
>
> In the "else" clause, the ip header is still in the mbuf, right?
> Why do you have to access it the way like:
>
>   ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>   m->l2_len) >> 4;
>
> Why can't you just use
>
>   iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>   iph->version_ihl ;

AFAIK, there is no requirement that each network header has to be 
contiguous in a mbuf segment.

Of course, a split in the middle of a network header probably never 
happens... but we never knows, as it is not forbidden. I think the code 
should be robust enough to avoid accesses to wrong addresses.

Hope it's clear enough :)

Thanks
Olivier


[dpdk-dev] 17.02 Roadmap

2016-10-13 Thread Thomas Monjalon
2016-10-13 15:18, Hunt, David:
> Hi Thomas,
> 
> On 10/10/2016 9:42 PM, Thomas Monjalon wrote:
> > 2016-10-10 16:13, O'Driscoll, Tim:
> >> Packet Distributor Enhancements: Enhancements will be made to the Packet 
> >> Distributor library to improve performance:
> >> 1. Introduce burst functionality to allow batches of packets to be sent to 
> >> workers.
> >> 2. Improve the performance of the flow/core affinity through the use of 
> >> SSE/AVX instructions.
> > 
> > The packet distributor looks more and more like a DPDK application
> > at the same level of BESS, VPP, OVS, etc.
> 
> Lets discuss this further.  Would you see other libraries heading in 
> this direction also (reorder, acl, hash, etc)?

Not so easy to put things in a category.

> Do you think it would be an idea to add as an item of discussion for the 
> technical steering group when we're all at Userspace next week?

If others are interested to discuss about libraries categories and growth,
yes I'll jump in.


[dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support

2016-10-13 Thread Yuanhan Liu
On Mon, Oct 03, 2016 at 11:00:21AM +0200, Olivier Matz wrote:
> + /* Checksum Offload */
> + switch (cookie->ol_flags & PKT_TX_L4_MASK) {
> + case PKT_TX_UDP_CKSUM:
> + hdr->csum_start = cookie->l2_len + cookie->l3_len;
> + hdr->csum_offset = 6;
> + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> + break;
> +
> + case PKT_TX_TCP_CKSUM:
> + hdr->csum_start = cookie->l2_len + cookie->l3_len;
> + hdr->csum_offset = 16;

I would suggest to use "offsetof(...)" here, instead of some magic
number like 16.

--yliu


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Stephen Hemminger
On Thu, 13 Oct 2016 16:18:39 +0800
Yuanhan Liu  wrote:

> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> > +/* When doing TSO, the IP length is not included in the pseudo header
> > + * checksum of the packet given to the PMD, but for virtio it is
> > + * expected.
> > + */
> > +static void
> > +virtio_tso_fix_cksum(struct rte_mbuf *m)
> > +{
> > +   /* common case: header is not fragmented */
> > +   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> > +   m->l4_len)) {  
> ...
> > +   /* replace it in the packet */
> > +   th->cksum = new_cksum;
> > +   } else {  
> ...
> > +   /* replace it in the packet */
> > +   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > +   m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> > +   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > +   m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> > +   }  
> 
> The tcp header will always be in the mbuf, right? Otherwise, you can't
> update the cksum field here. What's the point of introducing the "else
> clause" then?
> 
>   --yliu

You need to check the reference count before updating any data in mbuf.


[dpdk-dev] [PATCH v8] drivers/net:new PMD using tun/tap host interface

2016-10-13 Thread Mcnamara, John
Hi,

Some doc comments below. Apologies, for the late review, I didn't see
the docs inline.



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Keith Wiles
> Sent: Thursday, October 13, 2016 5:11 PM
> To: dev at dpdk.org
> Cc: pmatilai at redhat.com; yuanhan.liu at linux.intel.com; Yigit, Ferruh
> 
> Subject: [dpdk-dev] [PATCH v8] drivers/net:new PMD using tun/tap host
> interface
> 
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces on the
> local host. The PMD allows for DPDK and the host to communicate using a
> raw device interface on the host and in the DPDK application. The device
> created is a Tap device with a L2 packet header.
> 
> v8 - Fix issue with tap_tx_queue_setup() not return zero on success.
> v7 - Reword the comment in common_base and fix the data->name issue
> v6 - fixed the checkpatch issues
> v5 - merge in changes from list review see related emails
>  fixed many minor edits
> v4 - merge with latest driver changes
> v3 - fix includes by removing ifdef for other type besides Linux
>  Fix the copyright notice in the Makefile
> v2 - merge all of the patches into one patch
>  Fix a typo on naming the tap device
>  Update the maintainers list
> 
> Signed-off-by: Keith Wiles 
> ---
>  MAINTAINERS |   5 +
>  config/common_base  |   9 +
>  config/common_linuxapp  |   1 +
>  doc/guides/nics/tap.rst | 138 ++
>  drivers/net/Makefile|   1 +
>  drivers/net/tap/Makefile|  57 +++
>  drivers/net/tap/rte_eth_tap.c   | 756


You need to add "tap" to the doc/guides/nics/index.rst file to
include the tap.rst in the docs.


> +
> +Tun/Tap Poll Mode Driver
> +
> +
> +The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces on the
> +local host. The PMD allows for DPDK and the host to communicate using a
> +raw device interface on the host and in the DPDK application.
> +
> +The device created is a TAP device, which sends/receives packet in a
> +raw format with a L2 header. The usage for a TAP PMD is for
> +connectivity to the local host using a TAP interface. When the TAP PMD
> +is initialized it will create a number of tap devices in the host
> +accessed via 'ifconfig -a' or 'ip' command. The commands can be used to
> assign and query the virtual like device.

The apostrophes around the commands should be replaced with `` to render
them as fixed width.


> +
> +These TAP interfaces can be used with wireshark or tcpdump or
> +Pktgen-DPDK along with being able to be used as a network connection to
> +the DPDK application. The method enable one or more interfaces is to
> +use the --vdev=net_tap option on the DPDK application  command line.
> +Each --vdev=net_tap option give will create an interface named dtap0,
> dtap1, ... and so forth.

Same here, include any commands or variable names in backticks to render
them as fixed width in the text: ``--vdev=net_tap``


> +
> +.. code-block:: console
> +
> +   The interfaced name can be changed by adding the iface=foo0
> +   e.g. --vdev=net_tap,iface=foo0 --vdev=net_tap,iface=foo1, ...

This would be better formatted as follows:

The interfaced name can be changed by adding the ``iface=foo0``, for example::

   --vdev=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...



> +If you have a Network Stack in your DPDK application or something like
> +it you can utilize that stack to handle the network protocols. Plus you
> +would be able to address the interface using an IP address assigned to
> the internal interface.
> +
> +A very crude test you can do the following:
> +
> +Apply the patch below and make sure you have socat installed on your
> system.

There is no patch below in the docs. ;-) Also, this would probably be better
as a new section. Something like:


Example
---

The following is a simple example of using the TUN/TAP PMD with the Pktgen
packet generator. It requires that the ``socat`` utility is installed on the
test system.

Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Target
used to build the dpdk you pulled down.

...

> +
> +Build DPDK, then pull down Pktgen and build pktgen using the DPDK
> +SDK/Target used to build the dpdk you pulled down.
> +
> +Run pktgen from the pktgen repo directory in an xterm:
> +Note: change the -b options to blacklist all of your physical ports.
> The
> +  following command line is all one line.


The RST syntax for Note and the indentation are wrong here. Also the note
would be better after the example. Something like:


.. Note:

   Change the ``-b`` options to blacklist all of your physical ports. The
   following command line is all one line.

   Also, ``-f themes/black-yellow.theme`` is optional if the default colors
   work on your system configuration. See the Pktgen docs for more
   information.


Finally, if you want to include the 

[dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over rte_config file

2016-10-13 Thread Van Haaren, Harry
Hi,

> From: John Ousterhout [mailto:ouster at cs.stanford.edu] 

> But, given the existence of the --file-prefix option, isn't it already unsafe 
> for Collectd to check only for .rte_config? If it's important for other 
> programs to be able to find the config files, it seems to me that a more 
> robust mechanism is needed than the current one.

If the user is using the DPDK --file-prefix, we expect the user to provide that 
same --file-prefix to inform Collectd of the changed config path. Details on 
how to do so are available here: 
https://github.com/collectd/collectd/blob/master/src/collectd.conf.pod#plugin-dpdkstat

Keep in mind a for a simple setup the current defaults will just work, so 
changing the default seems a bad idea to me.

Regards, -Harry


> On Thu, Oct 13, 2016 at 9:07 AM, Van Haaren, Harry  intel.com> wrote:
>Hi John,

>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John Ousterhout
>> Subject: Re: [dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over 
>> rte_config file

> 

> For example, it took me several hours
> to figure out why the problem was occurring and then to hunt down the
> --file-prefix solution. Is there some reason why it would be a bad idea to
> fix this in DPDK?

Right now, DPDK will by default always use a consistent .rte_config location, 
which allows other applications to monitor that. For example, Collectd is able 
to monitor a DPDK primary process by checking if the .rte_config file exists at 
its default location[1].

If we change the default behavior of DPDK, other projects can no longer rely on 
that file being created, and these monitoring projects must be updated in sync 
with DPDK to avoid breakage if versions mismatch.

Although I see your use-case, I think using the --file-prefix as Konstantin 
suggests is a better solution in this case.

Regards, -Harry

[1] https://github.com/collectd/collectd/blob/master/src/dpdkstat.c#L60


> On Thu, Oct 13, 2016 at 3:53 AM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
>
> >
> > Hi John,
> >
> > > Before this patch, DPDK used the file ~/.rte_config as a lock to detect
> > > potential interference between multiple DPDK applications running on the
> > > same machine. However, if a single user ran DPDK applications
> > concurrently
> > > on several different machines, and if the user's home directory was
> > shared
> > > between the machines via NFS, DPDK would incorrectly detect conflicts
> > > for all but the first application and abort them. This patch fixes the
> > > problem by incorporating the machine name into the config file name
> > (e.g.,
> > > ~/.rte_hostname_config).
> > >
> > > Signed-off-by: John Ousterhout 
> > > ---
> > >? doc/guides/prog_guide/multi_proc_support.rst | 11 +++
> > >? lib/librte_eal/common/eal_common_proc.c? ? ? |? 8 ++--
> > >? lib/librte_eal/common/eal_filesystem.h? ? ? ?| 15 +--
> > >? 3 files changed, 22 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/multi_proc_support.rst
> > b/doc/guides/prog_guide/multi_proc_support.rst
> > > index badd102..a54fa1c 100644
> > > --- a/doc/guides/prog_guide/multi_proc_support.rst
> > > +++ b/doc/guides/prog_guide/multi_proc_support.rst
> > > @@ -129,10 +129,13 @@ Support for this usage scenario is provided using
> > the ``--file-prefix`` paramete
> > >
> > >? By default, the EAL creates hugepage files on each hugetlbfs filesystem
> > using the rtemap_X filename,
> > >? where X is in the range 0 to the maximum number of hugepages -1.
> > > -Similarly, it creates shared configuration files, memory mapped in each
> > process, using the /var/run/.rte_config filename,
> > > -when run as root (or $HOME/.rte_config when run as a non-root user;
> > > -if filesystem and device permissions are set up to allow this).
> > > -The rte part of the filenames of each of the above is configurable
> > using the file-prefix parameter.
> > > +Similarly, it creates shared configuration files, memory mapped in each
> > process.
> > > +When run as root, the name of the configuration file will be
> > > +/var/run/.rte_*host*_config, where *host* is the name of the machine.
> > > +When run as a non-root user, the the name of the configuration file
> > will be
> > > +$HOME/.rte_*host*_config (if filesystem and device permissions are set
> > up to allow this).
> > > +If the ``--file-prefix`` parameter has been specified, its value will
> > be used
> > > +in place of "rte" in the file names.
> >
> > I am not sure that we need to handle all such cases inside EAL.
> > User can easily overcome that problem by just adding something like:
> > --file-prefix=`uname -n`
> > to his command-line.
> > Konstantin
> >
> > >
> > >? In addition to specifying the file-prefix parameter,
> > >? any DPDK applications that are to be run side-by-side must explicitly
> > limit their memory use.
> > > diff --git a/lib/librte_eal/common/eal_common_proc.c
> > b/lib/librte_eal/common/eal_common_proc.c
> > > index 

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Yuanhan Liu
On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> + /* common case: header is not fragmented */
> + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> + m->l4_len)) {
...
> + /* replace it in the packet */
> + th->cksum = new_cksum;
> + } else {
...
> + /* replace it in the packet */
> + *rte_pktmbuf_mtod_offset(m, uint8_t *,
> + m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> + *rte_pktmbuf_mtod_offset(m, uint8_t *,
> + m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> + }

The tcp header will always be in the mbuf, right? Otherwise, you can't
update the cksum field here. What's the point of introducing the "else
clause" then?

--yliu


[dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support

2016-10-13 Thread Olivier Matz
Signed-off-by: Olivier Matz 
---
 drivers/net/virtio/virtio_ethdev.c |   6 ++
 drivers/net/virtio/virtio_ethdev.h |   2 +
 drivers/net/virtio/virtio_rxtx.c   | 133 +++--
 3 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 109f855..969edb6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1572,6 +1572,7 @@ virtio_dev_link_update(struct rte_eth_dev *dev, 
__rte_unused int wait_to_complet
 static void
 virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
+   uint64_t tso_mask;
struct virtio_hw *hw = dev->data->dev_private;

if (dev->pci_dev)
@@ -1599,6 +1600,11 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_UDP_CKSUM |
DEV_TX_OFFLOAD_TCP_CKSUM;
}
+
+   tso_mask = (1ULL << VIRTIO_NET_F_HOST_TSO4) |
+   (1ULL << VIRTIO_NET_F_HOST_TSO6);
+   if ((hw->guest_features & tso_mask) == tso_mask)
+   dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
 }

 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index d55e7ed..f77f618 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -63,6 +63,8 @@
 1u << VIRTIO_NET_F_CTRL_RX   | \
 1u << VIRTIO_NET_F_CTRL_VLAN | \
 1u << VIRTIO_NET_F_CSUM  | \
+1u << VIRTIO_NET_F_HOST_TSO4 | \
+1u << VIRTIO_NET_F_HOST_TSO6 | \
 1u << VIRTIO_NET_F_MRG_RXBUF | \
 1u << VIRTIO_RING_F_INDIRECT_DESC |\
 1ULL << VIRTIO_F_VERSION_1)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 0fa635a..4b01ea3 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -209,10 +209,117 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, 
struct rte_mbuf *cookie)
return 0;
 }

+/* When doing TSO, the IP length is not included in the pseudo header
+ * checksum of the packet given to the PMD, but for virtio it is
+ * expected.
+ */
+static void
+virtio_tso_fix_cksum(struct rte_mbuf *m)
+{
+   /* common case: header is not fragmented */
+   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
+   m->l4_len)) {
+   struct ipv4_hdr *iph;
+   struct ipv6_hdr *ip6h;
+   struct tcp_hdr *th;
+   uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
+   uint32_t tmp;
+
+   iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
+   th = RTE_PTR_ADD(iph, m->l3_len);
+   if ((iph->version_ihl >> 4) == 4) {
+   iph->hdr_checksum = 0;
+   iph->hdr_checksum = rte_ipv4_cksum(iph);
+   ip_len = iph->total_length;
+   ip_paylen = rte_cpu_to_be_16(rte_be_to_cpu_16(ip_len) -
+   m->l3_len);
+   } else {
+   ip6h = (struct ipv6_hdr *)iph;
+   ip_paylen = ip6h->payload_len;
+   }
+
+   /* calculate the new phdr checksum not including ip_paylen */
+   prev_cksum = th->cksum;
+   tmp = prev_cksum;
+   tmp += ip_paylen;
+   tmp = (tmp & 0x) + (tmp >> 16);
+   new_cksum = tmp;
+
+   /* replace it in the packet */
+   th->cksum = new_cksum;
+   } else {
+   const struct ipv4_hdr *iph;
+   struct ipv4_hdr iph_copy;
+   union {
+   uint16_t u16;
+   uint8_t u8[2];
+   } prev_cksum, new_cksum, ip_len, ip_paylen, ip_csum;
+   uint32_t tmp;
+
+   /* Same code than above, but we use rte_pktmbuf_read()
+* or we read/write in mbuf data one byte at a time to
+* avoid issues if the packet is multi segmented.
+*/
+
+   uint8_t ip_version;
+
+   ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+   m->l2_len) >> 4;
+
+   /* calculate ip checksum (API imposes to set it to 0)
+* and get ip payload len */
+   if (ip_version == 4) {
+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
+   m->l2_len + 10) = 0;
+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
+   m->l2_len + 11) = 0;
+   iph = rte_pktmbuf_read(m, m->l2_len,
+   sizeof(*iph), _copy);
+   ip_csum.u16 = rte_ipv4_cksum(iph);
+   *rte_pktmbuf_mtod_offset(m, 

[dpdk-dev] [PATCH v3 11/12] net/virtio: add Lro support

2016-10-13 Thread Olivier Matz
Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++-
 drivers/net/virtio/virtio_ethdev.h |  9 -
 drivers/net/virtio/virtio_rxtx.c   | 25 -
 3 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c3c53be..109f855 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1348,6 +1348,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
if (rxmode->hw_ip_checksum)
req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+   if (rxmode->enable_lro)
+   req_features |=
+   (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+   (1ULL << VIRTIO_NET_F_GUEST_TSO6);

/* if request features changed, reinit the device */
if (req_features != hw->req_guest_features) {
@@ -1363,6 +1367,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -ENOTSUP;
}

+   if (rxmode->enable_lro &&
+   (!vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
+   !vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4))) {
+   PMD_DRV_LOG(NOTICE,
+   "lro not available on this host");
+   return -ENOTSUP;
+   }
+
/* Setup and start control queue */
if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
ret = virtio_dev_cq_queue_setup(dev,
@@ -1578,7 +1590,8 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
};
dev_info->rx_offload_capa =
DEV_RX_OFFLOAD_TCP_CKSUM |
-   DEV_RX_OFFLOAD_UDP_CKSUM;
+   DEV_RX_OFFLOAD_UDP_CKSUM |
+   DEV_RX_OFFLOAD_TCP_LRO;
dev_info->tx_offload_capa = 0;

if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index adca6ba..d55e7ed 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -117,13 +117,4 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct 
rte_mbuf **tx_pkts,

 int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);

-/*
- * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
- * frames larger than 1514 bytes. We do not yet support software LRO
- * via tcp_lro_rx().
- */
-#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
-   VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
-
-
 #endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 675dc43..0fa635a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -715,13 +715,36 @@ virtio_rx_offload(struct rte_mbuf *m, struct 
virtio_net_hdr *hdr)
m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
}

+   /* GSO request, save required information in mbuf */
+   if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+   /* Check unsupported modes */
+   if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
+   (hdr->gso_size == 0)) {
+   return -EINVAL;
+   }
+
+   /* Update mss lengthes in mbuf */
+   m->tso_segsz = hdr->gso_size;
+   switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+   case VIRTIO_NET_HDR_GSO_TCPV4:
+   case VIRTIO_NET_HDR_GSO_TCPV6:
+   m->ol_flags |= PKT_RX_LRO | \
+   PKT_RX_L4_CKSUM_NONE;
+   break;
+   default:
+   return -EINVAL;
+   }
+   }
+
return 0;
 }

 static inline int
 rx_offload_enabled(struct virtio_hw *hw)
 {
-   return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM);
+   return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
+   vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6);
 }

 #define VIRTIO_MBUF_BURST_SZ 64
-- 
2.8.1



[dpdk-dev] [PATCH v3 10/12] net/virtio: add Tx checksum offload support

2016-10-13 Thread Olivier Matz
Signed-off-by: Olivier Matz 
---
 drivers/net/virtio/virtio_ethdev.c |  7 
 drivers/net/virtio/virtio_ethdev.h |  1 +
 drivers/net/virtio/virtio_rxtx.c   | 73 +++---
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 00b4c38..c3c53be 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1579,6 +1579,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->rx_offload_capa =
DEV_RX_OFFLOAD_TCP_CKSUM |
DEV_RX_OFFLOAD_UDP_CKSUM;
+   dev_info->tx_offload_capa = 0;
+
+   if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
+   dev_info->tx_offload_capa |=
+   DEV_TX_OFFLOAD_UDP_CKSUM |
+   DEV_TX_OFFLOAD_TCP_CKSUM;
+   }
 }

 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index fd29a7f..adca6ba 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -62,6 +62,7 @@
 1u << VIRTIO_NET_F_CTRL_VQ   | \
 1u << VIRTIO_NET_F_CTRL_RX   | \
 1u << VIRTIO_NET_F_CTRL_VLAN | \
+1u << VIRTIO_NET_F_CSUM  | \
 1u << VIRTIO_NET_F_MRG_RXBUF | \
 1u << VIRTIO_RING_F_INDIRECT_DESC |\
 1ULL << VIRTIO_F_VERSION_1)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index fc0d84b..675dc43 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -53,6 +53,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -207,18 +209,27 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, 
struct rte_mbuf *cookie)
return 0;
 }

+static inline int
+tx_offload_enabled(struct virtio_hw *hw)
+{
+   return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM);
+}
+
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
   uint16_t needed, int use_indirect, int can_push)
 {
+   struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
struct vq_desc_extra *dxp;
struct virtqueue *vq = txvq->vq;
struct vring_desc *start_dp;
uint16_t seg_num = cookie->nb_segs;
uint16_t head_idx, idx;
uint16_t head_size = vq->hw->vtnet_hdr_size;
-   unsigned long offs;
+   struct virtio_net_hdr *hdr;
+   int offload;

+   offload = tx_offload_enabled(vq->hw);
head_idx = vq->vq_desc_head_idx;
idx = head_idx;
dxp = >vq_descx[idx];
@@ -228,10 +239,12 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
start_dp = vq->vq_ring.desc;

if (can_push) {
-   /* put on zero'd transmit header (no offloads) */
-   void *hdr = rte_pktmbuf_prepend(cookie, head_size);
-
-   memset(hdr, 0, head_size);
+   /* prepend cannot fail, checked by caller */
+   hdr = (struct virtio_net_hdr *)
+   rte_pktmbuf_prepend(cookie, head_size);
+   /* if offload disabled, it is not zeroed below, do it now */
+   if (offload == 0)
+   memset(hdr, 0, head_size);
} else if (use_indirect) {
/* setup tx ring slot to point to indirect
 * descriptor list stored in reserved region.
@@ -239,14 +252,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
 * the first slot in indirect ring is already preset
 * to point to the header in reserved region
 */
-   struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
-
-   offs = idx * sizeof(struct virtio_tx_region)
-   + offsetof(struct virtio_tx_region, tx_indir);
-
-   start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+   start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+   RTE_PTR_DIFF([idx].tx_indir, txr);
start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
start_dp[idx].flags = VRING_DESC_F_INDIRECT;
+   hdr = (struct virtio_net_hdr *)[idx].tx_hdr;

/* loop below will fill in rest of the indirect elements */
start_dp = txr[idx].tx_indir;
@@ -255,15 +265,43 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
/* setup first tx ring slot to point to header
 * stored in reserved region.
 */
-   offs = idx * sizeof(struct virtio_tx_region)
-   + offsetof(struct virtio_tx_region, tx_hdr);
-
-   start_dp[idx].addr  = 

[dpdk-dev] [PATCH v3 09/12] net/virtio: add Rx checksum offload support

2016-10-13 Thread Olivier Matz
Signed-off-by: Olivier Matz 
---
 drivers/net/virtio/virtio_ethdev.c | 21 ++
 drivers/net/virtio/virtio_ethdev.h |  2 +-
 drivers/net/virtio/virtio_rxtx.c   | 79 ++
 drivers/net/virtio/virtqueue.h |  1 +
 4 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b5bc0ee..00b4c38 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->data->dev_flags = dev_flags;

/* reset device and negotiate default features */
-   ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
+   ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
if (ret < 0)
return ret;

@@ -1345,13 +1345,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
int ret;

PMD_INIT_LOG(DEBUG, "configure");
+   req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
+   if (rxmode->hw_ip_checksum)
+   req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);

-   if (rxmode->hw_ip_checksum) {
-   PMD_DRV_LOG(ERR, "HW IP checksum not supported");
-   return -EINVAL;
-   }
-
-   req_features = VIRTIO_PMD_GUEST_FEATURES;
/* if request features changed, reinit the device */
if (req_features != hw->req_guest_features) {
ret = virtio_init_device(dev, req_features);
@@ -1359,6 +1356,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return ret;
}

+   if (rxmode->hw_ip_checksum &&
+   !vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM)) {
+   PMD_DRV_LOG(NOTICE,
+   "rx ip checksum not available on this host");
+   return -ENOTSUP;
+   }
+
/* Setup and start control queue */
if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
ret = virtio_dev_cq_queue_setup(dev,
@@ -1572,6 +1576,9 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->default_txconf = (struct rte_eth_txconf) {
.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
};
+   dev_info->rx_offload_capa =
+   DEV_RX_OFFLOAD_TCP_CKSUM |
+   DEV_RX_OFFLOAD_UDP_CKSUM;
 }

 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index dc18341..fd29a7f 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -54,7 +54,7 @@
 #define VIRTIO_MAX_RX_PKTLEN  9728

 /* Features desired/implemented by this driver. */
-#define VIRTIO_PMD_GUEST_FEATURES  \
+#define VIRTIO_PMD_DEFAULT_GUEST_FEATURES  \
(1u << VIRTIO_NET_F_MAC   | \
 1u << VIRTIO_NET_F_STATUS| \
 1u << VIRTIO_NET_F_MQ| \
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9ab441b..fc0d84b 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -51,6 +51,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -632,6 +634,63 @@ virtio_update_packet_stats(struct virtnet_stats *stats, 
struct rte_mbuf *mbuf)
}
 }

+/* Optionally fill offload information in structure */
+static int
+virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
+{
+   struct rte_net_hdr_lens hdr_lens;
+   uint32_t hdrlen, ptype;
+   int l4_supported = 0;
+
+   /* nothing to do */
+   if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+   return 0;
+
+   m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+   ptype = rte_net_get_ptype(m, _lens, RTE_PTYPE_ALL_MASK);
+   m->packet_type = ptype;
+   if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+   (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+   (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+   l4_supported = 1;
+
+   if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+   hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+   if (hdr->csum_start <= hdrlen && l4_supported) {
+   m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+   } else {
+   /* Unknown proto or tunnel, do sw cksum. We can assume
+* the cksum field is in the first segment since the
+* buffers we provided to the host are large enough.
+* In case of SCTP, this will be wrong since it's a CRC
+* but there's nothing we can do.
+*/
+   uint16_t csum, off;
+
+   rte_raw_cksum_mbuf(m, hdr->csum_start,
+   

[dpdk-dev] [PATCH v3 08/12] app/testpmd: display lro segment size

2016-10-13 Thread Olivier Matz
In csumonly engine, display the value of LRO segment if the
LRO flag is set.

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 app/test-pmd/csumonly.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index da15185..57e6ae2 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -822,6 +822,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
"l4_proto=%d l4_len=%d flags=%s\n",
info.l2_len, rte_be_to_cpu_16(info.ethertype),
info.l3_len, info.l4_proto, info.l4_len, buf);
+   if (rx_ol_flags & PKT_RX_LRO)
+   printf("rx: m->lro_segsz=%u\n", m->tso_segsz);
if (info.is_tunnel == 1)
printf("rx: outer_l2_len=%d outer_ethertype=%x "
"outer_l3_len=%d\n", info.outer_l2_len,
-- 
2.8.1



[dpdk-dev] [PATCH v3 07/12] mbuf: new flag for LRO

2016-10-13 Thread Olivier Matz
When receiving coalesced packets in virtio, the original size of the
segments is provided. This is a useful information because it allows to
resegment with the same size.

Add a RX new flag in mbuf, that can be set when packets are coalesced by
a hardware or virtual driver when the m->tso_segsz field is valid and is
set to the segment size of original packets.

This flag is used in next commits in the virtio pmd.

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 doc/guides/rel_notes/release_16_11.rst | 5 +
 lib/librte_mbuf/rte_mbuf.c | 2 ++
 lib/librte_mbuf/rte_mbuf.h | 7 +++
 3 files changed, 14 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst 
b/doc/guides/rel_notes/release_16_11.rst
index 2ec63b2..c9fcfb9 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -115,6 +115,11 @@ New Features
   good, bad, or not present (useful for virtual drivers). This modification
   was done for IP and L4.

+* **Added a LRO mbuf flag.**
+
+  Added a new RX LRO mbuf flag, used when packets are coalesced. This
+  flag indicates that the segment size of original packets is known.
+
 Resolved Issues
 ---

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 8d9b875..63f43c8 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -319,6 +319,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
+   case PKT_RX_LRO: return "PKT_RX_LRO";
default: return NULL;
}
 }
@@ -352,6 +353,7 @@ rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t 
buflen)
{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
{ PKT_RX_IEEE1588_TMST, PKT_RX_IEEE1588_TMST, NULL },
{ PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ_STRIPPED, NULL },
+   { PKT_RX_LRO, PKT_RX_LRO, NULL },
};
const char *name;
unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 38022a3..f5eedda 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -170,6 +170,13 @@ extern "C" {
  */
 #define PKT_RX_QINQ_PKT  PKT_RX_QINQ_STRIPPED

+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO   (1ULL << 16)
+
 /* add new RX flags here */

 /* add new TX flags here */
-- 
2.8.1



[dpdk-dev] [PATCH v3 06/12] app/testpmd: adapt checksum stats in csum engine

2016-10-13 Thread Olivier Matz
Reviewed-by: Maxime Coquelin 
---
 app/test-pmd/csumonly.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 27d0f08..da15185 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -697,8 +697,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
rx_ol_flags = m->ol_flags;

/* Update the L3/L4 checksum error packet statistics */
-   rx_bad_ip_csum += ((rx_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-   rx_bad_l4_csum += ((rx_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+   if ((rx_ol_flags & PKT_RX_IP_CKSUM_MASK) == PKT_RX_IP_CKSUM_BAD)
+   rx_bad_ip_csum += 1;
+   if ((rx_ol_flags & PKT_RX_L4_CKSUM_MASK) == PKT_RX_L4_CKSUM_BAD)
+   rx_bad_l4_csum += 1;

/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
 * and inner headers */
-- 
2.8.1



[dpdk-dev] [PATCH v3 05/12] mbuf: add new Rx checksum mbuf flags

2016-10-13 Thread Olivier Matz
Following discussions in [1] and [2], introduce a new bit to
describe the Rx checksum status in mbuf.

Before this patch, only one flag was available:
  PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.

And same for L3:
  PKT_RX_IP_CKSUM_BAD: IP cksum of RX pkt. is not OK.

This had 2 issues:
- it was not possible to differentiate "checksum good" from
  "checksum unknown".
- it was not possible for a virtual driver to say "the checksum
  in packet may be wrong, but data integrity is valid".

This patch tries to solve this issue by having 4 states (2 bits)
for the IP and L4 Rx checksums. New values are:

 - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
   -> the application should verify the checksum by sw
 - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
   -> the application can drop the packet without additional check
 - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
   -> the application can accept the packet without verifying the
  checksum by sw
 - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
   data, but the integrity of the L4 data is verified.
   -> the application can process the packet but must not verify the
  checksum by sw. It has to take care to recalculate the cksum
  if the packet is transmitted (either by sw or using tx offload)

  And same for L3 (replace L4 by IP in description above).

This commit tries to be compatible with existing applications that
only check the existing flag (CKSUM_BAD).

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-June/040007.html

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 doc/guides/rel_notes/release_16_11.rst |  6 
 lib/librte_mbuf/rte_mbuf.c | 16 +--
 lib/librte_mbuf/rte_mbuf.h | 51 --
 3 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_11.rst 
b/doc/guides/rel_notes/release_16_11.rst
index fbc0cbd..2ec63b2 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -109,6 +109,12 @@ New Features
   Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
   data embedded in an mbuf chain.

+* **Added new Rx checksum mbuf flags.**
+
+  Added new Rx checksum flags in mbufs to describe more states: unknown,
+  good, bad, or not present (useful for virtual drivers). This modification
+  was done for IP and L4.
+
 Resolved Issues
 ---

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 4e1fdd1..8d9b875 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -309,7 +309,11 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
case PKT_RX_FDIR: return "PKT_RX_FDIR";
case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+   case PKT_RX_L4_CKSUM_GOOD: return "PKT_RX_L4_CKSUM_GOOD";
+   case PKT_RX_L4_CKSUM_NONE: return "PKT_RX_L4_CKSUM_NONE";
case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+   case PKT_RX_IP_CKSUM_GOOD: return "PKT_RX_IP_CKSUM_GOOD";
+   case PKT_RX_IP_CKSUM_NONE: return "PKT_RX_IP_CKSUM_NONE";
case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
case PKT_RX_VLAN_STRIPPED: return "PKT_RX_VLAN_STRIPPED";
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
@@ -333,8 +337,16 @@ rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t 
buflen)
{ PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT, NULL },
{ PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, NULL },
{ PKT_RX_FDIR, PKT_RX_FDIR, NULL },
-   { PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, NULL },
-   { PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD, NULL },
+   { PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_MASK, NULL },
+   { PKT_RX_L4_CKSUM_GOOD, PKT_RX_L4_CKSUM_MASK, NULL },
+   { PKT_RX_L4_CKSUM_NONE, PKT_RX_L4_CKSUM_MASK, NULL },
+   { PKT_RX_L4_CKSUM_UNKNOWN, PKT_RX_L4_CKSUM_MASK,
+ "PKT_RX_L4_CKSUM_UNKNOWN" },
+   { PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK, NULL },
+   { PKT_RX_IP_CKSUM_GOOD, PKT_RX_IP_CKSUM_MASK, NULL },
+   { PKT_RX_IP_CKSUM_NONE, PKT_RX_IP_CKSUM_MASK, NULL },
+   { PKT_RX_IP_CKSUM_UNKNOWN, PKT_RX_IP_CKSUM_MASK,
+ "PKT_RX_IP_CKSUM_UNKNOWN" },
{ PKT_RX_EIP_CKSUM_BAD, PKT_RX_EIP_CKSUM_BAD, NULL },
{ PKT_RX_VLAN_STRIPPED, PKT_RX_VLAN_STRIPPED, NULL },
{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7541070..38022a3 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,8 +91,25 @@ extern "C" {

 #define PKT_RX_RSS_HASH  (1ULL << 1)  /**< RX packet with RSS 

[dpdk-dev] [PATCH v3 04/12] net: add function to calculate a checksum in a mbuf

2016-10-13 Thread Olivier Matz
This function can be used to calculate the checksum of data embedded in
mbuf, that can be composed of several segments.

This function will be used by the virtio pmd in next commits to calculate
the checksum in software in case the protocol is not recognized.

Signed-off-by: Olivier Matz 
---
 doc/guides/rel_notes/release_16_11.rst |  5 +++
 lib/librte_net/rte_ip.h| 71 ++
 2 files changed, 76 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst 
b/doc/guides/rel_notes/release_16_11.rst
index 51fc707..fbc0cbd 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -104,6 +104,11 @@ New Features
   The config option ``RTE_MACHINE`` can be used to pass code names to the 
compiler as ``-march`` flag.


+* **Added a functions to calculate the checksum of data in a mbuf.**
+
+  Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
+  data embedded in an mbuf chain.
+
 Resolved Issues
 ---

diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 5b7554a..4491b86 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -230,6 +230,77 @@ rte_raw_cksum(const void *buf, size_t len)
 }

 /**
+ * Compute the raw (non complemented) checksum of a packet.
+ *
+ * @param m
+ *   The pointer to the mbuf.
+ * @param off
+ *   The offset in bytes to start the checksum.
+ * @param len
+ *   The length in bytes of the data to ckecksum.
+ * @param cksum
+ *   A pointer to the checksum, filled on success.
+ * @return
+ *   0 on success, -1 on error (bad length or offset).
+ */
+static inline int
+rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len,
+   uint16_t *cksum)
+{
+   const struct rte_mbuf *seg;
+   const char *buf;
+   uint32_t sum, tmp;
+   uint32_t seglen, done;
+
+   /* easy case: all data in the first segment */
+   if (off + len <= rte_pktmbuf_data_len(m)) {
+   *cksum = rte_raw_cksum(rte_pktmbuf_mtod_offset(m,
+   const char *, off), len);
+   return 0;
+   }
+
+   if (unlikely(off + len > rte_pktmbuf_pkt_len(m)))
+   return -1; /* invalid params, return a dummy value */
+
+   /* else browse the segment to find offset */
+   seglen = 0;
+   for (seg = m; seg != NULL; seg = seg->next) {
+   seglen = rte_pktmbuf_data_len(seg);
+   if (off < seglen)
+   break;
+   off -= seglen;
+   }
+   seglen -= off;
+   buf = rte_pktmbuf_mtod_offset(seg, const char *, off);
+   if (seglen >= len) {
+   /* all in one segment */
+   *cksum = rte_raw_cksum(buf, len);
+   return 0;
+   }
+
+   /* hard case: process checksum of several segments */
+   sum = 0;
+   done = 0;
+   for (;;) {
+   tmp = __rte_raw_cksum(buf, seglen, 0);
+   if (done & 1)
+   tmp = rte_bswap16(tmp);
+   sum += tmp;
+   done += seglen;
+   if (done == len)
+   break;
+   seg = seg->next;
+   buf = rte_pktmbuf_mtod(seg, const char *);
+   seglen = rte_pktmbuf_data_len(seg);
+   if (seglen > len - done)
+   seglen = len - done;
+   }
+
+   *cksum = __rte_raw_cksum_reduce(sum);
+   return 0;
+}
+
+/**
  * Process the IPv4 checksum of an IPv4 header.
  *
  * The checksum field must be set to 0 by the caller.
-- 
2.8.1



[dpdk-dev] [PATCH v3 03/12] net/virtio: reinitialize the device in configure callback

2016-10-13 Thread Olivier Matz
Add the ability to reset the virtio device in the configure callback
if the features flag changed since previous reset. This will be possible
with the introduction of offload support in next commits.

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 26 +++---
 drivers/net/virtio/virtio_pci.h|  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index f3921ac..b5bc0ee 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1045,14 +1045,13 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, 
uint16_t vlan_id, int on)
 }

 static int
-virtio_negotiate_features(struct virtio_hw *hw)
+virtio_negotiate_features(struct virtio_hw *hw, uint64_t req_features)
 {
uint64_t host_features;

/* Prepare guest_features: feature that driver wants to support */
-   hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %" PRIx64,
-   hw->guest_features);
+   req_features);

/* Read device(host) feature bits */
host_features = hw->vtpci_ops->get_features(hw);
@@ -1063,6 +1062,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 * Negotiate features: Subset of device feature bits are written back
 * guest feature bits.
 */
+   hw->guest_features = req_features;
hw->guest_features = vtpci_negotiate_features(hw, host_features);
PMD_INIT_LOG(DEBUG, "features after negotiate = %" PRIx64,
hw->guest_features);
@@ -1081,6 +1081,8 @@ virtio_negotiate_features(struct virtio_hw *hw)
}
}

+   hw->req_guest_features = req_features;
+
return 0;
 }

@@ -1121,8 +1123,9 @@ rx_func_get(struct rte_eth_dev *eth_dev)
eth_dev->rx_pkt_burst = _recv_pkts;
 }

+/* reset device and renegotiate features if needed */
 static int
-virtio_init_device(struct rte_eth_dev *eth_dev)
+virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
 {
struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
@@ -1137,7 +1140,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)

/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-   if (virtio_negotiate_features(hw) < 0)
+   if (virtio_negotiate_features(hw, req_features) < 0)
return -1;

/* If host does not support status then disable LSC */
@@ -1258,8 +1261,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

eth_dev->data->dev_flags = dev_flags;

-   /* reset device and negotiate features */
-   ret = virtio_init_device(eth_dev);
+   /* reset device and negotiate default features */
+   ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
if (ret < 0)
return ret;

@@ -1338,6 +1341,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
struct virtio_hw *hw = dev->data->dev_private;
+   uint64_t req_features;
int ret;

PMD_INIT_LOG(DEBUG, "configure");
@@ -1347,6 +1351,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -EINVAL;
}

+   req_features = VIRTIO_PMD_GUEST_FEATURES;
+   /* if request features changed, reinit the device */
+   if (req_features != hw->req_guest_features) {
+   ret = virtio_init_device(dev, req_features);
+   if (ret < 0)
+   return ret;
+   }
+
/* Setup and start control queue */
if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
ret = virtio_dev_cq_queue_setup(dev,
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 6930cd6..bbf06ec 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -245,6 +245,7 @@ struct virtio_net_config;
 struct virtio_hw {
struct virtnet_ctl *cvq;
struct rte_pci_ioport io;
+   uint64_treq_guest_features;
uint64_tguest_features;
uint32_tmax_queue_pairs;
uint16_tvtnet_hdr_size;
-- 
2.8.1



[dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback

2016-10-13 Thread Olivier Matz
Move the configuration of control queue in the configure callback.
This is needed by next commit, which introduces the reinitialization
of the device in the configure callback to change the feature flags.
Therefore, the control queue will have to be restarted at the same
place.

As virtio_dev_cq_queue_setup() is called from a place where
config->max_virtqueue_pairs is not available, we need to store this in
the private structure. It replaces max_rx_queues and max_tx_queues which
have the same value. The log showing the value of max_rx_queues and
max_tx_queues is also removed since config->max_virtqueue_pairs is
already displayed above.

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 43 +++---
 drivers/net/virtio/virtio_ethdev.h |  4 ++--
 drivers/net/virtio/virtio_pci.h|  3 +--
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 77ca569..f3921ac 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -552,6 +552,9 @@ virtio_dev_close(struct rte_eth_dev *dev)
if (hw->started == 1)
virtio_dev_stop(dev);

+   if (hw->cvq)
+   virtio_dev_queue_release(hw->cvq->vq);
+
/* reset the NIC */
if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
@@ -1191,16 +1194,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
config->max_virtqueue_pairs = 1;
}

-   hw->max_rx_queues =
-   (VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-   VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-   hw->max_tx_queues =
-   (VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-   VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-   virtio_dev_cq_queue_setup(eth_dev,
-   config->max_virtqueue_pairs * 2,
-   SOCKET_ID_ANY);
+   hw->max_queue_pairs = config->max_virtqueue_pairs;

PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
config->max_virtqueue_pairs);
@@ -1211,19 +1205,15 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
config->mac[2], config->mac[3],
config->mac[4], config->mac[5]);
} else {
-   hw->max_rx_queues = 1;
-   hw->max_tx_queues = 1;
+   PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1");
+   hw->max_queue_pairs = 1;
}

-   PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-   hw->max_rx_queues, hw->max_tx_queues);
if (pci_dev)
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);

-   virtio_dev_cq_start(eth_dev);
-
return 0;
 }

@@ -1285,7 +1275,6 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
-   struct virtio_hw *hw = eth_dev->data->dev_private;

PMD_INIT_FUNC_TRACE();

@@ -1301,9 +1290,6 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
eth_dev->tx_pkt_burst = NULL;
eth_dev->rx_pkt_burst = NULL;

-   if (hw->cvq)
-   virtio_dev_queue_release(hw->cvq->vq);
-
rte_free(eth_dev->data->mac_addrs);
eth_dev->data->mac_addrs = NULL;

@@ -1352,6 +1338,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
struct virtio_hw *hw = dev->data->dev_private;
+   int ret;

PMD_INIT_LOG(DEBUG, "configure");

@@ -1360,6 +1347,16 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -EINVAL;
}

+   /* Setup and start control queue */
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+   ret = virtio_dev_cq_queue_setup(dev,
+   hw->max_queue_pairs * 2,
+   SOCKET_ID_ANY);
+   if (ret < 0)
+   return ret;
+   virtio_dev_cq_start(dev);
+   }
+
hw->vlan_strip = rxmode->hw_vlan_strip;

if (rxmode->hw_vlan_filter
@@ -1553,8 +1550,10 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->driver_name = dev->driver->pci_drv.driver.name;
else
dev_info->driver_name = "virtio_user PMD";
-   dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-   dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+   dev_info->max_rx_queues =
+   

[dpdk-dev] [PATCH v3 01/12] net/virtio: move device initialization in a function

2016-10-13 Thread Olivier Matz
Move all code related to device initialization in a new function
virtio_init_device().

This commit brings no functional change, it prepares the next commits
that will add the offload support. For that, it will be needed to
reinitialize the device from ethdev->configure(), using this new
function.

Signed-off-by: Olivier Matz 
Reviewed-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 99 ++
 1 file changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b4dfc0a..77ca569 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1118,46 +1118,13 @@ rx_func_get(struct rte_eth_dev *eth_dev)
eth_dev->rx_pkt_burst = _recv_pkts;
 }

-/*
- * This function is based on probe() function in virtio_pci.c
- * It returns 0 on success.
- */
-int
-eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+static int
+virtio_init_device(struct rte_eth_dev *eth_dev)
 {
struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
-   struct rte_pci_device *pci_dev;
-   uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
-   int ret;
-
-   RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct 
virtio_net_hdr_mrg_rxbuf));
-
-   eth_dev->dev_ops = _eth_dev_ops;
-   eth_dev->tx_pkt_burst = _xmit_pkts;
-
-   if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-   rx_func_get(eth_dev);
-   return 0;
-   }
-
-   /* Allocate memory for storing MAC addresses */
-   eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * 
ETHER_ADDR_LEN, 0);
-   if (eth_dev->data->mac_addrs == NULL) {
-   PMD_INIT_LOG(ERR,
-   "Failed to allocate %d bytes needed to store MAC 
addresses",
-   VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
-   return -ENOMEM;
-   }
-
-   pci_dev = eth_dev->pci_dev;
-
-   if (pci_dev) {
-   ret = vtpci_init(pci_dev, hw, _flags);
-   if (ret)
-   return ret;
-   }
+   struct rte_pci_device *pci_dev = eth_dev->pci_dev;

/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1172,10 +1139,11 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
-   dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+   eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+   else
+   eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;

rte_eth_copy_pci_info(eth_dev, pci_dev);
-   eth_dev->data->dev_flags = dev_flags;

rx_func_get(eth_dev);

@@ -1254,12 +1222,61 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);

+   virtio_dev_cq_start(eth_dev);
+
+   return 0;
+}
+
+/*
+ * This function is based on probe() function in virtio_pci.c
+ * It returns 0 on success.
+ */
+int
+eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+{
+   struct virtio_hw *hw = eth_dev->data->dev_private;
+   struct rte_pci_device *pci_dev;
+   uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
+   int ret;
+
+   RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct 
virtio_net_hdr_mrg_rxbuf));
+
+   eth_dev->dev_ops = _eth_dev_ops;
+   eth_dev->tx_pkt_burst = _xmit_pkts;
+
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+   rx_func_get(eth_dev);
+   return 0;
+   }
+
+   /* Allocate memory for storing MAC addresses */
+   eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * 
ETHER_ADDR_LEN, 0);
+   if (eth_dev->data->mac_addrs == NULL) {
+   PMD_INIT_LOG(ERR,
+   "Failed to allocate %d bytes needed to store MAC 
addresses",
+   VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
+   return -ENOMEM;
+   }
+
+   pci_dev = eth_dev->pci_dev;
+
+   if (pci_dev) {
+   ret = vtpci_init(pci_dev, hw, _flags);
+   if (ret)
+   return ret;
+   }
+
+   eth_dev->data->dev_flags = dev_flags;
+
+   /* reset device and negotiate features */
+   ret = virtio_init_device(eth_dev);
+   if (ret < 0)
+   return ret;
+
/* Setup interrupt callback  */
if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
rte_intr_callback_register(_dev->intr_handle,
-  virtio_interrupt_handler, eth_dev);
-
-   virtio_dev_cq_start(eth_dev);
+   virtio_interrupt_handler, eth_dev);

return 0;
 }
-- 
2.8.1



[dpdk-dev] [PATCH v3 00/12] net/virtio: add offload support

2016-10-13 Thread Olivier Matz
This patchset, targetted for 16.11, introduces the support of rx and tx
offload in virtio pmd.  To achieve this, some new mbuf flags must be
introduced, as discussed in [1].

It applies on master + a patch fixing the testpmd csum engine:
http://dpdk.org/dev/patchwork/patch/16538/

The new mbuf checksum flags are backward compatible for current
applications that assume that unknown_csum = good_cum (since there
was only a bad_csum flag). But it the patchset is integrated, we
should consider updating the PMDs to match the new API for 16.11.

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html

changes v2 -> v3

- fix typo in release note
- add unlikely() in cksum calculation error case
- add likely() in virtio rx function when cksum != 0x
- return an error code instead of the cksum in rte_raw_cksum_mbuf()
- do not access to the virtio header if no offload is negotiated (rx and tx)
- return an error if offload cannot be negotiated
- use offsetof() instead of magic hardcoded values for cksum offsets
- changefix some commit titles

changes v1 -> v2
- change mbuf checksum calculation static inline
- fix checksum calculation for protocol where csum=0 means no csum
- move mbuf checksum calculation in librte_net
- use RTE_MIN() to set max rx/tx queue
- rebase on top of head

Olivier Matz (12):
  virtio: move device initialization in a function
  virtio: setup and start cq in configure callback
  virtio: reinitialize the device in configure callback
  net: add function to calculate a checksum in a mbuf
  mbuf: add new Rx checksum mbuf flags
  app/testpmd: fix checksum stats in csum engine
  mbuf: new flag for LRO
  app/testpmd: display lro segment size
  virtio: add Rx checksum offload support
  virtio: add Tx checksum offload support
  virtio: add Lro support
  virtio: add Tso support

 app/test-pmd/csumonly.c|   8 +-
 doc/guides/rel_notes/release_16_11.rst |  16 ++
 drivers/net/virtio/virtio_ethdev.c | 197 ++
 drivers/net/virtio/virtio_ethdev.h |  18 +-
 drivers/net/virtio/virtio_pci.h|   4 +-
 drivers/net/virtio/virtio_rxtx.c   | 298 ++---
 drivers/net/virtio/virtqueue.h |   1 +
 lib/librte_mbuf/rte_mbuf.c |  18 +-
 lib/librte_mbuf/rte_mbuf.h |  58 ++-
 lib/librte_net/rte_ip.h|  71 
 10 files changed, 580 insertions(+), 109 deletions(-)

Test plan
=

(replayed on v3)

Platform description


  guest (dpdk)
  ++
  ||
  ||
  | port0  +-<---+
  |   ixgbe /  | |
  |   directio | |
  || |
  |port1   | ^ flow1
  ++ | (flow2 is the reverse)
 |   |
 | virtio|
 v   |
  ++ |
  | tap0   /   | |
  |1.1.1.1   / | |
  |ns-tap  /   | |
  |  / | |
  |/   ixgbe2  +-->--+
  |  /1.1.1.2  |
  |/  ns-ixgbe |
  ++
  host (linux, vhost-net)


flow1:
  host -(ixgbe)-> guest -(virtio)-> host
  1.1.1.2 -> 1.1.1.1

flow2:
  host -(virtio)-> guest -(ixgbe)-> host
  1.1.1.2 -> 1.1.1.1

Host configuration
--

Start qemu with:

- a ne2k management interface to avoi any conflict with dpdk
- 2 ixgbe interfaces given to with vm through vfio
- a virtio net device, connected to a tap interface through vhost-net

  /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \
-smp 3 -serial telnet::40564,server,nowait -serial null \
-qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \
-device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \
-netdev user,id=user.0,hostfwd=tcp::34965-:22 \
-device vfio-pci,host=:04:00.0 -device vfio-pci,host=:04:00.1 \
-netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \
-device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \
-hda "/path/to/ubuntu-14.04-template.qcow2" \
-snapshot -vga none -display none

Move the tap interface in a netns, and configure it:

  ip netns add ns-tap
  ip netns exec ns-tap ip l set lo up
  ip link set tap0 netns ns-tap
  ip netns exec ns-tap ip l set tap0 down
  ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0
  ip netns exec ns-tap ip l set tap0 up
  ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0
  ip netns exec ns-tap arp -s 1.1.1.2 02:00:00:00:00:00
  ip netns exec ns-tap ip a

Move the ixgbe interface in a netns, and configure it:

  IXGBE=ixgbe2
  ip netns add ns-ixgbe
  ip netns exec ns-ixgbe ip l set lo up
  ip link set ${IXGBE} netns ns-ixgbe
  ip netns exec ns-ixgbe ip l set ${IXGBE} down
  ip netns exec ns-ixgbe ip l set addr 02:00:00:00:00:00 dev ${IXGBE}
  ip netns exec ns-ixgbe ip l set ${IXGBE} up
  ip netns exec ns-ixgbe ip a a 1.1.1.2/24 dev 

[dpdk-dev] [PATCH] doc: add note on primary process dependency

2016-10-13 Thread Reshma Pattan
The note i.e. "The dpdk-pdump tool can only be used in
conjunction with a primary process which has the packet
capture framework initialized already" is added to
doc/guides/sample_app_ug/pdump.rst to facilitate
easy understanding on the usage of the tool.

Suggested-by: Jianfeng Tan 
Signed-off-by: Reshma Pattan 
---
 doc/guides/sample_app_ug/pdump.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/guides/sample_app_ug/pdump.rst 
b/doc/guides/sample_app_ug/pdump.rst
index ac0e7c9..4098e50 100644
--- a/doc/guides/sample_app_ug/pdump.rst
+++ b/doc/guides/sample_app_ug/pdump.rst
@@ -37,6 +37,8 @@ The ``dpdk-pdump`` tool is a Data Plane Development Kit 
(DPDK) tool that runs as
 a DPDK secondary process and is capable of enabling packet capture on dpdk 
ports.

.. Note::
+  * The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+application which has the packet capture framework initialized already.

   * The ``dpdk-pdump`` tool depends on libpcap based PMD which is disabled
 by default in the build configuration files,
-- 
2.7.4



[dpdk-dev] [PATCH v2] app/testpmd: fix RSS-hash-key size

2016-10-13 Thread Thomas Monjalon
> > RSS hash-key-size is retrieved from device configuration instead of
> > using a fixed size of 40 bytes.
> > 
> > Fixes: f79959ea1504 ("app/testpmd: allow to configure RSS hash key")
> > 
> > Signed-off-by: Mohammad Abdul Awal 
> 
> Acked-by: Konstantin Ananyev 

Applied, thanks


[dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over rte_config file

2016-10-13 Thread Van Haaren, Harry
Hi John,

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John Ousterhout
> Subject: Re: [dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over 
> rte_config file



> For example, it took me several hours
> to figure out why the problem was occurring and then to hunt down the
> --file-prefix solution. Is there some reason why it would be a bad idea to
> fix this in DPDK?

Right now, DPDK will by default always use a consistent .rte_config location, 
which allows other applications to monitor that. For example, Collectd is able 
to monitor a DPDK primary process by checking if the .rte_config file exists at 
its default location[1].

If we change the default behavior of DPDK, other projects can no longer rely on 
that file being created, and these monitoring projects must be updated in sync 
with DPDK to avoid breakage if versions mismatch. 

Although I see your use-case, I think using the --file-prefix as Konstantin 
suggests is a better solution in this case.

Regards, -Harry

[1] https://github.com/collectd/collectd/blob/master/src/dpdkstat.c#L60


> On Thu, Oct 13, 2016 at 3:53 AM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
> 
> >
> > Hi John,
> >
> > > Before this patch, DPDK used the file ~/.rte_config as a lock to detect
> > > potential interference between multiple DPDK applications running on the
> > > same machine. However, if a single user ran DPDK applications
> > concurrently
> > > on several different machines, and if the user's home directory was
> > shared
> > > between the machines via NFS, DPDK would incorrectly detect conflicts
> > > for all but the first application and abort them. This patch fixes the
> > > problem by incorporating the machine name into the config file name
> > (e.g.,
> > > ~/.rte_hostname_config).
> > >
> > > Signed-off-by: John Ousterhout 
> > > ---
> > >  doc/guides/prog_guide/multi_proc_support.rst | 11 +++
> > >  lib/librte_eal/common/eal_common_proc.c  |  8 ++--
> > >  lib/librte_eal/common/eal_filesystem.h   | 15 +--
> > >  3 files changed, 22 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/multi_proc_support.rst
> > b/doc/guides/prog_guide/multi_proc_support.rst
> > > index badd102..a54fa1c 100644
> > > --- a/doc/guides/prog_guide/multi_proc_support.rst
> > > +++ b/doc/guides/prog_guide/multi_proc_support.rst
> > > @@ -129,10 +129,13 @@ Support for this usage scenario is provided using
> > the ``--file-prefix`` paramete
> > >
> > >  By default, the EAL creates hugepage files on each hugetlbfs filesystem
> > using the rtemap_X filename,
> > >  where X is in the range 0 to the maximum number of hugepages -1.
> > > -Similarly, it creates shared configuration files, memory mapped in each
> > process, using the /var/run/.rte_config filename,
> > > -when run as root (or $HOME/.rte_config when run as a non-root user;
> > > -if filesystem and device permissions are set up to allow this).
> > > -The rte part of the filenames of each of the above is configurable
> > using the file-prefix parameter.
> > > +Similarly, it creates shared configuration files, memory mapped in each
> > process.
> > > +When run as root, the name of the configuration file will be
> > > +/var/run/.rte_*host*_config, where *host* is the name of the machine.
> > > +When run as a non-root user, the the name of the configuration file
> > will be
> > > +$HOME/.rte_*host*_config (if filesystem and device permissions are set
> > up to allow this).
> > > +If the ``--file-prefix`` parameter has been specified, its value will
> > be used
> > > +in place of "rte" in the file names.
> >
> > I am not sure that we need to handle all such cases inside EAL.
> > User can easily overcome that problem by just adding something like:
> > --file-prefix=`uname -n`
> > to his command-line.
> > Konstantin
> >
> > >
> > >  In addition to specifying the file-prefix parameter,
> > >  any DPDK applications that are to be run side-by-side must explicitly
> > limit their memory use.
> > > diff --git a/lib/librte_eal/common/eal_common_proc.c
> > b/lib/librte_eal/common/eal_common_proc.c
> > > index 12e0fca..517aa0c 100644
> > > --- a/lib/librte_eal/common/eal_common_proc.c
> > > +++ b/lib/librte_eal/common/eal_common_proc.c
> > > @@ -45,12 +45,8 @@ rte_eal_primary_proc_alive(const char
> > *config_file_path)
> > >
> > >   if (config_file_path)
> > >   config_fd = open(config_file_path, O_RDONLY);
> > > - else {
> > > - char default_path[PATH_MAX+1];
> > > - snprintf(default_path, PATH_MAX, RUNTIME_CONFIG_FMT,
> > > -  default_config_dir, "rte");
> > > - config_fd = open(default_path, O_RDONLY);
> > > - }
> > > + else
> > > + config_fd = open(eal_runtime_config_path(), O_RDONLY);
> > >   if (config_fd < 0)
> > >   return 0;
> > >
> > > diff --git a/lib/librte_eal/common/eal_filesystem.h
> > 

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-13 Thread Olivier MATZ


On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>> +/* When doing TSO, the IP length is not included in the pseudo header
>> + * checksum of the packet given to the PMD, but for virtio it is
>> + * expected.
>> + */
>> +static void
>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>> +{
>> +/* common case: header is not fragmented */
>> +if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>> +m->l4_len)) {
> ...
>> +/* replace it in the packet */
>> +th->cksum = new_cksum;
>> +} else {
> ...
>> +/* replace it in the packet */
>> +*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> +m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>> +*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> +m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>> +}
>
> The tcp header will always be in the mbuf, right? Otherwise, you can't
> update the cksum field here. What's the point of introducing the "else
> clause" then?

Sorry, I don't see the problem you're pointing out here.

What I want to solve here is to support the cases where the mbuf is 
segmented in the middle of the network header (which is probably a rare 
case).

In the "else" part, I only access the mbuf byte by byte using the 
rte_pktmbuf_mtod_offset() accessor. An alternative would have been to 
copy the header in a linear buffer, fix the checksum, then copy it again 
in the packet, but there is no mbuf helpers to do these copies for now.

Regards,
Olivier


[dpdk-dev] [PATCH] testpmd: fix tso with csum engine

2016-10-13 Thread Thomas Monjalon
2016-10-13 15:40, Olivier Matz:
> The commit that disabled tso for small packets was broken during the
> rebase. The problem is the IP checksum is not calculated in software if:
> - TX IP checksum is disabled
> - TSO is enabled
> - the current packet is smaller than tso segment size
> 
> When checking if the PKT_TX_IP_CKSUM flag should be set (in case
> of tso), use the local tso_segsz variable, which is set to 0 when the
> packet is too small to require tso. Therefore the IP checksum will be
> correctly calculated in software.
> 
> Moreover, we should not use tunnel segment size for non-tunnel tso, else
> TSO will stay disabled for all packets.
> 
> Fixes: 97c21329d42b ("app/testpmd: do not use TSO for small packets")
> 
> Signed-off-by: Olivier Matz 

Applied, thanks


[dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support

2016-10-13 Thread Olivier MATZ


On 10/13/2016 10:38 AM, Yuanhan Liu wrote:
> On Mon, Oct 03, 2016 at 11:00:21AM +0200, Olivier Matz wrote:
>> +/* Checksum Offload */
>> +switch (cookie->ol_flags & PKT_TX_L4_MASK) {
>> +case PKT_TX_UDP_CKSUM:
>> +hdr->csum_start = cookie->l2_len + cookie->l3_len;
>> +hdr->csum_offset = 6;
>> +hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>> +break;
>> +
>> +case PKT_TX_TCP_CKSUM:
>> +hdr->csum_start = cookie->l2_len + cookie->l3_len;
>> +hdr->csum_offset = 16;
>
> I would suggest to use "offsetof(...)" here, instead of some magic
> number like 16.

Will do, it's actually clearer.

Olivier


[dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback

2016-10-13 Thread Olivier MATZ


On 10/13/2016 09:54 AM, Yuanhan Liu wrote:
> On Wed, Oct 12, 2016 at 06:01:25PM +0200, Olivier MATZ wrote:
>> Hello Yuanhan,
>>
>> On 10/12/2016 04:41 PM, Yuanhan Liu wrote:
>>> On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
 @@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
   {
const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
struct virtio_hw *hw = dev->data->dev_private;
 +  uint64_t req_features;
int ret;

PMD_INIT_LOG(DEBUG, "configure");
 @@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -EINVAL;
}

 +  req_features = VIRTIO_PMD_GUEST_FEATURES;
 +  /* if request features changed, reinit the device */
 +  if (req_features != hw->req_guest_features) {
 +  ret = virtio_init_device(dev, req_features);
 +  if (ret < 0)
 +  return ret;
 +  }
>>>
>>> Why do you have to reset virtio here? This doesn't make too much sense
>>> to me.
>>>
>>> IIUC, you want to make sure those TSO related features being unset at
>>> init time, and enable it (by doing reset) when it's asked to be enabled
>>> (by rte_eth_dev_configure)?
>>>
>>> Why not always setting those features? We could do the actual offloads
>>> when:
>>>
>>> - those features have been negoiated
>>>
>>> - they are enabled through rte_eth_dev_configure
>>>
>>> With that, I think we could avoid the reset here?
>>
>> It would work for TX, since you decide to use or not the feature. But I
>> think this won't work for RX: if you negociate LRO at init, the host may
>> send you large packets, even if LRO is disabled in dev_configure.
>
> I see. Thanks.
>
> Besides, I think you should return error when LRO is not negoiated
> after the reset (say, when it's disabled through qemu command line)?

Good idea, I now return an error if offload cannot be negotiated.

Olivier


[dpdk-dev] [PATCH 2/2] app/testpmd: use consistent vdev names

2016-10-13 Thread Thomas Monjalon
2016-10-07 02:27, De Lara Guarch, Pablo:
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Thursday, October 06, 2016 3:34 AM
> > To: De Lara Guarch, Pablo
> > Cc: dev at dpdk.org
> > Subject: [PATCH 2/2] app/testpmd: use consistent vdev names
> > 
> > The vdev eth_bond has been renamed to net_bond.
> > testpmd is creating a bonding device with the old prefix.
> > It is changed for consistency.
> > 
> > The script test-null.sh was failing because using the old name
> > for the null vdev.
> > 
> > Fixes also the bonding and testpmd doc.
> > 
> > Fixes: 2f45703c17ac ("drivers: make driver names consistent")
> > 
> > Signed-off-by: Thomas Monjalon 
> 
> Acked-by: Pablo de Lara 

Series applied


[dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback

2016-10-13 Thread Yuanhan Liu
On Wed, Oct 12, 2016 at 06:01:25PM +0200, Olivier MATZ wrote:
> Hello Yuanhan,
> 
> On 10/12/2016 04:41 PM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
> >>@@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>  {
> >>const struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
> >>struct virtio_hw *hw = dev->data->dev_private;
> >>+   uint64_t req_features;
> >>int ret;
> >>
> >>PMD_INIT_LOG(DEBUG, "configure");
> >>@@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>return -EINVAL;
> >>}
> >>
> >>+   req_features = VIRTIO_PMD_GUEST_FEATURES;
> >>+   /* if request features changed, reinit the device */
> >>+   if (req_features != hw->req_guest_features) {
> >>+   ret = virtio_init_device(dev, req_features);
> >>+   if (ret < 0)
> >>+   return ret;
> >>+   }
> >
> >Why do you have to reset virtio here? This doesn't make too much sense
> >to me.
> >
> >IIUC, you want to make sure those TSO related features being unset at
> >init time, and enable it (by doing reset) when it's asked to be enabled
> >(by rte_eth_dev_configure)?
> >
> >Why not always setting those features? We could do the actual offloads
> >when:
> >
> >- those features have been negoiated
> >
> >- they are enabled through rte_eth_dev_configure
> >
> >With that, I think we could avoid the reset here?
> 
> It would work for TX, since you decide to use or not the feature. But I
> think this won't work for RX: if you negociate LRO at init, the host may
> send you large packets, even if LRO is disabled in dev_configure.

I see. Thanks.

Besides, I think you should return error when LRO is not negoiated
after the reset (say, when it's disabled through qemu command line)?

--yliu


[dpdk-dev] [PATCH] app/test: reduce lpm6 test case size

2016-10-13 Thread Thomas Monjalon
2016-09-30 02:11, Wei Dai:
> copy app/test/test_lpm6_routes.h to app/test/test_lpm6_data.h .
> and then delete app/test/test_lpm6_routes.h and clear the
> large_ips_table[ ] to make LPM6 test case size much smaller than
> before. Also add codes in app/test/test_lpm6_data.h to generate test
> data in large_ips_table[ ] at run time.
> 
> Signed-off-by: Wei Dai 

Applied, thanks


[dpdk-dev] [PATCH v6] drivers/net:new PMD using tun/tap host interface

2016-10-13 Thread Ferruh Yigit
On 10/12/2016 9:54 PM, Keith Wiles wrote:
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
> on the local host. The PMD allows for DPDK and the host to
> communicate using a raw device interface on the host and in
> the DPDK application. The device created is a Tap device with
> a L2 packet header.
> 
> v6 - fixed the checkpatch issues
> v5 - merge in changes from list review see related emails
>  fixed many minor edits
> v4 - merge with latest driver changes
> v3 - fix includes by removing ifdef for other type besides Linux
>  Fix the copyright notice in the Makefile
> v2 - merge all of the patches into one patch
>  Fix a typo on naming the tap device
>  Update the maintainers list
> 
> Signed-off-by: Keith Wiles 
> ---
<...>

> diff --git a/config/common_base b/config/common_base
...
> +
> +#
> +# Set TAP PMD to 'n' as it is only supported in Linux for now.
> +#

In final .config file, this comment says TAP PMD set to "n", but it is
set to "y"
Also this behavior is not unique to this config option, many config
items used as described in the comment, and they don't have same comment.

<...>

> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
...
> +
> +   Also the speed of the interface can be changed from 10G to whatever number
> +   needed, but the interface does not enforce that speed.
> +   e.g. --vdev=eth_tap,iface=foo0,speed=25000

Same comment with previous review, eth_tap should be net_tap, for all
occurrences in this document

<...>

> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
...
> +struct tap_info {
> + char name[RTE_ETH_NAME_MAX_LEN]; /* Interface name supplied/given */
> + int speed;   /* Speed of interface */
> +};

Same comment with previous review, this struct is not used at all.

<...>

> +static int
> +eth_dev_tap_create(const char *name, char *tap_name)
...
> + dev->data = data;
> + dev->dev_ops = 
> + dev->driver = NULL;
> + dev->rx_pkt_burst = pmd_rx_burst;
> + dev->tx_pkt_burst = pmd_tx_burst;
> + snprintf(dev->data->name, sizeof(dev->data->name), "%s", tap_name);

[1]

<...>

> + */
> +static int
> +rte_pmd_tap_remove(const char *name)
...
> +
> + /* find the ethdev entry */
> + eth_dev = rte_eth_dev_allocated(name);

This still won't work, please see [1], dev->name overwritten by
tap_name, and this function does simply a strcmp with name and dev->name.

<...>

Thanks,
ferruh


[dpdk-dev] [PATCH] testpmd: fix tso with csum engine

2016-10-13 Thread Olivier Matz
The commit that disabled tso for small packets was broken during the
rebase. The problem is the IP checksum is not calculated in software if:
- TX IP checksum is disabled
- TSO is enabled
- the current packet is smaller than tso segment size

When checking if the PKT_TX_IP_CKSUM flag should be set (in case
of tso), use the local tso_segsz variable, which is set to 0 when the
packet is too small to require tso. Therefore the IP checksum will be
correctly calculated in software.

Moreover, we should not use tunnel segment size for non-tunnel tso, else
TSO will stay disabled for all packets.

Fixes: 97c21329d42b ("app/testpmd: do not use TSO for small packets")

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index f9e65b6..27d0f08 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -336,7 +336,7 @@ process_inner_cksums(void *l3_hdr, const struct 
testpmd_offload_info *info,
if (!info->is_tunnel) {
max_pkt_len = info->l2_len + info->l3_len + info->l4_len +
info->tso_segsz;
-   if (info->tunnel_tso_segsz != 0 && info->pkt_len > max_pkt_len)
+   if (info->tso_segsz != 0 && info->pkt_len > max_pkt_len)
tso_segsz = info->tso_segsz;
} else {
max_pkt_len = info->outer_l2_len + info->outer_l3_len +
@@ -351,9 +351,7 @@ process_inner_cksums(void *l3_hdr, const struct 
testpmd_offload_info *info,
ipv4_hdr->hdr_checksum = 0;

ol_flags |= PKT_TX_IPV4;
-   if (info->l4_proto == IPPROTO_TCP &&
-   ((info->is_tunnel && info->tunnel_tso_segsz != 0) ||
-(!info->is_tunnel && info->tso_segsz != 0))) {
+   if (info->l4_proto == IPPROTO_TCP && tso_segsz) {
ol_flags |= PKT_TX_IP_CKSUM;
} else {
if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM)
-- 
2.8.1



[dpdk-dev] [PATCH v2] app/test: remove large lpm test head file

2016-10-13 Thread Thomas Monjalon
2016-09-28 01:38, Wei Dai:
> remove the large file app/test/test_lpm_routes.h and add codes to
> auto-generate similar large route rule talbe which keeps same depth
> and IP class distribution as previous one in test_lpm_routes.h .
> With the rule table auto-generated at run time, the performance
> of looking up keep similar to that from pervious constant talbe.
> 
> Signed-off-by: Wei Dai 

Applied, thanks

There is a comment which may be difficult to understand for newcomers.
Indeed it refers to the old file app/test/test_lpm_routes.h.
However it is not really an issue as this file is a *huge* part of the
DPDK history ;)


[dpdk-dev] 17.02 Roadmap

2016-10-13 Thread Hunt, David
Hi Thomas,


On 10/10/2016 9:42 PM, Thomas Monjalon wrote:
> Thanks Tim for the interesting inputs.
> Some of them may require a dedicated thread to continue the discussion
> based on some preliminary specifications or drafts.
>
> 2016-10-10 16:13, O'Driscoll, Tim:
>> Packet Distributor Enhancements: Enhancements will be made to the Packet 
>> Distributor library to improve performance:
>> 1. Introduce burst functionality to allow batches of packets to be sent to 
>> workers.
>> 2. Improve the performance of the flow/core affinity through the use of 
>> SSE/AVX instructions.
> The packet distributor looks more and more like a DPDK application
> at the same level of BESS, VPP, OVS, etc.

Lets discuss this further.  Would you see other libraries heading in 
this direction also (reorder, acl, hash, etc)?
Do you think it would be an idea to add as an item of discussion for the 
technical steering group when we're all at Userspace next week?

Regards,
Dave.



[dpdk-dev] [PATCH] test: fix hash multiwriter test

2016-10-13 Thread Thomas Monjalon
2016-10-06 23:34, Pablo de Lara:
> Hash multiwriter test consists of two subtests.
> If the any of the subtests fails, the overall test should fail,
> but the overall test only passed if the second subtest passed,
> because the return of the first subtest was being overwritten.
> 
> Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel 
> TSX")
> 
> Signed-off-by: Pablo de Lara 

Applied, thanks


[dpdk-dev] [PATCH v9 0/2] add API's for VF management

2016-10-13 Thread Bruce Richardson
On Wed, Oct 12, 2016 at 06:54:10PM +0100, Bernard Iremonger wrote:
> This patchset contains new DPDK API's for use
> with the Virtual Function Daemon (VFD).
> 
> The need to configure and manage VF's on a NIC has grown to the
> point where a DPDK based tool, VFD, has been developed to do this.
> 
> This patch set adds API extensions to DPDK for VF configuration.
> 
> Eight new API's have been added for the Intel 82559 NIC.
> 
> Changes have been made to testpmd to facilitate testing of the new API's.
> The testpmd documentation has been updated to document the testpmd changes.
> 
Patchset applied to dpdk-next-net/rel_16_11

/Bruce


[dpdk-dev] [PATCH] net/ring: fix param string info

2016-10-13 Thread Bruce Richardson
On Thu, Oct 13, 2016 at 11:22:25AM +0100, Bruce Richardson wrote:
> On Wed, Aug 31, 2016 at 05:53:51PM +0100, Ferruh Yigit wrote:
> > Fixes: 65eca099f405 ("drivers: split parameters infos in multiple lines")
> > 
> > Signed-off-by: Ferruh Yigit 
> > ---
> Acked-by: Bruce Richardson 
Applied to dpdk-next-net/rel_16_11

/Bruce


[dpdk-dev] [PATCH] net/ring: fix ring eth dev creation via devargs

2016-10-13 Thread Bruce Richardson
On Thu, Oct 13, 2016 at 11:21:57AM +0100, Bruce Richardson wrote:
> On Wed, Aug 31, 2016 at 05:51:11PM +0100, Ferruh Yigit wrote:
> > Using nodeaction devarg lets creating multiple ring eth devices:
> > "eth_ring0,nodeaction=R0:0:CREATE,nodeaction=R1:0:CREATE"
> > 
> > Trying to create all devices with same name fails. Since first part of
> > the nodeaction devarg is name (in above sample R0,R1), this name field
> > can be used as eth dev name.
> > 
> > Fixes: 61934c0956d4 ("ring: convert to use of PMD_REGISTER_DRIVER and fix 
> > linking")
> > 
> > Signed-off-by: Ferruh Yigit 
> > ---
> Acked-by: Bruce Richardson 
> 
Applied to dpdk-next-net/rel_16_11

/Bruce


[dpdk-dev] [PATCH v3 1/3] lib/librte_port: enable file descriptor port support

2016-10-13 Thread Thomas Monjalon
2016-10-13 10:17, Jasvinder Singh:
> This patch adds File Descriptor(FD) port type (e.g. TAP port) to the
> packet framework library that allows interface with the kernel network
> stack. The FD port APIs are defined that allow port creation, writing
> and reading packet from the kernel interface.
> 
> Signed-off-by: Jasvinder Singh 
> Acked-by: Cristian Dumitrescu 

Series applied, thanks


[dpdk-dev] [PATCH] net/enic: remove assert which causes compile error

2016-10-13 Thread Bruce Richardson
On Thu, Oct 13, 2016 at 02:22:16PM +0100, Bruce Richardson wrote:
> On Wed, Oct 12, 2016 at 11:09:35AM -0700, John Daley wrote:
> > Remove an RTE_ASSERT which will not compile if enabled and is not needed.
> > 
> > Fixes: a1f7c7b3b5b2 ("net/enic: extend fdir support for 1300 series 
> > adapters")
> > 
> > Signed-off-by: John Daley 
> > ---
> > Would have been nice if I caught this yesterday before you applied a1f7c7b3 
> > :(
> > 
> Since commit a1f7c7b3 has not been pulled into the dpdk.org mainline, I can 
> still
> merge this commit into the origin one that has the bug. Unless you object, I 
> shall do so now.
> 
> /Bruce
Done. Fix merged into original commit

/Bruce


[dpdk-dev] [PATCH 3/3] net/mlx5: implementation of Rx packet timestamping support

2016-10-13 Thread Oleg Kuporosov
Accurate RX timestaping is minimal requirement for several important
financial services industry requirement for several applications like
packet capture.

Support of periodic time synchronization with system time and adjusting
clock deviation was also added. We assume the system is synchronized by
PTP itself, so there is no links to PTP client implementation in DPDK.
Time synchronization is run by control thread by rte_alarm for each 5
seconds.  New synchronization object is copied to each configured RXQ
for TS calculation.

RX timestamp is calculated in nanoseconds and stored in rte_mbuf.
RX timestamp doesn't require additional API as simply can be
read from timestamp field of rte_mbuf.

TX timestamps were not added due to several reasons - there is no API yet,
issue with burst mode as TS will be provided only for the latest packet in
the burst and rte_mbuf will be discarded immediately after completion.

Enabling/disabling time synchronization DPDK API was added.

Signed-off-by: Oleg Kuporosov 
---
 drivers/net/mlx5/mlx5.c  |   7 +-
 drivers/net/mlx5/mlx5.h  |  10 +-
 drivers/net/mlx5/mlx5_defs.h |   4 +
 drivers/net/mlx5/mlx5_ethdev.c   | 222 ++-
 drivers/net/mlx5/mlx5_rxq.c  |   2 +
 drivers/net/mlx5/mlx5_rxtx.c |  19 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   7 +-
 drivers/net/mlx5/mlx5_time.h |  53 
 drivers/net/mlx5/mlx5_trigger.c  |   1 +
 lib/librte_eal/common/include/rte_time.h |  45 +++
 10 files changed, 360 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_time.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 83bdf65..64dab28 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -122,6 +122,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
  (void *)dev,
  ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
/* In case mlx5_dev_stop() has not been called. */
+   mlx5_timesync_disable(dev);
priv_dev_interrupt_handler_uninstall(priv, dev);
priv_special_flow_disable_all(priv);
priv_mac_addrs_disable(priv);
@@ -219,6 +220,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.rss_hash_update = mlx5_rss_hash_update,
.rss_hash_conf_get = mlx5_rss_hash_conf_get,
.filter_ctrl = mlx5_dev_filter_ctrl,
+   .timesync_enable = mlx5_timesync_enable,
+   .timesync_disable = mlx5_timesync_disable,
 };

 static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d4fb5ff..49447c4 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2015 6WIND S.A.
- *   Copyright 2015 Mellanox.
+ *   Copyright 2015-2016 6WIND S.A.
+ *   Copyright 2015-2016 Mellanox.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -54,6 +54,7 @@
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-pedantic"
 #endif
+#include 
 #include 
 #include 
 #include 
@@ -64,6 +65,7 @@
 #endif

 #include "mlx5_utils.h"
+#include "mlx5_time.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -113,6 +115,7 @@ struct priv {
unsigned int mps:1; /* Whether multi-packet send is supported. */
unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
+   unsigned int timesync_en:1; /* Timesync (timestamping) enabled */
unsigned int txq_inline; /* Maximum packet size for inlining. */
unsigned int txqs_inline; /* Queue number threshold for inlining. */
/* RX/TX queues. */
@@ -120,6 +123,7 @@ struct priv {
unsigned int txqs_n; /* TX queues array size. */
struct rxq *(*rxqs)[]; /* RX queues. */
struct txq *(*txqs)[]; /* TX queues. */
+   struct mlx5_timesync timesync; /* time synronization object */
/* Indirection tables referencing all RX WQs. */
struct ibv_exp_rwq_ind_table *(*ind_tables)[];
unsigned int ind_tables_n; /* Number of indirection tables. */
@@ -203,6 +207,8 @@ int mlx5_set_link_up(struct rte_eth_dev *dev);
 struct priv *mlx5_secondary_data_setup(struct priv *priv);
 void priv_select_tx_function(struct priv *);
 void priv_select_rx_function(struct priv *);
+int mlx5_timesync_enable(struct rte_eth_dev *dev);
+int mlx5_timesync_disable(struct rte_eth_dev *dev);

 /* mlx5_mac.c */

diff --git a/drivers/net/mlx5/mlx5_defs.h 

[dpdk-dev] [PATCH 2/3] app/testpmd: enabled control for packet timestamps

2016-10-13 Thread Oleg Kuporosov
Implemented two methods of control

- by --enable-timestamps CL testpmd application we can enable timestamping
  for all ports;
- in interactive mode port config  timestamps on|off is able to
  configure timestamping per specific port.

The control doesn't interact with IEEE1588 PTP implementation there as
it is under macro compilation but can be extended in the future.

This feature is required for debugging/testing purposes for real time HW
packet timestamping.

Signed-off-by: Oleg Kuporosov 
---
 app/test-pmd/cmdline.c  | 122 
 app/test-pmd/parameters.c   |   4 +
 app/test-pmd/testpmd.c  |   5 ++
 app/test-pmd/testpmd.h  |   1 +
 doc/guides/testpmd_app_ug/run_app.rst   |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 ++
 6 files changed, 143 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 17d238f..9b202ce 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -561,6 +561,9 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set 
crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
" for ports.\n\n"

+   "port config (port_id|all) timestamps (on|off)\n"
+   "Enable/disable packet timestamps.\n\n"
+
"port config all rss 
(all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)\n"
"Set the RSS mode.\n\n"

@@ -10188,6 +10191,123 @@ cmdline_parse_inst_t 
cmd_config_l2_tunnel_en_dis_specific = {
},
 };

+/* Configure port timestamping */
+struct cmd_config_timestamps_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t config;
+   cmdline_fixed_string_t all;
+   uint8_t id;
+   cmdline_fixed_string_t timestamps;
+   cmdline_fixed_string_t mode;
+};
+
+cmdline_parse_token_string_t cmd_config_timestamps_port =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+port, "port");
+cmdline_parse_token_string_t cmd_config_timestamps_config =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+config, "config");
+cmdline_parse_token_string_t cmd_config_timestamps_all_str =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+all, "all");
+cmdline_parse_token_num_t cmd_config_timestamps_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_timestamps_result,
+id, UINT8);
+cmdline_parse_token_string_t cmd_config_timestamps_ts_str =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+   timestamps, "timestamps");
+cmdline_parse_token_string_t cmd_config_timestamps_path =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_timestamps_result,
+   mode, "on#off");
+
+/* enable/disable timestamps (on/off) for a port */
+static void
+cmd_config_timestamps_all_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_config_timestamps_result *res = parsed_result;
+   portid_t pid;
+   uint8_t mode = 0;
+
+   if (!strcmp("on", res->mode))
+   mode = 1;
+   else if (!strcmp("off", res->mode))
+   mode = 0;
+   else {
+   printf("Unknown timestamps mode parameter\n");
+   return;
+   }
+   FOREACH_PORT(pid, ports) {
+   if (mode)
+   rte_eth_timesync_enable(pid);
+   else
+   rte_eth_timesync_disable(pid);
+   }
+}
+
+cmdline_parse_inst_t cmd_config_timestamps_all = {
+   .f = cmd_config_timestamps_all_parsed,
+   .data = NULL,
+   .help_str = "port config all timestamps on|off",
+   .tokens = {
+   (void *)_config_timestamps_port,
+   (void *)_config_timestamps_config,
+   (void *)_config_timestamps_all_str,
+   (void *)_config_timestamps_ts_str,
+   (void *)_config_timestamps_path,
+   NULL,
+   },
+};
+
+/* enable/disable timestamps (rx/tx/both) for a port */
+static void
+cmd_config_timestamps_specific_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_config_timestamps_result *res =
+   parsed_result;
+   uint8_t mode = 0;
+
+   if (port_id_is_invalid(res->id, ENABLED_WARN))
+   return;
+   if (!strcmp("on", res->mode))
+   mode = 1;
+   else if (!strcmp("off", res->mode))
+   mode = 0;
+   else {
+   printf("Unknown timestamps mode parameter\n");
+   return;
+   }
+   if (mode)

[dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet

2016-10-13 Thread Oleg Kuporosov
The hard requirement of financial services industry is accurate
timestamping aligned with the packet itself. This patch is to satisfy
this requirement:

- include uint64_t timestamp field into rte_mbuf with minimal impact to
  throughput/latency. Keep it just simple uint64_t in ns (more than 580
  years) would be enough for immediate needs while using full
  struct timespec with twice bigger size would have much stronger
  performance impact as missed cacheline0.

- it is possible as there is 6-bytes gap in 1st cacheline (fast path)
  and moving uint16_t vlan_tci_outer field to 2nd cacheline.

- such move will only impact for pretty rare usable VLAN RX stripping
  mode for outer TCI (it used only for one NIC i40e from the whole set and
  allows to keep minimal performance impact for RX/TX timestamps.

Signed-off-by: Oleg Kuporosov 
---
 lib/librte_mbuf/rte_mbuf.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 23b7bf8..1e1f2ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -851,8 +851,7 @@ struct rte_mbuf {

uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */

-   /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
-   uint16_t vlan_tci_outer;
+   uint64_t timestamp;   /**< Packet's timestamp, usually in ns */

/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_min_aligned;
@@ -885,6 +884,9 @@ struct rte_mbuf {
};
};

+   /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+   uint16_t vlan_tci_outer;
+
/** Size of the application private data. In case of an indirect
 * mbuf, it stores the direct mbuf private data size. */
uint16_t priv_size;
-- 
1.8.3.1



[dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support

2016-10-13 Thread Oleg Kuporosov

Hello DPDK Developers,

Financial Services Industry which is pretty eager for several DPDK
features especially low latency while high throughput. The major issue
so far for increasing DPDK adoption there is requirement for several
applications (capturers, some trading algorithms) to support of accurate
timestamping. The requirement mostly came from regulatory and customers
need strictly follow it.

Current state of timestamping support in DPDK looks pretty good:
 - there is API to enable/disable timestamping acquisition by 
   rte_eth_timesync_enable/disable
 - get timestamps itself by timesync_read_rx/tx_timestamp
 - and even implementation of NTP IEEE 1588 for time synchronization
   by rte_eth_timesync_adjust_read/write_time APIs.

But it misses the most important feature there ? embedding timestamp
in rte_mbuf aligning it with packet.

We would like to change this to increase DPDK adoption for several new
DPDK-based applications for FSI segment. It also might be very
applicable for several RT media and debugging purposes of network
flows/streams in other segments like HPC.

There are several thoughts how to improve there:
 - include uint64_t timestamp field into rte_mbuf with minimal impact
   to throughput/latency. Keep it just simple uint64_t in ns (more than
   580 years) would be enough for immediate needs while using full
   struct timespec with twice bigger size would have much stronger
   performance impact as missed cacheline0. It is possible as there is
   6-bytes gap in 1st cacheline (fast path) and moving uint16_t
   vlan_tci_outer field to 2nd cacheline. 
 - such move will only impact for pretty rare usable VLAN RX stripping
   mode for outer TCI (it used only for one NIC i40e from the whole
   set and keep minimal performance impact for timestamps.  
 - PMD can fill the field in their callback completion routines
   depending on enabling this feature in runtime.

We evaluated other possible options but looks it will have even worse
performance impact.


Oleg Kuporosov (3):
  mbuf: embedding timestamp into the packet
  app/testpmd: enabled control for packet timestamps
  net/mlx5: implementation of Rx packet timestamping support

 app/test-pmd/cmdline.c  | 122 +++
 app/test-pmd/parameters.c   |   4 +
 app/test-pmd/testpmd.c  |   5 +
 app/test-pmd/testpmd.h  |   1 +
 doc/guides/testpmd_app_ug/run_app.rst   |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 +
 drivers/net/mlx5/mlx5.c |   7 +-
 drivers/net/mlx5/mlx5.h |  10 +-
 drivers/net/mlx5/mlx5_defs.h|   4 +
 drivers/net/mlx5/mlx5_ethdev.c  | 222 +++-
 drivers/net/mlx5/mlx5_rxq.c |   2 +
 drivers/net/mlx5/mlx5_rxtx.c|  19 ++-
 drivers/net/mlx5/mlx5_rxtx.h|   7 +-
 drivers/net/mlx5/mlx5_time.h|  53 +++
 drivers/net/mlx5/mlx5_trigger.c |   1 +
 lib/librte_eal/common/include/rte_time.h|  45 ++
 lib/librte_mbuf/rte_mbuf.h  |   6 +-
 17 files changed, 507 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_time.h

Thanks,
Oleg.
-- 
1.8.3.1



[dpdk-dev] [PATCH] net/enic: remove assert which causes compile error

2016-10-13 Thread Bruce Richardson
On Wed, Oct 12, 2016 at 11:09:35AM -0700, John Daley wrote:
> Remove an RTE_ASSERT which will not compile if enabled and is not needed.
> 
> Fixes: a1f7c7b3b5b2 ("net/enic: extend fdir support for 1300 series adapters")
> 
> Signed-off-by: John Daley 
> ---
> Would have been nice if I caught this yesterday before you applied a1f7c7b3 :(
> 
Since commit a1f7c7b3 has not been pulled into the dpdk.org mainline, I can 
still
merge this commit into the origin one that has the bug. Unless you object, I 
shall do so now.

/Bruce


[dpdk-dev] [dpdk-stable] [PATCH v6 1/6] vhost: fix windows vm hang

2016-10-13 Thread Yuanhan Liu
On Mon, Sep 19, 2016 at 10:00:12PM -0400, Zhihong Wang wrote:
> This patch fixes a Windows VM compatibility issue in DPDK 16.07 vhost code
> which causes the guest to hang once any packets are enqueued when mrg_rxbuf
> is turned on by setting the right id and len in the used ring.
> 
> As defined in virtio spec 0.95 and 1.0, in each used ring element, id means
> index of start of used descriptor chain, and len means total length of the
> descriptor chain which was written to. While in 16.07 code, index of the
> last descriptor is assigned to id, and the length of the last descriptor is
> assigned to len.
> 
> How to test?
> 
>  1. Start testpmd in the host with a vhost port.
> 
>  2. Start a Windows VM image with qemu and connect to the vhost port.
> 
>  3. Start io forwarding with tx_first in host testpmd.
> 
> For 16.07 code, the Windows VM will hang once any packets are enqueued.
> 
> Cc: 
> Signed-off-by: Zhihong Wang 

Applied to dpdk-next-virtio (this patch only).

Thanks.

--yliu


[dpdk-dev] [PATCH 2/2] net/i40e: fix VF bonded device link down

2016-10-13 Thread Qiming Yang
Originally, using DPDK as host driver, when VF bonded device
uses I40E_VIRTCHNL_OP_GET_LINK_STAT to query PF the link status,
VF will wait for the interrupt from PF to get this link status.
PF uses interrupt which already exists to communication with VF.
These two kinds of interrupt will be confusing and fail to converge.

This patch uses PF to notify link status instead of VF query.

Fixes: 5c9222058df7 ("i40e: move to drivers/net/")

Signed-off-by: Qiming Yang 
---
 drivers/net/i40e/i40e_ethdev.c| 22 ++-
 drivers/net/i40e/i40e_ethdev.h|  4 +-
 drivers/net/i40e/i40e_ethdev_vf.c | 81 +--
 drivers/net/i40e/i40e_pf.c| 33 
 drivers/net/i40e/i40e_pf.h|  3 +-
 5 files changed, 67 insertions(+), 76 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 9c1f542..13060db 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5418,6 +5418,24 @@ i40e_dev_handle_vfr_event(struct rte_eth_dev *dev)
 }

 static void
+i40e_notify_all_vfs_link_status(struct rte_eth_dev *dev)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   struct i40e_virtchnl_pf_event event;
+   int i;
+
+   event.event = I40E_VIRTCHNL_EVENT_LINK_CHANGE;
+   event.event_data.link_event.link_status =
+   dev->data->dev_link.link_status;
+   event.event_data.link_event.link_speed =
+   dev->data->dev_link.link_speed;
+
+   for (i = 0; i < pf->vf_num; i++)
+   i40e_pf_host_send_msg_to_vf(>vfs[i], I40E_VIRTCHNL_OP_EVENT,
+   I40E_SUCCESS, (uint8_t *), sizeof(event));
+}
+
+static void
 i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
 {
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -5455,9 +5473,11 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
break;
case i40e_aqc_opc_get_link_status:
ret = i40e_dev_link_update(dev, 0);
-   if (!ret)
+   if (!ret) {
+   i40e_notify_all_vfs_link_status(dev);
_rte_eth_dev_callback_process(dev,
RTE_ETH_EVENT_INTR_LSC);
+   }
break;
default:
PMD_DRV_LOG(ERR, "Request %u is not supported yet",
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 92c8fad..61dfa93 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -599,7 +599,9 @@ int i40e_hash_filter_inset_select(struct i40e_hw *hw,
 struct rte_eth_input_set_conf *conf);
 int i40e_fdir_filter_inset_select(struct i40e_pf *pf,
 struct rte_eth_input_set_conf *conf);
-
+int i40e_pf_host_send_msg_to_vf(struct i40e_pf_vf *vf, uint32_t opcode,
+   uint32_t retval, uint8_t *msg,
+   uint16_t msglen);
 void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_rxq_info *qinfo);
 void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index a616ae0..ba63a7f 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -126,8 +126,6 @@ static void i40evf_dev_promiscuous_enable(struct 
rte_eth_dev *dev);
 static void i40evf_dev_promiscuous_disable(struct rte_eth_dev *dev);
 static void i40evf_dev_allmulticast_enable(struct rte_eth_dev *dev);
 static void i40evf_dev_allmulticast_disable(struct rte_eth_dev *dev);
-static int i40evf_get_link_status(struct rte_eth_dev *dev,
- struct rte_eth_link *link);
 static int i40evf_init_vlan(struct rte_eth_dev *dev);
 static int i40evf_dev_rx_queue_start(struct rte_eth_dev *dev,
 uint16_t rx_queue_id);
@@ -1084,31 +1082,6 @@ i40evf_del_vlan(struct rte_eth_dev *dev, uint16_t vlanid)
return err;
 }

-static int
-i40evf_get_link_status(struct rte_eth_dev *dev, struct rte_eth_link *link)
-{
-   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
-   int err;
-   struct vf_cmd_info args;
-   struct rte_eth_link *new_link;
-
-   args.ops = (enum i40e_virtchnl_ops)I40E_VIRTCHNL_OP_GET_LINK_STAT;
-   args.in_args = NULL;
-   args.in_args_size = 0;
-   args.out_buffer = vf->aq_resp;
-   args.out_size = I40E_AQ_BUF_SZ;
-   err = i40evf_execute_vf_cmd(dev, );
-   if (err) {
-   PMD_DRV_LOG(ERR, "fail to execute command OP_GET_LINK_STAT");
-   return err;
-   }
-
-   new_link = (struct rte_eth_link *)args.out_buffer;
-   (void)rte_memcpy(link, new_link, sizeof(*link));
-
-   

[dpdk-dev] [PATCH 1/2] net/i40e: fix link status change interrupt

2016-10-13 Thread Qiming Yang
Previously, link status interrupt in i40e is achieved by checking
LINK_STAT_CHANGE_MASK in PFINT_ICR0 register which is provided only
for diagnostic use. Instead, drivers need to get the link status
change notification by using LSE (Link Status Event).

This patch enables LSE and calls LSC callback when the event is
received. This patch also removes the processing on
LINK_STAT_CHANGE_MASK.

Fixes: 5c9222058df7 ("i40e: move to drivers/net/")

Signed-off-by: Qiming Yang 
---
 drivers/net/i40e/i40e_ethdev.c | 97 +-
 1 file changed, 19 insertions(+), 78 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d0aeb70..9c1f542 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -108,7 +108,6 @@
I40E_PFINT_ICR0_ENA_GRST_MASK | \
I40E_PFINT_ICR0_ENA_PCI_EXCEPTION_MASK | \
I40E_PFINT_ICR0_ENA_STORM_DETECT_MASK | \
-   I40E_PFINT_ICR0_ENA_LINK_STAT_CHANGE_MASK | \
I40E_PFINT_ICR0_ENA_HMC_ERR_MASK | \
I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK | \
I40E_PFINT_ICR0_ENA_VFLR_MASK | \
@@ -1768,6 +1767,16 @@ i40e_dev_start(struct rte_eth_dev *dev)
if (dev->data->dev_conf.intr_conf.lsc != 0)
PMD_INIT_LOG(INFO, "lsc won't enable because of"
 " no intr multiplex\n");
+   } else if (dev->data->dev_conf.intr_conf.lsc != 0) {
+   ret = i40e_aq_set_phy_int_mask(hw,
+  ~(I40E_AQ_EVENT_LINK_UPDOWN |
+  I40E_AQ_EVENT_MODULE_QUAL_FAIL |
+  I40E_AQ_EVENT_MEDIA_NA), NULL);
+   if (ret != I40E_SUCCESS)
+   PMD_DRV_LOG(WARNING, "Fail to set phy mask");
+
+   /* Call get_link_info aq commond to enable LSE */
+   i40e_dev_link_update(dev, 0);
}

/* enable uio intr after callback register */
@@ -1984,6 +1993,7 @@ i40e_dev_link_update(struct rte_eth_dev *dev,
struct rte_eth_link link, old;
int status;
unsigned rep_cnt = MAX_REPEAT_TIME;
+   bool enable_lse = dev->data->dev_conf.intr_conf.lsc ? true : false;

memset(, 0, sizeof(link));
memset(, 0, sizeof(old));
@@ -1992,7 +2002,8 @@ i40e_dev_link_update(struct rte_eth_dev *dev,

do {
/* Get link status information from hardware */
-   status = i40e_aq_get_link_info(hw, false, _status, NULL);
+   status = i40e_aq_get_link_info(hw, enable_lse,
+   _status, NULL);
if (status != I40E_SUCCESS) {
link.link_speed = ETH_SPEED_NUM_100M;
link.link_duplex = ETH_LINK_FULL_DUPLEX;
@@ -5442,6 +5453,12 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
info.msg_buf,
info.msg_len);
break;
+   case i40e_aqc_opc_get_link_status:
+   ret = i40e_dev_link_update(dev, 0);
+   if (!ret)
+   _rte_eth_dev_callback_process(dev,
+   RTE_ETH_EVENT_INTR_LSC);
+   break;
default:
PMD_DRV_LOG(ERR, "Request %u is not supported yet",
opcode);
@@ -5451,57 +5468,6 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
rte_free(info.msg_buf);
 }

-/*
- * Interrupt handler is registered as the alarm callback for handling LSC
- * interrupt in a definite of time, in order to wait the NIC into a stable
- * state. Currently it waits 1 sec in i40e for the link up interrupt, and
- * no need for link down interrupt.
- */
-static void
-i40e_dev_interrupt_delayed_handler(void *param)
-{
-   struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t icr0;
-
-   /* read interrupt causes again */
-   icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
-
-#ifdef RTE_LIBRTE_I40E_DEBUG_DRIVER
-   if (icr0 & I40E_PFINT_ICR0_ECC_ERR_MASK)
-   PMD_DRV_LOG(ERR, "ICR0: unrecoverable ECC error\n");
-   if (icr0 & I40E_PFINT_ICR0_MAL_DETECT_MASK)
-   PMD_DRV_LOG(ERR, "ICR0: malicious programming detected\n");
-   if (icr0 & I40E_PFINT_ICR0_GRST_MASK)
-   PMD_DRV_LOG(INFO, "ICR0: global reset requested\n");
-   if (icr0 & I40E_PFINT_ICR0_PCI_EXCEPTION_MASK)
-   PMD_DRV_LOG(INFO, "ICR0: PCI exception\n activated\n");
-   if (icr0 & I40E_PFINT_ICR0_STORM_DETECT_MASK)
-   PMD_DRV_LOG(INFO, "ICR0: a change in the storm control "
- 

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-13 Thread Yuanhan Liu
On Wed, Oct 12, 2016 at 12:22:08PM +, Wang, Zhihong wrote:
> > > >> >  3. How many percentage of drop are you seeing?
> > > The testing result:
> > > size (bytes) improvement (%)
> > > 64   3.92
> > > 128 11.51
> > > 256  24.16
> > > 512  -13.79
> > > 1024-22.51
> > > 1500-12.22
> > > A correction is that performance is dropping if byte size is larger than 
> > > 512.
> > 
> > I have thought of this twice. Unfortunately, I think I may need NACK this
> > series.
> > 
> > Merging two code path into one is really good: as you stated, it improves
> > the maintainability. But only if we see no performance regression on both
> > path after the refactor. Unfortunately, that's not the case here: it hurts
> > the performance for one code path (non-mrg Rx).
> > 
> > That makes me think we may should not do the code path merge at all. I think
> > that also aligns with what you have said before (internally): we could do 
> > the
> > merge if it gives comparable performance before and after that.
> > 
> > Besides that, I don't quite like the way you did in patch 2 (rewrite 
> > enqueue):
> > you made a lot of changes in one patch. That means if something wrong
> > happened,
> > it is hard to narrow down which change introduces that regression. Badly,
> > that's exactly what we met here. Weeks have been passed, I see no progress.
> > 
> > That's the reason we like the idea of "one patch only does one thing, an
> > atomic thing".
> 
> 
> Yuanhan, folks,
> 
> Thanks for the analysis. I disagree here though.
> 
> I analyze, develop, benchmark on x86 platforms, where this patch
> works great.

Yes, that's great effort! With your hardwork, we know what the bottleneck
is and how it could be improved.

However, you don't have to do code refactor (merge two code path to one)
to apply those improvements. From what I know, in this patchset, there
are two factors could improve the performance:

- copy hdr together with packet data

- shadow used ring update and update at once

The overall performance boost I got with your v6 patchset with mrg-Rx
code path is about 27% (in PVP case). And I have just applied the 1st
optimization, it yields about 20% boosts. The left could be covered if
we apply the 2nd optimization (I guess).

That would be a clean way to optimize vhost mergeable Rx path:

- you don't touch non-mrg Rx path (well, you may could apply the
  shadow_used_ring trick to it as wel)

  This would at least make sure we will have no such performance
  regression issue reported by ARM guys.

- you don't refactor the code

  The rewrite from scratch could introduce other issues, besides the
  performance regression. We may just don't know it yet.


Make sense to you? If you agree, I think we could still make it in
this release: they would be some small changes after all. For example,
below is the patch applies the 1st optimization tip on top of
dpdk-next-virtio

--yliu

---
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8a151af..0ddb5af 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -379,7 +379,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint16_t end_idx, struct rte_mbuf *m,
struct buf_vector *buf_vec)
 {
-   struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
+   struct virtio_net_hdr_mrg_rxbuf *virtio_hdr;
uint32_t vec_idx = 0;
uint16_t start_idx = vq->last_used_idx;
uint16_t cur_idx = start_idx;
@@ -388,6 +388,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint32_t desc_offset, desc_avail;
uint32_t cpy_len;
uint16_t desc_idx, used_idx;
+   uint16_t num_buffers = end_idx - start_idx;
+   int hdr_copied = 0;

if (unlikely(m == NULL))
return 0;
@@ -399,16 +401,11 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
return 0;

-   rte_prefetch0((void *)(uintptr_t)desc_addr);
+   virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr;
+   rte_prefetch0(virtio_hdr);

-   virtio_hdr.num_buffers = end_idx - start_idx;
LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
-   dev->vid, virtio_hdr.num_buffers);
-
-   virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
-   vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
+   dev->vid, num_buffers);

desc_avail  = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
desc_offset = 

[dpdk-dev] [PATCH] app/test: add crypto continual tests

2016-10-13 Thread Arek Kusztal
This commit adds continual performace tests to Intel(R) QuickAssist
Technology tests suite. Performance tests are run continually with
some number of repeating loops.

Signed-off-by: Arek Kusztal 
---
 app/test/test_cryptodev_perf.c | 133 -
 1 file changed, 119 insertions(+), 14 deletions(-)

diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c
index 43a7166..dd741fa 100644
--- a/app/test/test_cryptodev_perf.c
+++ b/app/test/test_cryptodev_perf.c
@@ -3876,9 +3876,9 @@ perf_AES_GCM(uint8_t dev_id, uint16_t queue_id,
 }

 static int
-test_perf_AES_GCM(void)
+test_perf_AES_GCM(int continual_buf_len, int continual_size)
 {
-   uint16_t i, j;
+   uint16_t i, j, k, loops = 1;

uint16_t buf_lengths[] = { 64, 128, 256, 512, 1024, 1536, 2048 };

@@ -3886,6 +3886,9 @@ test_perf_AES_GCM(void)
_GCM_128_12IV_0AAD
};

+   if (continual_buf_len)
+   loops = continual_size;
+
int TEST_CASES_GCM = RTE_DIM(gcm_tests);

const unsigned burst_size = 32;
@@ -3930,21 +3933,42 @@ test_perf_AES_GCM(void)
params_set[i].chain = CIPHER_HASH;
params_set[i].session_attrs = _attrs[i];
params_set[i].symmetric_op = _set[i];
-   params_set[i].total_operations = 100;
+   if (continual_buf_len)
+   params_set[i].total_operations = 0xFF;
+   else
+   params_set[i].total_operations = 100;
+
params_set[i].burst_size = burst_size;

}

+   if (continual_buf_len)
+   printf("\nCipher algo: %s Cipher hash: %s cipher key size: %ub"
+   " burst size: %u", "AES_GCM", "AES_GCM",
+   gcm_test->key.len << 3, burst_size);
+
for (i = 0; i < RTE_DIM(gcm_tests); i++) {

-   printf("\nCipher algo: %s Cipher hash: %s cipher key size: %ub"
+   if (!continual_buf_len) {
+   printf("\nCipher algo: %s Cipher hash: %s cipher key 
size: %ub"
" burst size: %u", "AES_GCM", "AES_GCM",
-   gcm_test->key.len << 3, burst_size
-   );
-   printf("\nBuffer Size(B)\tOPS(M)\tThroughput(Gbps)\t"
-   " Retries\tEmptyPolls");
+   gcm_test->key.len << 3, burst_size);
+   printf("\nBuffer Size(B)\tOPS(M)\tThroughput(Gbps)\t"
+   " Retries\tEmptyPolls");
+   }

-   for (j = 0; j < RTE_DIM(buf_lengths); ++j) {
+   uint16_t len = RTE_DIM(buf_lengths);
+   uint16_t p = 0;
+
+   if (continual_buf_len) {
+   for (k = 0; k < RTE_DIM(buf_lengths); k++)
+   if (buf_lengths[k] == continual_buf_len) {
+   len = k + 1;
+   p = k;
+   break;
+   }
+   }
+   for (j = p; j < len; ++j) {

params_set[i].symmetric_op->c_len = buf_lengths[j];
params_set[i].symmetric_op->p_len = buf_lengths[j];
@@ -3959,9 +3983,16 @@ test_perf_AES_GCM(void)
_set[i], 1))
return TEST_FAILED;

-   if (perf_AES_GCM(testsuite_params.dev_id, 0,
-   _set[i], 0))
-   return TEST_FAILED;
+   for (k = 0; k < loops; k++) {
+   if (continual_buf_len)
+   printf("\n\nBuffer 
Size(B)\tOPS(M)\tThroughput(Gbps)\t"
+   " Retries\tEmptyPolls");
+   if (perf_AES_GCM(testsuite_params.dev_id, 0,
+   _set[i], 0))
+   return TEST_FAILED;
+   if (continual_buf_len)
+   printf("\n\nCompleted loop %i of %i 
...", k+1, loops);
+   }
}

}
@@ -3969,6 +4000,70 @@ test_perf_AES_GCM(void)
return 0;
 }

+static int test_cryptodev_perf_AES_GCM(void)
+{
+   return test_perf_AES_GCM(0, 0);
+}
+/*
+ * This function calls AES GCM performance tests providing
+ * size of packet as an argument. If size of packet is not
+ * in the buf_lengths array, all sizes will be used
+ */
+static int test_continual_perf_AES_GCM(void)
+{
+   return test_perf_AES_GCM(1024, 10);
+}
+
+static int
+test_perf_continual_performance_test(void)
+{
+   unsigned total_operations = 0xFF;
+   unsigned total_loops = 10;
+   unsigned burst_size = 32;
+  

[dpdk-dev] [PATCH] mempool: fix search of maximum contiguous pages

2016-10-13 Thread Ananyev, Konstantin

> 
> > > > diff --git a/lib/librte_mempool/rte_mempool.c
> > > > b/lib/librte_mempool/rte_mempool.c
> > > > index 71017e1..e3e254a 100644
> > > > --- a/lib/librte_mempool/rte_mempool.c
> > > > +++ b/lib/librte_mempool/rte_mempool.c
> > > > @@ -426,9 +426,12 @@ rte_mempool_populate_phys_tab(struct
> > > > rte_mempool *mp, char *vaddr,
> > > >
> > > > for (i = 0; i < pg_num && mp->populated_size < mp->size; i += 
> > > > n) {
> > > >
> > > > +   phys_addr_t paddr_next;
> > > > +   paddr_next = paddr[i] + pg_sz;
> > > > +
> > > > /* populate with the largest group of contiguous pages 
> > > > */
> > > > for (n = 1; (i + n) < pg_num &&
> > > > -paddr[i] + pg_sz == paddr[i+n]; n++)
> > > > +paddr_next == paddr[i+n]; n++, paddr_next 
> > > > += pg_sz)
> > > > ;
> > >
> > > Good catch.
> > > Why not just paddr[i + n - 1] != paddr[i + n]?
> >
> > Sorry, I meant 'paddr[i + n - 1] + pg_sz == paddr[i+n]' off course.
> >
> > > Then you don't need extra variable (paddr_next) here.
> > > Konstantin
> 
> Thank you, Konstantin
> 'paddr[i + n - 1] + pg_sz = paddr[i + n]' also can fix it and have straight 
> meaning.
> But I assume that my revision with paddr_next += pg_sz may have a bit better 
> performance.

I don't think there would be any real difference, again it is not performance 
critical code-path.

> By the way, paddr[i] + n * pg_sz = paddr[i + n] can also resolve it.

Yes, that's one seems even better for me - make things more clear.
Konstantin

> 
> /Wei
> 
> > >
> > > >
> > > > ret = rte_mempool_populate_phys(mp, vaddr + i * pg_sz,
> > > > --
> > > > 2.7.4



[dpdk-dev] [PATCH v1 0/4] Generalize PCI specific EAL function/structures

2016-10-13 Thread Shreyansh Jain
Hi David,

On Friday 30 September 2016 09:01 PM, David Marchand wrote:
> On Tue, Sep 27, 2016 at 4:12 PM, Shreyansh Jain  
> wrote:
>> (I rebased these over HEAD 7b3c4f3)
>>
>> These patches were initially part of Jan's original series on SoC
>> Framework ([1],[2]). An update to that series, without these patches,
>> was posted here [3].
>>
>> Main motivation for these is aim of introducing a non-PCI centric
>> subsystem in EAL. As of now the first usecase is SoC, but not limited to
>> it.
>>
>> 4 patches in this series are independent of each other, as well as SoC
>> framework. All these focus on generalizing some structure or functions
>> present with the PCI specific code to EAL Common area (or splitting a
>> function to be more userful).
>
> Those patches move linux specifics (binding pci devices using sysfs)
> to common infrastucture.
> We have no proper hotplug support on bsd, but if we had some common
> code we should at least try to make the apis generic.
>

rte_eal_unbind_kernel_driver() defined in the patch is essentially a 
wrapper which can be implemented for Linux as well as BSD. Just that in 
this patch a ENOTSUP implementation of this was not given for BSD.

I can think of two options:

1. Implement a ENOTSUPP code for BSD and allow 
rte_eal_unbind_kernel_driver() as part of common/eal_private header.
implement rte_eal_unbind_kernel_driver in BSD as not supported.

2. That we keep this contained within Linux area:
   linux_unbind_kernel_driver <-- pci_unbind_kernel_driver
^
 `-- xxx_unbind_kernel_driver

   bsd_unbind_kernel_driver <-- pci_unbind_kernel_driver
 `-> this returns ENOTSUPP


@Jan: any additions/suggestions?

-
Shreyansh


[dpdk-dev] [PATCH] net/i40e: Fix a typo in a comment (Fortville instead of IXGBE).

2016-10-13 Thread Ferruh Yigit
On 9/25/2016 11:03 AM, Wu, Jingjing wrote:
> 
> 
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rami Rosen
>> Sent: Monday, August 1, 2016 12:52 PM
>> To: dev at dpdk.org
>> Cc: Rosen, Rami 
>> Subject: [dpdk-dev] [PATCH] net/i40e: Fix a typo in a comment (Fortville 
>> instead of IXGBE).
>>
>> This patch fixes a typo in a comment in i40e_ethdev.c: use Fortville instead 
>> of
>> IXGBE.
>>
>> Signed-off-by: Rami Rosen 
> 
> Thanks!
> 
> Acked-by: Jingjing Wu 
> 

This patch is no more valid since following commit removes the comment
completely:

c830cb295411 ("drivers: use PCI registration macro")

dropping the patch.

Thanks,
ferruh


[dpdk-dev] [PATCH v2 2/3] app/testpmd: fix wrong flow director mask

2016-10-13 Thread Ferruh Yigit
Hi Wenzhuo,

On 10/12/2016 7:24 AM, Wenzhuo Lu wrote:
> In mac-vlan mode, MAC address mask is not supported by HW.
> The MAC address mask should not be set in mac-vlan mode.
> Remove this parameter from the CLI. Remove MAC address
> from mask print too.
> 
> Fixes: 53b2bb9b7ea7 ("app/testpmd: new flow director commands")
> Signed-off-by: Wenzhuo Lu 
> ---

I think it is good to require an ack from testpmd maintainer for this
patch. cc'ed Pablo.

Thanks,
ferruh



[dpdk-dev] [PATCH 1/5] i40e: extract non-x86 specific code from vector driver

2016-10-13 Thread Zhang, Qi Z
Hi Jianbo

> -Original Message-
> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> Sent: Thursday, October 13, 2016 11:00 AM
> To: Zhang, Qi Z 
> Cc: Zhang, Helin ; Wu, Jingjing
> ; jerin.jacob at caviumnetworks.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/5] i40e: extract non-x86 specific code from
> vector driver
> 
> On 12 October 2016 at 10:55, Zhang, Qi Z  wrote:
> > Hi Jianbo
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
> >> Sent: Wednesday, August 24, 2016 5:54 PM
> >> To: Zhang, Helin ; Wu, Jingjing
> >> ; jerin.jacob at caviumnetworks.com;
> dev at dpdk.org
> >> Cc: Jianbo Liu 
> >> Subject: [dpdk-dev] [PATCH 1/5] i40e: extract non-x86 specific code
> >> from vector driver
> >>
> >> move scalar code which does not use x86 intrinsic functions to new
> >> file "i40e_rxtx_vec_common.h", while keeping x86 code in
> i40e_rxtx_vec.c.
> >> This allows the scalar code to to be shared among vector drivers for
> >> different platforms.
> >>
> >> Signed-off-by: Jianbo Liu 
> >> ---
> ...
> >
> > Should we rename the function "_40e_rx_queue_release_mbufs_vec" to
> > "i40e_rx_queue_release_mbufs_vec_default", so functions be wrapped
> can follow a consistent rule?
> 
> I think these two ways are different.
> For func/_func, _func implements what func needs to do, they are same.
> We needs _func inline, to be called in different ARCHs.
> But for func/func_default, func_default is the default behavior, but you can
> use or not-use it in func.

Got your point, I also saw ixgbe is implemented in the same way.
Thanks!
Qi


  1   2   >