[dpdk-dev] API feature check _HAS_

2015-11-29 Thread Gleb Natapov
On Sun, Nov 29, 2015 at 11:07:44AM +0200, Vlad Zolotarov wrote:
> 
> 
> On 11/26/15 22:35, Thomas Monjalon wrote:
> >When introducing LRO, Vlad has defined the macro RTE_ETHDEV_HAS_LRO_SUPPORT:
> >http://dpdk.org/browse/dpdk/commit/lib/librte_ether/rte_ethdev.h?id=8eecb329
> >
> >It allows to use the feature without version check (before the release or
> >after a backport).
> >Do you think it is useful?
> >Should we define other macros RTE_[API]_HAS_[FEATURE] for each new feature
> >or API change?
> 
> The main purpose of the above macro was to identify the presence of the new
> field in the rte_eth_rxmode during the
> period of time when there was no other way to know it. Once this may be
> concluded based on the release version I see no
> reason to keep it.
> 
Concluding things based on release version does not work so well for
back ports.

> >It's time to fix it before releasing the 2.2 version.
> 

--
Gleb.


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-02 Thread Gleb Natapov
On Fri, Oct 02, 2015 at 05:00:14PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote:
> > validation and translation would add 10s if not 100s of nanoseconds to the
> > time needed to process each packet.  In addition we are talking about doing
> > this in kernel space which means we wouldn't really be able to take
> > advantage of things like SSE or AVX instructions.
> 
> Yes. But the nice thing is that it's rearming so it can happen on
> a separate core, in parallel with packet processing.
> It does not need to add to latency.
> 
Modern nics have no less queues than most machines has cores. There is
no such thing as free core to offload you processing to, otherwise you
designed your application wrong and waste cpu cycles.

> You will burn up more CPU, but again, all this for boxes/hypervisors
> without an IOMMU.
> 
> I'm sure people can come up with even better approaches, once enough
> people get it that kernel absolutely needs to be protected from
> userspace.
> 
People should not "get" things which are, lets be polite here, untrue.
The kernel never tried to protect itself from userspace rumning on
behalf of root. Secure boot, which is quite recent, is may be an only
instance where kernel tries to do so (unfortunately) and it does so by
disabling things if boot is secure. Linux was always "jack of all
trades" and was suitable to run on a machine with secure boot and a vm
that acts as application container or embedded device running packet
forwarding.

the only valid point is that nobody should debug crashes that may be
caused by buggy userspace and tainting kernel solves that.

> Long term, the right thing to do is to focus on IOMMU support. This
> gives you hardware-based memory protection without need to burn up CPU
> cycles.
> 
> -- 
> MST

--
Gleb.


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-01 Thread Gleb Natapov
On Wed, Sep 30, 2015 at 11:36:58PM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 11:00:49PM +0300, Gleb Natapov wrote:
> > > You are increasing interrupt latency by a huge factor by channeling
> > > interrupts through a scheduler.  Let user install an
> > > interrupt handler function, and be done with it.
> > > 
> > Interrupt latency is not always hugely important. If you enter interrupt
> > mode only when idle hundred more us on a first packet will not kill you.
> 
> It certainly affects worst-case latency.  And if you lower interupt
> latency, you can go idle faster, so it affects power too.
> 
We are polling 100% now. Going idle faster is the least of our concern.

> > If
> > interrupt latency is important then uio may be not the right solution,
> > but then neither is vfio.
> 
> That's what I'm saying, if you don't need memory isolation you can do
> better than just slightly tweak existing drivers.
> 
No, you are forcing everyone to code in kernel no matter if it make
sense or not. You decide for everyone what is good for them. Believe me
people here know about trade-offs and made appropriate considerations.

--
Gleb.


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-01 Thread Gleb Natapov
On Wed, Sep 30, 2015 at 09:50:08PM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 10:43:04AM -0700, Stephen Hemminger wrote:
> > On Wed, 30 Sep 2015 20:39:43 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Sep 30, 2015 at 10:28:07AM -0700, Stephen Hemminger wrote:
> > > > On Wed, 30 Sep 2015 13:37:22 +0300
> > > > Vlad Zolotarov  wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > On 09/30/15 00:49, Michael S. Tsirkin wrote:
> > > > > > On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
> > > > > >> On Tue, 29 Sep 2015 23:54:54 +0300
> > > > > >> "Michael S. Tsirkin"  wrote:
> > > > > >>
> > > > > >>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
> > > > >  The security breach motivation u brought in "[RFC PATCH] uio:
> > > > >  uio_pci_generic: Add support for MSI interrupts" thread seems a 
> > > > >  bit weak
> > > > >  since one u let the userland access to the bar it may do any 
> > > > >  funny thing
> > > > >  using the DMA engine of the device. This kind of stuff should be 
> > > > >  prevented
> > > > >  using the iommu and if it's enabled then any funny tricks using 
> > > > >  MSI/MSI-X
> > > > >  configuration will be prevented too.
> > > > > 
> > > > >  I'm about to send the patch to main Linux mailing list. Let's 
> > > > >  continue this
> > > > >  discussion there.
> > > > > 
> > > > > >>> Basically UIO shouldn't be used with devices capable of DMA.
> > > > > >>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
> > > > > 
> > > > > If there is an IOMMU in the picture there shouldn't be any problem to 
> > > > > use UIO with DMA capable devices.
> > > > > 
> > > > > >>> I don't think this can change.
> > > > > >> Given there is no PV IOMMU and even if there was it would be too 
> > > > > >> slow for DPDK
> > > > > >> use, I can't accept that.
> > > > > > QEMU does allow emulating an iommu.
> > > > > 
> > > > > Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
> > > > > option there. And again, it's a general issue not DPDK specific.
> > > > > Today one has to develop some proprietary modules (like igb_uio) to 
> > > > > workaround the issue and this is lame. IMHO uio_pci_generic should
> > > > > be fixed to be able to properly work within any virtualized 
> > > > > environment 
> > > > > and not only with KVM.
> > > > > 
> > > > 
> > > > Also VMware (bigger problem) has no IOMMU emulation.
> > > > Other environments as well (Windriver, GCE) have noe IOMMU.
> > > 
> > > Because the use-case of userspace drivers is not important enough?
> > > Without an IOMMU, there's no way to have secure userspace drivers.
> > 
> > Look at Cloudius, there is no necessity of security in guest.
> 
> It's an interesting concept, isn't it?
> 
It is.

> So why not do what Cloudius does, and run this task code in ring 0 then,
> allocating all memory in the kernel range?
> 
Except this is not what Cloudius does. The idea of OSv is that it can
run your regular userspace application, but remove unneeded level of
indirection by bypassing userspace/kernelspace communication (among
other things).  Application still uses virtual, not directly mapped
physical memory like Linux ring 0 has.

You can achieve most of the benefits of kernel bypass on Linux too, but
unlike OSv you need to code for it. UIO is one of those things that
allows that.

> You are increasing interrupt latency by a huge factor by channeling
> interrupts through a scheduler.  Let user install an
> interrupt handler function, and be done with it.
> 
Interrupt latency is not always hugely important. If you enter interrupt
mode only when idle hundred more us on a first packet will not kill you. If
interrupt latency is important then uio may be not the right solution,
but then neither is vfio.

--
Gleb.


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Gleb Natapov
On Wed, Sep 30, 2015 at 08:39:43PM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 10:28:07AM -0700, Stephen Hemminger wrote:
> > On Wed, 30 Sep 2015 13:37:22 +0300
> > Vlad Zolotarov  wrote:
> > 
> > > 
> > > 
> > > On 09/30/15 00:49, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
> > > >> On Tue, 29 Sep 2015 23:54:54 +0300
> > > >> "Michael S. Tsirkin"  wrote:
> > > >>
> > > >>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
> > >  The security breach motivation u brought in "[RFC PATCH] uio:
> > >  uio_pci_generic: Add support for MSI interrupts" thread seems a bit 
> > >  weak
> > >  since one u let the userland access to the bar it may do any funny 
> > >  thing
> > >  using the DMA engine of the device. This kind of stuff should be 
> > >  prevented
> > >  using the iommu and if it's enabled then any funny tricks using 
> > >  MSI/MSI-X
> > >  configuration will be prevented too.
> > > 
> > >  I'm about to send the patch to main Linux mailing list. Let's 
> > >  continue this
> > >  discussion there.
> > > 
> > > >>> Basically UIO shouldn't be used with devices capable of DMA.
> > > >>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
> > > 
> > > If there is an IOMMU in the picture there shouldn't be any problem to 
> > > use UIO with DMA capable devices.
> > > 
> > > >>> I don't think this can change.
> > > >> Given there is no PV IOMMU and even if there was it would be too slow 
> > > >> for DPDK
> > > >> use, I can't accept that.
> > > > QEMU does allow emulating an iommu.
> > > 
> > > Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
> > > option there. And again, it's a general issue not DPDK specific.
> > > Today one has to develop some proprietary modules (like igb_uio) to 
> > > workaround the issue and this is lame. IMHO uio_pci_generic should
> > > be fixed to be able to properly work within any virtualized environment 
> > > and not only with KVM.
> > > 
> > 
> > Also VMware (bigger problem) has no IOMMU emulation.
> > Other environments as well (Windriver, GCE) have noe IOMMU.
> 
> Because the use-case of userspace drivers is not important enough?
Because "secure" userspace drivers is not important enough.

> Without an IOMMU, there's no way to have secure userspace drivers.
> 
People use VMs as an application containers, not as a machine that needs
to be secured for multiuser scenario.

--
Gleb.


[dpdk-dev] i40e and RSS woes

2015-04-28 Thread Gleb Natapov
Hi,

I didn't follow DPDK development to close lately. Was those problem
fixed already?

On Thu, Mar 05, 2015 at 06:56:14AM +, Zhang, Helin wrote:
> 
> 
> > -Original Message-
> > From: Gleb Natapov [mailto:gleb at cloudius-systems.com]
> > Sent: Thursday, March 5, 2015 2:39 PM
> > To: Zhang, Helin
> > Cc: dev at dpdk.org
> > Subject: Re: i40e and RSS woes
> > 
> > On Thu, Mar 05, 2015 at 05:56:27AM +, Zhang, Helin wrote:
> > > Hi Gleb
> > >
> > > Sorry for late! I am struggling on my tasks for the following DPDK release
> > these days.
> > >
> > > > -Original Message-
> > > > From: Gleb Natapov [mailto:gleb at cloudius-systems.com]
> > > > Sent: Monday, March 2, 2015 8:56 PM
> > > > To: dev at dpdk.org
> > > > Cc: Zhang, Helin
> > > > Subject: Re: i40e and RSS woes
> > > >
> > > > Ping.
> > > >
> > > > On Thu, Feb 19, 2015 at 04:50:10PM +0200, Gleb Natapov wrote:
> > > > > CCing i40e driver author in a hope to get an answer.
> > > > >
> > > > > On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
> > > > > > I have an application that works reasonably well with ixgbe
> > > > > > driver, but when I try to use it with i40e I encounter various RSS 
> > > > > > related
> > issues.
> > > > > >
> > > > > > First one is that for some reason i40e, when it builds default
> > > > > > reta table, round down number of queues to power of two. Why is
> > > > > > this? If
> > > It seems because of i40e queue configuration. We will check it more
> > > and see if it can be changed or improved later.
> > >
> > Thanks, as I said below when I configure reta by myself everything work as
> > expected - traffic is received on all queues, so I am curious if in some 
> > scenarios
> > my code can break.
> > 
> > > > > > I configure reta by my own using all of the queues everything
> > > > > > seams to be working. To add insult to injury I do not get any
> > > > > > errors during configuration some queues just do not receive any 
> > > > > > traffic.
> > > > > >
> > > > > > The second problem is that for some reason i40e does not use 40
> > > > > > byte toeplitz hash key like any other driver, but it expects the
> > > > > > key to be 52 bytes. And it would have being fine (if we ignore
> > > > > > the fact that it contradicts MS spec), but how my high level
> > > > > > code suppose to know
> > > > that?
> > > Actually a rss_key_len was introduced in struct rte_eth_rss_conf
> > > recently. So the length should be indicated clearly. But I found the
> > > annotations of that structure should have been reworked. I will try to 
> > > rework
> > it with clear descriptions.
> > >
> > I saw rss_key_len of course, my question is how my code suppose to know
> > what value to set it to? Why required key length is not part of a device
> > capability query (or is it and I missed it)? The only way I found to get 
> > key length
> > is to quire device for a key, and check rss_key_len. If it is zero then key 
> > is 40
> > bytes, otherwise whatever rss_key_len says. This method is more of a hack
> > then proper way to do it.
> I think it was missed. I will add it soon later.
> 
> > 
> > > > > > And again, device configuration does not fail when wrong key
> > > > > > length is provided, it just uses some other key. Guys this kind
> > > > > > of error handling is completely unacceptable.
> > > If less length of key is provided, it will not be used at all, the 
> > > default key will be
> > used.
> > > So there is no issue as you said. But we need to add more clear
> > > description for the structure of rte_eth_rss_conf.
> > >
> > What you've said above is exactly the issue! My code does not work if a key
> > used by HW is not the same as was set by application, but since I get no 
> > error
> > when my setting is ignored the is not way for me to know that my application
> > will not work (short of querying key back and comparing which is again a 
> > hack).
> > Device configuration should fail if it cannot apply my settings.
> After I checked the code, different PMD may have different implementation.
> Returning with an error might be the best way for all PMDs. I will unify it 
> later.
> 
> Really good findings and suggestions from you! Thank you very much!
> 

--
Gleb.


[dpdk-dev] [PATCH] Add toeplitz hash algorithm

2015-04-09 Thread Gleb Natapov
On Wed, Apr 08, 2015 at 03:06:13PM -0400, Vladimir Medvedkin wrote:
> Software implementation of the Toeplitz hash function used by RSS.
> Can be used either for packet distribution on single queue NIC
> or for simulating of RSS computation on specific NIC (for example
> after GRE header decapsulating).
> 
> Signed-off-by: Vladimir Medvedkin 
> ---
>  lib/librte_hash/Makefile|   1 +
>  lib/librte_hash/rte_thash.h | 179 
> 
>  2 files changed, 180 insertions(+)
>  create mode 100644 lib/librte_hash/rte_thash.h
> 
> diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
> index 3696cb1..083a9e5 100644
> --- a/lib/librte_hash/Makefile
> +++ b/lib/librte_hash/Makefile
> @@ -50,6 +50,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
>  
>  # this lib needs eal
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_HASH) += lib/librte_eal lib/librte_malloc
> diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
> new file mode 100644
> index 000..1acfa3a
> --- /dev/null
> +++ b/lib/librte_hash/rte_thash.h
> @@ -0,0 +1,179 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_THASH_H
> +#define _RTE_THASH_H
> +
> +/**
> + * @file
> + *
> + * toeplitz hash functions.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Software implementation of the Toeplitz hash function used by RSS.
> + * Can be used either for packet distribution on single queue NIC
> + * or for simulating of RSS computation on specific NIC (for example
> + * after GRE header decapsulating)
> + */
> +
> +#include 
> +#include 
> +
> +enum rte_thash_flag {
> + RTE_THASH_L3 = 0,   //calculate hash tacking into account only l3 
> header
> + RTE_THASH_L4//calculate hash tacking into account l4 + l4 
> headers
> +};
> +
> +/**
> + * Prepare special converted key to use with rte_softrss_be()
> + * @param orig
> + *   pointer to original RSS key
> + * @param targ
> + *   pointer to target RSS key
> + */
> +
> +static inline void
> +rte_convert_rss_key(uint32_t *orig, uint32_t *targ)
> +{
> + int i;
> + for (i = 0; i < 10; i++) {
> + targ[i] = rte_be_to_cpu_32(orig[i]);
> + }
> +}
> +
> +/**
> + * Generic implementation. Can be used with original rss_key
> + * All ip's and ports have to be CPU byte order.
> + * @param sip
> + *   Source ip address.
> + * @param dip
> + *   Destination ip address.
ipv4, what about ipv6? Why not define rss function that works on byte
buffer and let caller build it according to whatever fields it want to
hash?

> + * @param sp
> + *   Source TCP|UDP port.
> + * @param dp
> + *   Destination TCP|UDP port.
> + * @param flag
> + *   RTE_THASH_L3:   calculate hash tacking into account only sip and dip
> + *   RTE_THASH_L4:   calculate hash tacking into account sip, dip, sp and dp
> + * @param *rss_key
> + *   Pointer to 40-byte RSS hash key.
i40e has 52 byte RSS hash key.

> + * @return
> + *   Calculated hash value.
> + */
> 

[dpdk-dev] i40e and RSS woes

2015-03-05 Thread Gleb Natapov
On Thu, Mar 05, 2015 at 05:56:27AM +, Zhang, Helin wrote:
> Hi Gleb
> 
> Sorry for late! I am struggling on my tasks for the following DPDK release 
> these days.
> 
> > -Original Message-
> > From: Gleb Natapov [mailto:gleb at cloudius-systems.com]
> > Sent: Monday, March 2, 2015 8:56 PM
> > To: dev at dpdk.org
> > Cc: Zhang, Helin
> > Subject: Re: i40e and RSS woes
> > 
> > Ping.
> > 
> > On Thu, Feb 19, 2015 at 04:50:10PM +0200, Gleb Natapov wrote:
> > > CCing i40e driver author in a hope to get an answer.
> > >
> > > On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
> > > > I have an application that works reasonably well with ixgbe driver,
> > > > but when I try to use it with i40e I encounter various RSS related 
> > > > issues.
> > > >
> > > > First one is that for some reason i40e, when it builds default reta
> > > > table, round down number of queues to power of two. Why is this? If
> It seems because of i40e queue configuration. We will check it more and see
> if it can be changed or improved later.
> 
Thanks, as I said below when I configure reta by myself everything work
as expected - traffic is received on all queues, so I am curious if in some
scenarios my code can break.

> > > > I configure reta by my own using all of the queues everything seams
> > > > to be working. To add insult to injury I do not get any errors
> > > > during configuration some queues just do not receive any traffic.
> > > >
> > > > The second problem is that for some reason i40e does not use 40 byte
> > > > toeplitz hash key like any other driver, but it expects the key to
> > > > be 52 bytes. And it would have being fine (if we ignore the fact
> > > > that it contradicts MS spec), but how my high level code suppose to know
> > that?
> Actually a rss_key_len was introduced in struct rte_eth_rss_conf recently. So 
> the
> length should be indicated clearly. But I found the annotations of that 
> structure
> should have been reworked. I will try to rework it with clear descriptions.
> 
I saw rss_key_len of course, my question is how my code suppose to know
what value to set it to? Why required key length is not part of a device
capability query (or is it and I missed it)? The only way I found to get
key length is to quire device for a key, and check rss_key_len. If it
is zero then key is 40 bytes, otherwise whatever rss_key_len says. This
method is more of a hack then proper way to do it.

> > > > And again, device configuration does not fail when wrong key length
> > > > is provided, it just uses some other key. Guys this kind of error
> > > > handling is completely unacceptable.
> If less length of key is provided, it will not be used at all, the default 
> key will be used.
> So there is no issue as you said. But we need to add more clear description 
> for the
> structure of rte_eth_rss_conf.
> 
What you've said above is exactly the issue! My code does not work if
a key used by HW is not the same as was set by application, but since
I get no error when my setting is ignored the is not way for me to know
that my application will not work (short of querying key back and comparing
which is again a hack). Device configuration should fail if it cannot
apply my settings.

--
Gleb.


[dpdk-dev] i40e and RSS woes

2015-03-02 Thread Gleb Natapov
Ping.

On Thu, Feb 19, 2015 at 04:50:10PM +0200, Gleb Natapov wrote:
> CCing i40e driver author in a hope to get an answer.
> 
> On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
> > I have an application that works reasonably well with ixgbe driver, but
> > when I try to use it with i40e I encounter various RSS related issues.
> > 
> > First one is that for some reason i40e, when it builds default reta
> > table, round down number of queues to power of two. Why is this? If I
> > configure reta by my own using all of the queues everything seams to be
> > working. To add insult to injury I do not get any errors during
> > configuration some queues just do not receive any traffic.
> > 
> > The second problem is that for some reason i40e does not use 40 byte
> > toeplitz hash key like any other driver, but it expects the key to be 52
> > bytes. And it would have being fine (if we ignore the fact that it
> > contradicts MS spec), but how my high level code suppose to know that?
> > And again, device configuration does not fail when wrong key length is
> > provided, it just uses some other key. Guys this kind of error handling
> > is completely unacceptable.
> > 
> > The last one is more of a question. Why interface to change RSS hash
> > function (XOR or toeplitz) is part of a filter configuration and not rss
> > config?
> > 
> > --
> > Gleb.
> 
> --
>   Gleb.

--
Gleb.


[dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD

2015-03-01 Thread Gleb Natapov
On Fri, Feb 27, 2015 at 07:38:59PM +0100, Adrien Mazarguil wrote:
> On Thu, Feb 26, 2015 at 03:49:07PM +0200, Gleb Natapov wrote:
> > On Thu, Feb 26, 2015 at 02:36:27PM +0100, Thomas Monjalon wrote:
> > > 2015-02-26 13:51, Gleb Natapov:
> > > > Did git pull today. After enabling mlnx pmd compilation fails with:
> > > > 
> > > > dpdk/lib/librte_pmd_mlx4/mlx4.c: In function ?mlx4_pci_devinit?:
> > > > dpdk/lib/librte_pmd_mlx4/mlx4.c:4636:14: error: too few arguments to 
> > > > function ?rte_eth_dev_allocate?
> > > > eth_dev = rte_eth_dev_allocate(name);
> > > 
> > > Yes, thanks for reporting.
> > > I didn't test the disabled mlx4 after hotplug integration:
> > >   dpdk.org/browse/dpdk/commit/?id=9f1653e7b7e1746e7c
> > > 
> > > Clearly, I have to improve my sanity checks.
> > > Sorry for the inconvenience.
> > No problem, I fixed that locally, but now I see another issue. I have
> > several PMDs statically compiled in with my application and I expect
> > dpdk to choose correct one depending on available HW, but mlnx pmd does
> > not behave nicely, if its initialization fails it kills entire
> > application:
> > 
> > EAL: PCI device :03:00.0 on NUMA socket 0
> > EAL:   probe driver: 15b3:1003 librte_pmd_mlx4
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device :03:00.0 cannot be used
> 
> Forgot to set in-reply-to, but I just sent a patch to work around that
> issue and make mlx4 nicer:
> 
> http://dpdk.org/dev/patchwork/patch/3796/
> 
Works for me, thanks! May be better to change:
"cannot use device, are drivers up to date? To "cannot use device, are
drivers and fw up to date?". On one of my machines that was the case.

I see that some features are missing from the PMD though. Some of them
are nice to have for good performance like providing rss hash value
in mbuf (PKT_RX_RSS_HASH), but others are absolutely required for
my application to work: setting (or at least getting) rss key and
redirection table. Are there any plans to support those?

--
Gleb.


[dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD

2015-02-26 Thread Gleb Natapov
On Thu, Feb 26, 2015 at 03:18:34PM +0100, Adrien Mazarguil wrote:
> On Thu, Feb 26, 2015 at 03:49:07PM +0200, Gleb Natapov wrote:
> > On Thu, Feb 26, 2015 at 02:36:27PM +0100, Thomas Monjalon wrote:
> > > 2015-02-26 13:51, Gleb Natapov:
> > > > Did git pull today. After enabling mlnx pmd compilation fails with:
> > > > 
> > > > dpdk/lib/librte_pmd_mlx4/mlx4.c: In function ?mlx4_pci_devinit?:
> > > > dpdk/lib/librte_pmd_mlx4/mlx4.c:4636:14: error: too few arguments to 
> > > > function ?rte_eth_dev_allocate?
> > > > eth_dev = rte_eth_dev_allocate(name);
> > > 
> > > Yes, thanks for reporting.
> > > I didn't test the disabled mlx4 after hotplug integration:
> > >   dpdk.org/browse/dpdk/commit/?id=9f1653e7b7e1746e7c
> > > 
> > > Clearly, I have to improve my sanity checks.
> > > Sorry for the inconvenience.
> > No problem, I fixed that locally, but now I see another issue. I have
> > several PMDs statically compiled in with my application and I expect
> > dpdk to choose correct one depending on available HW, but mlnx pmd does
> > not behave nicely, if its initialization fails it kills entire
> > application:
> > 
> > EAL: PCI device :03:00.0 on NUMA socket 0
> > EAL:   probe driver: 15b3:1003 librte_pmd_mlx4
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device :03:00.0 cannot be used
> 
> About this error, make sure you are using the kernel modules provided by the
> mlnx-ofed-kernel package from MOFED [1] as described in the documentation.
> 
The problem is not in the error, I know how to get rid of it, but in the
error handling. My program has a fallback, so I an error to init a PMD
should not kill it, non other PMDs do.

--
Gleb.


[dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD

2015-02-26 Thread Gleb Natapov
On Thu, Feb 26, 2015 at 02:36:27PM +0100, Thomas Monjalon wrote:
> 2015-02-26 13:51, Gleb Natapov:
> > Did git pull today. After enabling mlnx pmd compilation fails with:
> > 
> > dpdk/lib/librte_pmd_mlx4/mlx4.c: In function ?mlx4_pci_devinit?:
> > dpdk/lib/librte_pmd_mlx4/mlx4.c:4636:14: error: too few arguments to 
> > function ?rte_eth_dev_allocate?
> > eth_dev = rte_eth_dev_allocate(name);
> 
> Yes, thanks for reporting.
> I didn't test the disabled mlx4 after hotplug integration:
>   dpdk.org/browse/dpdk/commit/?id=9f1653e7b7e1746e7c
> 
> Clearly, I have to improve my sanity checks.
> Sorry for the inconvenience.
No problem, I fixed that locally, but now I see another issue. I have
several PMDs statically compiled in with my application and I expect
dpdk to choose correct one depending on available HW, but mlnx pmd does
not behave nicely, if its initialization fails it kills entire
application:

EAL: PCI device :03:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1003 librte_pmd_mlx4
EAL: Error - exiting with code: 1
  Cause: Requested device :03:00.0 cannot be used

This is how other pmds handle situation when init cannot be done:
EAL: PCI device :02:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   Not managed by known pt driver, skipped
EAL: PCI device :02:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   Not managed by known pt driver, skipped

--
Gleb.


[dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD

2015-02-26 Thread Gleb Natapov
Did git pull today. After enabling mlnx pmd compilation fails with:

dpdk/lib/librte_pmd_mlx4/mlx4.c: In function ?mlx4_pci_devinit?:
dpdk/lib/librte_pmd_mlx4/mlx4.c:4636:14: error: too few arguments to function 
?rte_eth_dev_allocate?
eth_dev = rte_eth_dev_allocate(name);


On Wed, Feb 25, 2015 at 02:52:03PM +0100, Adrien Mazarguil wrote:
> This PMD adds support for Mellanox ConnectX-3-based adapters through the
> verbs framework. It relies on external libraries (libibverbs and user space
> driver libmlx4) and kernel support to do so.
> 
> While these libraries and kernel modules are available on OpenFabrics
> Alliance's website [1] and provided by package managers on most
> distributions, this PMD requires Ethernet extensions that may not be
> supported at the moment (this is a work in progress).
> 
> Mellanox OFED [2] includes the necessary support and should be used in the
> meantime. For DPDK, only libibverbs, libmlx4 and mlnx-ofed-kernel packages
> are required from that distribution.
> 
> The following kernel modules must be loaded before using this PMD:
> 
> - mlx4_core (hardware driver, does global initialization)
> - mlx4_en (Ethernet device driver)
> - mlx4_ib (InfiniBand device driver)
> - ib_uverbs (user space driver for verbs)
> 
> [1] https://www.openfabrics.org/
> [2] 
> http://www.mellanox.com/page/products_dyn?product_family=26=linux_sw_drivers
> 
> v2:
>  - Include minor bugfix for VLAN filtering.
>  - Add maintainers entry.
>  - Add documentation.
> 
> v3:
>  - Add script and documentation to MAINTAINERS.
>  - Make cosmetic changes to copyright notices.
>  - Remove unwanted executable bits.
>  - Fix coding style and typos found by checkpatch.
>  - Add shared library compilation support.
> 
> Adrien Mazarguil (3):
>   scripts: check features to generate configuration header
>   mlx4: new poll mode driver
>   doc: add librte_pmd_mlx4 documentation
> 
>  MAINTAINERS  |6 +
>  config/common_bsdapp |   11 +
>  config/common_linuxapp   |   11 +
>  doc/guides/prog_guide/index.rst  |1 +
>  doc/guides/prog_guide/mlx4_poll_mode_drv.rst |  326 ++
>  doc/guides/prog_guide/source_org.rst |1 +
>  lib/Makefile |1 +
>  lib/librte_pmd_mlx4/Makefile |  121 +
>  lib/librte_pmd_mlx4/mlx4.c   | 4749 
> ++
>  lib/librte_pmd_mlx4/mlx4.h   |  165 +
>  lib/librte_pmd_mlx4/rte_pmd_mlx4_version.map |4 +
>  mk/rte.app.mk|8 +
>  scripts/auto-config-h.sh |  136 +
>  13 files changed, 5540 insertions(+)
>  create mode 100644 doc/guides/prog_guide/mlx4_poll_mode_drv.rst
>  create mode 100644 lib/librte_pmd_mlx4/Makefile
>  create mode 100644 lib/librte_pmd_mlx4/mlx4.c
>  create mode 100644 lib/librte_pmd_mlx4/mlx4.h
>  create mode 100644 lib/librte_pmd_mlx4/rte_pmd_mlx4_version.map
>  create mode 100755 scripts/auto-config-h.sh
> 
> -- 
> 2.1.0
> 

--
Gleb.


[dpdk-dev] i40e and RSS woes

2015-02-19 Thread Gleb Natapov
CCing i40e driver author in a hope to get an answer.

On Mon, Feb 16, 2015 at 03:36:54PM +0200, Gleb Natapov wrote:
> I have an application that works reasonably well with ixgbe driver, but
> when I try to use it with i40e I encounter various RSS related issues.
> 
> First one is that for some reason i40e, when it builds default reta
> table, round down number of queues to power of two. Why is this? If I
> configure reta by my own using all of the queues everything seams to be
> working. To add insult to injury I do not get any errors during
> configuration some queues just do not receive any traffic.
> 
> The second problem is that for some reason i40e does not use 40 byte
> toeplitz hash key like any other driver, but it expects the key to be 52
> bytes. And it would have being fine (if we ignore the fact that it
> contradicts MS spec), but how my high level code suppose to know that?
> And again, device configuration does not fail when wrong key length is
> provided, it just uses some other key. Guys this kind of error handling
> is completely unacceptable.
> 
> The last one is more of a question. Why interface to change RSS hash
> function (XOR or toeplitz) is part of a filter configuration and not rss
> config?
> 
> --
>   Gleb.

--
Gleb.


[dpdk-dev] i40e and RSS woes

2015-02-16 Thread Gleb Natapov
I have an application that works reasonably well with ixgbe driver, but
when I try to use it with i40e I encounter various RSS related issues.

First one is that for some reason i40e, when it builds default reta
table, round down number of queues to power of two. Why is this? If I
configure reta by my own using all of the queues everything seams to be
working. To add insult to injury I do not get any errors during
configuration some queues just do not receive any traffic.

The second problem is that for some reason i40e does not use 40 byte
toeplitz hash key like any other driver, but it expects the key to be 52
bytes. And it would have being fine (if we ignore the fact that it
contradicts MS spec), but how my high level code suppose to know that?
And again, device configuration does not fail when wrong key length is
provided, it just uses some other key. Guys this kind of error handling
is completely unacceptable.

The last one is more of a question. Why interface to change RSS hash
function (XOR or toeplitz) is part of a filter configuration and not rss
config?

--
Gleb.