date:20161028

[dpdk-dev] [PATCH] net/qede: fix gcc compiler option checks

2016-10-28 Thread Mody, Rasesh

Hi Stephen,

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Friday, October 28, 2016 3:12 PM
> 
> On Thu, 27 Oct 2016 23:37:57 -0700
> Rasesh Mody  wrote:
> 
> > From: Rasesh Mody 
> >
> > Using GCC_VERSION to check gcc version and decide whether to include
> > that compiler option.
> >
> > Fixes: ec94dbc57362 ("qede: add base driver")
> > Fixes: ecc7a5a27ffe ("net/qede/base: fix 32-bit build")
> >
> > Signed-off-by: Rasesh Mody 
> > ---
> >  drivers/net/qede/Makefile | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/qede/Makefile b/drivers/net/qede/Makefile
> > index 39751e4..29b443d 100644
> > --- a/drivers/net/qede/Makefile
> > +++ b/drivers/net/qede/Makefile
> > @@ -46,11 +46,11 @@ endif
> >  endif
> >
> >  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > -ifeq ($(shell gcc -Wno-unused-but-set-variable -Werror -E - <
> > /dev/null > /dev/null 2>&1; echo $$?),0)
> > +ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1)
> >  CFLAGS_BASE_DRIVER += -Wno-unused-but-set-variable  endif
> > CFLAGS_BASE_DRIVER += -Wno-missing-declarations -ifeq ($(shell gcc
> > -Wno-maybe-uninitialized -Werror -E - < /dev/null > /dev/null 2>&1;
> > echo $$?),0)
> > +ifeq ($(shell test $(GCC_VERSION) -ge 46 && echo 1), 1)
> >  CFLAGS_BASE_DRIVER += -Wno-maybe-uninitialized  endif
> > CFLAGS_BASE_DRIVER += -Wno-strict-prototypes
> 
> Does this mean that less compiler checking is done or more?

With higher version of compilers more compiler checking is done, for older 
compilers less checking is done. As some of the older compiles do not have 
newly added checking capabilities. Testing with latest compilers ensures we do 
lot more checking.

Thanks!
-Rasesh

> It seems lots of drivers make the excuse:
>  "the base driver comes from another group and is known buggy but can't be
> fixed"
> That doesn't reflect well on the quality of the DPDK.

[dpdk-dev] mbuf changes

2016-10-28 Thread Morten Brørup

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, Bruce
> Sent: Friday, October 28, 2016 7:01 PM
> To: Adrien Mazarguil; Morten Br?rup
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] mbuf changes
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Friday, October 28, 2016 5:50 PM
> > To: Morten Br?rup 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> > On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Br?rup wrote:
> > > Comments at the end.
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pattan,
> > > > Reshma
> > > > Sent: Friday, October 28, 2016 3:35 PM
> > > > To: Olivier Matz
> > > > Cc: dev at dpdk.org; Morten Br?rup
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > > Hi Olivier,
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier
> > > > > Matz
> > > > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > > > To: Richardson, Bruce ; Morten
> > > > > Br?rup 
> > > > > Cc: Adrien Mazarguil ; Wiles, Keith
> > > > > ; dev at dpdk.org; Oleg Kuporosov
> > > > > 
> > > > > Subject: Re: [dpdk-dev] mbuf changes
> > > > >
> > > > >
> > > > >
> > > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> > > > > >> Comments at the end.
> > > > > >>
> > > > > >> Med venlig hilsen / kind regards
> > > > > >> - Morten Br?rup
> > > > > >>
> > > > > >>> -Original Message-
> > > > > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > > > >>> To: Morten Br?rup
> > > > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier
> > > > > >>> Matz; Oleg Kuporosov
> > > > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > > > >>>
> > > > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
> > > > >  Comments inline.
> > > > > 
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > > > Richardson
> > > > > > Sent: Tuesday, October 25, 2016 1:14 PM
> > > > > > To: Adrien Mazarguil
> > > > > > Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier
> > > > > > Matz; Oleg Kuporosov
> > > > > > Subject: Re: [dpdk-dev] mbuf changes
> > > > > >
> > > > > > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > > > wrote:
> > > > > >> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup
> > wrote:
> > > > > >>> Comments inline.
> > > > > >>>
> > > > > >>> Med venlig hilsen / kind regards
> > > > > >>> - Morten Br?rup
> > > > > >>>
> > > > > >>>
> > > > >  -Original Message-
> > > > >  From: Adrien Mazarguil
> > > > >  [mailto:adrien.mazarguil at 6wind.com]
> > > > >  Sent: Tuesday, October 25, 2016 11:39 AM
> > > > >  To: Bruce Richardson
> > > > >  Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier
> > > > >  Matz; Oleg Kuporosov
> > > > >  Subject: Re: [dpdk-dev] mbuf changes
> > > > > 
> > > > >  On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce
> > > > >  Richardson
> > > > > >>> wrote:
> > > > > > On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> > > > > >>> wrote:
> > > > >  [...]
> > > > > >>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> > > > >   wrote:
> > > > >  [...]
> > > > > 
> > > > > > One other point I'll mention is that we need to have a
> > > > > > discussion on how/where to add in a timestamp value
> > > > > > into
> > > > > >>> the
> > > > > > mbuf. Personally, I think it can be in a union with
> > > > > > the
> > > > > > sequence
> > > > > > number value, but I also suspect that 32-bits of a
> > > > > >>> timestamp
> > > > > > is not going to be enough for
> > > > >  many.
> > > > > >
> > > > > > Thoughts?
> > > > > 
> > > > >  If we consider that timestamp representation should use
> > > > > > nanosecond
> > > > >  granularity, a 32-bit value may likely wrap around too
> > > > > >>> quickly
> > > > >  to be useful. We can also assume that applications
> > > > requesting
> > > > >  timestamps may care more about latency than throughput,
> > > > >  Oleg
> > > > > > found
> > > > >  that using the second cache line for this purpose had a
> > > > > > noticeable impact [1].
> > > > > 
> > > > >   [1] http://dpdk.org/ml/archives/dev/2016-
> > > > October/049237.html
> > > > > >>>
> > > > > >>> I agree with Oleg about the latency vs. throughput
> > > > > >>> importance for
> > > > > > such

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Igor Ryzhov

On Fri, Oct 28, 2016 at 9:40 PM, Thomas Monjalon 
wrote:

> 2016-10-28 20:29, Igor Ryzhov:
> > On Fri, Oct 28, 2016 Thomas Monjalon wrote:
> > > 2016-10-28 15:51, Richardson, Bruce:
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > > > > 2016-10-28 15:31, Ferruh Yigit:
> > > > > > * Remove ethtool support ?
> > > > >
> > > > > That's the other part of KNI.
> > > > > It works only for e1000/ixgbe. That's a niche.
> > > >
> > > > Yes, it's something we need to remove, but again, we need an
> > > > alternative first.
> > > >
> > > > >
> > > > > > Still there is some interest, will keep it. But not able to
> extend it
> > > > > > to other drivers with current design.
> > > > >
> > > > > It should be removed one day.
> > > > > We must seriously think about a generic alternative.
> > > > > Either we add DPDK support in ethtool or we create a dpdk-ethtool.
> > > > > (or at least a library as the one in examples/).
> > > >
> > > > I don't view that as a great path forward. Sure, we can do our own
> > > > ethtool, but then people will look for ifconfig to work, and "ip" to
> work,
> > > > etc. I view having a kernel proxy module as the best path here as it
> is
> > > > tool agnostic on the userspace side, rather than trying to make every
> > > > tool for working with kernel netdevs also have support for dpdk
> ports.
> > >
> > > Yes that's the ultimately best solution.
> > > But:
> > > - we need some cooperation of the kernel team
> > > - ethtool manages a device (what DPDK provides) whereas iproute and
> others
> > > manage a TCP/IP stack so is out of control of DPDK.
> >
> > That's not true.
> > iproute can control a lot of things like MAC address, promiscuous, MTU,
> > etc. that cannot be controlled with ethtool.
> > Just compare net_device_ops and ethtool_ops to see the difference.
>
> Yes you're right. iproute was not a good example :)
>
> > And the question is not only about tools, it is also about how Linux
> kernel
> > works with network devices.
> > And it uses net_device_ops, not ethtool_ops.
>
> What do you mean exactly? I feel you have something in mind.
>

My main point is that if we want to control DPDK ports from Linux, it
should be done with standard utilities.
Every standard utility like iproute just uses existing Linux kernel
interfaces and kernel in its turn uses net_device_ops to control the device.

For example, you want to set MTU of the network device.
Regardless of the utility you use to do that (even if you write your own),
there are two options ? ioctl or netlink.
And regardless of the method you choose,  Linux kernel will then call
"ndo_change_mtu".

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Thomas Monjalon

2016-10-28 20:29, Igor Ryzhov:
> On Fri, Oct 28, 2016 Thomas Monjalon wrote:
> > 2016-10-28 15:51, Richardson, Bruce:
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > > > 2016-10-28 15:31, Ferruh Yigit:
> > > > > * Remove ethtool support ?
> > > >
> > > > That's the other part of KNI.
> > > > It works only for e1000/ixgbe. That's a niche.
> > >
> > > Yes, it's something we need to remove, but again, we need an
> > > alternative first.
> > >
> > > >
> > > > > Still there is some interest, will keep it. But not able to extend it
> > > > > to other drivers with current design.
> > > >
> > > > It should be removed one day.
> > > > We must seriously think about a generic alternative.
> > > > Either we add DPDK support in ethtool or we create a dpdk-ethtool.
> > > > (or at least a library as the one in examples/).
> > >
> > > I don't view that as a great path forward. Sure, we can do our own
> > > ethtool, but then people will look for ifconfig to work, and "ip" to work,
> > > etc. I view having a kernel proxy module as the best path here as it is
> > > tool agnostic on the userspace side, rather than trying to make every
> > > tool for working with kernel netdevs also have support for dpdk ports.
> >
> > Yes that's the ultimately best solution.
> > But:
> > - we need some cooperation of the kernel team
> > - ethtool manages a device (what DPDK provides) whereas iproute and others
> > manage a TCP/IP stack so is out of control of DPDK.
> 
> That's not true.
> iproute can control a lot of things like MAC address, promiscuous, MTU,
> etc. that cannot be controlled with ethtool.
> Just compare net_device_ops and ethtool_ops to see the difference.

Yes you're right. iproute was not a good example :)

> And the question is not only about tools, it is also about how Linux kernel
> works with network devices.
> And it uses net_device_ops, not ethtool_ops.

What do you mean exactly? I feel you have something in mind.

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Igor Ryzhov

On Fri, Oct 28, 2016 at 7:13 PM, Thomas Monjalon 
wrote:

> 2016-10-28 15:51, Richardson, Bruce:
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > > 2016-10-28 15:31, Ferruh Yigit:
> > > > * virtio-user + vhost-net
> > > > This can be valid alternative, removes the out of tree kernel module
> > > > need. But missing control path. Proof of concept work will be done.
> > >
> > > That's probably a smart alternative for packet injection.
> > > What do you mean exactly by "missing control path"?
> >
> > We'll have to see how it performs - which is the key gap for data path
> that KNI fills. Until we get an alternative with (nearly) equivalent
> performance, there will be demand for KNI to stick around.
> > The "control path" is the ethtool part, to get stats and do operations
> on the NIC using command-line tools.
> >
> > >
> > > > * Remove ethtool support ?
> > >
> > > That's the other part of KNI.
> > > It works only for e1000/ixgbe. That's a niche.
> >
> > Yes, it's something we need to remove, but again, we need an alternative
> first.
> >
> > >
> > > > Still there is some interest, will keep it. But not able to extend it
> > > > to other drivers with current design.
> > >
> > > It should be removed one day.
> > > We must seriously think about a generic alternative.
> > > Either we add DPDK support in ethtool or we create a dpdk-ethtool.
> > > (or at least a library as the one in examples/).
> >
> > I don't view that as a great path forward. Sure, we can do our own
> ethtool, but then people will look for ifconfig to work, and "ip" to work,
> etc. I view having a kernel proxy module as the best path here as it is
> tool agnostic on the userspace side, rather than trying to make every tool
> for working with kernel netdevs also have support for dpdk ports.
>
> Yes that's the ultimately best solution.
> But:
> - we need some cooperation of the kernel team
> - ethtool manages a device (what DPDK provides) whereas iproute and others
> manage a TCP/IP stack so is out of control of DPDK.
>

That's not true.
iproute can control a lot of things like MAC address, promiscuous, MTU,
etc. that cannot be controlled with ethtool.
Just compare net_device_ops and ethtool_ops to see the difference.

And the question is not only about tools, it is also about how Linux kernel
works with network devices.
And it uses net_device_ops, not ethtool_ops.


>
> > > Or we do nothing and wait to have more hardware like Mellanox
> supporting a
> > > kernel bifurcated driver approach.
> >
> > Given the lack of other NICs supporting that, I think it could be quite
> a wait! Also, it doesn't work for virtio ports, for pcap ports, or any
> other ports which don't have physical hardware backing them. No reason you
> shouldn't be able to pull stats from all your dpdk ethdevs, not just the
> ones with physical hardware. The same ethdev APIs work for them, so should
> the same tools.
>
> Yes, very good point.
>
> > > > *KNI PMD
> > > > Patch is in the mail list, missing comments. If it gets some
> > > > interest/comments/acks it may go in to next release.
> > >
> > > I'm not against KNI PMD but it looks strange to add more support to an
> old
> > > dying approach.
> >
> > I think the main idea here is to clean up the API - at least for the
> data path. There is no reason why we need special KNI RX/TX functions, when
> ethdev RX/TX functions could do the job. However, at a higher level, the
> more basic requirement is that whatever solution for the data-path to
> kernel from dpdk is, it needs to appear as an ethdev, and not as a special
> library with different APIs, as KNI is now.
>
> Yes I agree to unifiy (and reduce) API.
> Why this PMD is not more commented?
> KNI users should be interested to review it.
> Please dear community, we need more reviews!
>

[dpdk-dev] mbuf changes

2016-10-28 Thread Adrien Mazarguil

On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Br?rup wrote:
> Comments at the end.
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pattan, Reshma
> > Sent: Friday, October 28, 2016 3:35 PM
> > To: Olivier Matz
> > Cc: dev at dpdk.org; Morten Br?rup
> > Subject: Re: [dpdk-dev] mbuf changes
> > 
> > Hi Olivier,
> > 
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > To: Richardson, Bruce ; Morten Br?rup
> > > 
> > > Cc: Adrien Mazarguil ; Wiles, Keith
> > > ; dev at dpdk.org; Oleg Kuporosov
> > > 
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > >
> > >
> > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> > > >> Comments at the end.
> > > >>
> > > >> Med venlig hilsen / kind regards
> > > >> - Morten Br?rup
> > > >>
> > > >>> -Original Message-
> > > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > >>> To: Morten Br?rup
> > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > >>> Oleg Kuporosov
> > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>
> > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
> > >  Comments inline.
> > > 
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > Richardson
> > > > Sent: Tuesday, October 25, 2016 1:14 PM
> > > > To: Adrien Mazarguil
> > > > Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > > Oleg Kuporosov
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > wrote:
> > > >> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote:
> > > >>> Comments inline.
> > > >>>
> > > >>> Med venlig hilsen / kind regards
> > > >>> - Morten Br?rup
> > > >>>
> > > >>>
> > >  -Original Message-
> > >  From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > >  Sent: Tuesday, October 25, 2016 11:39 AM
> > >  To: Bruce Richardson
> > >  Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz;
> > >  Oleg Kuporosov
> > >  Subject: Re: [dpdk-dev] mbuf changes
> > > 
> > >  On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > > >>> wrote:
> > > > On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> > > >>> wrote:
> > >  [...]
> > > >>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> > >   wrote:
> > >  [...]
> > > 
> > > > One other point I'll mention is that we need to have a
> > > > discussion on how/where to add in a timestamp value into
> > > >>> the
> > > > mbuf. Personally, I think it can be in a union with the
> > > > sequence
> > > > number value, but I also suspect that 32-bits of a
> > > >>> timestamp
> > > > is not going to be enough for
> > >  many.
> > > >
> > > > Thoughts?
> > > 
> > >  If we consider that timestamp representation should use
> > > > nanosecond
> > >  granularity, a 32-bit value may likely wrap around too
> > > >>> quickly
> > >  to be useful. We can also assume that applications
> > requesting
> > >  timestamps may care more about latency than throughput, Oleg
> > > > found
> > >  that using the second cache line for this purpose had a
> > > > noticeable impact [1].
> > > 
> > >   [1] http://dpdk.org/ml/archives/dev/2016-
> > October/049237.html
> > > >>>
> > > >>> I agree with Oleg about the latency vs. throughput importance
> > > >>> for
> > > > such applications.
> > > >>>
> > > >>> If you need high resolution timestamps, consider them to be
> > > > generated by the NIC RX driver, possibly by the hardware itself
> > > > (http://w3new.napatech.com/features/time-precision/hardware-
> > time
> > > > - stamp), so the timestamp belongs in the first cache line. And
> > > > I am proposing that it should have the highest possible
> > > > accuracy, which makes the value hardware dependent.
> > > >>>
> > > >>> Furthermore, I am arguing that we leave it up to the
> > > >>> application
> > > >>> to
> > > > keep track of the slowly moving bits (i.e. counting whole
> > > > seconds, hours and calendar date) out of band, so we don't use
> > > > precious
> > > >>> space
> > > > in the mbuf. The application doesn't need the NIC RX driver's
> > > > fast path to capture which date (or even which second) a packet
> > > > was received. Yes, it adds complexity to the application, but
> > we
> > > > can't set aside 64 bit

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Thomas Monjalon

2016-10-28 15:51, Richardson, Bruce:
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > 2016-10-28 15:31, Ferruh Yigit:
> > > * virtio-user + vhost-net
> > > This can be valid alternative, removes the out of tree kernel module
> > > need. But missing control path. Proof of concept work will be done.
> > 
> > That's probably a smart alternative for packet injection.
> > What do you mean exactly by "missing control path"?
> 
> We'll have to see how it performs - which is the key gap for data path that 
> KNI fills. Until we get an alternative with (nearly) equivalent performance, 
> there will be demand for KNI to stick around.
> The "control path" is the ethtool part, to get stats and do operations on the 
> NIC using command-line tools.
> 
> > 
> > > * Remove ethtool support ?
> > 
> > That's the other part of KNI.
> > It works only for e1000/ixgbe. That's a niche.
> 
> Yes, it's something we need to remove, but again, we need an alternative 
> first.
> 
> > 
> > > Still there is some interest, will keep it. But not able to extend it
> > > to other drivers with current design.
> > 
> > It should be removed one day.
> > We must seriously think about a generic alternative.
> > Either we add DPDK support in ethtool or we create a dpdk-ethtool.
> > (or at least a library as the one in examples/).
> 
> I don't view that as a great path forward. Sure, we can do our own ethtool, 
> but then people will look for ifconfig to work, and "ip" to work, etc. I view 
> having a kernel proxy module as the best path here as it is tool agnostic on 
> the userspace side, rather than trying to make every tool for working with 
> kernel netdevs also have support for dpdk ports.

Yes that's the ultimately best solution.
But:
- we need some cooperation of the kernel team
- ethtool manages a device (what DPDK provides) whereas iproute and others
manage a TCP/IP stack so is out of control of DPDK.

> > Or we do nothing and wait to have more hardware like Mellanox supporting a
> > kernel bifurcated driver approach.
> 
> Given the lack of other NICs supporting that, I think it could be quite a 
> wait! Also, it doesn't work for virtio ports, for pcap ports, or any other 
> ports which don't have physical hardware backing them. No reason you 
> shouldn't be able to pull stats from all your dpdk ethdevs, not just the ones 
> with physical hardware. The same ethdev APIs work for them, so should the 
> same tools.

Yes, very good point.

> > > *KNI PMD
> > > Patch is in the mail list, missing comments. If it gets some
> > > interest/comments/acks it may go in to next release.
> > 
> > I'm not against KNI PMD but it looks strange to add more support to an old
> > dying approach.
> 
> I think the main idea here is to clean up the API - at least for the data 
> path. There is no reason why we need special KNI RX/TX functions, when ethdev 
> RX/TX functions could do the job. However, at a higher level, the more basic 
> requirement is that whatever solution for the data-path to kernel from dpdk 
> is, it needs to appear as an ethdev, and not as a special library with 
> different APIs, as KNI is now.

Yes I agree to unifiy (and reduce) API.
Why this PMD is not more commented?
KNI users should be interested to review it.
Please dear community, we need more reviews!

[dpdk-dev] [PATCH v7 00/21] Introduce SoC device/driver framework for EAL

2016-10-28 Thread Shreyansh Jain

On Friday 28 October 2016 05:56 PM, Shreyansh Jain wrote:
> Introduction:
> =
>
> This patch set is direct derivative of Jan's original series [1],[2].
>
>  - This version is based on master HEAD (ca41215)
>
>  - In this, I am merging the series [11] back. It was initially part
>of this set but I had split considering that those changes in PCI
>were good standalone as well. But, 1) not much feedback was avail-
>able and 2) this patchset is a use-case for those patches making
>it easier to review. Just like what Jan had intended in original
>series.
>
>  - SoC support is not enabled by default. It needs the 'enable-soc' toggle
>on command line. This is primarily because this patchset is still
>experimental and we would like to keep it isolated from non-SoC ops.
>Though, it does impact the ABI.

Sending v7 as patch 11/21 of v6 wasn't received by mailing list leading 
to automated build failure. No other change has been done.

-
Shreyansh

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Igor Ryzhov

Thank you, Ferruh.

As we are staying on the existing implementation, I think we can do some
improvements:
1. Implement more commands for net_device_ops.
2. Implement ethtool support the same way as net_device_ops are implemented
? send commands to application.
3. Add ability to set default MAC address for KNI interface. Now it is
random for all interfaces except those that work on igb or ixgbe.
4. Properly implement link state control feature. Now KNI interface is in
UNKNOWN state even after changing carrier flag to 1.

First two improvements are already done in KCP patches and can be easily
ported into the current code.
For the last two improvements I can send patches.

Best regards,
Igor

[dpdk-dev] [PATCH v7 21/21] eal/crypto: Support rte_soc_driver/device for cryptodev

2016-10-28 Thread Shreyansh Jain

- rte_cryptodev_driver/rte_cryptodev_dev embeds rte_soc_driver/device for
  linking SoC PMDs to crypto devices.
- Add probe and remove functions linked

Signed-off-by: Hemant Agrawal 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_cryptodev/rte_cryptodev.c   | 122 -
 lib/librte_cryptodev/rte_cryptodev.h   |   3 +
 lib/librte_cryptodev/rte_cryptodev_pmd.h   |  18 +++-
 lib/librte_cryptodev/rte_cryptodev_version.map |   2 +
 4 files changed, 140 insertions(+), 5 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 127e8d0..77ec9fe 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -422,7 +422,8 @@ rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv,

int retval;

-   cryptodrv = (struct rte_cryptodev_driver *)pci_drv;
+   cryptodrv = container_of(pci_drv, struct rte_cryptodev_driver,
+pci_drv);
if (cryptodrv == NULL)
return -ENODEV;

@@ -489,7 +490,8 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev)
if (cryptodev == NULL)
return -ENODEV;

-   cryptodrv = (const struct rte_cryptodev_driver *)pci_dev->driver;
+   cryptodrv = container_of(pci_dev->driver, struct rte_cryptodev_driver,
+pci_drv);
if (cryptodrv == NULL)
return -ENODEV;

@@ -513,6 +515,111 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev)
return 0;
 }

+
+int
+rte_cryptodev_soc_probe(struct rte_soc_driver *soc_drv,
+ struct rte_soc_device *soc_dev)
+{
+   struct rte_cryptodev_driver *cryptodrv;
+   struct rte_cryptodev *cryptodev;
+
+   char cryptodev_name[RTE_CRYPTODEV_NAME_MAX_LEN];
+
+   int retval;
+
+   cryptodrv = container_of(soc_drv, struct rte_cryptodev_driver,
+soc_drv);
+
+   rte_eal_soc_device_name(_dev->addr, cryptodev_name,
+   sizeof(cryptodev_name));
+
+   cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name,
+  rte_socket_id());
+   if (cryptodev == NULL)
+   return -ENOMEM;
+
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   cryptodev->data->dev_private =
+   rte_zmalloc_socket(
+   "cryptodev private structure",
+   cryptodrv->dev_private_size,
+   RTE_CACHE_LINE_SIZE,
+   rte_socket_id());
+
+   if (cryptodev->data->dev_private == NULL)
+   rte_panic("Cannot allocate memzone for private "
+   "device data");
+   }
+
+   cryptodev->soc_dev = soc_dev;
+   cryptodev->driver = cryptodrv;
+
+   /* init user callbacks */
+   TAILQ_INIT(&(cryptodev->link_intr_cbs));
+
+   /* Invoke PMD device initialization function */
+   retval = (*cryptodrv->cryptodev_init)(cryptodrv, cryptodev);
+   if (retval == 0)
+   return 0;
+
+   CDEV_LOG_ERR("driver %s: cryptodev_init(%s) failed\n",
+   soc_drv->driver.name,
+   soc_dev->addr.name);
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(cryptodev->data->dev_private);
+
+   cryptodev->attached = RTE_CRYPTODEV_DETACHED;
+   cryptodev_globals.nb_devs--;
+
+   return -ENXIO;
+}
+
+int
+rte_cryptodev_soc_remove(struct rte_soc_device *soc_dev)
+{
+   const struct rte_cryptodev_driver *cryptodrv;
+   struct rte_cryptodev *cryptodev;
+   char cryptodev_name[RTE_CRYPTODEV_NAME_MAX_LEN];
+   int ret;
+
+   if (soc_dev == NULL)
+   return -EINVAL;
+
+   rte_eal_soc_device_name(_dev->addr, cryptodev_name,
+   sizeof(cryptodev_name));
+
+   cryptodev = rte_cryptodev_pmd_get_named_dev(cryptodev_name);
+   if (cryptodev == NULL)
+   return -ENODEV;
+
+   cryptodrv = container_of(soc_dev->driver,
+   struct rte_cryptodev_driver, soc_drv);
+   if (cryptodrv == NULL)
+   return -ENODEV;
+
+   /* Invoke PMD device uninit function */
+   if (*cryptodrv->cryptodev_uninit) {
+   ret = (*cryptodrv->cryptodev_uninit)(cryptodrv, cryptodev);
+   if (ret)
+   return ret;
+   }
+
+   /* free crypto device */
+   rte_cryptodev_pmd_release_device(cryptodev);
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(cryptodev->data->dev_private);
+
+   cryptodev->pci_dev = NULL;
+   cryptodev->soc_dev = NULL;
+   cryptodev->driver = NULL;
+   cryptodev->data = NULL;
+
+   return 0;
+}
+
 uint16_t

[dpdk-dev] [PATCH v7 20/21] ether: introduce ethernet dev probe remove

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_ether/rte_ethdev.c | 148 +-
 lib/librte_ether/rte_ethdev.h |  31 +
 2 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 4c61246..972e916 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -325,6 +325,101 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
 }

 int
+rte_eth_dev_soc_probe(struct rte_soc_driver *soc_drv,
+ struct rte_soc_device *soc_dev)
+{
+   struct eth_driver*eth_drv;
+   struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+
+   int diag;
+
+   eth_drv = container_of(soc_drv, struct eth_driver, soc_drv);
+
+   rte_eal_soc_device_name(_dev->addr, ethdev_name,
+   sizeof(ethdev_name));
+
+   eth_dev = rte_eth_dev_allocate(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENOMEM;
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   eth_dev->data->dev_private = rte_zmalloc(
+ "ethdev private structure",
+ eth_drv->dev_private_size,
+ RTE_CACHE_LINE_SIZE);
+   if (eth_dev->data->dev_private == NULL)
+   rte_panic("Cannot allocate memzone for private port "
+ "data\n");
+   }
+   eth_dev->soc_dev = soc_dev;
+   eth_dev->driver = eth_drv;
+   eth_dev->data->rx_mbuf_alloc_failed = 0;
+
+   /* init user callbacks */
+   TAILQ_INIT(&(eth_dev->link_intr_cbs));
+
+   /*
+* Set the default MTU.
+*/
+   eth_dev->data->mtu = ETHER_MTU;
+
+   /* Invoke PMD device initialization function */
+   diag = (*eth_drv->eth_dev_init)(eth_dev);
+   if (diag == 0)
+   return 0;
+
+   RTE_PMD_DEBUG_TRACE("driver %s: eth_dev_init(%s) failed\n",
+   soc_drv->driver.name,
+   soc_dev->addr.name);
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(eth_dev->data->dev_private);
+   rte_eth_dev_release_port(eth_dev);
+   return diag;
+}
+
+int
+rte_eth_dev_soc_remove(struct rte_soc_device *soc_dev)
+{
+   const struct eth_driver *eth_drv;
+   struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+   int ret;
+
+   if (soc_dev == NULL)
+   return -EINVAL;
+
+   rte_eal_soc_device_name(_dev->addr, ethdev_name,
+   sizeof(ethdev_name));
+
+   eth_dev = rte_eth_dev_allocated(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENODEV;
+
+   eth_drv = container_of(soc_dev->driver, struct eth_driver, soc_drv);
+
+   /* Invoke PMD device uninit function */
+   if (*eth_drv->eth_dev_uninit) {
+   ret = (*eth_drv->eth_dev_uninit)(eth_dev);
+   if (ret)
+   return ret;
+   }
+
+   /* free ether device */
+   rte_eth_dev_release_port(eth_dev);
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(eth_dev->data->dev_private);
+
+   eth_dev->soc_dev = NULL;
+   eth_dev->driver = NULL;
+   eth_dev->data = NULL;
+
+   return 0;
+}
+
+
+int
 rte_eth_dev_is_valid_port(uint8_t port_id)
 {
if (port_id >= RTE_MAX_ETHPORTS ||
@@ -1557,6 +1652,7 @@ rte_eth_dev_info_get(uint8_t port_id, struct 
rte_eth_dev_info *dev_info)
RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
(*dev->dev_ops->dev_infos_get)(dev, dev_info);
dev_info->pci_dev = dev->pci_dev;
+   dev_info->soc_dev = dev->soc_dev;
dev_info->driver_name = dev->data->drv_name;
dev_info->nb_rx_queues = dev->data->nb_rx_queues;
dev_info->nb_tx_queues = dev->data->nb_tx_queues;
@@ -2535,8 +2631,15 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 static inline
 struct rte_intr_handle *eth_dev_get_intr_handle(struct rte_eth_dev *dev)
 {
-   if (dev->pci_dev)
+   if (dev->pci_dev) {
+   RTE_ASSERT(dev->soc_dev == NULL);
return >pci_dev->intr_handle;
+   }
+
+   if (dev->soc_dev) {
+   RTE_ASSERT(dev->pci_dev == NULL);
+   return >soc_dev->intr_handle;
+   }

RTE_ASSERT(0);
return NULL;
@@ -2573,6 +2676,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int 
op, void *data)
return 0;
 }

+static inline
+const char *eth_dev_get_driver_name(const struct rte_eth_dev *dev)
+{
+   if (dev->pci_dev) {
+   RTE_ASSERT(dev->soc_dev == NULL);
+   return dev->driver->pci_drv.driver.name;
+   }
+
+   if (dev->soc_dev) {
+

[dpdk-dev] [PATCH v7 19/21] ether: extract function eth_dev_get_intr_handle

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

We abstract access to the intr_handle here as we want to get
it either from the pci_dev or soc_dev.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_ether/rte_ethdev.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a1e3aaf..4c61246 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2532,6 +2532,16 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
rte_spinlock_unlock(_eth_dev_cb_lock);
 }

+static inline
+struct rte_intr_handle *eth_dev_get_intr_handle(struct rte_eth_dev *dev)
+{
+   if (dev->pci_dev)
+   return >pci_dev->intr_handle;
+
+   RTE_ASSERT(0);
+   return NULL;
+}
+
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -2544,7 +2554,7 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int 
op, void *data)
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);

dev = _eth_devices[port_id];
-   intr_handle = >pci_dev->intr_handle;
+   intr_handle = eth_dev_get_intr_handle(dev);
if (!intr_handle->intr_vec) {
RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n");
return -EPERM;
@@ -2604,7 +2614,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t 
queue_id,
return -EINVAL;
}

-   intr_handle = >pci_dev->intr_handle;
+   intr_handle = eth_dev_get_intr_handle(dev);
if (!intr_handle->intr_vec) {
RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n");
return -EPERM;
-- 
2.7.4

[dpdk-dev] [PATCH v7 18/21] ether: verify we copy info from a PCI device

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Now that different types of ethdev exist, check for presence of PCI dev
while copying out the info.
Similar would be done for SoC.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_ether/rte_ethdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 347c230..a1e3aaf 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3206,6 +3206,8 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev, struct 
rte_pci_device *pci_de
return;
}

+   RTE_VERIFY(eth_dev->pci_dev != NULL);
+
eth_dev->data->dev_flags = 0;
if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
-- 
2.7.4

[dpdk-dev] [PATCH v7 17/21] ether: utilize container_of for pci_drv

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

It is not necessary to place the rte_pci_driver at the beginning
of the rte_eth_dev struct anymore as we use the container_of macro
to get the parent pointer.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_ether/rte_ethdev.c | 4 ++--
 lib/librte_ether/rte_ethdev.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fde8112..347c230 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -241,7 +241,7 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,

int diag;

-   eth_drv = (struct eth_driver *)pci_drv;
+   eth_drv = container_of(pci_drv, struct eth_driver, pci_drv);

rte_eal_pci_device_name(_dev->addr, ethdev_name,
sizeof(ethdev_name));
@@ -302,7 +302,7 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
if (eth_dev == NULL)
return -ENODEV;

-   eth_drv = (const struct eth_driver *)pci_dev->driver;
+   eth_drv = container_of(pci_dev->driver, struct eth_driver, pci_drv);

/* Invoke PMD device uninit function */
if (*eth_drv->eth_dev_uninit) {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..f893fe0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1850,7 +1850,7 @@ typedef int (*eth_dev_uninit_t)(struct rte_eth_dev 
*eth_dev);
  * Each Ethernet driver acts as a PCI driver and is represented by a generic
  * *eth_driver* structure that holds:
  *
- * - An *rte_pci_driver* structure (which must be the first field).
+ * - An *rte_pci_driver* structure.
  *
  * - The *eth_dev_init* function invoked for each matching PCI device.
  *
-- 
2.7.4

[dpdk-dev] [PATCH v7 16/21] eal/soc: additional features for SoC

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Additional features introduced:
 - Find kernel driver through sysfs bindings
 - Dummy implementation for mapping to kernel driver
 - DMA coherency value from sysfs
 - Numa node number from sysfs
 - Support for updating device during probe if already registered

Signed-off-by: Jan Viktorin 
[Shreyansh: merge multiple patches into single set]
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/common/eal_common_soc.c  |  30 
 lib/librte_eal/common/eal_private.h |  23 ++
 lib/librte_eal/common/include/rte_soc.h |  28 +++
 lib/librte_eal/linuxapp/eal/eal_soc.c   | 129 
 4 files changed, 210 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_soc.c 
b/lib/librte_eal/common/eal_common_soc.c
index 44f5559..29c38e0 100644
--- a/lib/librte_eal/common/eal_common_soc.c
+++ b/lib/librte_eal/common/eal_common_soc.c
@@ -114,6 +114,26 @@ rte_eal_soc_probe_one_driver(struct rte_soc_driver *drv,
return ret;
}

+   if (!dev->is_dma_coherent) {
+   if (!(drv->drv_flags & RTE_SOC_DRV_ACCEPT_NONCC)) {
+   RTE_LOG(DEBUG, EAL,
+   "  device is not DMA coherent, skipping\n");
+   return 1;
+   }
+   }
+
+   if (drv->drv_flags & RTE_SOC_DRV_NEED_MAPPING) {
+   /* map resources */
+   ret = rte_eal_soc_map_device(dev);
+   if (ret)
+   return ret;
+   } else if (drv->drv_flags & RTE_SOC_DRV_FORCE_UNBIND
+   && rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   /* unbind */
+   if (soc_unbind_kernel_driver(dev) < 0)
+   return -1;
+   }
+
dev->driver = drv;
RTE_VERIFY(drv->probe != NULL);
return drv->probe(drv, dev);
@@ -166,6 +186,10 @@ rte_eal_soc_detach_dev(struct rte_soc_driver *drv,
if (drv->remove && (drv->remove(dev) < 0))
return -1;  /* negative value is an error */

+   if (drv->drv_flags & RTE_SOC_DRV_NEED_MAPPING)
+   /* unmap resources for devices */
+   rte_eal_soc_unmap_device(dev);
+
/* clear driver structure */
dev->driver = NULL;

@@ -241,6 +265,12 @@ rte_eal_soc_probe_one(const struct rte_soc_addr *addr)
if (addr == NULL)
return -1;

+   /* update current SoC device in global list, kernel bindings might have
+* changed since last time we looked at it.
+*/
+   if (soc_update_device(addr) < 0)
+   goto err_return;
+
TAILQ_FOREACH(dev, _device_list, next) {
if (rte_eal_compare_soc_addr(>addr, addr))
continue;
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index d810f9f..30c648d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -159,6 +159,29 @@ int pci_update_device(const struct rte_pci_addr *addr);
 int pci_unbind_kernel_driver(struct rte_pci_device *dev);

 /**
+ * Update a soc device object by asking the kernel for the latest information.
+ *
+ * This function is private to EAL.
+ *
+ * @param addr
+ *  The SoC address to look for
+ * @return
+ *   - 0 on success.
+ *   - negative on error.
+ */
+int soc_update_device(const struct rte_soc_addr *addr);
+
+/**
+ * Unbind kernel driver for this device
+ *
+ * This function is private to EAL.
+ *
+ * @return
+ *   0 on success, negative on error
+ */
+int soc_unbind_kernel_driver(struct rte_soc_device *dev);
+
+/**
  * Map the PCI resource of a PCI device in virtual memory
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_soc.h 
b/lib/librte_eal/common/include/rte_soc.h
index 8be3db7..d7f7ec8 100644
--- a/lib/librte_eal/common/include/rte_soc.h
+++ b/lib/librte_eal/common/include/rte_soc.h
@@ -46,9 +46,11 @@ extern "C" {

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -63,6 +65,14 @@ extern struct soc_device_list soc_device_list;
 TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC drivers in D-linked Q. */
 TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. */

+#define SOC_MAX_RESOURCE 6
+
+struct rte_soc_resource {
+   uint64_t phys_addr;
+   uint64_t len;
+   void *addr;
+};
+
 struct rte_soc_id {
union {
const char *compatible; /**< OF compatible specification */
@@ -84,8 +94,12 @@ struct rte_soc_device {
struct rte_device device;   /**< Inherit code device */
struct rte_soc_addr addr;   /**< SoC device Location */
struct rte_soc_id *id;  /**< SoC device ID list */
+   struct rte_soc_resource mem_resource[SOC_MAX_RESOURCE];
struct rte_intr_handle intr_handle; /**< Interrupt handle */
struct

[dpdk-dev] [PATCH v7 15/21] eal/soc: add default scan for Soc devices

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Default implementation which scans the sysfs platform devices hierarchy.
For each device, extract the ueven and convert into rte_soc_device.

The information populated can then be used in probe to match against
the drivers registered.

Signed-off-by: Jan Viktorin 
[Shreyansh: restructure commit to be an optional implementation]
Signed-off-by: Shreyansh Jain 

--
 v5:
 - Update rte_eal_soc_scan to rte_eal_soc_scan_platform_bus
 - Fix comments over scan and match functions
---
 lib/librte_eal/common/include/rte_soc.h |  16 +-
 lib/librte_eal/linuxapp/eal/eal_soc.c   | 315 
 2 files changed, 329 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_soc.h 
b/lib/librte_eal/common/include/rte_soc.h
index 38f897d..8be3db7 100644
--- a/lib/librte_eal/common/include/rte_soc.h
+++ b/lib/librte_eal/common/include/rte_soc.h
@@ -64,7 +64,10 @@ TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC 
drivers in D-linked Q. */
 TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. */

 struct rte_soc_id {
-   const char *compatible; /**< OF compatible specification */
+   union {
+   const char *compatible; /**< OF compatible specification */
+   char *_compatible;
+   };
uint64_t priv_data; /**< SoC Driver specific data */
 };

@@ -200,7 +203,16 @@ rte_eal_parse_soc_spec(const char *spec, struct 
rte_soc_addr *addr)
 }

 /**
- * Default function for matching the Soc driver with device. Each driver can
+ * Helper function for scanning for new SoC devices on platform bus.
+ *
+ * @return
+ * 0 on success
+ * !0 on failure to scan
+ */
+int rte_eal_soc_scan_platform_bus(void);
+
+/**
+ * Helper function for matching the Soc driver with device. Each driver can
  * either use this function or define their own soc matching function.
  * This function relies on the compatible string extracted from sysfs. But,
  * a SoC might have different way of identifying its devices. Such SoC can
diff --git a/lib/librte_eal/linuxapp/eal/eal_soc.c 
b/lib/librte_eal/linuxapp/eal/eal_soc.c
index 3929a76..d8dfe97 100644
--- a/lib/librte_eal/linuxapp/eal/eal_soc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_soc.c
@@ -48,6 +48,321 @@
 #include 
 #include 

+/** Pathname of SoC devices directory. */
+#define SYSFS_SOC_DEVICES "/sys/bus/platform/devices"
+
+static const char *
+soc_get_sysfs_path(void)
+{
+   const char *path = NULL;
+
+   path = getenv("SYSFS_SOC_DEVICES");
+   if (path == NULL)
+   return SYSFS_SOC_DEVICES;
+
+   return path;
+}
+
+static char *
+dev_read_uevent(const char *dirname)
+{
+   char filename[PATH_MAX];
+   struct stat st;
+   char *buf;
+   ssize_t total = 0;
+   int fd;
+
+   snprintf(filename, sizeof(filename), "%s/uevent", dirname);
+   fd = open(filename, O_RDONLY);
+   if (fd < 0) {
+   RTE_LOG(WARNING, EAL, "Failed to open file %s\n", filename);
+   return strdup("");
+   }
+
+   if (fstat(fd, ) < 0) {
+   RTE_LOG(ERR, EAL, "Failed to stat file %s\n", filename);
+   close(fd);
+   return NULL;
+   }
+
+   if (st.st_size == 0) {
+   close(fd);
+   return strdup("");
+   }
+
+   buf = malloc(st.st_size + 1);
+   if (buf == NULL) {
+   RTE_LOG(ERR, EAL, "Failed to alloc memory to read %s\n",
+   filename);
+   close(fd);
+   return NULL;
+   }
+
+   while (total < st.st_size) {
+   ssize_t rlen = read(fd, buf + total, st.st_size - total);
+   if (rlen < 0) {
+   if (errno == EINTR)
+   continue;
+
+   RTE_LOG(ERR, EAL, "Failed to read file %s\n", filename);
+
+   free(buf);
+   close(fd);
+   return NULL;
+   }
+   if (rlen == 0) /* EOF */
+   break;
+
+   total += rlen;
+   }
+
+   buf[total] = '\0';
+   close(fd);
+
+   return buf;
+}
+
+static const char *
+dev_uevent_find(const char *uevent, const char *key)
+{
+   const size_t keylen = strlen(key);
+   const size_t total = strlen(uevent);
+   const char *p = uevent;
+
+   /* check whether it is the first key */
+   if (!strncmp(uevent, key, keylen))
+   return uevent + keylen;
+
+   /* check 2nd key or further... */
+   do {
+   p = strstr(p, key);
+   if (p == NULL)
+   break;
+
+   if (p[-1] == '\n') /* check we are at a new line */
+   return p + keylen;
+
+   p += keylen; /* skip this one */
+   } while (p - uevent < (ptrdiff_t) total);
+
+   return NULL;
+}
+
+static char *
+strdup_until_nl(const

[dpdk-dev] [PATCH v7 14/21] eal/soc: add intr_handle

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_eal/common/include/rte_soc.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_soc.h 
b/lib/librte_eal/common/include/rte_soc.h
index 40490b9..38f897d 100644
--- a/lib/librte_eal/common/include/rte_soc.h
+++ b/lib/librte_eal/common/include/rte_soc.h
@@ -53,6 +53,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 

 extern struct soc_driver_list soc_driver_list;
 /**< Global list of SoC Drivers */
@@ -80,6 +81,7 @@ struct rte_soc_device {
struct rte_device device;   /**< Inherit code device */
struct rte_soc_addr addr;   /**< SoC device Location */
struct rte_soc_id *id;  /**< SoC device ID list */
+   struct rte_intr_handle intr_handle; /**< Interrupt handle */
struct rte_soc_driver *driver;  /**< Associated driver */
 };

-- 
2.7.4

[dpdk-dev] [PATCH v7 13/21] eal/soc: add drv_flags

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

The flags are copied from the PCI ones. They should be refactorized into a
general set of flags in the future.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_eal/common/include/rte_soc.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_soc.h 
b/lib/librte_eal/common/include/rte_soc.h
index fb5ea7b..40490b9 100644
--- a/lib/librte_eal/common/include/rte_soc.h
+++ b/lib/librte_eal/common/include/rte_soc.h
@@ -123,8 +123,18 @@ struct rte_soc_driver {
soc_scan_t *scan_fn;   /**< Callback for scanning SoC bus*/
soc_match_t *match_fn; /**< Callback to match dev<->drv */
const struct rte_soc_id *id_table; /**< ID table, NULL terminated */
+   uint32_t drv_flags;/**< Control handling of device */
 };

+/** Device needs to map its resources by EAL */
+#define RTE_SOC_DRV_NEED_MAPPING 0x0001
+/** Device needs to be unbound even if no module is provieded */
+#define RTE_SOC_DRV_FORCE_UNBIND 0x0004
+/** Device driver supports link state interrupt */
+#define RTE_SOC_DRV_INTR_LSC0x0008
+/** Device driver supports detaching capability */
+#define RTE_SOC_DRV_DETACHABLE  0x0010
+
 /**
  * Utility function to write a SoC device name, this device name can later be
  * used to retrieve the corresponding rte_soc_addr using above functions.
-- 
2.7.4

[dpdk-dev] [PATCH v7 12/21] eal/soc: extend and utilize devargs

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

It is assumed that SoC Devices provided on command line are prefixed with
"soc:". This patch adds parse and attach support for such devices.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_eal/common/eal_common_dev.c  | 27 +
 lib/librte_eal/common/eal_common_devargs.c  | 17 
 lib/librte_eal/common/eal_common_soc.c  | 61 -
 lib/librte_eal/common/include/rte_devargs.h |  8 
 lib/librte_eal/common/include/rte_soc.h | 24 
 5 files changed, 120 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 457d227..ebbcf47 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -107,17 +107,23 @@ rte_eal_dev_init(void)

 int rte_eal_dev_attach(const char *name, const char *devargs)
 {
-   struct rte_pci_addr addr;
+   struct rte_soc_addr soc_addr;
+   struct rte_pci_addr pci_addr;

if (name == NULL || devargs == NULL) {
RTE_LOG(ERR, EAL, "Invalid device or arguments provided\n");
return -EINVAL;
}

-   if (eal_parse_pci_DomBDF(name, ) == 0) {
-   if (rte_eal_pci_probe_one() < 0)
+   memset(_addr, 0, sizeof(soc_addr));
+   if (rte_eal_parse_soc_spec(name, _addr) == 0) {
+   if (rte_eal_soc_probe_one(_addr) < 0) {
+   free(soc_addr.name);
+   goto err;
+   }
+   } else if (eal_parse_pci_DomBDF(name, _addr) == 0) {
+   if (rte_eal_pci_probe_one(_addr) < 0)
goto err;
-
} else {
if (rte_eal_vdev_init(name, devargs))
goto err;
@@ -132,15 +138,22 @@ err:

 int rte_eal_dev_detach(const char *name)
 {
-   struct rte_pci_addr addr;
+   struct rte_soc_addr soc_addr;
+   struct rte_pci_addr pci_addr;

if (name == NULL) {
RTE_LOG(ERR, EAL, "Invalid device provided.\n");
return -EINVAL;
}

-   if (eal_parse_pci_DomBDF(name, ) == 0) {
-   if (rte_eal_pci_detach() < 0)
+   memset(_addr, 0, sizeof(soc_addr));
+   if (rte_eal_parse_soc_spec(name, _addr) == 0) {
+   if (rte_eal_soc_detach(_addr) < 0) {
+   free(soc_addr.name);
+   goto err;
+   }
+   } else if (eal_parse_pci_DomBDF(name, _addr) == 0) {
+   if (rte_eal_pci_detach(_addr) < 0)
goto err;
} else {
if (rte_eal_vdev_uninit(name))
diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index e403717..e1dae1a 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -41,6 +41,7 @@
 #include 

 #include 
+#include 
 #include 
 #include "eal_private.h"

@@ -105,6 +106,14 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
goto fail;

break;
+
+   case RTE_DEVTYPE_WHITELISTED_SOC:
+   case RTE_DEVTYPE_BLACKLISTED_SOC:
+   /* try to parse soc device with prefix "soc:" */
+   if (rte_eal_parse_soc_spec(buf, >soc.addr) != 0)
+   goto fail;
+   break;
+
case RTE_DEVTYPE_VIRTUAL:
/* save driver name */
ret = snprintf(devargs->virt.drv_name,
@@ -166,6 +175,14 @@ rte_eal_devargs_dump(FILE *f)
   devargs->pci.addr.devid,
   devargs->pci.addr.function,
   devargs->args);
+   else if (devargs->type == RTE_DEVTYPE_WHITELISTED_SOC)
+   fprintf(f, "  SoC whitelist %s %s\n",
+  devargs->soc.addr.name,
+  devargs->soc.addr.fdt_path);
+   else if (devargs->type == RTE_DEVTYPE_BLACKLISTED_SOC)
+   fprintf(f, "  SoC blacklist %s %s\n",
+  devargs->soc.addr.name,
+  devargs->soc.addr.fdt_path);
else if (devargs->type == RTE_DEVTYPE_VIRTUAL)
fprintf(f, "  VIRTUAL %s %s\n",
   devargs->virt.drv_name,
diff --git a/lib/librte_eal/common/eal_common_soc.c 
b/lib/librte_eal/common/eal_common_soc.c
index 256cef8..44f5559 100644
--- a/lib/librte_eal/common/eal_common_soc.c
+++ b/lib/librte_eal/common/eal_common_soc.c
@@ -37,6 +37,8 @@

 #include 
 #include 
+#include 
+#include 
 #include 

 #include "eal_private.h"
@@ -70,6 +72,21 @@ rte_eal_soc_match_compat(struct rte_soc_driver *drv,
return 1;
 }

+static struct rte_devargs *soc_devargs_lookup(struct rte_soc_device *dev)
+{
+

[dpdk-dev] [PATCH v7 11/21] eal/soc: implement probing of drivers

2016-10-28 Thread Shreyansh Jain

Each SoC PMD registers a set of callback for scanning its own bus/infra and
matching devices to drivers when probe is called.
This patch introduces the infra for calls to SoC scan on rte_eal_soc_init()
and match on rte_eal_soc_probe().

Patch also adds test case for scan and probe.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
--
v4:
 - Update test_soc for descriptive test function names
 - Comments over test functions
 - devinit and devuninint --> probe/remove
 - RTE_VERIFY at some places
---
 app/test/test_soc.c | 205 ++-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   4 +
 lib/librte_eal/common/eal_common_soc.c  | 213 +++-
 lib/librte_eal/common/include/rte_soc.h |  75 -
 lib/librte_eal/linuxapp/eal/eal.c   |   5 +
 lib/librte_eal/linuxapp/eal/eal_soc.c   |  21 ++-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   4 +
 7 files changed, 519 insertions(+), 8 deletions(-)

diff --git a/app/test/test_soc.c b/app/test/test_soc.c
index ac03e64..b587d5e 100644
--- a/app/test/test_soc.c
+++ b/app/test/test_soc.c
@@ -87,14 +87,65 @@ static int test_compare_addr(void)
  */
 struct test_wrapper {
struct rte_soc_driver soc_drv;
+   struct rte_soc_device soc_dev;
 };

+static int empty_pmd0_probe(struct rte_soc_driver *drv,
+ struct rte_soc_device *dev);
+static int empty_pmd0_remove(struct rte_soc_device *dev);
+
+static void always_find_dev0_cb(void);
+static int match_dev0_by_name(struct rte_soc_driver *drv,
+ struct rte_soc_device *dev);
+
+static void always_find_dev1_cb(void);
+static int match_dev1_by_name(struct rte_soc_driver *drv,
+ struct rte_soc_device *dev);
+
+/**
+ * Dummy probe handler for PMD driver 'pmd0'.
+ *
+ * @param drv
+ * driver object
+ * @param dev
+ * device object
+ * @return
+ * 0 on success
+ */
+static int
+empty_pmd0_probe(struct rte_soc_driver *drv __rte_unused,
+  struct rte_soc_device *dev __rte_unused)
+{
+   return 0;
+}
+
+/**
+ * Remove handler for PMD driver 'pmd0'.
+ *
+ * @param dev
+ * device to remove
+ * @return
+ * 0 on success
+ */
+static int
+empty_pmd0_remove(struct rte_soc_device *dev)
+{
+   /* Release the memory associated with dev->addr.name */
+   free(dev->addr.name);
+
+   return 0;
+}
+
 struct test_wrapper empty_pmd0 = {
.soc_drv = {
.driver = {
.name = "empty_pmd0"
},
-   },
+   .probe = empty_pmd0_probe,
+   .remove = empty_pmd0_remove,
+   .scan_fn = always_find_dev0_cb,
+   .match_fn = match_dev0_by_name,
+   }
 };

 struct test_wrapper empty_pmd1 = {
@@ -102,9 +153,87 @@ struct test_wrapper empty_pmd1 = {
.driver = {
.name = "empty_pmd1"
},
+   .scan_fn = always_find_dev1_cb,
+   .match_fn = match_dev1_by_name,
},
 };

+/**
+ * Bus scan by PMD 'pmd0' for adding device 'dev0'
+ *
+ * @param void
+ * @return void
+ */
+static void
+always_find_dev0_cb(void)
+{
+   /* SoC's scan would scan devices on its bus and add to
+* soc_device_list
+*/
+   empty_pmd0.soc_dev.addr.name = strdup("empty_pmd0_dev");
+
+   TAILQ_INSERT_TAIL(_device_list, _pmd0.soc_dev, next);
+}
+
+/**
+ * Match device 'dev0' with driver PMD pmd0
+ *
+ * @param drv
+ * Driver with this matching needs to be done; unused here
+ * @param dev
+ * device to be matched against driver
+ * @return
+ * 0 on successful matched
+ * 1 if driver<=>device don't match
+ */
+static int
+match_dev0_by_name(struct rte_soc_driver *drv __rte_unused,
+  struct rte_soc_device *dev)
+{
+   if (!dev->addr.name || strcmp(dev->addr.name, "empty_pmd0_dev"))
+   return 0;
+
+   return 1;
+}
+
+/**
+ * Bus scan by PMD 'pmd0' for adding device 'dev1'
+ *
+ * @param void
+ * @return void
+ */
+static void
+always_find_dev1_cb(void)
+{
+   /* SoC's scan would scan devices on its bus and add to
+* soc_device_list
+*/
+   empty_pmd0.soc_dev.addr.name = strdup("empty_pmd1_dev");
+
+   TAILQ_INSERT_TAIL(_device_list, _pmd1.soc_dev, next);
+}
+
+/**
+ * Match device 'dev1' with driver PMD pmd0
+ *
+ * @param drv
+ * Driver with this matching needs to be done; unused here
+ * @param dev
+ * device to be matched against driver
+ * @return
+ * 0 on successful matched
+ * 1 if driver<=>device don't match
+ */
+static int
+match_dev1_by_name(struct rte_soc_driver *drv __rte_unused,
+  struct rte_soc_device *dev)
+{
+   if (!dev->addr.name || strcmp(dev->addr.name, "empty_pmd1_dev"))
+   return 0;
+
+   return 1;
+}
+
 static int

[dpdk-dev] [PATCH v7 10/21] eal/soc: init SoC infra from EAL

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_eal/bsdapp/eal/Makefile|  1 +
 lib/librte_eal/bsdapp/eal/eal.c   |  4 +++
 lib/librte_eal/bsdapp/eal/eal_soc.c   | 46 
 lib/librte_eal/common/eal_private.h   | 10 +++
 lib/librte_eal/linuxapp/eal/Makefile  |  1 +
 lib/librte_eal/linuxapp/eal/eal.c |  3 ++
 lib/librte_eal/linuxapp/eal/eal_soc.c | 56 +++
 7 files changed, 121 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_soc.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_soc.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index a15b762..42b3a2b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -56,6 +56,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_memory.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_hugepage_info.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_soc.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 9b93da3..2d62b9d 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -564,6 +565,9 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_init() < 0)
rte_panic("Cannot init PCI\n");

+   if (rte_eal_soc_init() < 0)
+   rte_panic("Cannot init SoC\n");
+
eal_check_mem_on_local_socket();

if (eal_plugins_init() < 0)
diff --git a/lib/librte_eal/bsdapp/eal/eal_soc.c 
b/lib/librte_eal/bsdapp/eal/eal_soc.c
new file mode 100644
index 000..cb297ff
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_soc.c
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 RehiveTech. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of RehiveTech nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/* Init the SoC EAL subsystem */
+int
+rte_eal_soc_init(void)
+{
+   return 0;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 0e8d6f7..d810f9f 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -122,6 +122,16 @@ int rte_eal_pci_init(void);
 struct rte_soc_driver;
 struct rte_soc_device;

+/**
+ * Init the SoC infra.
+ *
+ * This function is private to EAL.
+ *
+ * @return
+ *   0 on success, negative on error
+ */
+int rte_eal_soc_init(void);
+
 struct rte_pci_driver;
 struct rte_pci_device;

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index a520477..59e30fa 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -65,6 +65,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_soc.c

[dpdk-dev] [PATCH v7 09/21] eal: introduce command line enable SoC option

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Support --enable-soc. SoC support is disabled by default.

Signed-off-by: Jan Viktorin 
[Shreyansh: Change --no-soc to --enable-soc; disabled by default]
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 doc/guides/testpmd_app_ug/run_app.rst  | 4 
 lib/librte_eal/common/eal_common_options.c | 5 +
 lib/librte_eal/common/eal_internal_cfg.h   | 1 +
 lib/librte_eal/common/eal_options.h| 2 ++
 4 files changed, 12 insertions(+)

diff --git a/doc/guides/testpmd_app_ug/run_app.rst 
b/doc/guides/testpmd_app_ug/run_app.rst
index d7c5120..4dafe5f 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -156,6 +156,10 @@ See the DPDK Getting Started Guides for more information 
on these options.

 Use malloc instead of hugetlbfs.

+*   ``--enable-soc``
+
+Enable SoC framework support
+

 Testpmd Command-line Options
 
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 6ca8af1..2156ab3 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -75,6 +75,7 @@ const struct option
 eal_long_options[] = {
{OPT_BASE_VIRTADDR, 1, NULL, OPT_BASE_VIRTADDR_NUM},
{OPT_CREATE_UIO_DEV,0, NULL, OPT_CREATE_UIO_DEV_NUM   },
+   {OPT_ENABLE_SOC,0, NULL, OPT_ENABLE_SOC_NUM   },
{OPT_FILE_PREFIX,   1, NULL, OPT_FILE_PREFIX_NUM  },
{OPT_HELP,  0, NULL, OPT_HELP_NUM },
{OPT_HUGE_DIR,  1, NULL, OPT_HUGE_DIR_NUM },
@@ -843,6 +844,10 @@ eal_parse_common_option(int opt, const char *optarg,
break;

/* long options */
+   case OPT_ENABLE_SOC_NUM:
+   conf->enable_soc = 1;
+   break;
+
case OPT_HUGE_UNLINK_NUM:
conf->hugepage_unlink = 1;
break;
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index 5f1367e..2a6e3ea 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -67,6 +67,7 @@ struct internal_config {
unsigned hugepage_unlink; /**< true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen 
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
+   volatile unsigned enable_soc; /**< true to enable SoC */
volatile unsigned no_hpet;/**< true to disable HPET */
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping

* instead of native TSC */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index a881c62..6e679c3 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -49,6 +49,8 @@ enum {
OPT_BASE_VIRTADDR_NUM,
 #define OPT_CREATE_UIO_DEV"create-uio-dev"
OPT_CREATE_UIO_DEV_NUM,
+#define OPT_ENABLE_SOC"enable-soc"
+   OPT_ENABLE_SOC_NUM,
 #define OPT_FILE_PREFIX   "file-prefix"
OPT_FILE_PREFIX_NUM,
 #define OPT_HUGE_DIR  "huge-dir"
-- 
2.7.4

[dpdk-dev] [PATCH v7 08/21] eal/soc: implement SoC device list and dump

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

SoC devices would be linked in a separate list (from PCI). This is used for
probe function.
A helper for dumping the device list is added.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 ++
 lib/librte_eal/common/eal_common_soc.c  | 34 +
 lib/librte_eal/common/include/rte_soc.h |  9 +++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
 4 files changed, 47 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index cf6fb8e..86e3cfd 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -171,11 +171,13 @@ DPDK_16.11 {
rte_eal_dev_attach;
rte_eal_dev_detach;
rte_eal_map_resource;
+   rte_eal_soc_dump;
rte_eal_soc_register;
rte_eal_soc_unregister;
rte_eal_unmap_resource;
rte_eal_vdrv_register;
rte_eal_vdrv_unregister;
+   soc_device_list;
soc_driver_list;

 } DPDK_16.07;
diff --git a/lib/librte_eal/common/eal_common_soc.c 
b/lib/librte_eal/common/eal_common_soc.c
index 56135ed..5dcddc5 100644
--- a/lib/librte_eal/common/eal_common_soc.c
+++ b/lib/librte_eal/common/eal_common_soc.c
@@ -31,6 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
+#include 
 #include 

 #include 
@@ -40,6 +42,38 @@
 /* Global SoC driver list */
 struct soc_driver_list soc_driver_list =
TAILQ_HEAD_INITIALIZER(soc_driver_list);
+struct soc_device_list soc_device_list =
+   TAILQ_HEAD_INITIALIZER(soc_device_list);
+
+/* dump one device */
+static int
+soc_dump_one_device(FILE *f, struct rte_soc_device *dev)
+{
+   int i;
+
+   fprintf(f, "%s", dev->addr.name);
+   fprintf(f, " - fdt_path: %s\n",
+   dev->addr.fdt_path ? dev->addr.fdt_path : "(none)");
+
+   for (i = 0; dev->id && dev->id[i].compatible; ++i)
+   fprintf(f, "   %s\n", dev->id[i].compatible);
+
+   return 0;
+}
+
+/* dump devices on the bus to an output stream */
+void
+rte_eal_soc_dump(FILE *f)
+{
+   struct rte_soc_device *dev = NULL;
+
+   if (!f)
+   return;
+
+   TAILQ_FOREACH(dev, _device_list, next) {
+   soc_dump_one_device(f, dev);
+   }
+}

 /* register a driver */
 void
diff --git a/lib/librte_eal/common/include/rte_soc.h 
b/lib/librte_eal/common/include/rte_soc.h
index 23b06a9..347e611 100644
--- a/lib/librte_eal/common/include/rte_soc.h
+++ b/lib/librte_eal/common/include/rte_soc.h
@@ -56,8 +56,12 @@ extern "C" {

 extern struct soc_driver_list soc_driver_list;
 /**< Global list of SoC Drivers */
+extern struct soc_device_list soc_device_list;
+/**< Global list of SoC Devices */

 TAILQ_HEAD(soc_driver_list, rte_soc_driver); /**< SoC drivers in D-linked Q. */
+TAILQ_HEAD(soc_device_list, rte_soc_device); /**< SoC devices in D-linked Q. */
+

 struct rte_soc_id {
const char *compatible; /**< OF compatible specification */
@@ -142,6 +146,11 @@ rte_eal_compare_soc_addr(const struct rte_soc_addr *a0,
 }

 /**
+ * Dump discovered SoC devices.
+ */
+void rte_eal_soc_dump(FILE *f);
+
+/**
  * Register a SoC driver.
  */
 void rte_eal_soc_register(struct rte_soc_driver *driver);
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index ab6b985..0155025 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -175,11 +175,13 @@ DPDK_16.11 {
rte_eal_dev_attach;
rte_eal_dev_detach;
rte_eal_map_resource;
+   rte_eal_soc_dump;
rte_eal_soc_register;
rte_eal_soc_unregister;
rte_eal_unmap_resource;
rte_eal_vdrv_register;
rte_eal_vdrv_unregister;
+   soc_device_list;
soc_driver_list;

 } DPDK_16.07;
-- 
2.7.4

[dpdk-dev] [PATCH v7 07/21] eal/soc: add SoC PMD register/unregister logic

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Registeration of a SoC driver through a helper RTE_PMD_REGISTER_SOC
(on the lines of RTE_PMD_REGISTER_PCI). soc_driver_list stores all the
registered drivers.

Test case has been introduced to verify the registration and
deregistration.

Signed-off-by: Jan Viktorin 
[Shreyansh: update PMD registration method]
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 app/test/test_soc.c | 111 
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   3 +
 lib/librte_eal/common/eal_common_soc.c  |  56 
 lib/librte_eal/common/include/rte_soc.h |  26 ++
 lib/librte_eal/linuxapp/eal/Makefile|   1 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   3 +
 6 files changed, 200 insertions(+)
 create mode 100644 lib/librte_eal/common/eal_common_soc.c

diff --git a/app/test/test_soc.c b/app/test/test_soc.c
index 916a863..ac03e64 100644
--- a/app/test/test_soc.c
+++ b/app/test/test_soc.c
@@ -75,6 +75,108 @@ static int test_compare_addr(void)
free(a2.name);
free(a1.name);
free(a0.name);
+
+   return 0;
+}
+
+/**
+ * Empty PMD driver based on the SoC infra.
+ *
+ * The rte_soc_device is usually wrapped in some higher-level struct
+ * (eth_driver). We simulate such a wrapper with an anonymous struct here.
+ */
+struct test_wrapper {
+   struct rte_soc_driver soc_drv;
+};
+
+struct test_wrapper empty_pmd0 = {
+   .soc_drv = {
+   .driver = {
+   .name = "empty_pmd0"
+   },
+   },
+};
+
+struct test_wrapper empty_pmd1 = {
+   .soc_drv = {
+   .driver = {
+   .name = "empty_pmd1"
+   },
+   },
+};
+
+static int
+count_registered_socdrvs(void)
+{
+   int i;
+   struct rte_soc_driver *drv;
+
+   i = 0;
+   TAILQ_FOREACH(drv, _driver_list, next)
+   i += 1;
+
+   return i;
+}
+
+static int
+test_register_unregister(void)
+{
+   struct rte_soc_driver *drv;
+   int count;
+
+   rte_eal_soc_register(_pmd0.soc_drv);
+
+   TEST_ASSERT(!TAILQ_EMPTY(_driver_list),
+   "No PMD is present but the empty_pmd0 should be there");
+   drv = TAILQ_FIRST(_driver_list);
+   TEST_ASSERT(!strcmp(drv->driver.name, "empty_pmd0"),
+   "The registered PMD is not empty_pmd0 but '%s'",
+   drv->driver.name);
+
+   rte_eal_soc_register(_pmd1.soc_drv);
+
+   count = count_registered_socdrvs();
+   TEST_ASSERT_EQUAL(count, 2, "Expected 2 PMDs but detected %d", count);
+
+   rte_eal_soc_unregister(_pmd0.soc_drv);
+   count = count_registered_socdrvs();
+   TEST_ASSERT_EQUAL(count, 1, "Expected 1 PMDs but detected %d", count);
+
+   rte_eal_soc_unregister(_pmd1.soc_drv);
+
+   printf("%s has been successful\n", __func__);
+   return 0;
+}
+
+/* save real devices and drivers until the tests finishes */
+struct soc_driver_list real_soc_driver_list =
+   TAILQ_HEAD_INITIALIZER(real_soc_driver_list);
+
+static int test_soc_setup(void)
+{
+   struct rte_soc_driver *drv;
+
+   /* no real drivers for the test */
+   while (!TAILQ_EMPTY(_driver_list)) {
+   drv = TAILQ_FIRST(_driver_list);
+   rte_eal_soc_unregister(drv);
+   TAILQ_INSERT_TAIL(_soc_driver_list, drv, next);
+   }
+
+   return 0;
+}
+
+static int test_soc_cleanup(void)
+{
+   struct rte_soc_driver *drv;
+
+   /* bring back real drivers after the test */
+   while (!TAILQ_EMPTY(_soc_driver_list)) {
+   drv = TAILQ_FIRST(_soc_driver_list);
+   TAILQ_REMOVE(_soc_driver_list, drv, next);
+   rte_eal_soc_register(drv);
+   }
+
return 0;
 }

@@ -84,6 +186,15 @@ test_soc(void)
if (test_compare_addr())
return -1;

+   if (test_soc_setup())
+   return -1;
+
+   if (test_register_unregister())
+   return -1;
+
+   if (test_soc_cleanup())
+   return -1;
+
return 0;
 }

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 11d9f59..cf6fb8e 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -171,8 +171,11 @@ DPDK_16.11 {
rte_eal_dev_attach;
rte_eal_dev_detach;
rte_eal_map_resource;
+   rte_eal_soc_register;
+   rte_eal_soc_unregister;
rte_eal_unmap_resource;
rte_eal_vdrv_register;
rte_eal_vdrv_unregister;
+   soc_driver_list;

 } DPDK_16.07;
diff --git a/lib/librte_eal/common/eal_common_soc.c 
b/lib/librte_eal/common/eal_common_soc.c
new file mode 100644
index 000..56135ed
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_soc.c
@@ -0,0 +1,56 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 RehiveTech. All rights

[dpdk-dev] [PATCH v7 06/21] eal/soc: introduce very essential SoC infra definitions

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Define initial structures and functions for the SoC infrastructure.
This patch supports only a very minimal functions for now.
More features will be added in the following commits.

Includes rte_device/rte_driver inheritance of
rte_soc_device/rte_soc_driver.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 
---
 app/test/Makefile   |   1 +
 app/test/test_soc.c |  90 +
 lib/librte_eal/common/Makefile  |   2 +-
 lib/librte_eal/common/eal_private.h |   4 +
 lib/librte_eal/common/include/rte_soc.h | 138 
 5 files changed, 234 insertions(+), 1 deletion(-)
 create mode 100644 app/test/test_soc.c
 create mode 100644 lib/librte_eal/common/include/rte_soc.h

diff --git a/app/test/Makefile b/app/test/Makefile
index 5be023a..30295af 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,6 +77,7 @@ APP = test
 #
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := commands.c
 SRCS-y += test.c
+SRCS-y += test_soc.c
 SRCS-y += resource.c
 SRCS-y += test_resource.c
 test_resource.res: test_resource.c
diff --git a/app/test/test_soc.c b/app/test/test_soc.c
new file mode 100644
index 000..916a863
--- /dev/null
+++ b/app/test/test_soc.c
@@ -0,0 +1,90 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 RehiveTech. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of RehiveTech nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+static char *safe_strdup(const char *s)
+{
+   char *c = strdup(s);
+
+   if (c == NULL)
+   rte_panic("failed to strdup '%s'\n", s);
+
+   return c;
+}
+
+static int test_compare_addr(void)
+{
+   struct rte_soc_addr a0;
+   struct rte_soc_addr a1;
+   struct rte_soc_addr a2;
+
+   a0.name = safe_strdup("ethernet0");
+   a0.fdt_path = NULL;
+
+   a1.name = safe_strdup("ethernet0");
+   a1.fdt_path = NULL;
+
+   a2.name = safe_strdup("ethernet1");
+   a2.fdt_path = NULL;
+
+   TEST_ASSERT(!rte_eal_compare_soc_addr(, ),
+   "Failed to compare two soc addresses that equal");
+   TEST_ASSERT(rte_eal_compare_soc_addr(, ),
+   "Failed to compare two soc addresses that differs");
+
+   free(a2.name);
+   free(a1.name);
+   free(a0.name);
+   return 0;
+}
+
+static int
+test_soc(void)
+{
+   if (test_compare_addr())
+   return -1;
+
+   return 0;
+}
+
+REGISTER_TEST_COMMAND(soc_autotest, test_soc);
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index dfd64aa..b414008 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk

 INC := rte_branch_prediction.h rte_common.h
 INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
-INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
+INC += rte_log.h rte_memory.h rte_memzone.h rte_soc.h rte_pci.h
 INC += rte_per_lcore.h rte_random.h
 INC += rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_version.h
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index c8c2131..0e8d6f7 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -36,6

[dpdk-dev] [PATCH v7 05/21] eal: define container macro

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/common/include/rte_common.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_common.h 
b/lib/librte_eal/common/include/rte_common.h
index db5ac91..8152bd9 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -331,6 +331,24 @@ rte_bsf32(uint32_t v)
 #define offsetof(TYPE, MEMBER)  __builtin_offsetof (TYPE, MEMBER)
 #endif

+/**
+ * Return pointer to the wrapping struct instance.
+ * Example:
+ *
+ *  struct wrapper {
+ *  ...
+ *  struct child c;
+ *  ...
+ *  };
+ *
+ *  struct child *x = obtain(...);
+ *  struct wrapper *w = container_of(x, struct wrapper, c);
+ */
+#ifndef container_of
+#define container_of(p, type, member) \
+   ((type *) (((char *) (p)) - offsetof(type, member)))
+#endif
+
 #define _RTE_STR(x) #x
 /** Take a macro value and get a string version of it */
 #define RTE_STR(x) _RTE_STR(x)
-- 
2.7.4

[dpdk-dev] [PATCH v7 04/21] eal/linux: generalize PCI kernel driver extraction to EAL

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Generalize the PCI-specific pci_get_kernel_driver_by_path. The function
is general enough, we have just moved it to eal.c, changed the prefix to
rte_eal and provided it privately to other parts of EAL.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/bsdapp/eal/eal.c   |  7 +++
 lib/librte_eal/common/eal_private.h   | 14 ++
 lib/librte_eal/linuxapp/eal/eal.c | 29 +
 lib/librte_eal/linuxapp/eal/eal_pci.c | 31 +--
 4 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5271fc2..9b93da3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -640,3 +640,10 @@ rte_eal_unbind_kernel_driver(const char *devpath 
__rte_unused,
 {
return -ENOTSUP;
 }
+
+int
+rte_eal_get_kernel_driver_by_path(const char *filename __rte_unused,
+ char *dri_name __rte_unused)
+{
+   return -ENOTSUP;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index b0c208a..c8c2131 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -269,6 +269,20 @@ int rte_eal_check_module(const char *module_name);
 int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid);

 /**
+ * Extract the kernel driver name from the absolute path to the driver.
+ *
+ * @param filename  path to the driver ("/driver")
+ * @path  dri_name  target buffer where to place the driver name
+ *  (should be at least PATH_MAX long)
+ *
+ * @return
+ *  -1   on failure
+ *   0   when successful
+ *   1   when there is no such driver
+ */
+int rte_eal_get_kernel_driver_by_path(const char *filename, char *dri_name);
+
+/**
  * Get cpu core_id.
  *
  * This function is private to the EAL.
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 5f6676d..00af21c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -969,3 +969,32 @@ error:
fclose(f);
return -1;
 }
+
+int
+rte_eal_get_kernel_driver_by_path(const char *filename, char *dri_name)
+{
+   int count;
+   char path[PATH_MAX];
+   char *name;
+
+   if (!filename || !dri_name)
+   return -1;
+
+   count = readlink(filename, path, PATH_MAX);
+   if (count >= PATH_MAX)
+   return -1;
+
+   /* For device does not have a driver */
+   if (count < 0)
+   return 1;
+
+   path[count] = '\0';
+
+   name = strrchr(path, '/');
+   if (name) {
+   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
+   return 0;
+   }
+
+   return -1;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a03553f..e1cf9e8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -78,35 +78,6 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev)
return rte_eal_unbind_kernel_driver(devpath, devid);
 }

-static int
-pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
-{
-   int count;
-   char path[PATH_MAX];
-   char *name;
-
-   if (!filename || !dri_name)
-   return -1;
-
-   count = readlink(filename, path, PATH_MAX);
-   if (count >= PATH_MAX)
-   return -1;
-
-   /* For device does not have a driver */
-   if (count < 0)
-   return 1;
-
-   path[count] = '\0';
-
-   name = strrchr(path, '/');
-   if (name) {
-   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
-   return 0;
-   }
-
-   return -1;
-}
-
 /* Map pci device */
 int
 rte_eal_pci_map_device(struct rte_pci_device *dev)
@@ -354,7 +325,7 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,

/* parse driver */
snprintf(filename, sizeof(filename), "%s/driver", dirname);
-   ret = pci_get_kernel_driver_by_path(filename, driver);
+   ret = rte_eal_get_kernel_driver_by_path(filename, driver);
if (ret < 0) {
RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
free(dev);
-- 
2.7.4

[dpdk-dev] [PATCH v7 03/21] eal/linux: generalize PCI kernel unbinding driver to EAL

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Generalize the PCI-specific pci_unbind_kernel_driver. It is now divided
into two parts. First, determination of the path and string identification
of the device to be unbound. Second, the actual unbind operation which is
generic.

BSD implementation updated as ENOTSUP

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
--
Changes since v2:
 - update BSD support for unbind kernel driver
---
 lib/librte_eal/bsdapp/eal/eal.c   |  7 +++
 lib/librte_eal/bsdapp/eal/eal_pci.c   |  4 ++--
 lib/librte_eal/common/eal_private.h   | 13 +
 lib/librte_eal/linuxapp/eal/eal.c | 26 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c | 33 +
 5 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 35e3117..5271fc2 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -633,3 +633,10 @@ rte_eal_process_type(void)
 {
return rte_config.process_type;
 }
+
+int
+rte_eal_unbind_kernel_driver(const char *devpath __rte_unused,
+const char *devid __rte_unused)
+{
+   return -ENOTSUP;
+}
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 7ed0115..703f034 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -89,11 +89,11 @@

 /* unbind kernel driver for this device */
 int
-pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
+pci_unbind_kernel_driver(struct rte_pci_device *dev)
 {
RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
"for BSD\n");
-   return -ENOTSUP;
+   return rte_eal_unbind_kernel_driver(dev);
 }

 /* Map pci device */
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 9e7d8f6..b0c208a 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -256,6 +256,19 @@ int rte_eal_alarm_init(void);
 int rte_eal_check_module(const char *module_name);

 /**
+ * Unbind kernel driver bound to the device specified by the given devpath,
+ * and its string identification.
+ *
+ * @param devpath  path to the device directory ("/sys/.../devices/")
+ * @param devididentification of the device ()
+ *
+ * @return
+ *  -1  unbind has failed
+ *   0  module has been unbound
+ */
+int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid);
+
+/**
  * Get cpu core_id.
  *
  * This function is private to the EAL.
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 2075282..5f6676d 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -943,3 +943,29 @@ rte_eal_check_module(const char *module_name)
/* Module has been found */
return 1;
 }
+
+int
+rte_eal_unbind_kernel_driver(const char *devpath, const char *devid)
+{
+   char filename[PATH_MAX];
+   FILE *f;
+
+   snprintf(filename, sizeof(filename),
+"%s/driver/unbind", devpath);
+
+   f = fopen(filename, "w");
+   if (f == NULL) /* device was not bound */
+   return 0;
+
+   if (fwrite(devid, strlen(devid), 1, f) == 0) {
+   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
+   filename);
+   goto error;
+   }
+
+   fclose(f);
+   return 0;
+error:
+   fclose(f);
+   return -1;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 876ba38..a03553f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -59,38 +59,23 @@ int
 pci_unbind_kernel_driver(struct rte_pci_device *dev)
 {
int n;
-   FILE *f;
-   char filename[PATH_MAX];
-   char buf[BUFSIZ];
+   char devpath[PATH_MAX];
+   char devid[BUFSIZ];
struct rte_pci_addr *loc = >addr;

-   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
-   snprintf(filename, sizeof(filename),
-   "%s/" PCI_PRI_FMT "/driver/unbind", pci_get_sysfs_path(),
+   /* devpath /sys/bus/pci/devices/:BB:CC.D */
+   snprintf(devpath, sizeof(devpath),
+   "%s/" PCI_PRI_FMT, pci_get_sysfs_path(),
loc->domain, loc->bus, loc->devid, loc->function);

-   f = fopen(filename, "w");
-   if (f == NULL) /* device was not bound */
-   return 0;
-
-   n = snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
+   n = snprintf(devid, sizeof(devid), PCI_PRI_FMT "\n",
 loc->domain, loc->bus, loc->devid, loc->function);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
+   if ((n < 0) || (n >= (int)sizeof(devid))) {
RTE_LOG(ERR, EAL, "%s(): snprintf failed\n", __func__);
-   goto error;
-

[dpdk-dev] [PATCH v7 02/21] eal: generalize PCI map/unmap resource to EAL

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

The functions pci_map_resource, pci_unmap_resource are generic so the
pci_* prefix can be omitted. The functions are moved to the
eal_common_dev.c so they can be reused by other infrastructure.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  2 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 ++
 lib/librte_eal/common/eal_common_dev.c  | 39 +
 lib/librte_eal/common/eal_common_pci.c  | 39 -
 lib/librte_eal/common/eal_common_pci_uio.c  | 16 +-
 lib/librte_eal/common/include/rte_dev.h | 32 
 lib/librte_eal/common/include/rte_pci.h | 32 
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c   |  2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c  |  5 ++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
 10 files changed, 89 insertions(+), 82 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 8b3ed88..7ed0115 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -228,7 +228,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, 
int res_idx,

/* if matching map is found, then use it */
offset = res_idx * pagesz;
-   mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+   mapaddr = rte_eal_map_resource(NULL, fd, (off_t)offset,
(size_t)dev->mem_resource[res_idx].len, 0);
close(fd);
if (mapaddr == MAP_FAILED)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 2f81f7c..11d9f59 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -170,6 +170,8 @@ DPDK_16.11 {
rte_delay_us_callback_register;
rte_eal_dev_attach;
rte_eal_dev_detach;
+   rte_eal_map_resource;
+   rte_eal_unmap_resource;
rte_eal_vdrv_register;
rte_eal_vdrv_unregister;

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 4f3b493..457d227 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -151,3 +152,41 @@ err:
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n", name);
return -EINVAL;
 }
+
+/* map a particular resource from a file */
+void *
+rte_eal_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
+{
+   void *mapaddr;
+
+   /* Map the Memory resource of device */
+   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
+   MAP_SHARED | additional_flags, fd, offset);
+   if (mapaddr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s"
+   " (%p)\n", __func__, fd, requested_addr,
+   (unsigned long)size, (unsigned long)offset,
+   strerror(errno), mapaddr);
+   } else
+   RTE_LOG(DEBUG, EAL, "  Device memory mapped at %p\n", mapaddr);
+
+   return mapaddr;
+}
+
+/* unmap a particular resource */
+void
+rte_eal_unmap_resource(void *requested_addr, size_t size)
+{
+   if (requested_addr == NULL)
+   return;
+
+   /* Unmap the Memory resource of device */
+   if (munmap(requested_addr, size)) {
+   RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
+   __func__, requested_addr, (unsigned long)size,
+   strerror(errno));
+   } else
+   RTE_LOG(DEBUG, EAL, "  Device memory unmapped at %p\n",
+   requested_addr);
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 638cd86..464acc1 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -67,7 +67,6 @@
 #include 
 #include 
 #include 
-#include 

 #include 
 #include 
@@ -114,44 +113,6 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

-/* map a particular resource from a file */
-void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
-int additional_flags)
-{
-   void *mapaddr;
-
-   /* Map the PCI memory resource of device */
-   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED | additional_flags, fd, offset);
-   if (mapaddr == MAP_FAILED) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
-   __func__, fd, requested_addr,
-   (unsigned

[dpdk-dev] [PATCH v7 01/21] eal: generalize PCI kernel driver enum to EAL

2016-10-28 Thread Shreyansh Jain

From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 

--
Changes since v0:
 - fix compilation error due to missing include
---
 lib/librte_eal/common/include/rte_dev.h | 12 
 lib/librte_eal/common/include/rte_pci.h |  9 -
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 8840380..6975b9f 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -109,6 +109,18 @@ struct rte_mem_resource {
void *addr; /**< Virtual address, NULL when not mapped. */
 };

+/**
+ * Kernel driver passthrough type
+ */
+enum rte_kernel_driver {
+   RTE_KDRV_UNKNOWN = 0,
+   RTE_KDRV_IGB_UIO,
+   RTE_KDRV_VFIO,
+   RTE_KDRV_UIO_GENERIC,
+   RTE_KDRV_NIC_UIO,
+   RTE_KDRV_NONE,
+};
+
 /** Double linked list of device drivers. */
 TAILQ_HEAD(rte_driver_list, rte_driver);
 /** Double linked list of devices. */
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 9ce8847..2c7046f 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -135,15 +135,6 @@ struct rte_pci_addr {

 struct rte_devargs;

-enum rte_kernel_driver {
-   RTE_KDRV_UNKNOWN = 0,
-   RTE_KDRV_IGB_UIO,
-   RTE_KDRV_VFIO,
-   RTE_KDRV_UIO_GENERIC,
-   RTE_KDRV_NIC_UIO,
-   RTE_KDRV_NONE,
-};
-
 /**
  * A structure describing a PCI device.
  */
-- 
2.7.4

[dpdk-dev] [PATCH v7 00/21] Introduce SoC device/driver framework for EAL

2016-10-28 Thread Shreyansh Jain

Introduction:
=

This patch set is direct derivative of Jan's original series [1],[2].

 - This version is based on master HEAD (ca41215)

 - In this, I am merging the series [11] back. It was initially part
   of this set but I had split considering that those changes in PCI
   were good standalone as well. But, 1) not much feedback was avail-
   able and 2) this patchset is a use-case for those patches making
   it easier to review. Just like what Jan had intended in original
   series.

 - SoC support is not enabled by default. It needs the 'enable-soc' toggle
   on command line. This is primarily because this patchset is still
   experimental and we would like to keep it isolated from non-SoC ops.
   Though, it does impact the ABI.

Aim:


As of now EAL is primarly focused on PCI initialization/probing.

 rte_eal_init()
  |- rte_eal_pci_init(): Find PCI devices from sysfs
  |- ...
  |- rte_eal_memzone_init()
  |- ...
  `- rte_eal_pci_probe(): Driver<=>Device initialization

This patchset introduces SoC framework which would enable SoC drivers and
drivers to be plugged into EAL, very similar to how PCI drivers/devices are
done today.

This is a stripped down version of PCI framework which allows the SoC PMDs
to implement their own routines for detecting devices and linking devices to
drivers.

1) Changes to EAL
 rte_eal_init()
  |- rte_eal_pci_init(): Find PCI devices from sysfs
  |- rte_eal_soc_init(): Calls PMDs->scan_fn
  |- ...
  |- rte_eal_memzone_init()
  |- ...
  |- rte_eal_pci_probe(): Driver<=>Device initialization, PMD->devinit()
  `- rte_eal_soc_probe(): Calls PMDs->match_fn and PMDs->devinit();

2) New device/driver structures:
  - rte_soc_driver (inheriting rte_driver)
  - rte_soc_device (inheriting rte_device)
  - rte_eth_dev and eth_driver embedded rte_soc_device and rte_soc_driver,
respectively.

3) The SoC PMDs need to:
 - define rte_soc_driver with necessary scan and match callbacks
 - Register themselves using DRIVER_REGISTER_SOC()
 - Implement respective bus scanning in the scan callbacks to add necessary
   devices to SoC device list
 - Implement necessary eth_dev_init/uninint for ethernet instances

4) Design considerations that are same as PCI:
 - SoC initialization is being done through rte_eal_init(), just after PCI
   initialization is done.
 - As in case of PCI, probe is done after rte_eal_pci_probe() to link the
   devices detected with the drivers registered.
 - Device attach/detach functions are available and have been designed on
   the lines of PCI framework.
 - PMDs register using DRIVER_REGISTER_SOC, very similar to
   DRIVER_REGISTER_PCI for PCI devices.
 - Linked list of SoC driver and devices exists independent of the other
   driver/device list, but inheriting rte_driver/rte_driver, these are
   also part of a global list.

5) Design considerations that are different from PCI:
 - Each driver implements its own scan and match function. PCI uses the BDF
   format to read the device from sysfs, but this _may_not_ be a case for a
   SoC ethernet device.
   = This is an important change from initial proposal by Jan in [2].
   Unlike his attempt to use /sys/bus/platform, this patch relies on the
   PMD to detect the devices. This is because SoC may require specific or
   additional info for device detection. Further, SoC may have embedded
   devices/MACs which require initialization which cannot be covered
   through sysfs parsing.
   `-> Point (6) below is a side note to above.
   = PCI based PMDs rely on EAL's capability to detect devices. This
   proposal puts the onus on PMD to detect devices, add to soc_device_list
   and wait for Probe. Matching, of device<=>driver is again PMD's
   callback.

6) Adding default scan and match helpers for PMDs
 - The design warrrants the PMDs implement their own scan of devices
   on bus, and match routines for probe implementation.
   This patch introduces helpers which can be used by PMDs for scan of
   the platform bus and matching devices against the compatible string
   extracted from the scan.
 - Intention is to make it easier to integrate known SoC which expose
   platform bus compliant information (compat, sys/bus/platform...).
 - PMDs which have deviations from this standard model can implement and
   hook their bus scanning and probe match callbacks while registering
   driver.

Patchset Overview:
==
 - Patches 0001~0004 are from [11] - moving some PCI specific functions
   and definitions to non-PCI area.
 - Patches 0005~0008 introduce the base infrastructure and test case
 - Patch 0009 is for command line support for no-soc, on lines of no-pci
 - Patch 0010 enables EAL to handle SoC type devices
 - Patch 0011 adds support for scan and probe callbacks and updates the test
   framework with relevant test case.
 - Patch 0012~0014 enable device argument, driver specific flags and
   interrupt handling related basic infra. Subsequent patches build up on
   them.
 - Patch 0015~0016 add

[dpdk-dev] Solarflare PMD submission question

2016-10-28 Thread Andrew Rybchenko

On 10/28/2016 03:33 PM, Thomas Monjalon wrote:
> 2016-10-28 13:50, Andrew Rybchenko:
>> The only thing which comes to my mind is to split libefx import on subsystem
>> basis (few files per subsystem). It is artificial and added files will
>> be abandoned
>> until the patch which adds them into build. It could be something like:
>>1. External interfaces definition
>>2. Internal interfaces definition
>>3. Registers definition (hardware interface)
>>4. Management CPU interface definition (it is one file, but still big
>> 650K)
>>5. Management CPU interface implementation
>> and so on for NIC global controls, interrupts, event queue, transmit,
>> receive,
>>filtering etc.
> Yes it is artificial.
> The most valuable would be a transversal logical split, kind of feature
> per feature, in order to explain how the device works.

I'm not the main author of the libefx and personally would consider it 
very useful.
 From the other hand I understand that it is a huge amount of work to 
make it.

> Such commit is also the opportunity to explain acronyms and so on.

Good. We'll go this way and 'll do my best to make it useful to understand
overall structure of the code and how the device works.

>>> It would be also really appreciated to provide a design documentation
>>> in doc/guides/nics. Are the datasheets open? A link in the doc would help.
>> We have a documentation which grows together with supported features,
>> but it is rather for users. Important design decisions (not so many) are
>> documented nearby corresponding code. Unfortunately there is no open
>> datasheets. Management CPU interface definition has comments.
> Without neither a datasheet, nor a comprehensive code introduction, it is
> almost impossible to dive in your code. So it misses the point about bringing
> it to an Open Source project.
> Please do the the effort to bring some knowledge to the community.

I've raised this internally and see what extra documentation we can 
provide to
the community. But this may take some time and I hope it is OK to post 
patches
in the interim. I use the management CPU interface (MCDI) definition 
mentioned
above when I add features. It is shared by all drivers: [1], [2], [3].

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/sfc/mcdi_pcol.h
[2] 
https://svnweb.freebsd.org/base/head/sys/dev/sfxge/common/efx_regs_mcdi.h?view=markup
[3] 
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/io/sfxge/common/efx_regs_mcdi.h

Andrew.

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Thomas Monjalon

2016-10-28 15:31, Ferruh Yigit:
> * virtio-user + vhost-net
> This can be valid alternative, removes the out of tree kernel module
> need. But missing control path. Proof of concept work will be done.

That's probably a smart alternative for packet injection.
What do you mean exactly by "missing control path"?

> * Remove ethtool support ?

That's the other part of KNI.
It works only for e1000/ixgbe. That's a niche.

> Still there is some interest, will keep it. But not able to extend it to
> other drivers with current design.

It should be removed one day.
We must seriously think about a generic alternative.
Either we add DPDK support in ethtool or we create a dpdk-ethtool.
(or at least a library as the one in examples/).
Or we do nothing and wait to have more hardware like Mellanox supporting
a kernel bifurcated driver approach.

> *KNI PMD
> Patch is in the mail list, missing comments. If it gets some
> interest/comments/acks it may go in to next release.

I'm not against KNI PMD but it looks strange to add more support
to an old dying approach.

[dpdk-dev] mbuf changes

2016-10-28 Thread Richardson, Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Friday, October 28, 2016 5:50 PM
> To: Morten Br?rup 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] mbuf changes
> 
> On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Br?rup wrote:
> > Comments at the end.
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pattan, Reshma
> > > Sent: Friday, October 28, 2016 3:35 PM
> > > To: Olivier Matz
> > > Cc: dev at dpdk.org; Morten Br?rup
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > > Hi Olivier,
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > > To: Richardson, Bruce ; Morten Br?rup
> > > > 
> > > > Cc: Adrien Mazarguil ; Wiles, Keith
> > > > ; dev at dpdk.org; Oleg Kuporosov
> > > > 
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > >
> > > >
> > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> > > > >> Comments at the end.
> > > > >>
> > > > >> Med venlig hilsen / kind regards
> > > > >> - Morten Br?rup
> > > > >>
> > > > >>> -Original Message-
> > > > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > > >>> To: Morten Br?rup
> > > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier
> > > > >>> Matz; Oleg Kuporosov
> > > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>
> > > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
> > > >  Comments inline.
> > > > 
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > > Richardson
> > > > > Sent: Tuesday, October 25, 2016 1:14 PM
> > > > > To: Adrien Mazarguil
> > > > > Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > > > Oleg Kuporosov
> > > > > Subject: Re: [dpdk-dev] mbuf changes
> > > > >
> > > > > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > > wrote:
> > > > >> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup
> wrote:
> > > > >>> Comments inline.
> > > > >>>
> > > > >>> Med venlig hilsen / kind regards
> > > > >>> - Morten Br?rup
> > > > >>>
> > > > >>>
> > > >  -Original Message-
> > > >  From: Adrien Mazarguil
> > > >  [mailto:adrien.mazarguil at 6wind.com]
> > > >  Sent: Tuesday, October 25, 2016 11:39 AM
> > > >  To: Bruce Richardson
> > > >  Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier
> > > >  Matz; Oleg Kuporosov
> > > >  Subject: Re: [dpdk-dev] mbuf changes
> > > > 
> > > >  On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce
> > > >  Richardson
> > > > >>> wrote:
> > > > > On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> > > > >>> wrote:
> > > >  [...]
> > > > >>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> > > >   wrote:
> > > >  [...]
> > > > 
> > > > > One other point I'll mention is that we need to have a
> > > > > discussion on how/where to add in a timestamp value into
> > > > >>> the
> > > > > mbuf. Personally, I think it can be in a union with the
> > > > > sequence
> > > > > number value, but I also suspect that 32-bits of a
> > > > >>> timestamp
> > > > > is not going to be enough for
> > > >  many.
> > > > >
> > > > > Thoughts?
> > > > 
> > > >  If we consider that timestamp representation should use
> > > > > nanosecond
> > > >  granularity, a 32-bit value may likely wrap around too
> > > > >>> quickly
> > > >  to be useful. We can also assume that applications
> > > requesting
> > > >  timestamps may care more about latency than throughput,
> > > >  Oleg
> > > > > found
> > > >  that using the second cache line for this purpose had a
> > > > > noticeable impact [1].
> > > > 
> > > >   [1] http://dpdk.org/ml/archives/dev/2016-
> > > October/049237.html
> > > > >>>
> > > > >>> I agree with Oleg about the latency vs. throughput
> > > > >>> importance for
> > > > > such applications.
> > > > >>>
> > > > >>> If you need high resolution timestamps, consider them to
> > > > >>> be
> > > > > generated by the NIC RX driver, possibly by the hardware
> > > > > itself
> > > > > (http://w3new.napatech.com/features/time-precision/hardware-
> > > time
> > > > > - stamp), so the timestamp belongs in the first cache line.
> > > > > And I am proposing that it should have the highest possible
> > > > > accuracy, which makes the value hardware dependent.
> > > > >>>
> > > >

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Jerin Jacob

On Fri, Oct 28, 2016 at 10:15:47AM +, Ananyev, Konstantin wrote:
> Hi Tomasz,
> 
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without)
> > > code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> > 
> > I had sent txprep engine in v2 
> > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the 
> > suggestions. If you like it I can resent
> > it in place of csumonly modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep

Just my 2 cents, As "tx_prep" is a generic API and if PMD tries to
fix-up some other limitation(not csum) then in that case it is difficult for
the application to know in which PMD combination it needs be used.

> or so? 
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?
> Konstantin 
> 
> > 
> > Tomasz
> > 
> > > >
> > > > > > >  struct rte_eth_dev {
> > > > > > >   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive
> > > function. */
> > > > > > >   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
> > > > > > > function. */
> > > > > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit
> > > > > > > +prepare function. */
> > > > > > >   struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > >   const struct eth_driver *driver;/**< Driver for this device */
> > > > > > >   const struct eth_dev_ops *dev_ops; /**< Functions exported by
> > > > > > > PMD */
> > > > > >
> > > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > > I guess we want to have several implementations?
> > > > >
> > > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > > >
> > > > > >
> > > > > > Shouldn't we have a const struct control_dev_ops and a struct
> > > datapath_dev_ops?
> > > > >
> > > > > That's probably a good idea, but I suppose it is out of scope for that
> > > patch.
> > > >
> > > > No it's not out of scope.
> > > > It answers to the question "why is it added in this structure and not
> > > dev_ops".
> > > > We won't do this change when nothing else is changed in the struct.
> > >
> > > Not sure I understood you here:
> > > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced
> > > as part of that patch?
> > > But that's a lot of  changes all over rte_ethdev.[h,c].
> > > It definitely worse a separate patch (might be some discussion) for me.
> > > Konstantin
> > >
> > >
>

[dpdk-dev] mbuf changes

2016-10-28 Thread Morten Brørup

Comments at the end.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pattan, Reshma
> Sent: Friday, October 28, 2016 3:35 PM
> To: Olivier Matz
> Cc: dev at dpdk.org; Morten Br?rup
> Subject: Re: [dpdk-dev] mbuf changes
> 
> Hi Olivier,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > Sent: Tuesday, October 25, 2016 1:49 PM
> > To: Richardson, Bruce ; Morten Br?rup
> > 
> > Cc: Adrien Mazarguil ; Wiles, Keith
> > ; dev at dpdk.org; Oleg Kuporosov
> > 
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> >
> >
> > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> > >> Comments at the end.
> > >>
> > >> Med venlig hilsen / kind regards
> > >> - Morten Br?rup
> > >>
> > >>> -Original Message-
> > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > >>> To: Morten Br?rup
> > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > >>> Oleg Kuporosov
> > >>> Subject: Re: [dpdk-dev] mbuf changes
> > >>>
> > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
> >  Comments inline.
> > 
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > Richardson
> > > Sent: Tuesday, October 25, 2016 1:14 PM
> > > To: Adrien Mazarguil
> > > Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > Oleg Kuporosov
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> wrote:
> > >> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote:
> > >>> Comments inline.
> > >>>
> > >>> Med venlig hilsen / kind regards
> > >>> - Morten Br?rup
> > >>>
> > >>>
> >  -Original Message-
> >  From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> >  Sent: Tuesday, October 25, 2016 11:39 AM
> >  To: Bruce Richardson
> >  Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz;
> >  Oleg Kuporosov
> >  Subject: Re: [dpdk-dev] mbuf changes
> > 
> >  On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > >>> wrote:
> > > On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> > >>> wrote:
> >  [...]
> > >>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> >   wrote:
> >  [...]
> > 
> > > One other point I'll mention is that we need to have a
> > > discussion on how/where to add in a timestamp value into
> > >>> the
> > > mbuf. Personally, I think it can be in a union with the
> > > sequence
> > > number value, but I also suspect that 32-bits of a
> > >>> timestamp
> > > is not going to be enough for
> >  many.
> > >
> > > Thoughts?
> > 
> >  If we consider that timestamp representation should use
> > > nanosecond
> >  granularity, a 32-bit value may likely wrap around too
> > >>> quickly
> >  to be useful. We can also assume that applications
> requesting
> >  timestamps may care more about latency than throughput, Oleg
> > > found
> >  that using the second cache line for this purpose had a
> > > noticeable impact [1].
> > 
> >   [1] http://dpdk.org/ml/archives/dev/2016-
> October/049237.html
> > >>>
> > >>> I agree with Oleg about the latency vs. throughput importance
> > >>> for
> > > such applications.
> > >>>
> > >>> If you need high resolution timestamps, consider them to be
> > > generated by the NIC RX driver, possibly by the hardware itself
> > > (http://w3new.napatech.com/features/time-precision/hardware-
> time
> > > - stamp), so the timestamp belongs in the first cache line. And
> > > I am proposing that it should have the highest possible
> > > accuracy, which makes the value hardware dependent.
> > >>>
> > >>> Furthermore, I am arguing that we leave it up to the
> > >>> application
> > >>> to
> > > keep track of the slowly moving bits (i.e. counting whole
> > > seconds, hours and calendar date) out of band, so we don't use
> > > precious
> > >>> space
> > > in the mbuf. The application doesn't need the NIC RX driver's
> > > fast path to capture which date (or even which second) a packet
> > > was received. Yes, it adds complexity to the application, but
> we
> > > can't set aside 64 bit for a generic timestamp. Or as a weird
> tradeoff:
> > > Put the fast moving 32 bit in the first cache line and the slow
> > > moving 32 bit in the second cache line, as a placeholder for
> the
> > >>> application to fill out if needed.
> > > Yes, it means that the application needs to check the time

[dpdk-dev] Solarflare PMD submission question

2016-10-28 Thread Andrew Rybchenko

On 10/28/2016 03:33 PM, Thomas Monjalon wrote:
> 2016-10-28 13:50, Andrew Rybchenko:
>> First of all I'd like to double check that it is clear that we discuss
>> libefx
>> (base driver in terms of DPDK) import here. The PMD itself is already split
>> in 20+ patches.
> I don't know libefx. In DPDK, a base driver is often a subdirectory
> inside the PMD. Will it be the case?

Yes. Just to be absolutely sure: are the discussed requirements to split
applicable to base driver import?

Andrew.

[dpdk-dev] [PATCH] app/test: fix wrong pointer values in crypto perftest

2016-10-28 Thread Thomas Monjalon

2016-10-28 13:42, Trahe, Fiona:
> From: Kusztal, ArkadiuszX
> > This commit fixes problem with device hanging because of wrong pointer
> > values in snow3g performance test
> 
> Can you add to resolved issues section of release notes please.

I'm not sure we should comment about unit test fixes in the release notes.

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Richardson, Bruce



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, October 28, 2016 4:13 PM
> To: Yigit, Ferruh 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] KNI discussion in userspace event
> 
> 2016-10-28 15:31, Ferruh Yigit:
> > * virtio-user + vhost-net
> > This can be valid alternative, removes the out of tree kernel module
> > need. But missing control path. Proof of concept work will be done.
> 
> That's probably a smart alternative for packet injection.
> What do you mean exactly by "missing control path"?

We'll have to see how it performs - which is the key gap for data path that KNI 
fills. Until we get an alternative with (nearly) equivalent performance, there 
will be demand for KNI to stick around.
The "control path" is the ethtool part, to get stats and do operations on the 
NIC using command-line tools.

> 
> > * Remove ethtool support ?
> 
> That's the other part of KNI.
> It works only for e1000/ixgbe. That's a niche.

Yes, it's something we need to remove, but again, we need an alternative first.

> 
> > Still there is some interest, will keep it. But not able to extend it
> > to other drivers with current design.
> 
> It should be removed one day.
> We must seriously think about a generic alternative.
> Either we add DPDK support in ethtool or we create a dpdk-ethtool.
> (or at least a library as the one in examples/).

I don't view that as a great path forward. Sure, we can do our own ethtool, but 
then people will look for ifconfig to work, and "ip" to work, etc. I view 
having a kernel proxy module as the best path here as it is tool agnostic on 
the userspace side, rather than trying to make every tool for working with 
kernel netdevs also have support for dpdk ports.

> Or we do nothing and wait to have more hardware like Mellanox supporting a
> kernel bifurcated driver approach.

Given the lack of other NICs supporting that, I think it could be quite a wait! 
Also, it doesn't work for virtio ports, for pcap ports, or any other ports 
which don't have physical hardware backing them. No reason you shouldn't be 
able to pull stats from all your dpdk ethdevs, not just the ones with physical 
hardware. The same ethdev APIs work for them, so should the same tools.

> 
> > *KNI PMD
> > Patch is in the mail list, missing comments. If it gets some
> > interest/comments/acks it may go in to next release.
> 
> I'm not against KNI PMD but it looks strange to add more support to an old
> dying approach.

I think the main idea here is to clean up the API - at least for the data path. 
There is no reason why we need special KNI RX/TX functions, when ethdev RX/TX 
functions could do the job. However, at a higher level, the more basic 
requirement is that whatever solution for the data-path to kernel from dpdk is, 
it needs to appear as an ethdev, and not as a special library with different 
APIs, as KNI is now.

/Bruce

[dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library

2016-10-28 Thread Remy Horton

On 28/10/2016 09:12, Stephen Hemminger wrote:
> On Fri, 28 Oct 2016 09:04:30 +0800
> Remy Horton  wrote:
>
>> +
>> +struct rte_stats_bitrate_s {
>> +uint64_t last_ibytes;
>> +uint64_t last_obytes;
>> +uint64_t peak_ibits;
>> +uint64_t peak_obits;
>> +uint64_t ewma_ibits;
>> +uint64_t ewma_obits;
>> +};
>> +
>
> Reader/write access of 64 bit values is not safe on 32 bit platforms.
> I think you need to add a generation counter (see Linux kernel syncp)
> to handle 32 bit architecture. If done correctly, it would be a nop
> on 64 bit platforms.

I don't see a problem since this is private persistent data that is only 
read/written from rte_stats_bitrate_calc(), and once calculated it 
pushes them into the metrics library using rte_metrics_update_metrics(). 
The idea is that downstream consumers get the values using 
rte_metrics_get_values() rather than reading rte_stats_bitrate_s directly.

Having said that, what you mention quite likley affects the metrics 
library itself.. :)

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Thomas Monjalon

2016-10-28 12:59, Ananyev, Konstantin:
> > 2016-10-28 11:34, Ananyev, Konstantin:
> > > > > 2016-10-27 16:24, Ananyev, Konstantin:
> > > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > > --- a/config/common_base
> > > > > > > > > > +++ b/config/common_base
> > > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > > >
> > > > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > > > >
> > > > > > > > Not sure why?
> > > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act 
> > > > > > > > as noop.
> > > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > > >
> > > > > > > If it is not implemented, the application must do the preparation 
> > > > > > > by itself.
> > > > > > > From patch 6:
> > > > > > > "
> > > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > > application and used Tx preparation API for packet preparation and
> > > > > > > verification.
> > > > > > > "
> > > > > > > So how does it behave with other drivers?
> > > > > >
> > > > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel 
> > > > > > drivers..
> > > > > > My bad, missed that part completely.
> > > > > > Yes, then I suppose for now we'll need to support both (with and 
> > > > > > without) code paths for testpmd.
> > > > > > Probably a new fwd mode or just extra parameter for the existing 
> > > > > > one?
> > > > > > Any other suggestions?
> > > > >
> > > > > Please think how we can use it in every applications.
> > > > > It is not ready.
> > > > > Either we introduce the API without enabling it, or we implement it
> > > > > in every drivers.
> > > >
> > > > I understand your position here, but just like to point that:
> > > > 1) It is a new functionality optional to use.
> > > >  The app is free not to use that functionality and still do the 
> > > > preparation itself
> > > >  (as it has to do it now).
> > > > All existing apps would keep working as expected without using that 
> > > > function.
> > > > Though if the app developer knows that for all HW models he plans 
> > > > to run on
> > > > tx_prep is implemented - he is free to use it.
> > > > 2) It would be difficult for Tomasz (and other Intel guys) to 
> > > > implement tx_prep()
> > > >  for all non-Intel HW that DPDK supports right now.
> > > >  We just don't have all the actual HW in stock and probably 
> > > > adequate knowledge of it.
> > > > So we depend here on the good will of other PMD 
> > > > mainaners/developers to implement
> > > > tx_prep() for these devices.
> > > > From other side, if it will be disabled by default, then, I think,
> > > > PMD developers just wouldn't be motivated to implement it.
> > > > So it will be left untested and unused forever.
> > >
> > > Actually as another thought:
> > > Can we have it enabled by default, but mark it as experimental or so?
> > > If memory serves me right, we've done that for cryptodev in the past, no?
> > 
> > Cryptodev was a whole new library.
> > We won't play the game "find which function is experimental or not".
> > 
> > We should not enable a function until it is fully implemented.
> > 
> > If the user really understands that it will work only with few drivers
> > then he can change the build configuration himself.
> > Enabling in the default configuration is a message to say that it works
> > everywhere without any risk.
> > It's so simple that I don't even understand why I must argue for.
> > 
> > And by the way, it is late for 16.11.
> 
> Ok, I understand your concern about enabling it by default and testpmd 
> breakage,
> but what else you believe is not ready?

That's already a lot!
I commented also about function naming.
All these things are trivial to fix.
But it is late. After RC1, we should stop integrating new features.

> > I suggest to integrate it in the beginning of 17.02 cycle, with the hope
> > that you can convince other developers to implement it in other drivers,
> > so we could finally enable it in the default config.
> 
> Ok, any insights then, how we can convince people to do that?

You just have to explain clearly what this new feature is bringing
and what will be the future possibilities.

> BTW,  it means then that tx_prep() should become part of mandatory API
> to be implemented by each PMD doing TX offloads, right?

Right.
The question is "what means mandatory"?
Should we block some patches for non-compliant drivers?
Should we remove offloads capability from non-compliant drivers?

> > Oh, and I don't trust that nobody were thinking that it would break testpmd
> > for non-Intel drivers.
> 
> Well, believe it or not, but yes, I missed that one.
> I think I already admitted that it was my fault, and apologized for that.

And it's my fault not having seen that

[dpdk-dev] [PATCH v7] app/testpmd: fix DCB configuration

2016-10-28 Thread Bernard Iremonger

Data Centre Bridge (DCB) configuration fails when SRIOV is
enabled if nb_rxq and nb_txq are not set to 1.

When dcb_mode is DCB_VT_ENABLED and max_vfs is greater than
zero, set nb_rxq and nb_txq to 1.

The failure occurs during configuration of the ixgbe PMD when
it is started, in the ixgbe_check_mq_mode function, if nb_rxq
and nb_txq are not set to 1.

Fixes: 2a977b891f99 ("app/testpmd: fix DCB configuration")

Signed-off-by: Bernard Iremonger 

Changes in v7:
restore nb_rxq and nb_txq setting when max_vfs is 0.
---
 app/test-pmd/testpmd.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6185be6..96f5011 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2002,8 +2002,13 @@ init_port_dcb_config(portid_t pid,
 * and has the same number of rxq and txq in dcb mode
 */
if (dcb_mode == DCB_VT_ENABLED) {
-   nb_rxq = rte_port->dev_info.max_rx_queues;
-   nb_txq = rte_port->dev_info.max_tx_queues;
+   if (rte_port->dev_info.max_vfs > 0) {
+   nb_rxq = 1;
+   nb_txq = 1;
+   } else {
+   nb_rxq = rte_port->dev_info.max_rx_queues;
+   nb_txq = rte_port->dev_info.max_tx_queues;
+   }
} else {
/*if vt is disabled, use all pf queues */
if (rte_port->dev_info.vmdq_pool_base == 0) {
-- 
2.4.3

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Ferruh Yigit

Hi,

There was an "Interworking with the Linux Kernel" discussion in the DPDK
userspace event, this mail is to summarize the output and to get more
comments from community.


Briefly, KNI mostly will stay as it is as an interworking with the Linux
kernel solution.
Out of tree kernel module concern is still there, but there is no clear
alternative to switch. And community still care about performance of KNI
and control path of KNI. Only KNI VHOST may go away. KNI PMD depends on
community interest. There was no modification request on KNI library and
sample app.


Discussed alternatives were:
* Tun/Tap
This won't be as fast as KNI and performance is an issue.

* virtio-user + vhost-net
This can be valid alternative, removes the out of tree kernel module
need. But missing control path. Proof of concept work will be done.

* Bifurcated driver
Not able to filter all traffic, not a full functional alternative.

* Upstreaming kernel module:
Stephen suggested upstreaming a generic shim layer and use it.


Future of the KNI:
* Remove ethtool support ?
Still there is some interest, will keep it. But not able to extend it to
other drivers with current design.

* Remove KNI VHOST?
There were no interest for this feature. I will send a deprecation
notice to remove this, and we can discuss more there.

* What to do with out of tree kernel module
It is still problem for OSVs and unfortunately it is staying.

* Switch completely to an alternative approach?
There won't be an action for a switch. virtio-user + vhost-net
alternative will be investigated.

*KNI PMD
Patch is in the mail list, missing comments. If it gets some
interest/comments/acks it may go in to next release.

* Any improvement on library or sample app?
Nothing listed.


Thanks,
ferruh

[dpdk-dev] mbuf changes

2016-10-28 Thread Pattan, Reshma



> -Original Message-
> From: Morten Br?rup [mailto:mb at smartsharesystems.com]
> Sent: Friday, October 28, 2016 3:12 PM
> To: Pattan, Reshma ; Olivier Matz
> 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] mbuf changes
> 
> Comments at the end.
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pattan, Reshma
> > Sent: Friday, October 28, 2016 3:35 PM
> > To: Olivier Matz
> > Cc: dev at dpdk.org; Morten Br?rup
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> > Hi Olivier,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > To: Richardson, Bruce ; Morten Br?rup
> > > 
> > > Cc: Adrien Mazarguil ; Wiles, Keith
> > > ; dev at dpdk.org; Oleg Kuporosov
> > > 
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > >
> > >
> > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> > > >> Comments at the end.
> > > >>
> > > >> Med venlig hilsen / kind regards
> > > >> - Morten Br?rup
> > > >>
> > > >>> -Original Message-
> > > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > >>> To: Morten Br?rup
> > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > >>> Oleg Kuporosov
> > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>
> > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
> > >  Comments inline.
> > > 
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > Richardson
> > > > Sent: Tuesday, October 25, 2016 1:14 PM
> > > > To: Adrien Mazarguil
> > > > Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz;
> > > > Oleg Kuporosov
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > wrote:
> > > >> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote:
> > > >>> Comments inline.
> > > >>>
> > > >>> Med venlig hilsen / kind regards
> > > >>> - Morten Br?rup
> > > >>>
> > > >>>
> > >  -Original Message-
> > >  From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > >  Sent: Tuesday, October 25, 2016 11:39 AM
> > >  To: Bruce Richardson
> > >  Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier
> > >  Matz; Oleg Kuporosov
> > >  Subject: Re: [dpdk-dev] mbuf changes
> > > 
> > >  On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > > >>> wrote:
> > > > On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> > > >>> wrote:
> > >  [...]
> > > >>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> > >   wrote:
> > >  [...]
> > > 
> > > > One other point I'll mention is that we need to have a
> > > > discussion on how/where to add in a timestamp value into
> > > >>> the
> > > > mbuf. Personally, I think it can be in a union with the
> > > > sequence
> > > > number value, but I also suspect that 32-bits of a
> > > >>> timestamp
> > > > is not going to be enough for
> > >  many.
> > > >
> > > > Thoughts?
> > > 
> > >  If we consider that timestamp representation should use
> > > > nanosecond
> > >  granularity, a 32-bit value may likely wrap around too
> > > >>> quickly
> > >  to be useful. We can also assume that applications
> > requesting
> > >  timestamps may care more about latency than throughput,
> > >  Oleg
> > > > found
> > >  that using the second cache line for this purpose had a
> > > > noticeable impact [1].
> > > 
> > >   [1] http://dpdk.org/ml/archives/dev/2016-
> > October/049237.html
> > > >>>
> > > >>> I agree with Oleg about the latency vs. throughput
> > > >>> importance for
> > > > such applications.
> > > >>>
> > > >>> If you need high resolution timestamps, consider them to be
> > > > generated by the NIC RX driver, possibly by the hardware
> > > > itself
> > > > (http://w3new.napatech.com/features/time-precision/hardware-
> > time
> > > > - stamp), so the timestamp belongs in the first cache line.
> > > > And I am proposing that it should have the highest possible
> > > > accuracy, which makes the value hardware dependent.
> > > >>>
> > > >>> Furthermore, I am arguing that we leave it up to the
> > > >>> application
> > > >>> to
> > > > keep track of the slowly moving bits (i.e. counting whole
> > > > seconds, hours and calendar date) out of band, so we don't use
> > > > precious
> > > >>> space
> > > > in the mbuf. The application doesn't need the NIC RX

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Bruce Richardson

On Fri, Oct 28, 2016 at 02:48:57PM +0100, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Tuesday, October 25, 2016 6:49 PM
> 
> > 
> > Hi Community,
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro 
> > folks.
> > Let me know, if anyone else interested in contributing to the definition of 
> > eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work 
> > on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> 
> 
> Hi All,
> 
> I've been looking at the eventdev API from a use-case point of view, and I'm 
> unclear on a how the API caters for two uses. I have simplified these as much 
> as possible, think of them as a theoretical unit-test for the API :)
> 
> 
> Fragmentation:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs fragmentation into two packets
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> In particular, I'm referring to how can the scheduler know that the 3rd 
> packet is the one being fragmented, and how to keep packet order valid. 
> 
> 
> Dropping packets:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs to be dropped
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> Again, in particular how does the scheduler know that the 3rd packet is being 
> dropped.
> 
> 
> Regards, -Harry

Hi,

these questions apply particularly to reordered which has a lot more
complications than the other types in terms of sending packets back into
the scheduler. However, atomic types will still suffer from problems
with things the way they are - again if we assume a burst of 8 packets,
then to forward those packets, we need to re-enqueue them again to the
scheduler, and also then send 8 releases to the scheduler as well, to
release the atomic locks for those packets.
This means that for each packet we have to send two messages to a
scheduler core, something that is really inefficient.

This number of messages is critical for any software implementation, as
the cost of moving items core-to-core is going to be a big bottleneck
(perhaps the biggest bottleneck) in the system. It's for this reason we
need to use burst APIs - as with rte_rings.

How we have solved this in our implementation, is to allow there to be
an event operation type. The four operations we implemented are as below
(using packet as a synonym for event here, since these would mostly
apply to packets flowing through a system):

* NEW - just a regular enqueue of a packet, without any previous context
* FORWARD - enqueue a packet, and mark the flow processing for the
equivalent packet that was dequeued as completed, i.e.
release any atomic locks, or reorder this packet with
respect to any other outstanding packets from the event queue.
* DROP- this is roughtly equivalent to the existing "release" API call,
except that having it as an enqueue type allows us to
release multiple items in a single call, and also to mix
releases with new packets and forwarded packets
* PARTIAL - this indicates that the packet being enqueued should be
treated according to the context of the current packet, but
that that context should not be released/completed by the
enqueue of this packet. This only really applies for
reordered events, and is needed to do fragmentation and or
multicast of packets with reordering.

Therefore, I think we need to use some of the bits just freed up in the
event structure to include an enqueue operation type. Without it, I just
can't see how the API can ever support burst operation on packets.

Regards,
/Bruce

[dpdk-dev] Solarflare PMD submission question

2016-10-28 Thread Thomas Monjalon

2016-10-28 16:05, Andrew Rybchenko:
> On 10/28/2016 03:33 PM, Thomas Monjalon wrote:
> > 2016-10-28 13:50, Andrew Rybchenko:
> >> First of all I'd like to double check that it is clear that we discuss
> >> libefx
> >> (base driver in terms of DPDK) import here. The PMD itself is already split
> >> in 20+ patches.
> > I don't know libefx. In DPDK, a base driver is often a subdirectory
> > inside the PMD. Will it be the case?
> 
> Yes. Just to be absolutely sure: are the discussed requirements to split
> applicable to base driver import?

Yes I'm talking about the base driver. But they are not some requirements.
Take them as advices.
The first priority is to welcome your new driver.
The second priority is to make sure that it is open and readable enough.

[dpdk-dev] [PATCH] net/qede: fix gcc compiler option checks

2016-10-28 Thread Stephen Hemminger

On Thu, 27 Oct 2016 23:37:57 -0700
Rasesh Mody  wrote:

> From: Rasesh Mody 
> 
> Using GCC_VERSION to check gcc version and decide whether to include
> that compiler option.
> 
> Fixes: ec94dbc57362 ("qede: add base driver")
> Fixes: ecc7a5a27ffe ("net/qede/base: fix 32-bit build")
> 
> Signed-off-by: Rasesh Mody 
> ---
>  drivers/net/qede/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/qede/Makefile b/drivers/net/qede/Makefile
> index 39751e4..29b443d 100644
> --- a/drivers/net/qede/Makefile
> +++ b/drivers/net/qede/Makefile
> @@ -46,11 +46,11 @@ endif
>  endif
>  
>  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> -ifeq ($(shell gcc -Wno-unused-but-set-variable -Werror -E - < /dev/null > 
> /dev/null 2>&1; echo $$?),0)
> +ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1)
>  CFLAGS_BASE_DRIVER += -Wno-unused-but-set-variable
>  endif
>  CFLAGS_BASE_DRIVER += -Wno-missing-declarations
> -ifeq ($(shell gcc -Wno-maybe-uninitialized -Werror -E - < /dev/null > 
> /dev/null 2>&1; echo $$?),0)
> +ifeq ($(shell test $(GCC_VERSION) -ge 46 && echo 1), 1)
>  CFLAGS_BASE_DRIVER += -Wno-maybe-uninitialized
>  endif
>  CFLAGS_BASE_DRIVER += -Wno-strict-prototypes

Does this mean that less compiler checking is done or more?
It seems lots of drivers make the excuse:
 "the base driver comes from another group and is known buggy but can't be 
fixed"
That doesn't reflect well on the quality of the DPDK.

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Jerin Jacob

On Fri, Oct 28, 2016 at 09:36:46AM +0100, Bruce Richardson wrote:
> On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > Thanks. One other suggestion is that it might be useful to provide
> > > support for having typed queues explicitly in the API. Right now, when
> > > you create an queue, the queue_conf structure takes as parameters how
> > > many atomic flows that are needed for the queue, or how many reorder
> > > slots need to be reserved for it. This implicitly hints at the type of
> > > traffic which will be sent to the queue, but I'm wondering if it's
> > > better to make it explicit. There are certain optimisations that can be
> > > looked at if we know that a queue only handles packets of a particular
> > > type. [Not having to handle reordering when pulling events from a core
> > > can be a big win for software!].
> > 
> > If it helps in SW implementation, then I think we can add this in queue
> > configuration. 
> > 
> > > 
> > > How about adding: "allowed_event_types" as a field to
> > > rte_event_queue_conf, with possible values:
> > > * atomic
> > > * ordered
> > > * parallel
> > > * mixed - allowing all 3 types. I think allowing 2 of three types might
> > > make things too complicated.
> > > 
> > > An open question would then be how to behave when the queue type and
> > > requested event type conflict. We can either throw an error, or just
> > > ignore the event type and always treat enqueued events as being of the
> > > queue type. I prefer the latter, because it's faster not having to
> > > error-check, and it pushes the responsibility on the app to know what
> > > it's doing.
> > 
> > How about making default as "mixed" and let application configures what
> > is not required?. That way application responsibility is clear.
> > something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
> > with default.
> > 
> I suppose it could work, but why bother doing that? If an app knows it's
> only going to use one traffic type, why not let it just state what it
> will do rather than try to specify what it won't do. If mixed is needed,

My thought was more inline with ethdev spec, like, ref-count is default,
if application need exception then set ETH_TXQ_FLAGS_NOREFCOUNT. But it is OK, 
if
you need other way.

> then it's easy enough to specify - and we can make it the zero/default
> value too.

OK. Then we will make MIX as zero/default and add "allowed_event_types" in
event queue config.

/Jerin

> 
> Our software implementation for now, only supports one type per queue -
> which we suspect should meet a lot of use-cases. We'll have to see about
> adding in mixed types in future.
> 
> /Bruce

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Thomas Monjalon

2016-10-28 11:34, Ananyev, Konstantin:
> > > 2016-10-27 16:24, Ananyev, Konstantin:
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > --- a/config/common_base
> > > > > > > > +++ b/config/common_base
> > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > >
> > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > >
> > > > > > Not sure why?
> > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as 
> > > > > > noop.
> > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > >
> > > > > If it is not implemented, the application must do the preparation by 
> > > > > itself.
> > > > > From patch 6:
> > > > > "
> > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > application and used Tx preparation API for packet preparation and
> > > > > verification.
> > > > > "
> > > > > So how does it behave with other drivers?
> > > >
> > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel 
> > > > drivers..
> > > > My bad, missed that part completely.
> > > > Yes, then I suppose for now we'll need to support both (with and 
> > > > without) code paths for testpmd.
> > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > Any other suggestions?
> > >
> > > Please think how we can use it in every applications.
> > > It is not ready.
> > > Either we introduce the API without enabling it, or we implement it
> > > in every drivers.
> > 
> > I understand your position here, but just like to point that:
> > 1) It is a new functionality optional to use.
> >  The app is free not to use that functionality and still do the 
> > preparation itself
> >  (as it has to do it now).
> > All existing apps would keep working as expected without using that 
> > function.
> > Though if the app developer knows that for all HW models he plans to 
> > run on
> > tx_prep is implemented - he is free to use it.
> > 2) It would be difficult for Tomasz (and other Intel guys) to implement 
> > tx_prep()
> >  for all non-Intel HW that DPDK supports right now.
> >  We just don't have all the actual HW in stock and probably adequate 
> > knowledge of it.
> > So we depend here on the good will of other PMD mainaners/developers to 
> > implement
> > tx_prep() for these devices.
> > From other side, if it will be disabled by default, then, I think,
> > PMD developers just wouldn't be motivated to implement it.
> > So it will be left untested and unused forever.
> 
> Actually as another thought:
> Can we have it enabled by default, but mark it as experimental or so?
> If memory serves me right, we've done that for cryptodev in the past, no?

Cryptodev was a whole new library.
We won't play the game "find which function is experimental or not".

We should not enable a function until it is fully implemented.

If the user really understands that it will work only with few drivers
then he can change the build configuration himself.
Enabling in the default configuration is a message to say that it works
everywhere without any risk.
It's so simple that I don't even understand why I must argue for.

And by the way, it is late for 16.11.
I suggest to integrate it in the beginning of 17.02 cycle, with the hope
that you can convince other developers to implement it in other drivers,
so we could finally enable it in the default config.

Oh, and I don't trust that nobody were thinking that it would break testpmd
for non-Intel drivers.

[dpdk-dev] [PATCH] net/mlx5: fix handling of small mbuf sizes

2016-10-28 Thread Adrien Mazarguil

On Mon, Oct 24, 2016 at 11:10:59AM +0300, Raslan Darawsheh wrote:
> When mbufs are smaller than MRU, multi-segment support must be enabled to
> default set when not in promiscuous or allmulticast modes.
> 
> Fixes: 9964b965ad69 ("net/mlx5: re-add Rx scatter support")
> 
> Signed-off-by: Raslan Darawsheh 
> ---
>  drivers/net/mlx5/mlx5_rxq.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 4dc5cc3..62253ed 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -946,6 +946,12 @@ rxq_ctrl_setup(struct rte_eth_dev *dev, struct rxq_ctrl 
> *rxq_ctrl,
>   (void)conf; /* Thresholds configuration (ignored). */
>   /* Enable scattered packets support for this queue if necessary. */
>   assert(mb_len >= RTE_PKTMBUF_HEADROOM);
> + /* If smaller than MRU, multi-segment support must be enabled. */
> + if (mb_len < (priv->mtu > dev->data->dev_conf.rxmode.max_rx_pkt_len ?
> +  dev->data->dev_conf.rxmode.max_rx_pkt_len :
> +  priv->mtu
> +  ))

Let's move poor "))" to the end of the previous line.

> + dev->data->dev_conf.rxmode.jumbo_frame = 1;
>   if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
>   (dev->data->dev_conf.rxmode.max_rx_pkt_len >
>(mb_len - RTE_PKTMBUF_HEADROOM))) {
> -- 
> 1.9.1

Besides the above comment:

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] Tcpdump

2016-10-28 Thread Pattan, Reshma

Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Dror Birkman
> Sent: Thursday, October 27, 2016 1:25 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Tcpdump
> 
> Hi,
> 
> I have a DPDK application that binds to an interface and processes packets.
> For debugging purposes I want to run tcpdump on this interface.

DPDK  provides dpdk-pdump tool for capturing the packets to pcap file, which 
can be passed to tcpdump like tool to view the packets.
Please refer below DPDK documentation on the same.  If you have further 
questions on the usage, please do send mail on to users at dpdk.org.
It does effect the performance so it is recommended to use it for debugging 
purposes.

http://dpdk.org/doc/guides/prog_guide/pdump_lib.html
http://dpdk.org/doc/guides/sample_app_ug/pdump.html

Thanks,
Reshma

> 
> IYO, what is my best option with hurting the performance of the application 
> too
> much?


> 
> TIA,
> Dror

[dpdk-dev] [PATCH] net/mlx5: fix default set for multicast traffic

2016-10-28 Thread Adrien Mazarguil

On Mon, Oct 24, 2016 at 10:59:14AM +0300, Raslan Darawsheh wrote:
> Remove non-IPv6 multicast traffic with destination MAC 33:33:* from the
> default set when not in promiscuous or allmulticast modes.
> 
> Fixes: 0497ddaac511 ("mlx5: add special flows for broadcast and IPv6 
> multicast")
> 
> Signed-off-by: Raslan Darawsheh 
> ---
>  drivers/net/mlx5/mlx5_rxmode.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
> index 173e6e8..4ffe703 100644
> --- a/drivers/net/mlx5/mlx5_rxmode.c
> +++ b/drivers/net/mlx5/mlx5_rxmode.c
> @@ -104,7 +104,6 @@ static const struct special_flow_init special_flow_init[] 
> = {
>   .hash_types =
>   1 << HASH_RXQ_UDPV6 |
>   1 << HASH_RXQ_IPV6 |
> - 1 << HASH_RXQ_ETH |
>   0,
>   .per_vlan = 1,
>   },
> -- 
> 1.9.1

(NACK)

While technically correct, looks like this patch sometimes break IPv6
multicast traffic as well, let's drop it until we figure out the reason.

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] Solarflare PMD submission question

2016-10-28 Thread Andrew Rybchenko

Thomas,

On 10/27/2016 01:37 PM, Thomas Monjalon wrote:
> First of all, welcome to DPDK!

Thanks!

> 2016-10-27 09:34, Andrew Rybchenko:
>> we would like to include Solarflare libefx-based PMD in the DPDK 17.02
>> and start the upstreaming process.
>> The driver supports Solarflare SFN7xxx and SFN8xxx families of 10/40
>> Gbps adapters.
>> The driver has base driver. It is just fresh version of the same code
>> which is used in the FreeBSD [1], illumos [2] and some other Solarflare
>> drivers.
> Unfortunately it is common to have some big base drivers in DPDK.
> Note that some PMD rely on their kernel counterpart for the control path.
> It is a way to avoid code duplication.

Linux kernel sfc driver has control path functionality, but technically
it is a different code.

> As far as I understand, it is easier to share queues with DPDK from kernel
> when the device supports an IOMMU.
>
>> The question is how to submit the base driver which is pretty big. Mail
>> size of the patch which imports it is about 2 Mb.
> First answer is a question:
> Have you thought about cooperating with the kernel driver for your PMD?

Yes, we considered it, but decided that we need pure userspace driver since
the approach has its advantages: no specific dependencies from kernel,
the same PMD for Linux and FreeBSD etc.

> If you really cannot use this approach, then we have to maintain this
> whole base driver in DPDK.
> It will be easier to read, understand and reference if it is a bit split.
> Could you try to send it as 10 to 20 patches explaining the role of each
> part and giving some design details?

First of all I'd like to double check that it is clear that we discuss 
libefx
(base driver in terms of DPDK) import here. The PMD itself is already split
in 20+ patches.
The only thing which comes to my mind is to split libefx import on subsystem
basis (few files per subsystem). It is artificial and added files will 
be abandoned
until the patch which adds them into build. It could be something like:
  1. External interfaces definition
  2. Internal interfaces definition
  3. Registers definition (hardware interface)
  4. Management CPU interface definition (it is one file, but still big 
650K)
  5. Management CPU interface implementation
and so on for NIC global controls, interrupts, event queue, transmit, 
receive,
  filtering etc.

> It would be also really appreciated to provide a design documentation
> in doc/guides/nics. Are the datasheets open? A link in the doc would help.

We have a documentation which grows together with supported features,
but it is rather for users. Important design decisions (not so many) are
documented nearby corresponding code. Unfortunately there is no open
datasheets. Management CPU interface definition has comments.

> Please be prepare to work on several iterations of the patch series.

We have already passed a number of iterations internally, so it will not 
frighten.

> PS: the mailing list put emails exceeding 300KB into a moderation queue.

Nice to know that it is not completely rejected, since even if we split as

described above, we still have one candidate which will end-up in moderation

queue.

Thanks,

Andrew.

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Van Haaren, Harry

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> Sent: Tuesday, October 25, 2016 6:49 PM

> 
> Hi Community,
> 
> So far, I have received constructive feedback from Intel, NXP and Linaro 
> folks.
> Let me know, if anyone else interested in contributing to the definition of 
> eventdev?
> 
> If there are no major issues in proposed spec, then Cavium would like work on
> implementing and up-streaming the common code(lib/librte_eventdev/) and
> an associated HW driver.(Requested minor changes of v2 will be addressed
> in next version).


Hi All,

I've been looking at the eventdev API from a use-case point of view, and I'm 
unclear on a how the API caters for two uses. I have simplified these as much 
as possible, think of them as a theoretical unit-test for the API :)


Fragmentation:
1. Dequeue 8 packets
2. Process 2 packets
3. Processing 3rd, this packet needs fragmentation into two packets
4. Process remaining 5 packets as normal

What function calls does the application make to achieve this?
In particular, I'm referring to how can the scheduler know that the 3rd packet 
is the one being fragmented, and how to keep packet order valid. 


Dropping packets:
1. Dequeue 8 packets
2. Process 2 packets
3. Processing 3rd, this packet needs to be dropped
4. Process remaining 5 packets as normal

What function calls does the application make to achieve this?
Again, in particular how does the scheduler know that the 3rd packet is being 
dropped.


Regards, -Harry

[dpdk-dev] [PATCH] app/test: fix wrong pointer values in crypto perftest

2016-10-28 Thread Trahe, Fiona

Hi Arek, 

> -Original Message-
> From: Kusztal, ArkadiuszX
> Sent: Friday, October 28, 2016 12:37 PM
> To: dev at dpdk.org
> Cc: Trahe, Fiona ; De Lara Guarch, Pablo
> ; Griffin, John  intel.com>;
> Jain, Deepak K ; Kusztal, ArkadiuszX
> 
> Subject: [PATCH] app/test: fix wrong pointer values in crypto perftest
> 
> This commit fixes problem with device hanging because of wrong pointer
> values in snow3g performance test
> 
> Fixes: 97fe6461c7cb ("app/test: add SNOW 3G performance test")
> 
> Signed-off-by: Arek Kusztal 
> ---

Can you add to resolved issues section of release notes please.

[dpdk-dev] mbuf changes

2016-10-28 Thread Pattan, Reshma

Hi Olivier,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, October 25, 2016 1:49 PM
> To: Richardson, Bruce ; Morten Br?rup
> 
> Cc: Adrien Mazarguil ; Wiles, Keith
> ; dev at dpdk.org; Oleg Kuporosov
> 
> Subject: Re: [dpdk-dev] mbuf changes
> 
> 
> 
> On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote:
> >> Comments at the end.
> >>
> >> Med venlig hilsen / kind regards
> >> - Morten Br?rup
> >>
> >>> -Original Message-
> >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> >>> Sent: Tuesday, October 25, 2016 2:20 PM
> >>> To: Morten Br?rup
> >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier Matz; Oleg
> >>> Kuporosov
> >>> Subject: Re: [dpdk-dev] mbuf changes
> >>>
> >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote:
>  Comments inline.
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > Richardson
> > Sent: Tuesday, October 25, 2016 1:14 PM
> > To: Adrien Mazarguil
> > Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz; Oleg
> > Kuporosov
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote:
> >> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote:
> >>> Comments inline.
> >>>
> >>> Med venlig hilsen / kind regards
> >>> - Morten Br?rup
> >>>
> >>>
>  -Original Message-
>  From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
>  Sent: Tuesday, October 25, 2016 11:39 AM
>  To: Bruce Richardson
>  Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz;
>  Oleg Kuporosov
>  Subject: Re: [dpdk-dev] mbuf changes
> 
>  On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> >>> wrote:
> > On Mon, Oct 24, 2016 at 04:11:33PM +, Wiles, Keith
> >>> wrote:
>  [...]
> >>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup
>   wrote:
>  [...]
> 
> > One other point I'll mention is that we need to have a
> > discussion on how/where to add in a timestamp value into
> >>> the
> > mbuf. Personally, I think it can be in a union with the
> > sequence
> > number value, but I also suspect that 32-bits of a
> >>> timestamp
> > is not going to be enough for
>  many.
> >
> > Thoughts?
> 
>  If we consider that timestamp representation should use
> > nanosecond
>  granularity, a 32-bit value may likely wrap around too
> >>> quickly
>  to be useful. We can also assume that applications requesting
>  timestamps may care more about latency than throughput, Oleg
> > found
>  that using the second cache line for this purpose had a
> > noticeable impact [1].
> 
>   [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> >>>
> >>> I agree with Oleg about the latency vs. throughput importance
> >>> for
> > such applications.
> >>>
> >>> If you need high resolution timestamps, consider them to be
> > generated by the NIC RX driver, possibly by the hardware itself
> > (http://w3new.napatech.com/features/time-precision/hardware-time-
> > stamp), so the timestamp belongs in the first cache line. And I am
> > proposing that it should have the highest possible accuracy, which
> > makes the value hardware dependent.
> >>>
> >>> Furthermore, I am arguing that we leave it up to the
> >>> application
> >>> to
> > keep track of the slowly moving bits (i.e. counting whole seconds,
> > hours and calendar date) out of band, so we don't use precious
> >>> space
> > in the mbuf. The application doesn't need the NIC RX driver's fast
> > path to capture which date (or even which second) a packet was
> > received. Yes, it adds complexity to the application, but we can't
> > set aside 64 bit for a generic timestamp. Or as a weird tradeoff:
> > Put the fast moving 32 bit in the first cache line and the slow
> > moving 32 bit in the second cache line, as a placeholder for the
> >>> application to fill out if needed.
> > Yes, it means that the application needs to check the time and
> > update its variable holding the slow moving time once every second
> > or so; but that should be doable without significant effort.
> >>
> >> That's a good point, however without a 64 bit value, elapsed time
> >> between two arbitrary mbufs cannot be measured reliably due to
> >>> not
> >> enough context, one way or another the low resolution value is
> >> also
> > needed.
> >>
> >> Obviously latency-sensitive applications are unlikely to perform
> >> lengthy

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Van Haaren, Harry

> From: Vincent Jardin [mailto:vincent.jardin at 6wind.com]
> Sent: Wednesday, October 26, 2016 7:37 PM
> Le 26 octobre 2016 2:11:26 PM "Van Haaren, Harry"
>  a ?crit :
> 
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> >>
> >> So far, I have received constructive feedback from Intel, NXP and Linaro 
> >> folks.
> >> Let me know, if anyone else interested in contributing to the definition of
> >> eventdev?
> >>
> >> If there are no major issues in proposed spec, then Cavium would like work 
> >> on
> >> implementing and up-streaming the common code(lib/librte_eventdev/) and
> >> an associated HW driver.(Requested minor changes of v2 will be addressed
> >> in next version).
> >
> > Hi All,
> >
> > I will propose a minor change to the rte_event struct, allowing some bits
> > to be implementation specific. Currently the rte_event struct has no space
> > to allow an implementation store any metadata about the event. For software
> > performance it would be really helpful if there are some bits available for
> > the implementation to keep some flags about each event.
> >
> > I suggest to rework the struct as below which opens 6 bits that were
> > otherwise wasted, and define them as implementation specific. By
> > implementation specific it is understood that the implementation can
> > overwrite any information stored in those bits, and the application must
> > not expect the data to remain after the event is scheduled.
> >
> > OLD:
> > struct rte_event {
> > uint32_t flow_id:24;
> > uint32_t queue_id:8;
> > uint8_t  sched_type; /* Note only 2 bits of 8 are required */
> >
> > NEW:
> > struct rte_event {
> > uint32_t flow_id:24;
> > uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the
> > enqueue types Ordered,Atomic,Parallel.*/
> > uint32_t implementation:6; /* available for implementation specific
> > metadata */
> > uint8_t queue_id; /* still 8 bits as before */
> 
> Bitfileds are efficients on Octeon. What's about other CPUs you have in
> mind? x86 is not as efficient.

Given the rte_event struct is 16 bytes and there's no free space to use, I see 
no alternative than using bitfields in this case. Wecloming suggestions of a 
better way to layout the structure to avoid the bitfield.

Regards, -Harry

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Ananyev, Konstantin


> 
> 2016-10-28 11:34, Ananyev, Konstantin:
> > > > 2016-10-27 16:24, Ananyev, Konstantin:
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > --- a/config/common_base
> > > > > > > > > +++ b/config/common_base
> > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > >
> > > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > > >
> > > > > > > Not sure why?
> > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as 
> > > > > > > noop.
> > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > >
> > > > > > If it is not implemented, the application must do the preparation 
> > > > > > by itself.
> > > > > > From patch 6:
> > > > > > "
> > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > application and used Tx preparation API for packet preparation and
> > > > > > verification.
> > > > > > "
> > > > > > So how does it behave with other drivers?
> > > > >
> > > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel 
> > > > > drivers..
> > > > > My bad, missed that part completely.
> > > > > Yes, then I suppose for now we'll need to support both (with and 
> > > > > without) code paths for testpmd.
> > > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > > Any other suggestions?
> > > >
> > > > Please think how we can use it in every applications.
> > > > It is not ready.
> > > > Either we introduce the API without enabling it, or we implement it
> > > > in every drivers.
> > >
> > > I understand your position here, but just like to point that:
> > > 1) It is a new functionality optional to use.
> > >  The app is free not to use that functionality and still do the 
> > > preparation itself
> > >  (as it has to do it now).
> > > All existing apps would keep working as expected without using that 
> > > function.
> > > Though if the app developer knows that for all HW models he plans to 
> > > run on
> > > tx_prep is implemented - he is free to use it.
> > > 2) It would be difficult for Tomasz (and other Intel guys) to 
> > > implement tx_prep()
> > >  for all non-Intel HW that DPDK supports right now.
> > >  We just don't have all the actual HW in stock and probably adequate 
> > > knowledge of it.
> > > So we depend here on the good will of other PMD mainaners/developers 
> > > to implement
> > > tx_prep() for these devices.
> > > From other side, if it will be disabled by default, then, I think,
> > > PMD developers just wouldn't be motivated to implement it.
> > > So it will be left untested and unused forever.
> >
> > Actually as another thought:
> > Can we have it enabled by default, but mark it as experimental or so?
> > If memory serves me right, we've done that for cryptodev in the past, no?
> 
> Cryptodev was a whole new library.
> We won't play the game "find which function is experimental or not".
> 
> We should not enable a function until it is fully implemented.
> 
> If the user really understands that it will work only with few drivers
> then he can change the build configuration himself.
> Enabling in the default configuration is a message to say that it works
> everywhere without any risk.
> It's so simple that I don't even understand why I must argue for.
> 
> And by the way, it is late for 16.11.

Ok, I understand your concern about enabling it by default and testpmd breakage,
but what else you believe is not ready? 

> I suggest to integrate it in the beginning of 17.02 cycle, with the hope
> that you can convince other developers to implement it in other drivers,
> so we could finally enable it in the default config.

Ok, any insights then, how we can convince people to do that?
BTW,  it means then that tx_prep() should become part of mandatory API
to be implemented by each PMD doing TX offloads, right?   

> 
> Oh, and I don't trust that nobody were thinking that it would break testpmd
> for non-Intel drivers.

Well, believe it or not, but yes, I missed that one.
I think I already admitted that it was my fault, and apologized for that.
But sure, it is your choice to trust me here or not.
Konstantin

[dpdk-dev] [PATCH] app/test: fix wrong pointer values in crypto perftest

2016-10-28 Thread Arek Kusztal

This commit fixes problem with device hanging because of
wrong pointer values in snow3g performance test

Fixes: 97fe6461c7cb ("app/test: add SNOW 3G performance test")

Signed-off-by: Arek Kusztal 
---
 app/test/test_cryptodev_perf.c | 101 -
 1 file changed, 90 insertions(+), 11 deletions(-)

diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c
index 53dd8f5..59a6891 100644
--- a/app/test/test_cryptodev_perf.c
+++ b/app/test/test_cryptodev_perf.c
@@ -2565,6 +2565,8 @@ test_perf_create_aes_sha_session(uint8_t dev_id, enum 
chain_mode chain,
}
 }

+#define SNOW3G_CIPHER_IV_LENGTH 16
+
 static struct rte_cryptodev_sym_session *
 test_perf_create_snow3g_session(uint8_t dev_id, enum chain_mode chain,
enum rte_crypto_cipher_algorithm cipher_algo, unsigned 
cipher_key_len,
@@ -2587,6 +2589,7 @@ test_perf_create_snow3g_session(uint8_t dev_id, enum 
chain_mode chain,
auth_xform.auth.op = RTE_CRYPTO_AUTH_OP_GENERATE;
auth_xform.auth.algo = auth_algo;

+   auth_xform.auth.add_auth_data_length = SNOW3G_CIPHER_IV_LENGTH;
auth_xform.auth.key.data = snow3g_hash_key;
auth_xform.auth.key.length =  get_auth_key_max_length(auth_algo);
auth_xform.auth.digest_length = get_auth_digest_length(auth_algo);
@@ -2686,8 +2689,6 @@ test_perf_create_openssl_session(uint8_t dev_id, enum 
chain_mode chain,
 #define TRIPLE_DES_BLOCK_SIZE 8
 #define TRIPLE_DES_CIPHER_IV_LENGTH 8

-#define SNOW3G_CIPHER_IV_LENGTH 16
-
 static struct rte_mbuf *
 test_perf_create_pktmbuf(struct rte_mempool *mpool, unsigned buf_sz)
 {
@@ -2812,6 +2813,69 @@ test_perf_set_crypto_op_snow3g(struct rte_crypto_op *op, 
struct rte_mbuf *m,
 }

 static inline struct rte_crypto_op *
+test_perf_set_crypto_op_snow3g_cipher(struct rte_crypto_op *op,
+   struct rte_mbuf *m,
+   struct rte_cryptodev_sym_session *sess,
+   unsigned data_len)
+{
+   if (rte_crypto_op_attach_sym_session(op, sess) != 0) {
+   rte_crypto_op_free(op);
+   return NULL;
+   }
+
+   /* Cipher Parameters */
+   op->sym->cipher.iv.data = rte_pktmbuf_mtod(m, uint8_t *);
+   op->sym->cipher.iv.length = SNOW3G_CIPHER_IV_LENGTH;
+   rte_memcpy(op->sym->cipher.iv.data, snow3g_iv, SNOW3G_CIPHER_IV_LENGTH);
+   op->sym->cipher.iv.phys_addr = rte_pktmbuf_mtophys(m);
+
+   op->sym->cipher.data.offset = SNOW3G_CIPHER_IV_LENGTH;
+   op->sym->cipher.data.length = data_len << 3;
+
+   op->sym->m_src = m;
+
+   return op;
+}
+
+
+static inline struct rte_crypto_op *
+test_perf_set_crypto_op_snow3g_hash(struct rte_crypto_op *op,
+   struct rte_mbuf *m,
+   struct rte_cryptodev_sym_session *sess,
+   unsigned data_len,
+   unsigned digest_len)
+{
+   if (rte_crypto_op_attach_sym_session(op, sess) != 0) {
+   rte_crypto_op_free(op);
+   return NULL;
+   }
+
+   /* Authentication Parameters */
+
+   op->sym->auth.digest.data =
+   (uint8_t *)rte_pktmbuf_mtod_offset(m, uint8_t *,
+   data_len);
+   op->sym->auth.digest.phys_addr =
+   rte_pktmbuf_mtophys_offset(m, data_len +
+   SNOW3G_CIPHER_IV_LENGTH);
+   op->sym->auth.digest.length = digest_len;
+   op->sym->auth.aad.data = rte_pktmbuf_mtod(m, uint8_t *);
+   op->sym->auth.aad.length = SNOW3G_CIPHER_IV_LENGTH;
+   rte_memcpy(op->sym->auth.aad.data, snow3g_iv,
+   SNOW3G_CIPHER_IV_LENGTH);
+   op->sym->auth.aad.phys_addr = rte_pktmbuf_mtophys(m);
+
+   /* Data lengths/offsets Parameters */
+   op->sym->auth.data.offset = SNOW3G_CIPHER_IV_LENGTH;
+   op->sym->auth.data.length = data_len << 3;
+
+   op->sym->m_src = m;
+
+   return op;
+}
+
+
+static inline struct rte_crypto_op *
 test_perf_set_crypto_op_3des(struct rte_crypto_op *op, struct rte_mbuf *m,
struct rte_cryptodev_sym_session *sess, unsigned int data_len,
unsigned int digest_len)
@@ -3017,9 +3081,14 @@ test_perf_snow3g(uint8_t dev_id, uint16_t queue_id,

/* Generate a burst of crypto operations */
for (i = 0; i < (pparams->burst_size * NUM_MBUF_SETS); i++) {
+   /*
+* Buffer size + iv/aad len is allocated, for perf tests they
+* are equal + digest len.
+*/
mbufs[i] = test_perf_create_pktmbuf(
ts_params->mbuf_mp,
-   pparams->buf_size);
+   pparams->buf_size + SNOW3G_CIPHER_IV_LENGTH +
+   digest_length);

if (mbufs[i] == NULL) {
printf("\nFailed to get mbuf - freeing the rest.\n");
@@ -3049,12 +3118,22 @@ test_perf_snow3g(uint8_t dev_id, uint16_t

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Thomas Monjalon

2016-10-28 10:15, Ananyev, Konstantin:
> > From: Ananyev, Konstantin
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > >
> > > > > > We cannot enable it until it is implemented in every drivers.
> > > > >
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without)
> > > code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> > 
> > I had sent txprep engine in v2 
> > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the 
> > suggestions. If you like it I can resent
> > it in place of csumonly modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep
> or so? 
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?

No please no!
The problem is not in testpmd.
The problem is in every applications.
Should we prepare the checksums or let tx_prep do it?
The result will depend of the driver used.

[dpdk-dev] [PATCH v3 2/2] net/i40e: fix VF bonded device link down

2016-10-28 Thread Qiming Yang

If VF device is used as slave of a bond device, it will be polled
periodically through alarm. Interrupt is involved here. And then
VF will send I40E_VIRTCHNL_OP_GET_LINK_STAT message to
PF to query the status. The response is handled by interrupt
callback. Interrupt is involved here again. That's why bond
device cannot bring up.

This patch removes I40E_VIRTCHNL_OP_GET_LINK_STAT
message. Link status in VF driver will be updated when PF driver
notify it, and VF stores this link status locally. VF driver just
returns the local status when being required.

Fixes: 4861cde46116 ("i40e: new poll mode driver")

Signed-off-by: Qiming Yang 
---
Change in v3:
* resolved the conflict with other changes, rework based
on version: 16.11-rc2
---
---
 drivers/net/i40e/i40e_ethdev.c| 22 ++-
 drivers/net/i40e/i40e_ethdev.h|  4 +-
 drivers/net/i40e/i40e_ethdev_vf.c | 81 +--
 drivers/net/i40e/i40e_pf.c| 33 
 drivers/net/i40e/i40e_pf.h|  3 +-
 5 files changed, 67 insertions(+), 76 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 078c581..99183b1 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5441,6 +5441,24 @@ i40e_dev_handle_vfr_event(struct rte_eth_dev *dev)
 }

 static void
+i40e_notify_all_vfs_link_status(struct rte_eth_dev *dev)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   struct i40e_virtchnl_pf_event event;
+   int i;
+
+   event.event = I40E_VIRTCHNL_EVENT_LINK_CHANGE;
+   event.event_data.link_event.link_status =
+   dev->data->dev_link.link_status;
+   event.event_data.link_event.link_speed =
+   dev->data->dev_link.link_speed;
+
+   for (i = 0; i < pf->vf_num; i++)
+   i40e_pf_host_send_msg_to_vf(>vfs[i], I40E_VIRTCHNL_OP_EVENT,
+   I40E_SUCCESS, (uint8_t *), sizeof(event));
+}
+
+static void
 i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
 {
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -5478,9 +5496,11 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
break;
case i40e_aqc_opc_get_link_status:
ret = i40e_dev_link_update(dev, 0);
-   if (!ret)
+   if (!ret) {
+   i40e_notify_all_vfs_link_status(dev);
_rte_eth_dev_callback_process(dev,
RTE_ETH_EVENT_INTR_LSC, NULL);
+   }
break;
default:
PMD_DRV_LOG(ERR, "Request %u is not supported yet",
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 24b8580..298cef4 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -609,7 +609,9 @@ int i40e_hash_filter_inset_select(struct i40e_hw *hw,
 struct rte_eth_input_set_conf *conf);
 int i40e_fdir_filter_inset_select(struct i40e_pf *pf,
 struct rte_eth_input_set_conf *conf);
-
+int i40e_pf_host_send_msg_to_vf(struct i40e_pf_vf *vf, uint32_t opcode,
+   uint32_t retval, uint8_t *msg,
+   uint16_t msglen);
 void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_rxq_info *qinfo);
 void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 4b835cb..aa306d6 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -126,8 +126,6 @@ static void i40evf_dev_promiscuous_enable(struct 
rte_eth_dev *dev);
 static void i40evf_dev_promiscuous_disable(struct rte_eth_dev *dev);
 static void i40evf_dev_allmulticast_enable(struct rte_eth_dev *dev);
 static void i40evf_dev_allmulticast_disable(struct rte_eth_dev *dev);
-static int i40evf_get_link_status(struct rte_eth_dev *dev,
- struct rte_eth_link *link);
 static int i40evf_init_vlan(struct rte_eth_dev *dev);
 static int i40evf_dev_rx_queue_start(struct rte_eth_dev *dev,
 uint16_t rx_queue_id);
@@ -1084,31 +1082,6 @@ i40evf_del_vlan(struct rte_eth_dev *dev, uint16_t vlanid)
return err;
 }

-static int
-i40evf_get_link_status(struct rte_eth_dev *dev, struct rte_eth_link *link)
-{
-   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
-   int err;
-   struct vf_cmd_info args;
-   struct rte_eth_link *new_link;
-
-   args.ops = (enum i40e_virtchnl_ops)I40E_VIRTCHNL_OP_GET_LINK_STAT;
-   args.in_args = NULL;
-   args.in_args_size = 0;
-   args.out_buffer = vf->aq_resp;
-   args.out_size = I40E_AQ_BUF_SZ;
-   err =

[dpdk-dev] [PATCH v3 1/2] net/i40e: fix link status change interrupt

2016-10-28 Thread Qiming Yang

Previously, link status interrupt in i40e is achieved by checking
LINK_STAT_CHANGE_MASK in PFINT_ICR0 register which is provided only
for diagnostic use. Instead, drivers need to get the link status
change notification by using LSE (Link Status Event).

This patch enables LSE and calls LSC callback when the event is
received. This patch also removes the processing on
LINK_STAT_CHANGE_MASK.

Fixes: 4861cde46116 ("i40e: new poll mode driver")

Signed-off-by: Qiming Yang 
---
Change in v3:
* resolved the conflict with other changes, rework based
on version: 16.11-rc2
---
---
 drivers/net/i40e/i40e_ethdev.c | 96 +-
 1 file changed, 19 insertions(+), 77 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bb81b15..078c581 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -108,7 +108,6 @@
I40E_PFINT_ICR0_ENA_GRST_MASK | \
I40E_PFINT_ICR0_ENA_PCI_EXCEPTION_MASK | \
I40E_PFINT_ICR0_ENA_STORM_DETECT_MASK | \
-   I40E_PFINT_ICR0_ENA_LINK_STAT_CHANGE_MASK | \
I40E_PFINT_ICR0_ENA_HMC_ERR_MASK | \
I40E_PFINT_ICR0_ENA_PE_CRITERR_MASK | \
I40E_PFINT_ICR0_ENA_VFLR_MASK | \
@@ -1777,6 +1776,16 @@ i40e_dev_start(struct rte_eth_dev *dev)
if (dev->data->dev_conf.intr_conf.lsc != 0)
PMD_INIT_LOG(INFO, "lsc won't enable because of"
 " no intr multiplex\n");
+   } else if (dev->data->dev_conf.intr_conf.lsc != 0) {
+   ret = i40e_aq_set_phy_int_mask(hw,
+  ~(I40E_AQ_EVENT_LINK_UPDOWN |
+  I40E_AQ_EVENT_MODULE_QUAL_FAIL |
+  I40E_AQ_EVENT_MEDIA_NA), NULL);
+   if (ret != I40E_SUCCESS)
+   PMD_DRV_LOG(WARNING, "Fail to set phy mask");
+
+   /* Call get_link_info aq commond to enable LSE */
+   i40e_dev_link_update(dev, 0);
}

/* enable uio intr after callback register */
@@ -1995,6 +2004,7 @@ i40e_dev_link_update(struct rte_eth_dev *dev,
struct rte_eth_link link, old;
int status;
unsigned rep_cnt = MAX_REPEAT_TIME;
+   bool enable_lse = dev->data->dev_conf.intr_conf.lsc ? true : false;

memset(, 0, sizeof(link));
memset(, 0, sizeof(old));
@@ -2003,7 +2013,8 @@ i40e_dev_link_update(struct rte_eth_dev *dev,

do {
/* Get link status information from hardware */
-   status = i40e_aq_get_link_info(hw, false, _status, NULL);
+   status = i40e_aq_get_link_info(hw, enable_lse,
+   _status, NULL);
if (status != I40E_SUCCESS) {
link.link_speed = ETH_SPEED_NUM_100M;
link.link_duplex = ETH_LINK_FULL_DUPLEX;
@@ -5465,6 +5476,12 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
info.msg_buf,
info.msg_len);
break;
+   case i40e_aqc_opc_get_link_status:
+   ret = i40e_dev_link_update(dev, 0);
+   if (!ret)
+   _rte_eth_dev_callback_process(dev,
+   RTE_ETH_EVENT_INTR_LSC, NULL);
+   break;
default:
PMD_DRV_LOG(ERR, "Request %u is not supported yet",
opcode);
@@ -5474,57 +5491,6 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
rte_free(info.msg_buf);
 }

-/*
- * Interrupt handler is registered as the alarm callback for handling LSC
- * interrupt in a definite of time, in order to wait the NIC into a stable
- * state. Currently it waits 1 sec in i40e for the link up interrupt, and
- * no need for link down interrupt.
- */
-static void
-i40e_dev_interrupt_delayed_handler(void *param)
-{
-   struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t icr0;
-
-   /* read interrupt causes again */
-   icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
-
-#ifdef RTE_LIBRTE_I40E_DEBUG_DRIVER
-   if (icr0 & I40E_PFINT_ICR0_ECC_ERR_MASK)
-   PMD_DRV_LOG(ERR, "ICR0: unrecoverable ECC error\n");
-   if (icr0 & I40E_PFINT_ICR0_MAL_DETECT_MASK)
-   PMD_DRV_LOG(ERR, "ICR0: malicious programming detected\n");
-   if (icr0 & I40E_PFINT_ICR0_GRST_MASK)
-   PMD_DRV_LOG(INFO, "ICR0: global reset requested\n");
-   if (icr0 & I40E_PFINT_ICR0_PCI_EXCEPTION_MASK)
-   PMD_DRV_LOG(INFO, "ICR0: PCI exception\n activated\n");
-   if (icr0 & I40E_PFINT_ICR0_STORM_DETECT_MASK)
-

[dpdk-dev] [PATCH v2 1/1] mempool: Add sanity check when secondary link in less mempools than primary

2016-10-28 Thread Jean Tourrilhes

If the mempool ops the caller wants to use is not registered, the
library will segfault in an obscure way when trying to use that
mempool. It's better to catch it early and warn the user.

If the primary and secondary process were build using different build
systems, the list of constructors included by the linker in each
binary might be different. Mempools are registered via constructors, so
the linker magic will directly impact which tailqs are registered with
the primary and the secondary.
DPDK currently assumes that the secondary has a superset of the
mempools registered at the primary, and they are in the same order
(same index in primary and secondary). In some build scenario, the
secondary might not initialise any mempools at all.

This would also catch cases where there is a bug in the mempool
registration, or some memory corruptions, but this has not been
observed.

Signed-off-by: Jean Tourrilhes 
---
 lib/librte_mempool/rte_mempool.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2e28e2e..82260cc 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -1275,6 +1275,25 @@ rte_mempool_lookup(const char *name)
return NULL;
}

+   /* Sanity check : secondary may have initialised less mempools
+* than primary due to linker and constructor magic. Or maybe
+* there is a mempool corruption or bug. In any case, we can't
+* go on, we will segfault in an obscure way.
+* This does not detect the case where the constructor order
+* is different between primary and secondary and where the
+* index points to the wrong ops. This would require more
+* extensive changes, and is much less likely.
+* Jean II */
+   if(mp->ops_index >= (int32_t) rte_mempool_ops_table.num_ops) {
+   unsigned i;
+   /* Dump list of mempool ops for further investigation. */
+   for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
+   RTE_LOG(ERR, EAL, "Registered mempool[%d] is %s\n", i, 
rte_mempool_ops_table.ops[i].name);
+   }
+   /* Do not dump mempool list itself, it will segfault. */
+   rte_panic("Cannot find ops for mempool, ops_index %d, num_ops 
%d - maybe due to build process or linker configuration\n", mp->ops_index, 
rte_mempool_ops_table.num_ops);
+   }
+
return mp;
 }

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Ananyev, Konstantin



> 
> Hi Thomasz,
> 
> >
> > 2016-10-27 16:24, Ananyev, Konstantin:
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > Hi Tomasz,
> > > > > >
> > > > > > This is a major new function in the API and I still have some 
> > > > > > comments.
> > > > > >
> > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > >
> > > > > > We cannot enable it until it is implemented in every drivers.
> > > > >
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by 
> > > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel 
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without) 
> > > code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> >
> > Please think how we can use it in every applications.
> > It is not ready.
> > Either we introduce the API without enabling it, or we implement it
> > in every drivers.
> 
> I understand your position here, but just like to point that:
> 1) It is a new functionality optional to use.
>  The app is free not to use that functionality and still do the 
> preparation itself
>  (as it has to do it now).
> All existing apps would keep working as expected without using that 
> function.
> Though if the app developer knows that for all HW models he plans to run 
> on
> tx_prep is implemented - he is free to use it.
> 2) It would be difficult for Tomasz (and other Intel guys) to implement 
> tx_prep()
>  for all non-Intel HW that DPDK supports right now.
>  We just don't have all the actual HW in stock and probably adequate 
> knowledge of it.
> So we depend here on the good will of other PMD mainaners/developers to 
> implement
> tx_prep() for these devices.
> From other side, if it will be disabled by default, then, I think,
> PMD developers just wouldn't be motivated to implement it.
> So it will be left untested and unused forever.

Actually as another thought:
Can we have it enabled by default, but mark it as experimental or so?
If memory serves me right, we've done that for cryptodev in the past, no?
Konstantin

> 
> >
> > > > > > >  struct rte_eth_dev {
> > > > > > >   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive 
> > > > > > > function. */
> > > > > > >   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit 
> > > > > > > function. */
> > > > > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare 
> > > > > > > function. */
> > > > > > >   struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > >   const struct eth_driver *driver;/**< Driver for this device */
> > > > > > >   const struct eth_dev_ops *dev_ops; /**< Functions exported by 
> > > > > > > PMD */
> > > > > >
> > > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > > I guess we want to have several implementations?
> > > > >
> > > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > > >
> > > > > >
> > > > > > Shouldn't we have a const struct control_dev_ops and a struct 
> > > > > > datapath_dev_ops?
> > > > >
> > > > > That's probably a good idea, but I suppose it is out of scope for 
> > > > > that patch.
> > > >
> > > > No it's not out of scope.
> > > > It answers to the question "why is it added in this structure and not 
> > > > dev_ops".
> > > > We won't do this change when nothing else is changed in the struct.
> > >
> > > Not sure I understood you here:
> > > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced 
> > > as part of that patch?
> > > But that's a lot of  changes all over rte_ethdev.[h,c].
> > > It definitely worse a separate patch (might be some discussion) for me.
> >
> > Yes it could be a separate patch in the same patchset.
> 
> Honestly, I think it is a good idea, but it is too late and too risky to do 
> such change right now.
> We are on RC2 right now, just few days before RC3...
> Can't that wait till 17.02?
> From my understanding - it is pure code restructuring, without any 
> functionality affected.
> Konstantin
>

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Ananyev, Konstantin

Hi Thomasz,

> 
> 2016-10-27 16:24, Ananyev, Konstantin:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > Hi Tomasz,
> > > > >
> > > > > This is a major new function in the API and I still have some 
> > > > > comments.
> > > > >
> > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > >
> > > > > We cannot enable it until it is implemented in every drivers.
> > > >
> > > > Not sure why?
> > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > Right now it is not mandatory for the PMD to implement it.
> > >
> > > If it is not implemented, the application must do the preparation by 
> > > itself.
> > > From patch 6:
> > > "
> > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > application and used Tx preparation API for packet preparation and
> > > verification.
> > > "
> > > So how does it behave with other drivers?
> >
> > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> > My bad, missed that part completely.
> > Yes, then I suppose for now we'll need to support both (with and without) 
> > code paths for testpmd.
> > Probably a new fwd mode or just extra parameter for the existing one?
> > Any other suggestions?
> 
> Please think how we can use it in every applications.
> It is not ready.
> Either we introduce the API without enabling it, or we implement it
> in every drivers.

I understand your position here, but just like to point that:
1) It is a new functionality optional to use.
 The app is free not to use that functionality and still do the preparation 
itself
 (as it has to do it now).
All existing apps would keep working as expected without using that 
function.
Though if the app developer knows that for all HW models he plans to run on
tx_prep is implemented - he is free to use it.
2) It would be difficult for Tomasz (and other Intel guys) to implement 
tx_prep()
 for all non-Intel HW that DPDK supports right now.
 We just don't have all the actual HW in stock and probably adequate 
knowledge of it.
So we depend here on the good will of other PMD mainaners/developers to 
implement
tx_prep() for these devices. 
From other side, if it will be disabled by default, then, I think,
PMD developers just wouldn't be motivated to implement it. 
So it will be left untested and unused forever.   

> 
> > > > > >  struct rte_eth_dev {
> > > > > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive 
> > > > > > function. */
> > > > > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit 
> > > > > > function. */
> > > > > > +   eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare 
> > > > > > function. */
> > > > > > struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > const struct eth_driver *driver;/**< Driver for this device */
> > > > > > const struct eth_dev_ops *dev_ops; /**< Functions exported by 
> > > > > > PMD */
> > > > >
> > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > I guess we want to have several implementations?
> > > >
> > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > >
> > > > >
> > > > > Shouldn't we have a const struct control_dev_ops and a struct 
> > > > > datapath_dev_ops?
> > > >
> > > > That's probably a good idea, but I suppose it is out of scope for that 
> > > > patch.
> > >
> > > No it's not out of scope.
> > > It answers to the question "why is it added in this structure and not 
> > > dev_ops".
> > > We won't do this change when nothing else is changed in the struct.
> >
> > Not sure I understood you here:
> > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced 
> > as part of that patch?
> > But that's a lot of  changes all over rte_ethdev.[h,c].
> > It definitely worse a separate patch (might be some discussion) for me.
> 
> Yes it could be a separate patch in the same patchset.

Honestly, I think it is a good idea, but it is too late and too risky to do 
such change right now.
We are on RC2 right now, just few days before RC3...
Can't that wait till 17.02?
>From my understanding - it is pure code restructuring, without any 
>functionality affected.
Konstantin

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Richardson, Bruce



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, October 28, 2016 11:29 AM
> To: Thomas Monjalon ; Kulasek, TomaszX
> 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Friday, October 28, 2016 11:22 AM
> > To: Ananyev, Konstantin ; Kulasek,
> > TomaszX 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> >
> > 2016-10-28 10:15, Ananyev, Konstantin:
> > > > From: Ananyev, Konstantin
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > --- a/config/common_base
> > > > > > > > > +++ b/config/common_base
> > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > >
> > > > > > > > We cannot enable it until it is implemented in every
> drivers.
> > > > > > >
> > > > > > > Not sure why?
> > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act
> as noop.
> > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > >
> > > > > > If it is not implemented, the application must do the
> > > > > > preparation by
> > > > > itself.
> > > > > > From patch 6:
> > > > > > "
> > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > application and used Tx preparation API for packet preparation
> > > > > > and verification.
> > > > > > "
> > > > > > So how does it behave with other drivers?
> > > > >
> > > > > Hmm so it seems that we broke testpmd csumonly mode for
> > > > > non-intel drivers..
> > > > > My bad, missed that part completely.
> > > > > Yes, then I suppose for now we'll need to support both (with and
> > > > > without) code paths for testpmd.
> > > > > Probably a new fwd mode or just extra parameter for the existing
> one?
> > > > > Any other suggestions?
> > > > >
> > > >
> > > > I had sent txprep engine in v2
> > > > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on
> > > > the suggestions. If you like it I can
> > resent
> > > > it in place of csumonly modification.
> > >
> > > I still not sure it is worth to have another version of csum...
> > > Can we introduce a new global variable in testpmd and a new command:
> > > testpmd> csum tx_prep
> > > or so?
> > > Looking at current testpmd patch, I suppose the changes will be
> minimal.
> > > What do you think?
> >
> > No please no!
> > The problem is not in testpmd.
> > The problem is in every applications.
> > Should we prepare the checksums or let tx_prep do it?
> 
> Not sure, I understood you...
> Right now we don't' change other apps.
> They would work as before.
> If people would like to start to use tx_prep in their apps - they are free
> to do that.
> If they like to keep doing that manually - that's fine too.
> From other side we need an ability to test (and demonstrate) that new
> functionality.
> So we do need changes in testpmd.
> Konstantin
> 

Just my 2c on this:
* given this is new functionality, and no apps are currently using it, I'm not 
sure I see the harm in having the function available by default. We just need 
to be clear about the limits of the function and the fact that apps need to do 
work themselves if the driver doesn't provide the function.
* having it enabled will then allow any apps that want to use it do to so.
* however, for our sample apps, and by default in testpmd, we *shouldn't* use 
this functionality, in the absence of any fallback, so that is where I would 
look to have the enable/disable switch, not in the library.
* going forward, I think a SW fallback inside the ethdev API itself would be a 
good addition to make this fully generic.

Hope this helps, [and also that I haven't missed some subtlety in the 
discussion!]

/Bruce

[dpdk-dev] [PATCH] net/bonding: not handle vlan slow packet

2016-10-28 Thread linhaifeng

If rx vlan offload is enable we should not handle vlan slow
packets too.

Signed-off-by: Haifeng Lin  
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 09ce7bf..7765017 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -169,10 +169,11 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
/* Remove packet from array if it is slow packet or slave is not
 * in collecting state or bondign interface is not in promiscus
 * mode and packet address does not match. */
-   if (unlikely(hdr->ether_type == ether_type_slow_be ||
+   if (unlikely(!bufs[j]->vlan_tci &&
+(hdr->ether_type == ether_type_slow_be ||
!collecting || (!promisc &&
!is_multicast_ether_addr(>d_addr) &&
-   !is_same_ether_addr(_mac, >d_addr {
+   !is_same_ether_addr(_mac, >d_addr) {

if (hdr->ether_type == ether_type_slow_be) {
bond_mode_8023ad_handle_slow_pkt(internals, slaves[i],
--
1.8.3.1

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Ananyev, Konstantin



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, October 28, 2016 11:22 AM
> To: Ananyev, Konstantin ; Kulasek, TomaszX 
> 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 2016-10-28 10:15, Ananyev, Konstantin:
> > > From: Ananyev, Konstantin
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > --- a/config/common_base
> > > > > > > > +++ b/config/common_base
> > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > >
> > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > >
> > > > > > Not sure why?
> > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as 
> > > > > > noop.
> > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > >
> > > > > If it is not implemented, the application must do the preparation by
> > > > itself.
> > > > > From patch 6:
> > > > > "
> > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > application and used Tx preparation API for packet preparation and
> > > > > verification.
> > > > > "
> > > > > So how does it behave with other drivers?
> > > >
> > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > > drivers..
> > > > My bad, missed that part completely.
> > > > Yes, then I suppose for now we'll need to support both (with and 
> > > > without)
> > > > code paths for testpmd.
> > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > Any other suggestions?
> > > >
> > >
> > > I had sent txprep engine in v2 
> > > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the 
> > > suggestions. If you like it I can
> resent
> > > it in place of csumonly modification.
> >
> > I still not sure it is worth to have another version of csum...
> > Can we introduce a new global variable in testpmd and a new command:
> > testpmd> csum tx_prep
> > or so?
> > Looking at current testpmd patch, I suppose the changes will be minimal.
> > What do you think?
> 
> No please no!
> The problem is not in testpmd.
> The problem is in every applications.
> Should we prepare the checksums or let tx_prep do it?

Not sure, I understood you...
Right now we don't' change other apps.
They would work as before.
If people would like to start to use tx_prep in their apps -
they are free to do that.
If they like to keep doing that manually - that's fine too.
>From other side we need an ability to test (and demonstrate) that new 
>functionality.
So we do need changes in testpmd.
Konstantin



> The result will depend of the driver used.

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Kulasek, TomaszX

Hi Konstantin,

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Friday, October 28, 2016 12:16
> To: Kulasek, TomaszX ; Thomas Monjalon
> 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> >
> > Hi
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, October 27, 2016 18:24
> > > To: Thomas Monjalon 
> > > Cc: Kulasek, TomaszX ; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > Sent: Thursday, October 27, 2016 5:02 PM
> > > > To: Ananyev, Konstantin 
> > > > Cc: Kulasek, TomaszX ; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > > >
> > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > >
> > > > > >
> > > > > > Hi Tomasz,
> > > > > >
> > > > > > This is a major new function in the API and I still have some
> > > comments.
> > > > > >
> > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > >
> > > > > > We cannot enable it until it is implemented in every drivers.
> > > > >
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as
> noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation
> > > > by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and
> > > without) code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> >
> > I had sent txprep engine in v2
> > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the
> suggestions. If you like it I can resent it in place of csumonly
> modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep
> or so?
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?
> Konstantin
> 

This is not a problem.

Tomasz

[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Ananyev, Konstantin

Hi Tomasz,

> 
> Hi
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Thursday, October 27, 2016 18:24
> > To: Thomas Monjalon 
> > Cc: Kulasek, TomaszX ; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> >
> >
> >
> > > -Original Message-
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Sent: Thursday, October 27, 2016 5:02 PM
> > > To: Ananyev, Konstantin 
> > > Cc: Kulasek, TomaszX ; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > >
> > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > >
> > > > >
> > > > > Hi Tomasz,
> > > > >
> > > > > This is a major new function in the API and I still have some
> > comments.
> > > > >
> > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > >
> > > > > We cannot enable it until it is implemented in every drivers.
> > > >
> > > > Not sure why?
> > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > Right now it is not mandatory for the PMD to implement it.
> > >
> > > If it is not implemented, the application must do the preparation by
> > itself.
> > > From patch 6:
> > > "
> > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > application and used Tx preparation API for packet preparation and
> > > verification.
> > > "
> > > So how does it behave with other drivers?
> >
> > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > drivers..
> > My bad, missed that part completely.
> > Yes, then I suppose for now we'll need to support both (with and without)
> > code paths for testpmd.
> > Probably a new fwd mode or just extra parameter for the existing one?
> > Any other suggestions?
> >
> 
> I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), 
> but I'm opened on the suggestions. If you like it I can resent
> it in place of csumonly modification.

I still not sure it is worth to have another version of csum...
Can we introduce a new global variable in testpmd and a new command:
testpmd> csum tx_prep
or so? 
Looking at current testpmd patch, I suppose the changes will be minimal.
What do you think?
Konstantin 

> 
> Tomasz
> 
> > >
> > > > > >  struct rte_eth_dev {
> > > > > > eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive
> > function. */
> > > > > > eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
> > > > > > function. */
> > > > > > +   eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit
> > > > > > +prepare function. */
> > > > > > struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > const struct eth_driver *driver;/**< Driver for this device */
> > > > > > const struct eth_dev_ops *dev_ops; /**< Functions exported by
> > > > > > PMD */
> > > > >
> > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > I guess we want to have several implementations?
> > > >
> > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > >
> > > > >
> > > > > Shouldn't we have a const struct control_dev_ops and a struct
> > datapath_dev_ops?
> > > >
> > > > That's probably a good idea, but I suppose it is out of scope for that
> > patch.
> > >
> > > No it's not out of scope.
> > > It answers to the question "why is it added in this structure and not
> > dev_ops".
> > > We won't do this change when nothing else is changed in the struct.
> >
> > Not sure I understood you here:
> > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced
> > as part of that patch?
> > But that's a lot of  changes all over rte_ethdev.[h,c].
> > It definitely worse a separate patch (might be some discussion) for me.
> > Konstantin
> >
> >

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-28 Thread Maxime Coquelin



On 10/28/2016 09:32 AM, Pierre Pfister (ppfister) wrote:
>
>> Le 27 oct. 2016 ? 12:19, Wang, Zhihong  a ?crit :
>>
>>
>>
>>> -Original Message-
>>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>>> Sent: Thursday, October 27, 2016 5:55 PM
>>> To: Wang, Zhihong ; Yuanhan Liu
>>> ; stephen at networkplumber.org; Pierre
>>> Pfister (ppfister) 
>>> Cc: Xie, Huawei ; dev at dpdk.org;
>>> vkaplans at redhat.com; mst at redhat.com
>>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
>>> to the TX path
>>>
>>>
>>>
>>> On 10/27/2016 11:10 AM, Maxime Coquelin wrote:
 Hi Zhihong,

 On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
> Hi Maxime,
>
> Seems indirect desc feature is causing serious performance
> degradation on Haswell platform, about 20% drop for both
> mrg=on and mrg=off (--txqflags=0xf00, non-vector version),
> both iofwd and macfwd.
 I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge
 platform, and didn't faced such a drop.
 Have you tried to pass indirect_desc=off to qemu cmdline to see if you
 recover the performance?

 Yuanhan, which platform did you use when you tested it with zero copy?

>
> I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz.
>
> Could you please verify if this is true in your test?
 I'll try -rc1/-rc2 on my platform, and let you know.
>>> As a first test, I tried again Txonly from the guest to the host (Rxonly),
>>> where Tx indirect descriptors are used, on my E5-2665 @2.40GHz:
>>> v16.11-rc1: 10.81Mpps
>>> v16.11-rc2: 10.91Mpps
>>>
>>> -rc2 is even slightly better in my case.
>>> Could you please run the same test on your platform?
>>
>> I mean to use rc2 as both host and guest, and compare the
>> perf between indirect=0 and indirect=1.
>>
>> I use PVP traffic, tried both testpmd and OvS as the forwarding
>> engine in host, and testpmd in guest.
>>
>> Thanks
>> Zhihong
>
> From my experience, and as Michael pointed out, the best mode for small 
> packets is obviously
> ANY_LAYOUT so it uses a single descriptor per packet.
Of course, having a single descriptor is in theory the best way.
But, in current Virtio PMD implementation, with no offload supported, we 
never access the virtio header at transmit time, it is allocated and
zeroed at startup.

For ANY_LAYOUT case, the virtio header is prepended to the packet, and
need to be zeroed at packet transmit time. The performance impact is
quite important, as show the measurements I made one month ago (txonly):
  - 2 descs per packet: 11.6Mpps
  - 1 desc per packet: 9.6Mpps

As Michael suggested, I tried to replace the memset by direct
fields assignation, but it only recovers a small part of the drop.

What I suggested is to introduce a new feature, so that we can skip the
virtio header when no offload is negotiated.

Maybe you have other ideas?

> So, disabling indirect descriptors may give you better pps for 64 bytes 
> packets, but that doesn't mean you should not implement, or enable, it in 
> your driver. That just means that the guest is not taking the right decision, 
> and uses indirect while it should actually use any_layout.
+1, it really depends on the use-case.
>
> Given virtio/vhost design (most decision comes from the guest), the host 
> should be liberal in what it accepts, and not try to influence guest 
> implementation by carefully picking the features it supports. Otherwise 
> guests will never get a chance to make the right decisions either.
Agree, what we need is to be able to disable Virtio PMD features
without having to rebuild the PMD.
It will certainly require an new API change to add this option.

Thanks,
Maxime

>
> - Pierre
>
>>
>>>
>>> And could you provide me more info on your fwd bench?
>>> Do you use dpdk-pktgen on host, or you do fwd on howt with a real NIC
>>> also?
>>>
>>> Thanks,
>>> Maxime
 Thanks,
 Maxime

>
>
> Thanks
> Zhihong
>
>> -Original Message-
>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>> Sent: Monday, October 17, 2016 10:15 PM
>> To: Yuanhan Liu 
>> Cc: Wang, Zhihong ; Xie, Huawei
>> ; dev at dpdk.org; vkaplans at redhat.com;
>> mst at redhat.com; stephen at networkplumber.org
>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors
>> support
>> to the TX path
>>
>>
>>
>> On 10/17/2016 03:21 PM, Yuanhan Liu wrote:
>>> On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote:
> On my side, I just setup 2 Windows 2016 VMs, and confirm the issue.
> I'll continue the investigation early next week.

 The root cause is identified.
 When INDIRECT_DESC feature is negotiated, Windows guest uses
>>> indirect
 for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD &
 virtio-net kernel driver) use indirect only for Tx.
 I'll

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-28 Thread Maxime Coquelin



On 10/28/2016 02:49 AM, Wang, Zhihong wrote:
>
>> > -Original Message-
>> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
>> > Sent: Thursday, October 27, 2016 6:46 PM
>> > To: Maxime Coquelin 
>> > Cc: Wang, Zhihong ;
>> > stephen at networkplumber.org; Pierre Pfister (ppfister)
>> > ; Xie, Huawei ; dev at 
>> > dpdk.org;
>> > vkaplans at redhat.com; mst at redhat.com
>> > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
>> > to the TX path
>> >
>> > On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote:
>>> > >
>>> > >
>>> > > On 10/27/2016 12:33 PM, Yuanhan Liu wrote:
 > > >On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote:
> > > >>Hi Zhihong,
> > > >>
> > > >>On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
>> > > >>>Hi Maxime,
>> > > >>>
>> > > >>>Seems indirect desc feature is causing serious performance
>> > > >>>degradation on Haswell platform, about 20% drop for both
>> > > >>>mrg=on and mrg=off (--txqflags=0xf00, non-vector version),
>> > > >>>both iofwd and macfwd.
> > > >>I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy
>> > Bridge
> > > >>platform, and didn't faced such a drop.
 > > >
 > > >I was actually wondering that may be the cause. I tested it with
 > > >my IvyBridge server as well, I saw no drop.
 > > >
 > > >Maybe you should find a similar platform (Haswell) and have a try?
>>> > > Yes, that's why I asked Zhihong whether he could test Txonly in guest to
>>> > > see if issue is reproducible like this.
>> >
>> > I have no Haswell box, otherwise I could do a quick test for you. IIRC,
>> > he tried to disable the indirect_desc feature, then the performance
>> > recovered. So, it's likely the indirect_desc is the culprit here.
>> >
>>> > > I will be easier for me to find an Haswell machine if it has not to be
>>> > > connected back to back to and HW/SW packet generator.
> In fact simple loopback test will also do, without pktgen.
>
> Start testpmd in both host and guest, and do "start" in one
> and "start tx_first 32" in another.
>
> Perf drop is about 24% in my test.
>

Thanks, I never tried this test.
I managed to find an Haswell platform (Intel(R) Xeon(R) CPU E5-2699 v3
@ 2.30GHz), and can reproduce the problem with the loop test you
mention. I see a performance drop about 10% (8.94Mpps/8.08Mpps).
Out of curiosity, what are the numbers you get with your setup?

As I never tried this test, I run it again on my Sandy Bridge setup, and
I also see a performance regression, this time of 4%.

If I understand correctly the test, only 32 packets are allocated,
corresponding to a single burst, which is less than the queue size.
So it makes sense that the performance is lower with this test case.

Thanks,
Maxime

[dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library

2016-10-28 Thread Morten Brørup

Comments below.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Friday, October 28, 2016 3:05 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics
> library
> 
> This patch adds a library that calculates peak and average data-rate
> statistics. For ethernet devices. These statistics are reported using
> the metrics library.
> 
> Signed-off-by: Remy Horton 


> diff --git a/lib/librte_bitratestats/rte_bitrate.c
> b/lib/librte_bitratestats/rte_bitrate.c
> new file mode 100644
> index 000..fcdf401
> --- /dev/null
> +++ b/lib/librte_bitratestats/rte_bitrate.c
> @@ -0,0 +1,126 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or
> without
> + *   modification, are permitted provided that the following
> conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above
> copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above
> copyright
> + *   notice, this list of conditions and the following disclaimer
> in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products
> derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
> USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +
> +struct rte_stats_bitrate_s {
> + uint64_t last_ibytes;
> + uint64_t last_obytes;
> + uint64_t peak_ibits;
> + uint64_t peak_obits;
> + uint64_t ewma_ibits;
> + uint64_t ewma_obits;
> +};
> +
> +struct rte_stats_bitrates_s {
> + struct rte_stats_bitrate_s port_stats[RTE_MAX_ETHPORTS];
> + uint16_t id_stats_set;
> +};
> +
> +
> +struct rte_stats_bitrates_s *rte_stats_bitrate_create(void) {
> + return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates_s), 0);
> }
> +
> +
> +int
> +rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data)
> +{
> + const char *names[] = {
> + "mean_bits_in", "mean_bits_out",
> + "peak_bits_in", "peak_bits_out",
> + };
> + int return_value;
> +
> + bitrate_data = rte_stats_bitrate_create();
> + if (bitrate_data == NULL)
> + rte_exit(EXIT_FAILURE, "Could not allocate bitrate
> data.\n");
> + return_value = rte_metrics_reg_metrics([0], 4);
> + if (return_value >= 0)
> + bitrate_data->id_stats_set = return_value;
> + return return_value;
> +}
> +
> +
> +int
> +rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
> + uint8_t port_id)
> +{
> + struct rte_stats_bitrate_s *port_data;
> + struct rte_eth_stats eth_stats;
> + int ret_code;
> + uint64_t cnt_bits;
> + int64_t delta;
> + const int64_t alpha_percent = 20;
> + uint64_t values[4];
> +
> + ret_code = rte_eth_stats_get(port_id, _stats);
> + if (ret_code != 0)
> + return ret_code;
> +
> + port_data = _data->port_stats[port_id];
> +
> + /* Incoming */
> + cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
> + port_data->last_ibytes = eth_stats.ibytes;
> + if (cnt_bits > port_data->peak_ibits)
> + port_data->peak_ibits = cnt_bits;
> + delta = cnt_bits;
> + delta -= port_data->ewma_ibits;
> + delta = (delta * alpha_percent) / 100;
> + port_data->ewma_ibits += delta;
> +
> + /* Outgoing */
> + cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
> + port_data->last_obytes = eth_stats.obytes;
> + if (cnt_bits > port_data->peak_obits)
> + port_data->peak_obits = cnt_bits;
> + delta = cnt_bits;
> + delta -= port_data->ewma_obits;
> + delta = (delta * alpha_percent) / 100;
> + port_data->ewma_obits += delta;
> +
>

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Bruce Richardson

On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Thanks. One other suggestion is that it might be useful to provide
> > support for having typed queues explicitly in the API. Right now, when
> > you create an queue, the queue_conf structure takes as parameters how
> > many atomic flows that are needed for the queue, or how many reorder
> > slots need to be reserved for it. This implicitly hints at the type of
> > traffic which will be sent to the queue, but I'm wondering if it's
> > better to make it explicit. There are certain optimisations that can be
> > looked at if we know that a queue only handles packets of a particular
> > type. [Not having to handle reordering when pulling events from a core
> > can be a big win for software!].
> 
> If it helps in SW implementation, then I think we can add this in queue
> configuration. 
> 
> > 
> > How about adding: "allowed_event_types" as a field to
> > rte_event_queue_conf, with possible values:
> > * atomic
> > * ordered
> > * parallel
> > * mixed - allowing all 3 types. I think allowing 2 of three types might
> > make things too complicated.
> > 
> > An open question would then be how to behave when the queue type and
> > requested event type conflict. We can either throw an error, or just
> > ignore the event type and always treat enqueued events as being of the
> > queue type. I prefer the latter, because it's faster not having to
> > error-check, and it pushes the responsibility on the app to know what
> > it's doing.
> 
> How about making default as "mixed" and let application configures what
> is not required?. That way application responsibility is clear.
> something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
> with default.
> 
I suppose it could work, but why bother doing that? If an app knows it's
only going to use one traffic type, why not let it just state what it
will do rather than try to specify what it won't do. If mixed is needed,
then it's easy enough to specify - and we can make it the zero/default
value too.

Our software implementation for now, only supports one type per queue -
which we suspect should meet a lot of use-cases. We'll have to see about
adding in mixed types in future.

/Bruce

[dpdk-dev] [PATCH] net/qede: fix advertising link speed capability

2016-10-28 Thread Thomas Monjalon

2016-10-27 23:42, Rasesh Mody:
> From: Harish Patil 
> 
> Fix to advertise device's link speed capability based on current
> link speed rather than returning driver supported speeds.
[...]
> - dev_info->speed_capa = ETH_LINK_SPEED_25G | ETH_LINK_SPEED_40G |
> -ETH_LINK_SPEED_100G;
> + memset(, 0, sizeof(struct qed_link_output));
> + qdev->ops->common->get_link(edev, );
> + dev_info->speed_capa = rte_eth_speed_bitflag(link.speed, 0);
>  }

No, that's wrong.
You must advertise a capability!
So what the device supports?
Are every qede devices support ETH_LINK_SPEED_100G?

[dpdk-dev] KNI discussion in userspace event

2016-10-28 Thread Stephen Hemminger

On Fri, 28 Oct 2016 15:31:50 +0100
Ferruh Yigit  wrote:

> Discussed alternatives were:
> * Tun/Tap
> This won't be as fast as KNI and performance is an issue.

That is a myth. Both require the some number of copies.
TUN/TAP copies is a syscall and KNI copies is a kthread.
Actually, the KNI method is worse because it has kernel thread
always running chewing a CPU. I.e it is pure poll mode.

[dpdk-dev] [PATCH] Revert "bonding: use existing enslaved device queues"

2016-10-28 Thread Ilya Maximets

On 25.10.2016 09:26, Ilya Maximets wrote:
> On 24.10.2016 17:54, Jan Blunck wrote:
>> On Wed, Oct 19, 2016 at 5:47 AM, Ilya Maximets  
>> wrote:
>>> On 18.10.2016 18:19, Jan Blunck wrote:
 On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets  
 wrote:
> On 18.10.2016 15:28, Jan Blunck wrote:
>> If the application already configured queues the PMD should not
>> silently claim ownership and reset them.
>>
>> What exactly is the problem when changing MTU? This works fine from
>> what I can tell.
>
> Following scenario leads to APP PANIC:
>
> 1. mempool_1 = rte_mempool_create()
> 2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
> 3. rte_eth_dev_start(bond0);
> 4. mempool_2 = rte_mempool_create();
> 5. rte_eth_dev_stop(bond0);
> 6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
> 7. rte_eth_dev_start(bond0);
> * RX queues still use 'mempool_1' because reconfiguration doesn't 
> affect them. *
> 8. rte_mempool_free(mempool_1);
> 9. On any rx operation we'll get PANIC because of using freed 
> 'mempool_1':
>  PANIC in rte_mempool_get_ops():
>  assert "(ops_index >= 0) && (ops_index < 
> RTE_MEMPOOL_MAX_OPS_IDX)" failed
>
> You may just start OVS 2.6 with DPDK bonding device and attempt to change 
> MTU via 'mtu_request'.
> Bug is easily reproducible.
>

 I see. I'm not 100% that this is expected to work without leaking the
 driver's queues though. The driver is allowed to do allocations in
 its rx_queue_setup() function that are being freed via
 rx_queue_release() later. But rx_queue_release() is only called if you
 reconfigure the
 device with 0 queues.
> 
> It's not true. Drivers usually calls 'rx_queue_release()' inside
> 'rx_queue_setup()' function while reallocating the already allocated
> queue. (See ixgbe driver for example). Also all HW queues are
> usually destroyed inside 'eth_dev_stop()' and reallocated in
> 'eth_dev_start()' or '{rx,tx}_queue_setup()'.
> So, there is no leaks at all.
> 
 From what I understand there is no other way to
 reconfigure a device to use another mempool.

 But ... even that wouldn't work with the bonding driver right now: the
 bonding master only configures the slaves during startup. I can put
 that on my todo list.
> 
> No, bonding master reconfigures new slaves in 'rte_eth_bond_slave_add()'
> if needed.
> 
 Coming back to your original problem: changing the MTU for the bond
 does work through rte_eth_dev_set_mtu() for slaves supporting that. In
 any other case you could (re-)configure rxmode.max_rx_pkt_len (and
 jumbo_frame / enable_scatter accordingly). This does work without a
 call to rte_eth_rx_queue_setup().
>>>
>>> Thanks for suggestion, but using of rte_eth_dev_set_mtu() without
>>> reconfiguration will require to have mempools with huge mbufs (9KB)
>>> for all ports from the start. This is unacceptable because leads to
>>> significant performance regressions because of fast cache exhausting.
>>> Also this will require big work to rewrite OVS reconfiguration code
>>> this way.
>>> Anyway, it isn't the MTU only problem. Number of rx/tx descriptors
>>> also can't be changed in runtime.
>>>
>>>
>>> I'm not fully understand what is the use case for this 'reusing' code.
>>> Could you, please, describe situation where this behaviour is necessary?
>>
>> The device that is added to the bond was used before and therefore
>> already has allocated queues. Therefore we reuse the existing queues
>> of the devices instead of borrowing the queues of the bond device. If
>> the slave is removed from the bond again there is no need to allocate
>> the queues again.
>>
>> Hope that clarifies the usecase
> 
> So, As I see, this is just an optimization that leads to differently
> configured queues of same device and possible application crash if the
> old configuration doesn't valid any more.
> 
> With revert applied in your usecase while adding the device to bond
> it's queues will be automatically reconfigured according to configuration
> of the bond device. If the slave is removed from the bond all its'
> queues will remain as configured by bond. You can reconfigure them if
> needed. I guess, that in your case configuration of slave devices,
> actually, matches configuration of bond device. In that case slave
> will remain in the same state after removing from bond as it was
> before adding.

So, Jan, Declan, what do you think about this?

Best regards, Ilya Maximets.

[dpdk-dev] [RFC PATCH v2 3/3] app/test-pmd: add support for bitrate statistics

2016-10-28 Thread Remy Horton

Signed-off-by: Remy Horton 
---
 app/test-pmd/testpmd.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index e2403c3..940dc3b 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -78,6 +78,8 @@
 #ifdef RTE_LIBRTE_PDUMP
 #include 
 #endif
+#include 
+#include 

 #include "testpmd.h"

@@ -322,6 +324,9 @@ uint16_t nb_rx_queue_stats_mappings = 0;

 unsigned max_socket = 0;

+/* Bitrate statistics */
+struct rte_stats_bitrates_s *bitrate_data;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port 
*port);
 static void check_all_ports_link_status(uint32_t port_mask);
@@ -921,12 +926,26 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t 
pkt_fwd)
struct fwd_stream **fsm;
streamid_t nb_fs;
streamid_t sm_id;
-
+   uint64_t tics_per_1sec;
+   uint64_t tics_datum;
+   uint64_t tics_current;
+   uint8_t idx_port, cnt_ports;
+
+   cnt_ports = rte_eth_dev_count();
+   tics_datum = rte_rdtsc();
+   tics_per_1sec = rte_get_timer_hz();
fsm = _streams[fc->stream_idx];
nb_fs = fc->stream_nb;
do {
for (sm_id = 0; sm_id < nb_fs; sm_id++)
(*pkt_fwd)(fsm[sm_id]);
+   tics_current = rte_rdtsc();
+   if (tics_current - tics_datum >= tics_per_1sec) {
+   /* Periodic bitrate calculation */
+   for (idx_port = 0; idx_port < cnt_ports; idx_port++)
+   rte_stats_bitrate_calc(bitrate_data, idx_port);
+   tics_datum = tics_current;
+   }
} while (! fc->stopped);
 }

@@ -2119,6 +2138,15 @@ main(int argc, char** argv)
FOREACH_PORT(port_id, ports)
rte_eth_promiscuous_enable(port_id);

+   /* Setup bitrate stats */
+   bitrate_data = rte_stats_bitrate_create();
+   if (bitrate_data == NULL)
+   rte_exit(EXIT_FAILURE, "Could not allocate bitrate data.\n");
+   rte_stats_bitrate_reg(bitrate_data);
+   int id_const = rte_metrics_reg_metric("constant");
+   rte_metrics_update_metric(55, id_const, 0xdeadbeef);
+
+
 #ifdef RTE_LIBRTE_CMDLINE
if (interactive == 1) {
if (auto_start) {
-- 
2.5.5

[dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library

2016-10-28 Thread Remy Horton

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton 
---
 config/common_base |   5 +
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf  |   1 +
 lib/Makefile   |   1 +
 lib/librte_bitratestats/Makefile   |  53 +
 lib/librte_bitratestats/rte_bitrate.c  | 126 +
 lib/librte_bitratestats/rte_bitrate.h  |  80 +
 .../rte_bitratestats_version.map   |   9 ++
 lib/librte_metrics/rte_metrics.c   |  22 ++--
 lib/librte_metrics/rte_metrics.h   |   4 +-
 mk/rte.app.mk  |   1 +
 11 files changed, 291 insertions(+), 12 deletions(-)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/config/common_base b/config/common_base
index c23a632..e778c00 100644
--- a/config/common_base
+++ b/config/common_base
@@ -597,3 +597,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the device metrics library
 #
 CONFIG_RTE_LIBRTE_METRICS=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index ca50fa6..91e8ea6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -148,4 +148,5 @@ There are many libraries, so their headers may be grouped 
by topics:
   [ABI compat] (@ref rte_compat.h),
   [keepalive]  (@ref rte_keepalive.h),
   [Device Metrics] (@ref rte_metrics.h),
+  [Bitrate Statistics] (@ref rte_bitrate.h),
   [version](@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fe830eb..8765ddd 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -58,6 +58,7 @@ INPUT   = doc/api/doxy-api-index.md \
   lib/librte_ring \
   lib/librte_sched \
   lib/librte_metrics \
+  lib/librte_bitratestats \
   lib/librte_table \
   lib/librte_timer \
   lib/librte_vhost
diff --git a/lib/Makefile b/lib/Makefile
index 5d85dcf..e211bc0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats

 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 000..b725d4e
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB =

[dpdk-dev] [RFC PATCH v2 1/3] lib: add information metrics library

2016-10-28 Thread Remy Horton

This patch adds a new information metric library that allows other
modules to register named metrics and update their values. It is
intended to be independent of ethdev, rather than mixing ethdev
and non-ethdev information in xstats.

Signed-off-by: Remy Horton 
---
 config/common_base |   5 +
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf  |   1 +
 lib/Makefile   |   1 +
 lib/librte_metrics/Makefile|  51 ++
 lib/librte_metrics/rte_metrics.c   | 265 +
 lib/librte_metrics/rte_metrics.h   | 200 ++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk  |   2 +
 9 files changed, 539 insertions(+)
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/config/common_base b/config/common_base
index f5d2eff..c23a632 100644
--- a/config/common_base
+++ b/config/common_base
@@ -592,3 +592,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..ca50fa6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -147,4 +147,5 @@ There are many libraries, so their headers may be grouped 
by topics:
   [common] (@ref rte_common.h),
   [ABI compat] (@ref rte_compat.h),
   [keepalive]  (@ref rte_keepalive.h),
+  [Device Metrics] (@ref rte_metrics.h),
   [version](@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..fe830eb 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -57,6 +57,7 @@ INPUT   = doc/api/doxy-api-index.md \
   lib/librte_reorder \
   lib/librte_ring \
   lib/librte_sched \
+  lib/librte_metrics \
   lib/librte_table \
   lib/librte_timer \
   lib/librte_vhost
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..5d85dcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics

 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y

[dpdk-dev] [PATCH v2 0/3] expanded statistic reporting

2016-10-28 Thread Remy Horton

This patchset extends statistics reporting to include peak and
average data-rate metrics. It comes in two parts: a statistics
reporting library, and a bitrate calculation library that uses
it. This structure is intended to seperate statistic reporting
from ethdev and allow more flexible metric registration.

--

v2 changes:
* Uses a new metrics library rather than being part of ethdev

Remy Horton (3):
  lib: add information metrics library
  lib: add bitrate statistics library
  app/test-pmd: add support for bitrate statistics

 app/test-pmd/testpmd.c |  30 ++-
 config/common_base |  10 +
 doc/api/doxy-api-index.md  |   2 +
 doc/api/doxy-api.conf  |   2 +
 lib/Makefile   |   2 +
 lib/librte_bitratestats/Makefile   |  53 
 lib/librte_bitratestats/rte_bitrate.c  | 126 ++
 lib/librte_bitratestats/rte_bitrate.h  |  80 ++
 .../rte_bitratestats_version.map   |   9 +
 lib/librte_metrics/Makefile|  51 
 lib/librte_metrics/rte_metrics.c   | 267 +
 lib/librte_metrics/rte_metrics.h   | 200 +++
 lib/librte_metrics/rte_metrics_version.map |  13 +
 mk/rte.app.mk  |   3 +
 14 files changed, 847 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

-- 
2.5.5

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Jerin Jacob

On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> Thanks. One other suggestion is that it might be useful to provide
> support for having typed queues explicitly in the API. Right now, when
> you create an queue, the queue_conf structure takes as parameters how
> many atomic flows that are needed for the queue, or how many reorder
> slots need to be reserved for it. This implicitly hints at the type of
> traffic which will be sent to the queue, but I'm wondering if it's
> better to make it explicit. There are certain optimisations that can be
> looked at if we know that a queue only handles packets of a particular
> type. [Not having to handle reordering when pulling events from a core
> can be a big win for software!].

If it helps in SW implementation, then I think we can add this in queue
configuration. 

> 
> How about adding: "allowed_event_types" as a field to
> rte_event_queue_conf, with possible values:
> * atomic
> * ordered
> * parallel
> * mixed - allowing all 3 types. I think allowing 2 of three types might
> make things too complicated.
> 
> An open question would then be how to behave when the queue type and
> requested event type conflict. We can either throw an error, or just
> ignore the event type and always treat enqueued events as being of the
> queue type. I prefer the latter, because it's faster not having to
> error-check, and it pushes the responsibility on the app to know what
> it's doing.

How about making default as "mixed" and let application configures what
is not required?. That way application responsibility is clear.
something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
with default.

/Jerin


> 
> /Bruce

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-28 Thread Pierre Pfister (ppfister)


> Le 27 oct. 2016 ? 12:19, Wang, Zhihong  a ?crit :
> 
> 
> 
>> -Original Message-
>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>> Sent: Thursday, October 27, 2016 5:55 PM
>> To: Wang, Zhihong ; Yuanhan Liu
>> ; stephen at networkplumber.org; Pierre
>> Pfister (ppfister) 
>> Cc: Xie, Huawei ; dev at dpdk.org;
>> vkaplans at redhat.com; mst at redhat.com
>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
>> to the TX path
>> 
>> 
>> 
>> On 10/27/2016 11:10 AM, Maxime Coquelin wrote:
>>> Hi Zhihong,
>>> 
>>> On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
 Hi Maxime,
 
 Seems indirect desc feature is causing serious performance
 degradation on Haswell platform, about 20% drop for both
 mrg=on and mrg=off (--txqflags=0xf00, non-vector version),
 both iofwd and macfwd.
>>> I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy Bridge
>>> platform, and didn't faced such a drop.
>>> Have you tried to pass indirect_desc=off to qemu cmdline to see if you
>>> recover the performance?
>>> 
>>> Yuanhan, which platform did you use when you tested it with zero copy?
>>> 
 
 I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz.
 
 Could you please verify if this is true in your test?
>>> I'll try -rc1/-rc2 on my platform, and let you know.
>> As a first test, I tried again Txonly from the guest to the host (Rxonly),
>> where Tx indirect descriptors are used, on my E5-2665 @2.40GHz:
>> v16.11-rc1: 10.81Mpps
>> v16.11-rc2: 10.91Mpps
>> 
>> -rc2 is even slightly better in my case.
>> Could you please run the same test on your platform?
> 
> I mean to use rc2 as both host and guest, and compare the
> perf between indirect=0 and indirect=1.
> 
> I use PVP traffic, tried both testpmd and OvS as the forwarding
> engine in host, and testpmd in guest.
> 
> Thanks
> Zhihong

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-28 Thread Xu, Qian Q

In my BDW-EP platform(similar to HSW), I can also see the performance drop. So 
what's the next step now? 
Intel CPU GEN: 
SNB-->IVB--->HSW-->BDW-EP

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin
Sent: Thursday, October 27, 2016 6:53 PM
To: Yuanhan Liu 
Cc: mst at redhat.com; dev at dpdk.org; vkaplans at redhat.com
Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to 
the TX path



On 10/27/2016 12:33 PM, Yuanhan Liu wrote:
> On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote:
>> Hi Zhihong,
>>
>> On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
>>> Hi Maxime,
>>>
>>> Seems indirect desc feature is causing serious performance 
>>> degradation on Haswell platform, about 20% drop for both mrg=on and 
>>> mrg=off (--txqflags=0xf00, non-vector version), both iofwd and 
>>> macfwd.
>> I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy 
>> Bridge platform, and didn't faced such a drop.
>
> I was actually wondering that may be the cause. I tested it with my 
> IvyBridge server as well, I saw no drop.
Sorry, mine is a SandyBridge, not IvyBridge.

[dpdk-dev] Unable to change source MAC address of packet

2016-10-28 Thread Lu, Wenzhuo

Hi Padam,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Padam Jeet Singh
> Sent: Thursday, October 27, 2016 10:45 PM
> To: Wiles, Keith
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Unable to change source MAC address of packet
> 
> 
> > On 27-Oct-2016, at 7:37 pm, Wiles, Keith  wrote:
> >
> >
> >> On Oct 27, 2016, at 6:33 AM, Padam Jeet Singh 
> wrote:
> >>
> >> Hi,
> >>
> >> I am crafting a packet in which the source MAC address as set in the 
> >> Ethernet
> header is different than the transmit port?s default MAC address. A packet
> capture of the packets coming out of this port however comes with source MAC
> address of the port?s default MAC address.
> >>
> >> Altering the destination MAC address works fine and shows up correctly in
> packet capture.
> >>
> >> The underlying network interface is an i210 and some logs added to the
> eth_igb_xmit_pkts function show that the packets I have crafted indeed are
> reaching the driver with the source MAC address set in the packet code of the
> application.
> >>
> >> How can I disable this automatic source MAC address setting?
> >
> > The packets sent with rte_eth_tx_burst() are not forced to a give MAC 
> > address.
> If you are using something on top of DPDK like Pktgen or OVS or something,
> then it may try to force a source MAC address.
> 
> No? not using pktgen or OVS. Plain simple code to take a packets from a KNI,
> change source mac address on all received packets,  and then tx_burst them to 
> a
> port.
> 
> > Maybe the hardware does it, but we need to know the NIC being used and
> then someone maybe able to answer. I do not know of any Intel NICs do that.
> 
> Intel i210 NIC (gigabit Ethernet) is being used. I have gone through the i210
> documentation and can?t see anything specific to setting of MAC address in
> hardware for TX side. For RX side there are validations like MAC filtering, 
> but
> nothing over TX.
> 
> >
> > Is this what you are doing.
> 
> I agree that rte_eth_tx_burst does not overwrite the source MAC as I was able
> to trace all the way to the IGB driver that source mac makes it intact. There 
> is no
> offload flags enabled in the mbuf. Yet the packets to the other side comes out
> as with source mac address of the port.
> 
> Is there any standard DPDK app which crafts packets with different source MAC
> than the port?s physical mac? (I checked the l2fwd example loads the port mac
> before transmitting and then uses the same in TX function).
I don?t have a i210 on hand, so I checked the datasheet of i210. I don't find 
any description about HW will change the MAC address in the frame. I believe HW 
should send the frame provided by SW except doing some offload like checksum...
May I suggest to use testpmd forwarding a packet which has a different src MAC 
than the port's?

> 
> >
> >>
> >> Thanks,
> >> Padam
> >
> > Regards,
> > Keith
> >

[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-28 Thread Wang, Zhihong



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Thursday, October 27, 2016 6:46 PM
> To: Maxime Coquelin 
> Cc: Wang, Zhihong ;
> stephen at networkplumber.org; Pierre Pfister (ppfister)
> ; Xie, Huawei ; dev at 
> dpdk.org;
> vkaplans at redhat.com; mst at redhat.com
> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
> to the TX path
> 
> On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote:
> >
> >
> > On 10/27/2016 12:33 PM, Yuanhan Liu wrote:
> > >On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin wrote:
> > >>Hi Zhihong,
> > >>
> > >>On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
> > >>>Hi Maxime,
> > >>>
> > >>>Seems indirect desc feature is causing serious performance
> > >>>degradation on Haswell platform, about 20% drop for both
> > >>>mrg=on and mrg=off (--txqflags=0xf00, non-vector version),
> > >>>both iofwd and macfwd.
> > >>I tested PVP (with macswap on guest) and Txonly/Rxonly on an Ivy
> Bridge
> > >>platform, and didn't faced such a drop.
> > >
> > >I was actually wondering that may be the cause. I tested it with
> > >my IvyBridge server as well, I saw no drop.
> > >
> > >Maybe you should find a similar platform (Haswell) and have a try?
> > Yes, that's why I asked Zhihong whether he could test Txonly in guest to
> > see if issue is reproducible like this.
> 
> I have no Haswell box, otherwise I could do a quick test for you. IIRC,
> he tried to disable the indirect_desc feature, then the performance
> recovered. So, it's likely the indirect_desc is the culprit here.
> 
> > I will be easier for me to find an Haswell machine if it has not to be
> > connected back to back to and HW/SW packet generator.

In fact simple loopback test will also do, without pktgen.

Start testpmd in both host and guest, and do "start" in one
and "start tx_first 32" in another.

Perf drop is about 24% in my test.

> 
> Makes sense.
> 
>   --yliu
> >
> > Thanks,
> > Maxime
> >
> > >
> > >   --yliu
> > >
> > >>Have you tried to pass indirect_desc=off to qemu cmdline to see if you
> > >>recover the performance?
> > >>
> > >>Yuanhan, which platform did you use when you tested it with zero copy?
> > >>
> > >>>
> > >>>I'm using RC2, and the CPU is Xeon E5-2699 v3 @ 2.30GHz.
> > >>>
> > >>>Could you please verify if this is true in your test?
> > >>I'll try -rc1/-rc2 on my platform, and let you know.
> > >>
> > >>Thanks,
> > >>Maxime
> > >>
> > >>>
> > >>>
> > >>>Thanks
> > >>>Zhihong
> > >>>
> > -Original Message-
> > From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> > Sent: Monday, October 17, 2016 10:15 PM
> > To: Yuanhan Liu 
> > Cc: Wang, Zhihong ; Xie, Huawei
> > ; dev at dpdk.org; vkaplans at redhat.com;
> > mst at redhat.com; stephen at networkplumber.org
> > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors
> support
> > to the TX path
> > 
> > 
> > 
> > On 10/17/2016 03:21 PM, Yuanhan Liu wrote:
> > >On Mon, Oct 17, 2016 at 01:23:23PM +0200, Maxime Coquelin wrote:
> > >>>On my side, I just setup 2 Windows 2016 VMs, and confirm the
> issue.
> > >>>I'll continue the investigation early next week.
> > >>
> > >>The root cause is identified.
> > >>When INDIRECT_DESC feature is negotiated, Windows guest uses
> indirect
> > >>for both Tx and Rx descriptors, whereas Linux guests (Virtio PMD &
> > >>virtio-net kernel driver) use indirect only for Tx.
> > >>I'll implement indirect support for the Rx path in vhost lib, but the
> > >>change will be too big for -rc release.
> > >>I propose in the mean time to disable INDIRECT_DESC feature in
> vhost
> > >>lib, we can still enable it locally for testing.
> > >>
> > >>Yuanhan, is it ok for you?
> > >
> > >That's okay.
> > I'll send a patch to disable it then.
> > 
> > >
> > >>
> > >>>Has anyone already tested Windows guest with vhost-net, which
> also
> > has
> > >>>indirect descs support?
> > >>
> > >>I tested and confirm it works with vhost-net.
> > >
> > >I'm a bit confused then. IIRC, vhost-net also doesn't support indirect
> > >for Rx path, right?
> > 
> > No, it does support it actually.
> > I thought it didn't support too, I misread the Kernel implementation of
> > vhost-net and virtio-net. Acutally, virtio-net makes use of indirect
> > in Rx path when mergeable buffers is disabled.
> > 
> > The confusion certainly comes from me, sorry about that.
> > 
> > Maxime

[dpdk-dev] [PATCH] net/qede: fix advertising link speed capability

2016-10-28 Thread Rasesh Mody

From: Harish Patil 

Fix to advertise device's link speed capability based on current
link speed rather than returning driver supported speeds.

Fixes: 95e67b479506 ("net/qede: add 100G link speed capability")

Signed-off-by: Harish Patil 
---
 drivers/net/qede/qede_ethdev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index b91b478..4c4c669 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -646,6 +646,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 {
struct qede_dev *qdev = eth_dev->data->dev_private;
struct ecore_dev *edev = >edev;
+   struct qed_link_output link;

PMD_INIT_FUNC_TRACE(edev);

@@ -678,8 +679,9 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 DEV_TX_OFFLOAD_UDP_CKSUM |
 DEV_TX_OFFLOAD_TCP_CKSUM);

-   dev_info->speed_capa = ETH_LINK_SPEED_25G | ETH_LINK_SPEED_40G |
-  ETH_LINK_SPEED_100G;
+   memset(, 0, sizeof(struct qed_link_output));
+   qdev->ops->common->get_link(edev, );
+   dev_info->speed_capa = rte_eth_speed_bitflag(link.speed, 0);
 }

 /* return 0 means link status changed, -1 means not changed */
-- 
1.8.3.1

[dpdk-dev] [PATCH] net/qede: fix gcc compiler option checks

2016-10-28 Thread Rasesh Mody

From: Rasesh Mody 

Using GCC_VERSION to check gcc version and decide whether to include
that compiler option.

Fixes: ec94dbc57362 ("qede: add base driver")
Fixes: ecc7a5a27ffe ("net/qede/base: fix 32-bit build")

Signed-off-by: Rasesh Mody 
---
 drivers/net/qede/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qede/Makefile b/drivers/net/qede/Makefile
index 39751e4..29b443d 100644
--- a/drivers/net/qede/Makefile
+++ b/drivers/net/qede/Makefile
@@ -46,11 +46,11 @@ endif
 endif

 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
-ifeq ($(shell gcc -Wno-unused-but-set-variable -Werror -E - < /dev/null > 
/dev/null 2>&1; echo $$?),0)
+ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1)
 CFLAGS_BASE_DRIVER += -Wno-unused-but-set-variable
 endif
 CFLAGS_BASE_DRIVER += -Wno-missing-declarations
-ifeq ($(shell gcc -Wno-maybe-uninitialized -Werror -E - < /dev/null > 
/dev/null 2>&1; echo $$?),0)
+ifeq ($(shell test $(GCC_VERSION) -ge 46 && echo 1), 1)
 CFLAGS_BASE_DRIVER += -Wno-maybe-uninitialized
 endif
 CFLAGS_BASE_DRIVER += -Wno-strict-prototypes
-- 
1.8.3.1

88 matches

Mail list logo