[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 10:06:52PM +0300, Vlad Zolotarov wrote:
> >>How would iommu
> >>virtualization change anything?
> >Kernel can use an iommu to limit device access to memory of
> >the controlling application.
> 
> Ok, this is obvious but what it has to do with enabling using MSI/MSI-X
> interrupts support in uio_pci_generic? kernel may continue to limit the
> above access with this support as well.

It could maybe. So if you write a patch to allow MSI by at the same time
creating an isolated IOMMU group and blocking DMA from device in
question anywhere, that sounds reasonable.

-- 
MST


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 22:10, Vlad Zolotarov wrote:
>
>
> On 09/30/15 22:06, Vlad Zolotarov wrote:
>>
>>
>> On 09/30/15 21:55, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:

 On 09/30/15 18:26, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
>> How not virtualizing iommu forces "all or nothing" approach?
> Looks like you can't limit an assigned device to only access part of
> guest memory that belongs to a given process.  Either let it 
> access all
> of guest memory ("all") or don't assign the device ("nothing").
 Ok. A question then: can u limit the assigned device to only access 
 part of
 the guest memory even if iommu was virtualized?
>>> That's exactly what an iommu does - limit the device io access to 
>>> memory.
>>
>> If it does - it will continue to do so with or without the patch and 
>> if it doesn't (for any reason) it won't do it even without the patch.
>> So, again, the above (rhetorical) question stands. ;)
>>
>> I think Avi has already explained quite in detail why security is 
>> absolutely a non issue in regard to this patch or in regard to UIO in 
>> general. Security has to be enforced by some other means like iommu.
>>
>>>
 How would iommu
 virtualization change anything?
>>> Kernel can use an iommu to limit device access to memory of
>>> the controlling application.
>>
>> Ok, this is obvious but what it has to do with enabling using 
>> MSI/MSI-X interrupts support in uio_pci_generic? kernel may continue 
>> to limit the above access with this support as well.
>>
>>>
 And why do we care about an assigned device
 to be able to access all Guest memory?
>>> Because we want to be reasonably sure a kernel memory corruption
>>> is not a result of a bug in a userspace application.
>>
>> Corrupting Guest's memory due to any SW misbehavior (including bugs) 
>> is a non-issue by design - this is what HV and Guest machines were 
>> invented for. So, like Avi also said, instead of trying to enforce 
>> nobody cares about 
>
> Let me rephrase: by pretending enforcing some security promise that u 
> don't actually fulfill... ;)

...the promise nobody really cares about...

>
>> we'd rather make the developers life easier instead (by applying the 
>> not-yet-completed patch I'm working on).
>>>
>>
>



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 22:06, Vlad Zolotarov wrote:
>
>
> On 09/30/15 21:55, Michael S. Tsirkin wrote:
>> On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:
>>>
>>> On 09/30/15 18:26, Michael S. Tsirkin wrote:
 On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
> How not virtualizing iommu forces "all or nothing" approach?
 Looks like you can't limit an assigned device to only access part of
 guest memory that belongs to a given process.  Either let it access 
 all
 of guest memory ("all") or don't assign the device ("nothing").
>>> Ok. A question then: can u limit the assigned device to only access 
>>> part of
>>> the guest memory even if iommu was virtualized?
>> That's exactly what an iommu does - limit the device io access to 
>> memory.
>
> If it does - it will continue to do so with or without the patch and 
> if it doesn't (for any reason) it won't do it even without the patch.
> So, again, the above (rhetorical) question stands. ;)
>
> I think Avi has already explained quite in detail why security is 
> absolutely a non issue in regard to this patch or in regard to UIO in 
> general. Security has to be enforced by some other  means like iommu.
>
>>
>>> How would iommu
>>> virtualization change anything?
>> Kernel can use an iommu to limit device access to memory of
>> the controlling application.
>
> Ok, this is obvious but what it has to do with enabling using 
> MSI/MSI-X interrupts support in uio_pci_generic? kernel may continue 
> to limit the above access with this support as well.
>
>>
>>> And why do we care about an assigned device
>>> to be able to access all Guest memory?
>> Because we want to be reasonably sure a kernel memory corruption
>> is not a result of a bug in a userspace application.
>
> Corrupting Guest's memory due to any SW misbehavior (including bugs) 
> is a non-issue by design - this is what HV and Guest machines were 
> invented for. So, like Avi also said, instead of trying to enforce 
> nobody cares about 

Let me rephrase: by pretending enforcing some security promise that u 
don't actually fulfill... ;)

> we'd rather make the developers life easier instead (by applying the 
> not-yet-completed patch I'm working on).
>>
>



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 21:55, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 18:26, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
 How not virtualizing iommu forces "all or nothing" approach?
>>> Looks like you can't limit an assigned device to only access part of
>>> guest memory that belongs to a given process.  Either let it access all
>>> of guest memory ("all") or don't assign the device ("nothing").
>> Ok. A question then: can u limit the assigned device to only access part of
>> the guest memory even if iommu was virtualized?
> That's exactly what an iommu does - limit the device io access to memory.

If it does - it will continue to do so with or without the patch and if 
it doesn't (for any reason) it won't do it even without the patch.
So, again, the above (rhetorical) question stands. ;)

I think Avi has already explained quite in detail why security is 
absolutely a non issue in regard to this patch or in regard to UIO in 
general. Security has to be enforced by some other  means like iommu.

>
>> How would iommu
>> virtualization change anything?
> Kernel can use an iommu to limit device access to memory of
> the controlling application.

Ok, this is obvious but what it has to do with enabling using MSI/MSI-X 
interrupts support in uio_pci_generic? kernel may continue to limit the 
above access with this support as well.

>
>> And why do we care about an assigned device
>> to be able to access all Guest memory?
> Because we want to be reasonably sure a kernel memory corruption
> is not a result of a bug in a userspace application.

Corrupting Guest's memory due to any SW misbehavior (including bugs) is 
a non-issue by design - this is what HV and Guest machines were invented 
for. So, like Avi also said, instead of trying to enforce nobody cares 
about we'd rather make the developers life easier instead (by applying 
the not-yet-completed patch I'm working on).
>



[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-30 Thread shesha Sreenivasamurthy (shesha)
My bad that I said its not working, apologies.

Isn?t it correct to say that single process application do not benefit from 
having backing files ? In that case can make this configurable by passing a 
command line argument that will either unlink or keep the backing files, 
defaulting it to keeping the backing files. Single process application to do 
not need these files around can pass additional param to unlink these files ?

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: "Ananyev, Konstantin" mailto:konstantin.anan...@intel.com>>
Date: Wednesday, September 30, 2015 at 2:53 PM
To: Cisco Employee mailto:shesha at cisco.com>>, "dev at 
dpdk.org" mailto:dev at dpdk.org>>
Cc: "Michael S. Tsirkin" mailto:mst at redhat.com>>
Subject: RE: [dpdk-dev] Unlinking hugepage backing file after initialiation



-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of shesha Sreenivasamurthy 
(shesha)
Sent: Wednesday, September 30, 2015 10:44 PM
To: dev at dpdk.org
Cc: Michael S. Tsirkin
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation
What I heard is the following: A multi-process DPDK application, working either 
in master-worker or master-slave fashion, can
potentially benefit by keeping the backing files in hugetlbfs. However, it is 
does not work today as the pages are cleaned and added
back when the application restarts.

Who says it is not working?
I admit that DPDK MP model is probably a bit constrained, but it does work.
It is probably good to read some docs:
http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html
and/or look at the code that does MP support inside DPDK.
I think that might make things clearer.
Konstantin

On the other hand, for a single process application there is actually no 
benefit keeping the pages
around.
Therefore, I was wondering if we can make this configurable by passing a 
command line argument that will either unlink or keep the
backing files.
--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }
From: "Michael S. Tsirkin" mailto:mst at 
redhat.com>>
Date: Tuesday, September 29, 2015 at 2:35 PM
To: Cisco Employee mailto:shesha at 
cisco.com>>
Cc: "Xie, Huawei" mailto:huawei.xie at 
intel.com>>, "dev at dpdk.org"
mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation
On Tue, Sep 29, 2015 at 05:50:00PM +, shesha Sreenivasamurthy (shesha) 
wrote:
Sure. Then, is there any real reason why the backing files should not be
unlinked ?
AFAIK qemu unlinks them already.
--
MST




[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 09:15:56PM +0300, Vlad Zolotarov wrote:
> 
> 
> On 09/30/15 18:26, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
> >>How not virtualizing iommu forces "all or nothing" approach?
> >Looks like you can't limit an assigned device to only access part of
> >guest memory that belongs to a given process.  Either let it access all
> >of guest memory ("all") or don't assign the device ("nothing").
> 
> Ok. A question then: can u limit the assigned device to only access part of
> the guest memory even if iommu was virtualized?

That's exactly what an iommu does - limit the device io access to memory.

> How would iommu
> virtualization change anything?

Kernel can use an iommu to limit device access to memory of
the controlling application.

> And why do we care about an assigned device
> to be able to access all Guest memory?

Because we want to be reasonably sure a kernel memory corruption
is not a result of a bug in a userspace application.

-- 
MST


[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-30 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of shesha 
> Sreenivasamurthy (shesha)
> Sent: Wednesday, September 30, 2015 10:44 PM
> To: dev at dpdk.org
> Cc: Michael S. Tsirkin
> Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation
> 
> What I heard is the following: A multi-process DPDK application, working 
> either in master-worker or master-slave fashion, can
> potentially benefit by keeping the backing files in hugetlbfs. However, it is 
> does not work today as the pages are cleaned and added
> back when the application restarts.

Who says it is not working?
I admit that DPDK MP model is probably a bit constrained, but it does work.
It is probably good to read some docs:
http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html
and/or look at the code that does MP support inside DPDK.
I think that might make things clearer.
Konstantin 

> On the other hand, for a single process application there is actually no 
> benefit keeping the pages
> around.
> 
> Therefore, I was wondering if we can make this configurable by passing a 
> command line argument that will either unlink or keep the
> backing files.
> 
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0xC0DE; }
> 
> From: "Michael S. Tsirkin" mailto:mst at redhat.com>>
> Date: Tuesday, September 29, 2015 at 2:35 PM
> To: Cisco Employee mailto:shesha at cisco.com>>
> Cc: "Xie, Huawei" mailto:huawei.xie at intel.com>>, 
> "dev at dpdk.org"
> mailto:dev at dpdk.org>>
> Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation
> 
> On Tue, Sep 29, 2015 at 05:50:00PM +, shesha Sreenivasamurthy (shesha) 
> wrote:
> Sure. Then, is there any real reason why the backing files should not be
> unlinked ?
> 
> AFAIK qemu unlinks them already.
> 
> --
> MST



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 10:43:04AM -0700, Stephen Hemminger wrote:
> On Wed, 30 Sep 2015 20:39:43 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Sep 30, 2015 at 10:28:07AM -0700, Stephen Hemminger wrote:
> > > On Wed, 30 Sep 2015 13:37:22 +0300
> > > Vlad Zolotarov  wrote:
> > > 
> > > > 
> > > > 
> > > > On 09/30/15 00:49, Michael S. Tsirkin wrote:
> > > > > On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
> > > > >> On Tue, 29 Sep 2015 23:54:54 +0300
> > > > >> "Michael S. Tsirkin"  wrote:
> > > > >>
> > > > >>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
> > > >  The security breach motivation u brought in "[RFC PATCH] uio:
> > > >  uio_pci_generic: Add support for MSI interrupts" thread seems a 
> > > >  bit weak
> > > >  since one u let the userland access to the bar it may do any funny 
> > > >  thing
> > > >  using the DMA engine of the device. This kind of stuff should be 
> > > >  prevented
> > > >  using the iommu and if it's enabled then any funny tricks using 
> > > >  MSI/MSI-X
> > > >  configuration will be prevented too.
> > > > 
> > > >  I'm about to send the patch to main Linux mailing list. Let's 
> > > >  continue this
> > > >  discussion there.
> > > > 
> > > > >>> Basically UIO shouldn't be used with devices capable of DMA.
> > > > >>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
> > > > 
> > > > If there is an IOMMU in the picture there shouldn't be any problem to 
> > > > use UIO with DMA capable devices.
> > > > 
> > > > >>> I don't think this can change.
> > > > >> Given there is no PV IOMMU and even if there was it would be too 
> > > > >> slow for DPDK
> > > > >> use, I can't accept that.
> > > > > QEMU does allow emulating an iommu.
> > > > 
> > > > Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
> > > > option there. And again, it's a general issue not DPDK specific.
> > > > Today one has to develop some proprietary modules (like igb_uio) to 
> > > > workaround the issue and this is lame. IMHO uio_pci_generic should
> > > > be fixed to be able to properly work within any virtualized environment 
> > > > and not only with KVM.
> > > > 
> > > 
> > > Also VMware (bigger problem) has no IOMMU emulation.
> > > Other environments as well (Windriver, GCE) have noe IOMMU.
> > 
> > Because the use-case of userspace drivers is not important enough?
> > Without an IOMMU, there's no way to have secure userspace drivers.
> 
> Look at Cloudius, there is no necessity of security in guest.

It's an interesting concept, isn't it?

So why not do what Cloudius does, and run this task code in ring 0 then,
allocating all memory in the kernel range?

You are increasing interrupt latency by a huge factor by channeling
interrupts through a scheduler.  Let user install an
interrupt handler function, and be done with it.

-- 
MST


[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-30 Thread shesha Sreenivasamurthy (shesha)
What I heard is the following: A multi-process DPDK application, working either 
in master-worker or master-slave fashion, can potentially benefit by keeping 
the backing files in hugetlbfs. However, it is does not work today as the pages 
are cleaned and added back when the application restarts. On the other hand, 
for a single process application there is actually no benefit keeping the pages 
around.

Therefore, I was wondering if we can make this configurable by passing a 
command line argument that will either unlink or keep the backing files.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: "Michael S. Tsirkin" mailto:m...@redhat.com>>
Date: Tuesday, September 29, 2015 at 2:35 PM
To: Cisco Employee mailto:shesha at cisco.com>>
Cc: "Xie, Huawei" mailto:huawei.xie at intel.com>>, 
"dev at dpdk.org" mailto:dev at 
dpdk.org>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation

On Tue, Sep 29, 2015 at 05:50:00PM +, shesha Sreenivasamurthy (shesha) 
wrote:
Sure. Then, is there any real reason why the backing files should not be
unlinked ?

AFAIK qemu unlinks them already.

--
MST



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 18:26, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
>> How not virtualizing iommu forces "all or nothing" approach?
> Looks like you can't limit an assigned device to only access part of
> guest memory that belongs to a given process.  Either let it access all
> of guest memory ("all") or don't assign the device ("nothing").

Ok. A question then: can u limit the assigned device to only access part 
of the guest memory even if iommu was virtualized? How would iommu 
virtualization change anything? And why do we care about an assigned 
device to be able to access all Guest memory?

>



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Gleb Natapov
On Wed, Sep 30, 2015 at 08:39:43PM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 10:28:07AM -0700, Stephen Hemminger wrote:
> > On Wed, 30 Sep 2015 13:37:22 +0300
> > Vlad Zolotarov  wrote:
> > 
> > > 
> > > 
> > > On 09/30/15 00:49, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
> > > >> On Tue, 29 Sep 2015 23:54:54 +0300
> > > >> "Michael S. Tsirkin"  wrote:
> > > >>
> > > >>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
> > >  The security breach motivation u brought in "[RFC PATCH] uio:
> > >  uio_pci_generic: Add support for MSI interrupts" thread seems a bit 
> > >  weak
> > >  since one u let the userland access to the bar it may do any funny 
> > >  thing
> > >  using the DMA engine of the device. This kind of stuff should be 
> > >  prevented
> > >  using the iommu and if it's enabled then any funny tricks using 
> > >  MSI/MSI-X
> > >  configuration will be prevented too.
> > > 
> > >  I'm about to send the patch to main Linux mailing list. Let's 
> > >  continue this
> > >  discussion there.
> > > 
> > > >>> Basically UIO shouldn't be used with devices capable of DMA.
> > > >>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
> > > 
> > > If there is an IOMMU in the picture there shouldn't be any problem to 
> > > use UIO with DMA capable devices.
> > > 
> > > >>> I don't think this can change.
> > > >> Given there is no PV IOMMU and even if there was it would be too slow 
> > > >> for DPDK
> > > >> use, I can't accept that.
> > > > QEMU does allow emulating an iommu.
> > > 
> > > Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
> > > option there. And again, it's a general issue not DPDK specific.
> > > Today one has to develop some proprietary modules (like igb_uio) to 
> > > workaround the issue and this is lame. IMHO uio_pci_generic should
> > > be fixed to be able to properly work within any virtualized environment 
> > > and not only with KVM.
> > > 
> > 
> > Also VMware (bigger problem) has no IOMMU emulation.
> > Other environments as well (Windriver, GCE) have noe IOMMU.
> 
> Because the use-case of userspace drivers is not important enough?
Because "secure" userspace drivers is not important enough.

> Without an IOMMU, there's no way to have secure userspace drivers.
> 
People use VMs as an application containers, not as a machine that needs
to be secured for multiuser scenario.

--
Gleb.


[dpdk-dev] Is there any example application to used DPDK packet distributor library?

2015-09-30 Thread 최익성
 Dear Bruce Richardson and DPDK experts.

Thank you very much for your precious answer.

I found it. It seems very short and simple.

Thank you very much.

I have another question.

I don't know how the following steps work from new_tag to match variables.

/* in dpdk library. ~/dpdk-?.?.?/lib/librte_distributor/rte_distributor.c */
/* process a set of packets to distribute them to workers */
rte_distributor_process(struct rte_distributor *d, struct rte_mbuf **mbufs, 
unsigned num_mbufs)
{
...
 new_tag = next_mb-hash.usr;  /* flow ID hash.usr is set by NIC */

 for (i = 0; i  d-num_workers; i++)
  match |= (!(d-in_flight_tags[i] ^ new_tag)  i);

 /* Only turned-on bits are considered as match */
 match = d-in_flight_bitmask;

 unsigned worker = __builtin_ctzl(match);
...
}

I will appreciate if you let me know the steps.

Thank you very much.

Sincerely Yours,

Ick-Sung Choi.


-Original Message-
From: "Bruce Richardson"bruce.richard...@intel.com 
To: "???"pnk003 at naver.com; 
Cc: dev at dpdk.org; 
Sent: 2015-09-30 (?) 19:56:28
Subject: Re: [dpdk-dev] Is there any example application to used DPDK packet 
distributor library?

On Wed, Sep 30, 2015 at 02:45:20PM +0900, ??? wrote:
 Dear DPDK experts.
  
 I am Ick-Sung Choi living in South Korea.
  
 I have a question about DPDK? packet distributor library.
  
 Is there any example application to used DPDK packet distributor library?
  
 I am trying to experiment simple function using DPDK packet distributor 
library.
  
 If I can study an example application of DPDK packet distributor library, 
it would be very helpful for my experiment.
  
 I will appreciate if I can be given any example applications, advice, and 
information.
  
 Thank you very much.
  
 Sincerely Yours,
  
 Ick-Sung Choi.
  
Hi,

there is a "distributor" example app in the examples directory.

/Bruce



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 10:28:07AM -0700, Stephen Hemminger wrote:
> On Wed, 30 Sep 2015 13:37:22 +0300
> Vlad Zolotarov  wrote:
> 
> > 
> > 
> > On 09/30/15 00:49, Michael S. Tsirkin wrote:
> > > On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
> > >> On Tue, 29 Sep 2015 23:54:54 +0300
> > >> "Michael S. Tsirkin"  wrote:
> > >>
> > >>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
> >  The security breach motivation u brought in "[RFC PATCH] uio:
> >  uio_pci_generic: Add support for MSI interrupts" thread seems a bit 
> >  weak
> >  since one u let the userland access to the bar it may do any funny 
> >  thing
> >  using the DMA engine of the device. This kind of stuff should be 
> >  prevented
> >  using the iommu and if it's enabled then any funny tricks using 
> >  MSI/MSI-X
> >  configuration will be prevented too.
> > 
> >  I'm about to send the patch to main Linux mailing list. Let's continue 
> >  this
> >  discussion there.
> > 
> > >>> Basically UIO shouldn't be used with devices capable of DMA.
> > >>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
> > 
> > If there is an IOMMU in the picture there shouldn't be any problem to 
> > use UIO with DMA capable devices.
> > 
> > >>> I don't think this can change.
> > >> Given there is no PV IOMMU and even if there was it would be too slow 
> > >> for DPDK
> > >> use, I can't accept that.
> > > QEMU does allow emulating an iommu.
> > 
> > Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
> > option there. And again, it's a general issue not DPDK specific.
> > Today one has to develop some proprietary modules (like igb_uio) to 
> > workaround the issue and this is lame. IMHO uio_pci_generic should
> > be fixed to be able to properly work within any virtualized environment 
> > and not only with KVM.
> > 
> 
> Also VMware (bigger problem) has no IOMMU emulation.
> Other environments as well (Windriver, GCE) have noe IOMMU.

Because the use-case of userspace drivers is not important enough?
Without an IOMMU, there's no way to have secure userspace drivers.

-- 
MST


[dpdk-dev] [PATCH] i40e: fix wrong alignment for the number of HW descriptors

2015-09-30 Thread Konstantin Ananyev
According to XL710 datasheet:
RX QLEN restrictions: When the PXE_MODE flag in the GLLAN_RCTL_0
register is cleared, the QLEN must be whole number of 32
descriptors.
TX QLEN restrictions: When the PXE_MODE flag in the GLLAN_RCTL_0
register is cleared, the QLEN must be whole number of 32
descriptors.

So make sure that for both RX and TX queues number of HW descriptors is
a multiple of 32.

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/i40e/i40e_rxtx.c | 26 +-
 drivers/net/i40e/i40e_rxtx.h |  6 ++
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index fd656d5..260e580 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -57,9 +57,6 @@
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"

-#define I40E_MIN_RING_DESC 64
-#define I40E_MAX_RING_DESC 4096
-#define I40E_ALIGN 128
 #define DEFAULT_TX_RS_THRESH   32
 #define DEFAULT_TX_FREE_THRESH 32
 #define I40E_MAX_PKT_TYPE  256
@@ -68,6 +65,9 @@

 #define I40E_DMA_MEM_ALIGN 4096

+/* Base address of the HW descriptor ring should be 128B aligned. */
+#define I40E_RING_BASE_ALIGN   128
+
 #define I40E_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
ETH_TXQ_FLAGS_NOOFFLOADS)

@@ -2126,9 +2126,9 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
"index exceeds the maximum");
return I40E_ERR_PARAM;
}
-   if (((nb_desc * sizeof(union i40e_rx_desc)) % I40E_ALIGN) != 0 ||
-   (nb_desc > I40E_MAX_RING_DESC) ||
-   (nb_desc < I40E_MIN_RING_DESC)) {
+   if (nb_desc % I40E_ALIGN_RING_DESC != 0 ||
+   (nb_desc > I40E_MAX_RING_DESC) ||
+   (nb_desc < I40E_MIN_RING_DESC)) {
PMD_DRV_LOG(ERR, "Number (%u) of receive descriptors is "
"invalid", nb_desc);
return I40E_ERR_PARAM;
@@ -2338,9 +2338,9 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
return I40E_ERR_PARAM;
}

-   if (((nb_desc * sizeof(struct i40e_tx_desc)) % I40E_ALIGN) != 0 ||
-   (nb_desc > I40E_MAX_RING_DESC) ||
-   (nb_desc < I40E_MIN_RING_DESC)) {
+   if (nb_desc % I40E_ALIGN_RING_DESC != 0 ||
+   (nb_desc > I40E_MAX_RING_DESC) ||
+   (nb_desc < I40E_MIN_RING_DESC)) {
PMD_DRV_LOG(ERR, "Number (%u) of transmit descriptors is "
"invalid", nb_desc);
return I40E_ERR_PARAM;
@@ -2537,10 +2537,10 @@ i40e_ring_dma_zone_reserve(struct rte_eth_dev *dev,

 #ifdef RTE_LIBRTE_XEN_DOM0
return rte_memzone_reserve_bounded(z_name, ring_size,
-   socket_id, 0, I40E_ALIGN, RTE_PGSIZE_2M);
+   socket_id, 0, I40E_RING_BASE_ALIGN, RTE_PGSIZE_2M);
 #else
return rte_memzone_reserve_aligned(z_name, ring_size,
-   socket_id, 0, I40E_ALIGN);
+   socket_id, 0, I40E_RING_BASE_ALIGN);
 #endif
 }

@@ -2554,10 +2554,10 @@ i40e_memzone_reserve(const char *name, uint32_t len, 
int socket_id)
return mz;
 #ifdef RTE_LIBRTE_XEN_DOM0
mz = rte_memzone_reserve_bounded(name, len,
-   socket_id, 0, I40E_ALIGN, RTE_PGSIZE_2M);
+   socket_id, 0, I40E_RING_BASE_ALIGN, RTE_PGSIZE_2M);
 #else
mz = rte_memzone_reserve_aligned(name, len,
-   socket_id, 0, I40E_ALIGN);
+   socket_id, 0, I40E_RING_BASE_ALIGN);
 #endif
return mz;
 }
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 4385142..3d9884d 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -51,6 +51,12 @@
 #define I40E_RXBUF_SZ_1024 1024
 #define I40E_RXBUF_SZ_2048 2048

+/* In none-PXE mode QLEN must be whole number of 32 descriptors. */
+#defineI40E_ALIGN_RING_DESC32
+
+#defineI40E_MIN_RING_DESC  64
+#defineI40E_MAX_RING_DESC  4096
+
 enum i40e_header_split_mode {
i40e_header_split_none = 0,
i40e_header_split_enabled = 1,
-- 
1.8.5.3



[dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK libraries, modules,
binary files, nic bind files and documentation,
when invoking "make install-fhs" (filesystem hierarchy standard)
runtime files will be by default installed in:
$(DESTDIR)/$(BIN_DIR) where BIN_DIR=/usr/bin (binary files)
$(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind (nic bind
files)
$(DESTDIR)/$(DOC_DIR) where DOC_DIR=/usr/share/doc/dpdk (documentation)
$(DESTDIR)/$(LIB_DIR)  (libraries)
if the architecture is 64 bits then LIB_DIR=/usr/lib64
else LIB_DIR=/usr/lib
$(DESTDIR)/$(KERNEL_DIR) (modules)
if RTE_EXEC_ENV=linuxapp then
KERNEL_DIR=/lib/modules/$(uname -r)/build
else KERNEL_DIR=/boot/modules
All directory variables mentioned above can be overridden.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 6 ++
 mk/rte.sdkroot.mk| 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index c508758..c34bc3a 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -195,6 +195,12 @@ install-sdk: install-headers
cp -f $(BUILD_DIR)/build/.config $(DESTDIR)/$(DATA_DIR)/config; \
echo installing: $(BUILD_DIR)/build/.config
 #
+# install runtime files
+#
+.PHONY: install-fhs
+install-fhs: install-lib install-bin install-sbin install-doc install-mod
+
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 245ed21..296eba2 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -98,9 +98,9 @@ testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

 .PHONY: install install-headers install-bin install-lib install-mod \
-install-doc install-sbin install-sdk uninstall
+install-doc install-sbin install-sdk install-fhs uninstall
 install install-headers install-bin install-lib install-mod install-doc \
-install-sbin install-sdk uninstall:
+install-sbin install-sdk install-fhs uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 7/8] mk: Add rule for installing sdk files

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK makefiles, scripts,
examples, tools, config files and headers,
when invoking "make install-sdk" makefiles, scripts,
examples, tools, config files will be installed in:
$(DESTDIR)/$(SDK_DIR)
and headers will be installed in:
$(DESTDIR)/$(INCLUDE_DIR)
where SDK_DIR=/usr/share/dpdk and INCLUDE_DIR=/usr/include/dpdk
by default, you can overrifr SDK_DIR and INCLUDE_DIR vars.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 20 
 mk/rte.sdkroot.mk|  4 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 4eecf31..c508758 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -47,6 +47,7 @@ INCLUDE_DIR ?= /usr/include/dpdk
 BIN_DIR ?= /usr/bin
 DOC_DIR ?= /usr/share/doc/dpdk
 SBIN_DIR ?= /usr/sbin/dpdk_nic_bind
+DATA_DIR ?= /usr/share/dpdk
 HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
 BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
 LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
@@ -175,6 +176,25 @@ install-sbin:
echo installing: $$NB_FILE; \
done
 #
+# install sdk files in /usr/share/dpdk by default
+# DATA_DIR can be overridden.
+#
+.PHONY: install-sdk
+install-sdk: install-headers
+   @echo == Installing sdk files
+   @[ -d $(DESTDIR)/$(DATA_DIR) ] || mkdir -p $(DESTDIR)/$(DATA_DIR); \
+   cp -rf $(BUILD_DIR)/mk $(DESTDIR)/$(DATA_DIR); \
+   echo installing: $(BUILD_DIR)/mk; \
+   cp -rf $(BUILD_DIR)/scripts $(DESTDIR)/$(DATA_DIR); \
+   echo installing: $(BUILD_DIR)/scripts; \
+   cp -rf $(BUILD_DIR)/examples $(DESTDIR)/$(DATA_DIR); \
+   echo installing: $(BUILD_DIR)/examples; \
+   cp -rf $(BUILD_DIR)/tools $(DESTDIR)/$(DATA_DIR); \
+   echo installing: $(BUILD_DIR)/scripts
+   @[ -d $(DESTDIR)/$(DATA_DIR)/config ] || mkdir -p 
$(DESTDIR)/$(DATA_DIR)/config; \
+   cp -f $(BUILD_DIR)/build/.config $(DESTDIR)/$(DATA_DIR)/config; \
+   echo installing: $(BUILD_DIR)/build/.config
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 862af9e..245ed21 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -98,9 +98,9 @@ testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

 .PHONY: install install-headers install-bin install-lib install-mod \
-install-doc install-sbin uninstall
+install-doc install-sbin install-sdk uninstall
 install install-headers install-bin install-lib install-mod install-doc \
-install-sbin uninstall:
+install-sbin install-sdk uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK nic bind files,
when invoking "make install-sbin" nic bind files will
be installed by default in: $(DESTDIR)/$(SBIN_DIR)
where SBIN_DIR=/usr/sbin/dpdk_nic_bind by default,
you can override SBIN_DIR var.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
and dpdk spec file.

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 14 ++
 mk/rte.sdkroot.mk|  4 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 5a2fd40..4eecf31 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -46,11 +46,13 @@ else
 INCLUDE_DIR ?= /usr/include/dpdk
 BIN_DIR ?= /usr/bin
 DOC_DIR ?= /usr/share/doc/dpdk
+SBIN_DIR ?= /usr/sbin/dpdk_nic_bind
 HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
 BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
 LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
 MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
 DOCS := $(wildcard $(BUILD_DIR)/doc/*)
+NIC_BIND_FILES := $(wildcard $(BUILD_DIR)/tools/*nic_bind.py)
 include $(BUILD_DIR)/build/.config
 RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
 RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%)
@@ -161,6 +163,18 @@ install-doc:
echo installing: $$DOC; \
done
 #
+# install nic bind files in /usr/sbin/dpdk_nic_bind
+# by default SBIN_DIR can be overridden.
+#
+.PHONY: install-sbin
+install-sbin:
+   @echo == Installing nic bind files
+   @[ -d $(DESTDIR)/$(SBIN_DIR) ] || mkdir -p $(DESTDIR)/$(SBIN_DIR)
+   @for NB_FILE in ${NIC_BIND_FILES}; do \
+   cp -rf $$NB_FILE ${DESTDIR}/${SBIN_DIR}; \
+   echo installing: $$NB_FILE; \
+   done
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 43f937e..862af9e 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -98,9 +98,9 @@ testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

 .PHONY: install install-headers install-bin install-lib install-mod \
-install-doc uninstall
+install-doc install-sbin uninstall
 install install-headers install-bin install-lib install-mod install-doc \
-uninstall:
+install-sbin uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 5/8] mk: Add rule for installing documentation

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK documentation,
when invoking "make install-doc" documentation files will
be installed in: $(DESTDIR)/$(DOC_DIR) where
DOC_DIR=$(DESTDIR)/usr/share/doc/dpdk by default, you can
override DOC_DIR var.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 14 ++
 mk/rte.sdkroot.mk|  6 --
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index dff1e4d..5a2fd40 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -45,10 +45,12 @@ T=*
 else
 INCLUDE_DIR ?= /usr/include/dpdk
 BIN_DIR ?= /usr/bin
+DOC_DIR ?= /usr/share/doc/dpdk
 HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
 BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
 LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
 MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
+DOCS := $(wildcard $(BUILD_DIR)/doc/*)
 include $(BUILD_DIR)/build/.config
 RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
 RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%)
@@ -147,6 +149,18 @@ install-mod:
echo installing: $$MOD; \
done
 #
+# install documentation in /usr/share/doc/dpdk
+# bu default, DOC_DIR can be overriden.
+#
+.PHONY: install-doc
+install-doc:
+   @echo == Installing documentation
+   @[ -d $(DESTDIR)/$(DOC_DIR) ] || mkdir -p $(DESTDIR)/$(DOC_DIR)
+   @for DOC in ${DOCS}; do \
+   cp -rf $$DOC ${DESTDIR}/${DOC_DIR}; \
+   echo installing: $$DOC; \
+   done
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index e652218..43f937e 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -97,8 +97,10 @@ test fast_test ring_test mempool_test perf_test coverage:
 testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

-.PHONY: install install-headers install-bin install-lib install-mod uninstall
-install install-headers install-bin install-lib install-mod uninstall:
+.PHONY: install install-headers install-bin install-lib install-mod \
+install-doc uninstall
+install install-headers install-bin install-lib install-mod install-doc \
+uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK modules,
when invoking "make install-mod" modules will be
installed in: $(DESTDIR)/$(KERNEL_DIR)
if RTE_EXEC_ENV=linuxapp then
KERNEL_DIR=/lib/modules/$(uname -r)/build
else KERNEL_DIR=/boot/modules
by default, you can override KERNEL_DIR var.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 21 +
 mk/rte.sdkroot.mk|  4 ++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 25b8122..dff1e4d 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -48,13 +48,20 @@ BIN_DIR ?= /usr/bin
 HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
 BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
 LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
+MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
 include $(BUILD_DIR)/build/.config
 RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
+RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%)
 ifeq ($(RTE_ARCH),x86_64)
 LIB_DIR ?= /usr/lib64
 else
 LIB_DIR ?= /usr/lib
 endif
+ifeq ($(RTE_EXEC_ENV),linuxapp)
+KERNEL_DIR ?= /lib/modules/$(shell uname -r)/build
+else
+KERNEL_DIR ?= /boot/modules
+endif
 endif
 endif

@@ -126,6 +133,20 @@ install-lib:
echo installing: $$LIB; \
done
 #
+# if RTE_EXEC_ENV=linuxapp modules install in:
+# /lib/modules/$(uname -r)/build
+# else /boot/modules/ by default
+# KERNEL_DIR can be overridden.
+#
+.PHONY: install-mod
+install-mod:
+   @echo == Installing modules
+   @[ -d $(DESTDIR)/$(KERNEL_DIR) ] || mkdir -p $(DESTDIR)/$(KERNEL_DIR)
+   @for MOD in ${MODULES}; do \
+   cp -rf $$MOD ${DESTDIR}/${KERNEL_DIR}; \
+   echo installing: $$MOD; \
+   done
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 7a72c9b..e652218 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -97,8 +97,8 @@ test fast_test ring_test mempool_test perf_test coverage:
 testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

-.PHONY: install install-headers install-bin install-lib uninstall
-install install-headers install-bin install-lib uninstall:
+.PHONY: install install-headers install-bin install-lib install-mod uninstall
+install install-headers install-bin install-lib install-mod uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 3/8] mk: Add rule for installing libraries

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK libraries,
when invoking "make install-lib" libraries will
be installed in: $(DESTDIR)/$(LIB_DIR)
if architecture is 64 bits then LIB_DIR=/usr/lib64
else LIB_DIR=/usr/lib by default, you can override
LIB_DIR var.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 21 +
 mk/rte.sdkroot.mk|  4 ++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 7d7c2c9..25b8122 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -47,6 +47,14 @@ INCLUDE_DIR ?= /usr/include/dpdk
 BIN_DIR ?= /usr/bin
 HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
 BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
+LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
+include $(BUILD_DIR)/build/.config
+RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
+ifeq ($(RTE_ARCH),x86_64)
+LIB_DIR ?= /usr/lib64
+else
+LIB_DIR ?= /usr/lib
+endif
 endif
 endif

@@ -105,6 +113,19 @@ install-bin:
echo installing: $$BIN_FILE; \
done
 #
+# if architecture is 64 bits install in /usr/lib64
+# else /usr/lib by default
+# LIB_DIR can be overridden.
+#
+.PHONY: install-lib
+install-lib:
+   @echo == Installing libraries
+   @[ -d $(DESTDIR)/$(LIB_DIR) ] || mkdir -p $(DESTDIR)/$(LIB_DIR)
+   @for LIB in ${LIBS}; do \
+   cp -rf $$LIB ${DESTDIR}/${LIB_DIR}; \
+   echo installing: $$LIB; \
+   done
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 24eaa60..7a72c9b 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -97,8 +97,8 @@ test fast_test ring_test mempool_test perf_test coverage:
 testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

-.PHONY: install install-headers install-bin uninstall
-install install-headers install-bin uninstall:
+.PHONY: install install-headers install-bin install-lib uninstall
+install install-headers install-bin install-lib uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 2/8] mk: Add rule for installing app files

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK app files,
when invoking "make install-bin" app files will
be installed in: $(DESTDIR)/$(BIN_DIR)
where BIN_DIR=/usr/bin by default, you can
override BIN_DIR var.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 14 ++
 mk/rte.sdkroot.mk|  4 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 0d5cbcf..7d7c2c9 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -44,7 +44,9 @@ ifeq (,$(wildcard $(BUILD_DIR)/build/.config))
 T=*
 else
 INCLUDE_DIR ?= /usr/include/dpdk
+BIN_DIR ?= /usr/bin
 HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
+BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
 endif
 endif

@@ -91,6 +93,18 @@ install-headers:
echo installing: $$HEADER; \
done
 #
+# install app files in /usr/bin by default
+# BIN_DIR can be overridden.
+#
+.PHONY: install-bin
+install-bin:
+   @echo == Installing app files
+   @[ -d $(DESTDIR)/$(BIN_DIR) ] || mkdir -p $(DESTDIR)/$(BIN_DIR)
+   @for BIN_FILE in ${BINARY_FILES}; do \
+   cp -rf $$BIN_FILE ${DESTDIR}/${BIN_DIR}; \
+   echo installing: $$BIN_FILE; \
+   done
+#
 # uninstall: remove all built sdk
 #
 UNINSTALL_TARGETS := $(addsuffix _uninstall,\
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 8477a2b..24eaa60 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -97,8 +97,8 @@ test fast_test ring_test mempool_test perf_test coverage:
 testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

-.PHONY: install install-headers uninstall
-install install-headers uninstall:
+.PHONY: install install-headers install-bin uninstall
+install install-headers install-bin uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 1/8] mk: Add rule for installing headers

2015-09-30 Thread Mario Carrillo
Add hierarchy-file support to the DPDK headers,
when invoking "make install-headers" headers will
be installed in: $(DESTDIR)/$(INCLUDE_DIR)
where INCLUDE_DIR=/usr/include/dpdk by default,
you can override INCLUDE_DIR var.
This hierarchy is based on:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Signed-off-by: Mario Carrillo 
---
 mk/rte.sdkinstall.mk | 19 ++-
 mk/rte.sdkroot.mk|  4 ++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 86c98a5..0d5cbcf 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -40,7 +40,12 @@ endif
 # target name or a name containing jokers "*". Example:
 # x86_64-native-*-gcc
 ifndef T
+ifeq (,$(wildcard $(BUILD_DIR)/build/.config))
 T=*
+else
+INCLUDE_DIR ?= /usr/include/dpdk
+HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
+endif
 endif

 #
@@ -72,7 +77,19 @@ install: $(INSTALL_TARGETS)
echo "Using local configuration"; \
fi
$(Q)$(MAKE) all O=$(BUILD_DIR)/$*
-
+#
+# install headers in /usr/include/dpdk by default
+# INCLUDE_DIR can be overridden.
+#
+.PHONY: install-headers
+install-headers:
+   @echo == Installing headers;
+   @[ -d $(DESTDIR)/$(INCLUDE_DIR) ] || mkdir -p $(DESTDIR)/$(INCLUDE_DIR)
+   @for HSLINK in ${HSLINKS}; do \
+   HEADER=$$(readlink -f $$HSLINK); \
+   cp -rf $$HEADER ${DESTDIR}/${INCLUDE_DIR}; \
+   echo installing: $$HEADER; \
+   done
 #
 # uninstall: remove all built sdk
 #
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index e8423b0..8477a2b 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -97,8 +97,8 @@ test fast_test ring_test mempool_test perf_test coverage:
 testall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktestall.mk $@

-.PHONY: install uninstall
-install uninstall:
+.PHONY: install install-headers uninstall
+install install-headers uninstall:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkinstall.mk $@

 .PHONY: doc help
-- 
2.1.0



[dpdk-dev] [PATCH v3 0/8] Add instalation rules for dpdk files.

2015-09-30 Thread Mario Carrillo
DPDK package lacks of a mechanism to install libraries, headers
applications, kernel modules and sdk files to a file system tree.

This patch set allows to install files according to the next
proposal:
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html

Using rules support is possible to do the next steps:
make config T=TARGET
make
make INSTALL-TARGET

v3:

Modify the makefile target to specify the files 
that will be installed using a rule:

make install-bin (install app files)(dafault path BIN_DIR=/usr/bin).
make install-headers (install headers)(dafault path 
INCLUDE_DIR=/usr/include/dpdk).
make install-lib (install libraries)(dafault path if the architecture is 64 bits
is LIB_DIR=/usr/lib64 else LIB_DIR=/usr/lib).
make install-sbin (install nic bind files)(dafault path SBIN_DIR=/usr/sbin).
make install-doc (install documentation)(dafault path 
DOC_DIR=/usr/share/doc/dpdk).
make install-mod (install modules)(dafault path if RTE_EXEC_ENV=linuxapp then 
KERNEL_DIR=/lib/modules/$(uname -r)/build else 
KERNEL_DIR=/boot/modules).
make install-sdk (install headers, makefiles, scripts,examples, tools and 
config files) (default path DATA_DIR=/usr/share/dpdk). 
make install-fhs (install  libraries, modules, app files, 
nic bind files and documentation).

Also you can use the DESTDIR variable.
All directory variables mentioned above can be overridden
(BIN_DIR, LIB_DIR, INCLUDE_DIR, SBIN_DIR, DOC_DIR, KERNEL_DIR and DATA_DIR).


v2:

Modify the makefile target to specify the files 
that will be installed using a rule:

make install-bin (install app files).
make install-headers (install headers).
make install-lib (install libraries).
make install-sbin (install nic bind files).
make install-doc (install documentation).
make install-mod (install modules).
make install-sdk (install headers, makefiles, scripts,
examples, tools and config files). 
make install-fhs (install  libraries, modules, app files, 
nic bind files and documentation).

Also you can use the DESTDIR variable.


v1:

By adding a parameter H=1 (hierarchy-file) to makefile system, it is
possible to do the next steps

make config T=TARGET
make
make install H=1

and files will be installed on the proper directory. Also you can use
the DESTDIR variable.

Mario Carrillo (8):
  mk: Add rule for installing headers
  mk: Add rule for installing app files
  mk: Add rule for installing libraries
  mk: Add rule for installing modules
  mk: Add rule for installing documentation
  mk: Add rule for installing nic bind files
  mk: Add rule for installing sdk files
  mk: Add rule for installing runtime files

 mk/rte.sdkinstall.mk | 127 +++
 mk/rte.sdkroot.mk|   6 ++-
 2 files changed, 131 insertions(+), 2 deletions(-)

-- 
2.1.0



[dpdk-dev] [PATCH] ip_pipeline: add flow id parameter to flow classification

2015-09-30 Thread Dumitrescu, Cristian


> -Original Message-
> From: Singh, Jasvinder
> Sent: Wednesday, September 30, 2015 6:46 PM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian
> Subject: [PATCH] ip_pipeline: add flow id parameter to flow classification
> 
> This patch adds flow id field to the flow
> classification table entries and adds table action
> handlers to read flow id from table entry and
> write it into the packet meta-data. The flow_id
> (32-bit) parameter is also added to CLI commands
> flow add, flow delete, etc.
> 
> Signed-off-by: Jasvinder Singh 
> ---
>  .../pipeline/pipeline_flow_classification.c| 206 ++--
> -
>  .../pipeline/pipeline_flow_classification.h|   4 +-
>  .../pipeline/pipeline_flow_classification_be.c | 114 +++-
>  .../pipeline/pipeline_flow_classification_be.h |   2 +
>  4 files changed, 295 insertions(+), 31 deletions(-)
> 

Acked-by: Cristian Dumitrescu 



[dpdk-dev] [PATCH] ip_pipeline: add flow actions pipeline

2015-09-30 Thread Dumitrescu, Cristian

> -Original Message-
> From: Singh, Jasvinder
> Sent: Wednesday, September 30, 2015 6:06 PM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian
> Subject: [PATCH] ip_pipeline: add flow actions pipeline
> 
> 
> Signed-off-by: Jasvinder Singh 
> ---
>  examples/ip_pipeline/Makefile  |2 +
>  examples/ip_pipeline/init.c|2 +
>  .../ip_pipeline/pipeline/pipeline_actions_common.h |   83 +
>  .../ip_pipeline/pipeline/pipeline_flow_actions.c   | 1808
> 
>  .../ip_pipeline/pipeline/pipeline_flow_actions.h   |   78 +
>  .../pipeline/pipeline_flow_actions_be.c|  973 +++
>  .../pipeline/pipeline_flow_actions_be.h|  168 ++
>  7 files changed, 3114 insertions(+)
>  create mode 100644 examples/ip_pipeline/pipeline/pipeline_flow_actions.c
>  create mode 100644 examples/ip_pipeline/pipeline/pipeline_flow_actions.h
>  create mode 100644
> examples/ip_pipeline/pipeline/pipeline_flow_actions_be.c
>  create mode 100644
> examples/ip_pipeline/pipeline/pipeline_flow_actions_be.h

Acked-by: Cristian Dumitrescu 



[dpdk-dev] [PATCH] ip_pipeline: add flow id parameter to flow classification

2015-09-30 Thread Jasvinder Singh
This patch adds flow id field to the flow
classification table entries and adds table action
handlers to read flow id from table entry and
write it into the packet meta-data. The flow_id
(32-bit) parameter is also added to CLI commands
flow add, flow delete, etc.

Signed-off-by: Jasvinder Singh 
---
 .../pipeline/pipeline_flow_classification.c| 206 ++---
 .../pipeline/pipeline_flow_classification.h|   4 +-
 .../pipeline/pipeline_flow_classification_be.c | 114 +++-
 .../pipeline/pipeline_flow_classification_be.h |   2 +
 4 files changed, 295 insertions(+), 31 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c 
b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
index 4b82180..04b6915 100644
--- a/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
+++ b/examples/ip_pipeline/pipeline/pipeline_flow_classification.c
@@ -152,6 +152,7 @@ app_pipeline_fc_key_convert(struct pipeline_fc_key *key_in,
 struct app_pipeline_fc_flow {
struct pipeline_fc_key key;
uint32_t port_id;
+   uint32_t flow_id;
uint32_t signature;
void *entry_ptr;

@@ -280,7 +281,8 @@ int
 app_pipeline_fc_add(struct app_params *app,
uint32_t pipeline_id,
struct pipeline_fc_key *key,
-   uint32_t port_id)
+   uint32_t port_id,
+   uint32_t flow_id)
 {
struct app_pipeline_fc *p;
struct app_pipeline_fc_flow *flow;
@@ -325,6 +327,7 @@ app_pipeline_fc_add(struct app_params *app,
req->subtype = PIPELINE_FC_MSG_REQ_FLOW_ADD;
app_pipeline_fc_key_convert(key, req->key, );
req->port_id = port_id;
+   req->flow_id = flow_id;

/* Send request and wait for response */
rsp = app_msg_send_recv(app, pipeline_id, req, MSG_TIMEOUT_DEFAULT);
@@ -348,6 +351,7 @@ app_pipeline_fc_add(struct app_params *app,
memset(>key, 0, sizeof(flow->key));
memcpy(>key, key, sizeof(flow->key));
flow->port_id = port_id;
+   flow->flow_id = flow_id;
flow->signature = signature;
flow->entry_ptr = rsp->entry_ptr;

@@ -370,6 +374,7 @@ app_pipeline_fc_add_bulk(struct app_params *app,
uint32_t pipeline_id,
struct pipeline_fc_key *key,
uint32_t *port_id,
+   uint32_t *flow_id,
uint32_t n_keys)
 {
struct app_pipeline_fc *p;
@@ -389,6 +394,7 @@ app_pipeline_fc_add_bulk(struct app_params *app,
if ((app == NULL) ||
(key == NULL) ||
(port_id == NULL) ||
+   (flow_id == NULL) ||
(n_keys == 0))
return -1;

@@ -496,6 +502,7 @@ app_pipeline_fc_add_bulk(struct app_params *app,
flow_req[i].key,
[i]);
flow_req[i].port_id = port_id[i];
+   flow_req[i].flow_id = flow_id[i];
}

req->type = PIPELINE_MSG_REQ_CUSTOM;
@@ -535,6 +542,7 @@ app_pipeline_fc_add_bulk(struct app_params *app,
for (i = 0; i < rsp->n_keys; i++) {
memcpy([i]->key, [i], sizeof(flow[i]->key));
flow[i]->port_id = port_id[i];
+   flow[i]->flow_id = flow_id[i];
flow[i]->signature = signature[i];
flow[i]->entry_ptr = flow_rsp[i].entry_ptr;

@@ -731,13 +739,15 @@ print_fc_qinq_flow(struct app_pipeline_fc_flow *flow)
 {
printf("(SVLAN = %" PRIu32 ", "
"CVLAN = %" PRIu32 ") => "
-   "Port = %" PRIu32 " "
+   "Port = %" PRIu32 ", "
+   "Flow ID = %" PRIu32 ", "
"(signature = 0x%08" PRIx32 ", "
"entry_ptr = %p)\n",

flow->key.key.qinq.svlan,
flow->key.key.qinq.cvlan,
flow->port_id,
+   flow->flow_id,
flow->signature,
flow->entry_ptr);
 }
@@ -750,7 +760,8 @@ print_fc_ipv4_5tuple_flow(struct app_pipeline_fc_flow *flow)
   "SP = %" PRIu32 ", "
   "DP = %" PRIu32 ", "
   "Proto = %" PRIu32 ") => "
-  "Port = %" PRIu32 " "
+  "Port = %" PRIu32 ", "
+  "Flow ID = %" PRIu32 " "
   "(signature = 0x%08" PRIx32 ", "
   "entry_ptr = %p)\n",

@@ -770,6 +781,7 @@ print_fc_ipv4_5tuple_flow(struct app_pipeline_fc_flow *flow)
   flow->key.key.ipv4_5tuple.proto,

   flow->port_id,
+  flow->flow_id,
   flow->signature,
   flow->entry_ptr);
 }
@@ -787,7 +799,8 @@ print_fc_ipv6_5tuple_flow(struct app_pipeline_fc_flow 
*flow) {
"SP = %" PRIu32 ", "
"DP = %" PRIu32 " "
"Proto = %" PRIu32 " "
-   "=> Port = %" PRIu32 " "
+   "=> Port = %" PRIu32 ", "
+   "Flow ID = %" PRIu32 " "
"(signature = 0x%08" PRIx32 ", "
 

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Avi Kivity
On 09/30/2015 06:21 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 05:53:54PM +0300, Avi Kivity wrote:
>> On 09/30/2015 05:39 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 04:05:40PM +0300, Avi Kivity wrote:
 On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
>> On 09/30/15 15:03, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
 On 09/30/15 14:41, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>> The whole idea is to bypass kernel. Especially for networking...
> ... on dumb hardware that doesn't support doing that securely.
 On a very capable HW that supports whatever security requirements 
 needed
 (e.g. 82599 Intel's SR-IOV VF devices).
>>> Network card type is irrelevant as long as you do not have an IOMMU,
>>> otherwise you would just use e.g. VFIO.
>> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
>> example where there *is* iommu but it's not virtualized
>> and thus VFIO is
>> useless and there is an option to use directly assigned SR-IOV networking
>> device there where using the kernel drivers impose a performance impact
>> compared to user space UIO-based user space kernel bypass mode of usage. 
>> How
>> is it irrelevant? Could u, pls, clarify your point?
>>
> So it's not even dumb hardware, it's another piece of software
> that forces an "all or nothing" approach where either
> device has access to all VM memory, or none.
> And this, unfortunately, leaves you with no secure way to
> allow userspace drivers.
 Some setups don't need security (they are single-user, single application).
 But do need a lot of performance (like 5X-10X performance).  An example is
 OpenVSwitch, security doesn't help it at all and if you force it to use the
 kernel drivers you cripple it.
>>> We'd have to see there are actual users that need this.  So far, dpdk
>>> seems like the only one,
>> dpdk is a whole class if users.  It's not a specific application.
>>
>>>   and it wants to use UIO for slow path stuff
>>> like polling link status.  Why this needs kernel bypass support, I don't
>>> know.  I asked, and got no answer.
>> First, it's more than link status.  dpdk also has an interrupt mode, which
>> applications can fall back to when when the load is light in order to save
>> power (and in order not to get support calls about 100% cpu when idle).
> Aha, looks like it appeared in June. Interesting, thanks for the info.
>
>> Even for link status, you don't want to poll for that, because accessing
>> device registers is expensive.  An interrupt is the best approach for rare
>> events like link changed.
> Yea, but you probably can get by with a timer for that, even if it's ugly.

Maybe you can, but (a) why increase link status change detection latency 
(b) link status change detection is not the only user of the feature, 
since June.

 Also, I'm root.  I can do anything I like, including loading a patched
 pci_uio_generic.  You're not providing _any_ security, you're simply making
 life harder for users.
>>> Maybe that's true on your system. But I guess you know that's not true
>>> for everyone, not in 2015.
>> Why is it not true?  if I'm root, I can do anything I like to my
>> system, and everyone is root in 2015.  I can access the BARs directly
>> and program DMA, how am I more secure by uio not allowing me to setup
>> msix?
> That's not the point.  The point always was that using uio for these
> devices (capable of DMA, in particular of msix) isn't possible in a
> secure way.

uio is used today for DMA-capable devices.  Some users are perfectly 
willing to give up security for functionality (that's all users who have 
root access to their machines, not just uio users).  You aren't adding 
any security by disallowing uio, you're just removing functionality.

As it happens, you're removing the functionality from the users who have 
no other option.  They can't use vfio because it doesn't work on 
virtualized setups.

(note even on a setup that does support vfio, high performance users 
will want to avoid it).

>   And yes, if same device happens to also do interrupts, UIO
> does not reject it as it probably should, and we can't change this
> without breaking some working setups.  But this doesn't mean we should
> add more setups like this that we'll then be forced to maintain.

pci_uio_generic is maybe the driver with the lowest maintenance burden 
in the entire kernel.  One driver supporting all pci devices, if you 
don't need msi/msix.  And with the patch, it will be one driver 
supporting all pci devices.

I don't really understand the tradeoff.  By rejecting the patch you're 
denying users the ability to use their devices, except through the much 
slower 

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 03:50:09PM +0300, Vlad Zolotarov wrote:
> How not virtualizing iommu forces "all or nothing" approach?

Looks like you can't limit an assigned device to only access part of
guest memory that belongs to a given process.  Either let it access all
of guest memory ("all") or don't assign the device ("nothing").

-- 
MST


[dpdk-dev] [PATCH v2 1/1] ip_pipeline: added dynamic pipeline reconfiguration

2015-09-30 Thread Maciej Gajdzica
Up till now pipeline was bound to thread selected in the initial config.
This patch allows binding pipeline to other threads at runtime using CLI
commands.

Signed-off-by: Maciej Gajdzica 
---
 examples/ip_pipeline/Makefile  |1 +
 examples/ip_pipeline/app.h |5 +
 examples/ip_pipeline/config_parse.c|2 +-
 examples/ip_pipeline/init.c|   61 
 examples/ip_pipeline/pipeline.h|6 +
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |3 +
 examples/ip_pipeline/thread.c  |  134 +++-
 examples/ip_pipeline/thread.h  |  101 ++
 examples/ip_pipeline/thread_fe.c   |  323 
 9 files changed, 634 insertions(+), 2 deletions(-)
 create mode 100644 examples/ip_pipeline/thread.h
 create mode 100644 examples/ip_pipeline/thread_fe.c

diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile
index f3ff1ec..c8e80b5 100644
--- a/examples/ip_pipeline/Makefile
+++ b/examples/ip_pipeline/Makefile
@@ -54,6 +54,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_parse_tm.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_check.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread.c
+SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread_fe.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += cpu_core_map.c

 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_be.c
diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 521e3a0..19ddd31 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -220,9 +220,11 @@ struct app_pipeline_data {
void *be;
void *fe;
uint64_t timer_period;
+   uint32_t enabled;
 };

 struct app_thread_pipeline_data {
+   uint32_t pipeline_id;
void *be;
pipeline_be_op_run f_run;
pipeline_be_op_timer f_timer;
@@ -242,6 +244,9 @@ struct app_thread_data {
uint32_t n_custom;

uint64_t deadline;
+
+   struct rte_ring *msgq_in;
+   struct rte_ring *msgq_out;
 };

 struct app_eal_params {
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index c9b78f9..d2aaadf 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -362,7 +362,7 @@ parser_read_uint32(uint32_t *value, const char *p)
return 0;
 }

-static int
+int
 parse_pipeline_core(uint32_t *socket,
uint32_t *core,
uint32_t *ht,
diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index 3f9c68d..1126288 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -50,6 +50,7 @@
 #include "pipeline_firewall.h"
 #include "pipeline_flow_classification.h"
 #include "pipeline_routing.h"
+#include "thread.h"

 #define APP_NAME_SIZE  32

@@ -1225,6 +1226,48 @@ app_init_pipelines(struct app_params *app)
}
 }

+static inline struct rte_ring *
+app_thread_msgq_in_get(struct app_params *app,
+   uint32_t socket_id, uint32_t core_id, uint32_t ht_id)
+{
+   char msgq_name[32];
+   ssize_t param_idx;
+
+   snprintf(msgq_name, sizeof(msgq_name),
+   "MSGQ-REQ-CORE-s%" PRIu32 "c%" PRIu32 "%s",
+   socket_id,
+   core_id,
+   (ht_id) ? "h" : "");
+   param_idx = APP_PARAM_FIND(app->msgq_params, msgq_name);
+
+   if (param_idx < 0)
+   return NULL;
+
+   return app->msgq[param_idx];
+}
+
+static inline struct rte_ring *
+app_thread_msgq_out_get(struct app_params *app,
+   uint32_t socket_id, uint32_t core_id, uint32_t ht_id)
+{
+   char msgq_name[32];
+   ssize_t param_idx;
+
+   snprintf(msgq_name, sizeof(msgq_name),
+   "MSGQ-RSP-CORE-s%" PRIu32 "c%" PRIu32 "%s",
+   socket_id,
+   core_id,
+   (ht_id) ? "h" : "");
+   param_idx = APP_PARAM_FIND(app->msgq_params, msgq_name);
+
+
+   if (param_idx < 0)
+   return NULL;
+
+   return app->msgq[param_idx];
+
+}
+
 static void
 app_init_threads(struct app_params *app)
 {
@@ -1253,6 +1296,20 @@ app_init_threads(struct app_params *app)

t = >thread_data[lcore_id];

+   t->msgq_in = app_thread_msgq_in_get(app,
+   params->socket_id,
+   params->core_id,
+   params->hyper_th_id);
+   if (t->msgq_in == NULL)
+   rte_panic("Init error: Cannot find MSGQ_IN for thread 
%" PRId32, lcore_id);
+
+   t->msgq_out = app_thread_msgq_out_get(app,
+   params->socket_id,
+   params->core_id,
+   params->hyper_th_id);
+   if (t->msgq_out == NULL)
+   rte_panic("Init error: Cannot find MSGQ_OUT for thread 
%" 

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 05:53:54PM +0300, Avi Kivity wrote:
> On 09/30/2015 05:39 PM, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 04:05:40PM +0300, Avi Kivity wrote:
> >>
> >>On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
> On 09/30/15 15:03, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
> >>On 09/30/15 14:41, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
> The whole idea is to bypass kernel. Especially for networking...
> >>>... on dumb hardware that doesn't support doing that securely.
> >>On a very capable HW that supports whatever security requirements needed
> >>(e.g. 82599 Intel's SR-IOV VF devices).
> >Network card type is irrelevant as long as you do not have an IOMMU,
> >otherwise you would just use e.g. VFIO.
> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
> example where there *is* iommu but it's not virtualized
> and thus VFIO is
> useless and there is an option to use directly assigned SR-IOV networking
> device there where using the kernel drivers impose a performance impact
> compared to user space UIO-based user space kernel bypass mode of usage. 
> How
> is it irrelevant? Could u, pls, clarify your point?
> 
> >>>So it's not even dumb hardware, it's another piece of software
> >>>that forces an "all or nothing" approach where either
> >>>device has access to all VM memory, or none.
> >>>And this, unfortunately, leaves you with no secure way to
> >>>allow userspace drivers.
> >>Some setups don't need security (they are single-user, single application).
> >>But do need a lot of performance (like 5X-10X performance).  An example is
> >>OpenVSwitch, security doesn't help it at all and if you force it to use the
> >>kernel drivers you cripple it.
> >We'd have to see there are actual users that need this.  So far, dpdk
> >seems like the only one,
> 
> dpdk is a whole class if users.  It's not a specific application.
> 
> >  and it wants to use UIO for slow path stuff
> >like polling link status.  Why this needs kernel bypass support, I don't
> >know.  I asked, and got no answer.
> 
> First, it's more than link status.  dpdk also has an interrupt mode, which
> applications can fall back to when when the load is light in order to save
> power (and in order not to get support calls about 100% cpu when idle).

Aha, looks like it appeared in June. Interesting, thanks for the info.

> Even for link status, you don't want to poll for that, because accessing
> device registers is expensive.  An interrupt is the best approach for rare
> events like link changed.

Yea, but you probably can get by with a timer for that, even if it's ugly.

> >>Also, I'm root.  I can do anything I like, including loading a patched
> >>pci_uio_generic.  You're not providing _any_ security, you're simply making
> >>life harder for users.
> >Maybe that's true on your system. But I guess you know that's not true
> >for everyone, not in 2015.
> 
> Why is it not true?  if I'm root, I can do anything I like to my
> system, and everyone is root in 2015.  I can access the BARs directly
> and program DMA, how am I more secure by uio not allowing me to setup
> msix?

That's not the point.  The point always was that using uio for these
devices (capable of DMA, in particular of msix) isn't possible in a
secure way. And yes, if same device happens to also do interrupts, UIO
does not reject it as it probably should, and we can't change this
without breaking some working setups.  But this doesn't mean we should
add more setups like this that we'll then be forced to maintain.


> Non-root users are already secured by their inability to load the module,
> and by the device permissions.
> 
> >
> >>>So it makes even less sense to add insecure work-arounds in the kernel.
> >>>It seems quite likely that by the time the new kernel reaches
> >>>production X years from now, EC2 will have a virtual iommu.
> >>I can adopt a new kernel tomorrow.  I have no influence on EC2.
> >>
> >>
> >Xen grant tables sound like they could be the right interface
> >for EC2.  google search for "grant tables iommu" immediately gives me:
> >http://lists.xenproject.org/archives/html/xen-devel/2014-04/msg00963.html
> >Maybe latest Xen is already doing the right thing, and it's just the
> >question of making VFIO use that.
> >
> 
> grant tables only work for virtual devices, not physical devices.

Why not? That's what the patches above seem to do.

-- 
MST


[dpdk-dev] [PATCH 1/1] ip_pipeline: added dynamic pipeline reconfiguration

2015-09-30 Thread Maciej Gajdzica
Up till now pipeline was bound to thread selected in the initial config.
This patch allows binding pipeline to other threads at runtime using CLI
commands.

Signed-off-by: Maciej Gajdzica 
---
 examples/ip_pipeline/Makefile  |1 +
 examples/ip_pipeline/app.h |5 +
 examples/ip_pipeline/config_parse.c|2 +-
 examples/ip_pipeline/init.c|   61 
 examples/ip_pipeline/pipeline.h|6 +
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |3 +
 examples/ip_pipeline/thread.c  |  135 +++-
 examples/ip_pipeline/thread.h  |  101 ++
 examples/ip_pipeline/thread_fe.c   |  328 
 9 files changed, 640 insertions(+), 2 deletions(-)
 create mode 100644 examples/ip_pipeline/thread.h
 create mode 100644 examples/ip_pipeline/thread_fe.c

diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile
index f3ff1ec..c8e80b5 100644
--- a/examples/ip_pipeline/Makefile
+++ b/examples/ip_pipeline/Makefile
@@ -54,6 +54,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_parse_tm.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_check.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread.c
+SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread_fe.c
 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += cpu_core_map.c

 SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_be.c
diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 521e3a0..19ddd31 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -220,9 +220,11 @@ struct app_pipeline_data {
void *be;
void *fe;
uint64_t timer_period;
+   uint32_t enabled;
 };

 struct app_thread_pipeline_data {
+   uint32_t pipeline_id;
void *be;
pipeline_be_op_run f_run;
pipeline_be_op_timer f_timer;
@@ -242,6 +244,9 @@ struct app_thread_data {
uint32_t n_custom;

uint64_t deadline;
+
+   struct rte_ring *msgq_in;
+   struct rte_ring *msgq_out;
 };

 struct app_eal_params {
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index c9b78f9..d2aaadf 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -362,7 +362,7 @@ parser_read_uint32(uint32_t *value, const char *p)
return 0;
 }

-static int
+int
 parse_pipeline_core(uint32_t *socket,
uint32_t *core,
uint32_t *ht,
diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index 3f9c68d..1126288 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -50,6 +50,7 @@
 #include "pipeline_firewall.h"
 #include "pipeline_flow_classification.h"
 #include "pipeline_routing.h"
+#include "thread.h"

 #define APP_NAME_SIZE  32

@@ -1225,6 +1226,48 @@ app_init_pipelines(struct app_params *app)
}
 }

+static inline struct rte_ring *
+app_thread_msgq_in_get(struct app_params *app,
+   uint32_t socket_id, uint32_t core_id, uint32_t ht_id)
+{
+   char msgq_name[32];
+   ssize_t param_idx;
+
+   snprintf(msgq_name, sizeof(msgq_name),
+   "MSGQ-REQ-CORE-s%" PRIu32 "c%" PRIu32 "%s",
+   socket_id,
+   core_id,
+   (ht_id) ? "h" : "");
+   param_idx = APP_PARAM_FIND(app->msgq_params, msgq_name);
+
+   if (param_idx < 0)
+   return NULL;
+
+   return app->msgq[param_idx];
+}
+
+static inline struct rte_ring *
+app_thread_msgq_out_get(struct app_params *app,
+   uint32_t socket_id, uint32_t core_id, uint32_t ht_id)
+{
+   char msgq_name[32];
+   ssize_t param_idx;
+
+   snprintf(msgq_name, sizeof(msgq_name),
+   "MSGQ-RSP-CORE-s%" PRIu32 "c%" PRIu32 "%s",
+   socket_id,
+   core_id,
+   (ht_id) ? "h" : "");
+   param_idx = APP_PARAM_FIND(app->msgq_params, msgq_name);
+
+
+   if (param_idx < 0)
+   return NULL;
+
+   return app->msgq[param_idx];
+
+}
+
 static void
 app_init_threads(struct app_params *app)
 {
@@ -1253,6 +1296,20 @@ app_init_threads(struct app_params *app)

t = >thread_data[lcore_id];

+   t->msgq_in = app_thread_msgq_in_get(app,
+   params->socket_id,
+   params->core_id,
+   params->hyper_th_id);
+   if (t->msgq_in == NULL)
+   rte_panic("Init error: Cannot find MSGQ_IN for thread 
%" PRId32, lcore_id);
+
+   t->msgq_out = app_thread_msgq_out_get(app,
+   params->socket_id,
+   params->core_id,
+   params->hyper_th_id);
+   if (t->msgq_out == NULL)
+   rte_panic("Init error: Cannot find MSGQ_OUT for thread 
%" 

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Avi Kivity
On 09/30/2015 05:39 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 04:05:40PM +0300, Avi Kivity wrote:
>>
>> On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
 On 09/30/15 15:03, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
>> On 09/30/15 14:41, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
 The whole idea is to bypass kernel. Especially for networking...
>>> ... on dumb hardware that doesn't support doing that securely.
>> On a very capable HW that supports whatever security requirements needed
>> (e.g. 82599 Intel's SR-IOV VF devices).
> Network card type is irrelevant as long as you do not have an IOMMU,
> otherwise you would just use e.g. VFIO.
 Sorry, but I don't follow your logic here - Amazon EC2 environment is a
 example where there *is* iommu but it's not virtualized
 and thus VFIO is
 useless and there is an option to use directly assigned SR-IOV networking
 device there where using the kernel drivers impose a performance impact
 compared to user space UIO-based user space kernel bypass mode of usage. 
 How
 is it irrelevant? Could u, pls, clarify your point?

>>> So it's not even dumb hardware, it's another piece of software
>>> that forces an "all or nothing" approach where either
>>> device has access to all VM memory, or none.
>>> And this, unfortunately, leaves you with no secure way to
>>> allow userspace drivers.
>> Some setups don't need security (they are single-user, single application).
>> But do need a lot of performance (like 5X-10X performance).  An example is
>> OpenVSwitch, security doesn't help it at all and if you force it to use the
>> kernel drivers you cripple it.
> We'd have to see there are actual users that need this.  So far, dpdk
> seems like the only one,

dpdk is a whole class if users.  It's not a specific application.

>   and it wants to use UIO for slow path stuff
> like polling link status.  Why this needs kernel bypass support, I don't
> know.  I asked, and got no answer.

First, it's more than link status.  dpdk also has an interrupt mode, 
which applications can fall back to when when the load is light in order 
to save power (and in order not to get support calls about 100% cpu when 
idle).

Even for link status, you don't want to poll for that, because accessing 
device registers is expensive.  An interrupt is the best approach for 
rare events like link changed.

>
>> Also, I'm root.  I can do anything I like, including loading a patched
>> pci_uio_generic.  You're not providing _any_ security, you're simply making
>> life harder for users.
> Maybe that's true on your system. But I guess you know that's not true
> for everyone, not in 2015.

Why is it not true?  if I'm root, I can do anything I like to my system, 
and everyone is root in 2015.  I can access the BARs directly and 
program DMA, how am I more secure by uio not allowing me to setup msix?

Non-root users are already secured by their inability to load the 
module, and by the device permissions.

>
>>> So it makes even less sense to add insecure work-arounds in the kernel.
>>> It seems quite likely that by the time the new kernel reaches
>>> production X years from now, EC2 will have a virtual iommu.
>> I can adopt a new kernel tomorrow.  I have no influence on EC2.
>>
>>
> Xen grant tables sound like they could be the right interface
> for EC2.  google search for "grant tables iommu" immediately gives me:
> http://lists.xenproject.org/archives/html/xen-devel/2014-04/msg00963.html
> Maybe latest Xen is already doing the right thing, and it's just the
> question of making VFIO use that.
>

grant tables only work for virtual devices, not physical devices.




[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 04:05:40PM +0300, Avi Kivity wrote:
> 
> 
> On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
> >>
> >>On 09/30/15 15:03, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
> On 09/30/15 14:41, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
> >>The whole idea is to bypass kernel. Especially for networking...
> >... on dumb hardware that doesn't support doing that securely.
> On a very capable HW that supports whatever security requirements needed
> (e.g. 82599 Intel's SR-IOV VF devices).
> >>>Network card type is irrelevant as long as you do not have an IOMMU,
> >>>otherwise you would just use e.g. VFIO.
> >>Sorry, but I don't follow your logic here - Amazon EC2 environment is a
> >>example where there *is* iommu but it's not virtualized
> >>and thus VFIO is
> >>useless and there is an option to use directly assigned SR-IOV networking
> >>device there where using the kernel drivers impose a performance impact
> >>compared to user space UIO-based user space kernel bypass mode of usage. How
> >>is it irrelevant? Could u, pls, clarify your point?
> >>
> >So it's not even dumb hardware, it's another piece of software
> >that forces an "all or nothing" approach where either
> >device has access to all VM memory, or none.
> >And this, unfortunately, leaves you with no secure way to
> >allow userspace drivers.
> 
> Some setups don't need security (they are single-user, single application).
> But do need a lot of performance (like 5X-10X performance).  An example is
> OpenVSwitch, security doesn't help it at all and if you force it to use the
> kernel drivers you cripple it.

We'd have to see there are actual users that need this.  So far, dpdk
seems like the only one, and it wants to use UIO for slow path stuff
like polling link status.  Why this needs kernel bypass support, I don't
know.  I asked, and got no answer.

> 
> Also, I'm root.  I can do anything I like, including loading a patched
> pci_uio_generic.  You're not providing _any_ security, you're simply making
> life harder for users.

Maybe that's true on your system. But I guess you know that's not true
for everyone, not in 2015.

> >So it makes even less sense to add insecure work-arounds in the kernel.
> >It seems quite likely that by the time the new kernel reaches
> >production X years from now, EC2 will have a virtual iommu.
> 
> I can adopt a new kernel tomorrow.  I have no influence on EC2.
> 
>

Xen grant tables sound like they could be the right interface
for EC2.  google search for "grant tables iommu" immediately gives me:
http://lists.xenproject.org/archives/html/xen-devel/2014-04/msg00963.html
Maybe latest Xen is already doing the right thing, and it's just the
question of making VFIO use that.

-- 
MST


[dpdk-dev] [PATCH 2/2] virtio: change io privilege level as early as possible

2015-09-30 Thread Thomas Monjalon
2015-09-30 10:52, Neil Horman:
> On Wed, Sep 30, 2015 at 10:28:53AM +0200, David Marchand wrote:
> > On Tue, Sep 29, 2015 at 9:25 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> > 
> > > On Tue, 10 Mar 2015 09:14:28 -0400
> > > Neil Horman  wrote:
> > > > I don't see how this works for all cases.  The constructor is called
> > > once when
> > > > the library is first loaded.  What if you have multiple independent
> > > (i.e. not
> > > > forked children) processes that are using the dpdk in parallel?  Only 
> > > > the
> > > > process that triggered the library load will have io permissions set
> > > > appropriately.  I think what you need is to have every application that
> > > expects
> > > > to call through the transmit path or poll the receive path call iopl,
> > > which I
> > > > think speaks to having this requirement documented, so each application
> > > can call
> > > > iopl prior to calling fork/daemonize/etc.
> > > >
> > >
> > > I am still seeing this problem with DPDK 2.0 and 2.1.
> > > It seems to me that doing the iopl init in eal_init is the only safe way.
> > > Other workaround is to have application calling iopl_init before eal_init
> > > but that kind of violates the current method of all things being
> > > initialized by eal_init
> > 
> > Putting it in the virtio pmd constructor is my preferred solution and we
> > don't need to pollute the eal for virtio (specific to x86, btw).
> 
> Preferred solution or not, you can't just call iopl from the constructor,
> because not all process will get appropriate permissions.  It needs to be 
> called
> by every process.  What Stephen is saying is that your solution has use cases
> for which it doesn't work, and that needs to be solved.

I think it may be solved by calling iopl in the constructor.
We just need an extra call in rte_virtio_pmd_init() to detect iopl failures.
We can also simply move rte_eal_intr_init() after rte_eal_dev_init().
Please read my previous post on this topic:

http://thread.gmane.org/gmane.comp.networking.dpdk.devel/14761/focus=22341

About the multiprocess case, I don't see the problem as the RX/TX and interrupt
threads are forked in the rte_eal_init() context which should call iopl even in
secondary processes.


[dpdk-dev] [PATCH v2 1/1] ip_pipeline: added dynamic pipeline reconfiguration

2015-09-30 Thread Dumitrescu, Cristian


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maciej Gajdzica
> Sent: Wednesday, September 30, 2015 5:26 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 1/1] ip_pipeline: added dynamic pipeline
> reconfiguration
> 
> Up till now pipeline was bound to thread selected in the initial config.
> This patch allows binding pipeline to other threads at runtime using CLI
> commands.
> 
> Signed-off-by: Maciej Gajdzica 
> ---

Acked-by: Cristian Dumitrescu 



[dpdk-dev] [PATCH 02/20] librte_ether: add fields from rte_pci_driver to rte_eth_dev_data

2015-09-30 Thread Iremonger, Bernard
Hi Neil


> > > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > > @@ -1635,8 +1635,23 @@ struct rte_eth_dev_data {
> > > > >   all_multicast : 1, /**< RX all multicast mode ON(1) /
> OFF(0). */
> > > > >   dev_started : 1,   /**< Device state: STARTED(1) /
> STOPPED(0). */
> > > > >   lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
> > > > > + uint32_t dev_flags; /**< Flags controlling handling of device.
> */
> > > > > + enum rte_kernel_driver kdrv;/**< Kernel driver
> passthrough */
> > > > Why add this here? The ennumerated driver types are all variants
> > > > on PCI bus types.  Not sure why the ethernet interface needs to
> > > > know this info
> > > >
> > > > > + int numa_node;
> > > > Ditto, this seems like information that is only relevant if the
> > > > device is on a physical bus (i.e. virual devices are likely to not
> > > > have a numa node)
> > > >
> > > Actually, I disagree. For some virtual devices they will have a numa
> > > node. For ring or other virtual PMDs the numa node will be the node
> > > on which the ring / mempool etc. memory is allocated on, and can be of
> relevance.
> > >
> > > /Bruce
> > >
> >
> > I think its fairly clear that some devices (including virtual ones)
> > have some relevant relation to a numa_node (There are even some that
> > have no numa_node, for which a -1 value makes some sense).  That said,
> > there are just as many that don't have a relevant numa_node.
> >
> > 1) There are some drivers for which numa_node make no sense
> > (regardless of
> > value):
> >  * af_packet - The numa node is at best determined at run time by the
> > interface the socket is bound to
> >
> >  * pcap - same as af_packet
> >
> >  * bonding - multiple interfaces mean multiple numa_nodes, any value
> > set here is just as likely to be wrong as right
> >
> >  * mpipe - no real large memory area to associate with a numa node
> >
> >  * virtio - uses iopl for communication, and cannot know its numa_node
> >
> >  * vmxnet3 - same concept as virtio
> >
> >  * xenvirt - same as vmxnet3
> >
> > I think its better that you store numa locality information in a pmd's
> > private bus data, and export it to applications via a device method.
> > that provides the flexibility to tell the application that there is no
> > numa locality for a device (by not implementing the method), without
> > having to expose an unset data field to the application.
> >
> > Neil
> >
> 
> Sure, that could work.
> However, is it really worthwhile asking drivers to implement a new ethdev
> API function, rather than just having them set the numa node field correctly
> in the init function?
> 
> /Bruce

The four fields below have been added  to  struct rte_eth_dev_data

uint32_t dev_flags; /**< Flags controlling handling of device. */
enum rte_kernel_driver kdrv;/**< Kernel driver passthrough */
int numa_node;
const char *drv_name;

The data for these fields is available in the struct rte_pci_device.
In order to remove the pci_device  from the vdev PMD's this data needs to be 
available in the eth_dev.
A new function rte_eth_copy_dev_info() has been added to the eth_dev for use by 
the pdevs to copy this data from the pci_device to the ethdev.
In the vdevs the pci_device has been removed and the new fields are set up 
directly in the rte_driver.init function.

The numa_node is already initialised  in  the following vdev PMD's:

af_packet - initialized from socket_id
bonding - initialized from socket_id
mpipe - initialized from instance
null   - initialized from socket_id
pcap - initialized from socket_id
ring  - initialized form socket_id
xenvirt - initialized from socket_id

Regards,

Bernard.








[dpdk-dev] [PATCH] ip_pipeline: fixed bug in app_link_config

2015-09-30 Thread Jasvinder Singh
This patch fixes bug in app_link_config.

Signed-off-by: Jasvinder Singh 
---
 examples/ip_pipeline/pipeline/pipeline_common_fe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_common_fe.c 
b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
index fcda0ce..bb4cbd2 100644
--- a/examples/ip_pipeline/pipeline/pipeline_common_fe.c
+++ b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
@@ -358,7 +358,7 @@ app_link_config(struct app_params *app,
if (link->ip == ip) {
APP_LOG(app, HIGH,
"%s is already assigned this IP address",
-   p->name);
+   link->name);
return -1;
}
}
-- 
2.1.0



[dpdk-dev] [PATCH] ip_pipeline: add more functions to routing-pipeline

2015-09-30 Thread Jasvinder Singh
This patch adds following features to the
routing-pipeline to enable it for various NFV
use-cases;

1.Fast-path ARP table enable/disable
2.Double-tagged VLAN (Q-in-Q) packet enacapsulation
for the next-hop
3.MPLS encapsulation for the next-hop
4.Add colour (Traffic-class for QoS) to the MPLS tag
5.Classification action to select the input queue
of the hierarchical scehdular (QoS)

The above proposed features can be enabled
(or disabled) through the parameters specified
in configuration file as below;

[PIPELINE0]
type = ROUTING
core = 1
pktq_in = RXQ0.0 RXQ1.0 RXQ2.0 RXQ3.0
pktq_out = TXQ0.0 TXQ1.0 TXQ2.0 TXQ3.0
n_routes = 4096
n_arp_entries = 1024
ip_hdr_offset = 142
arp_key_offset = 64
l2 = qinq
qinq_sched = no

The LPM table entries might include additional
fields depending upon the packet encapsulation
(Q-in-Q, MPLS)for the next-hop. Therefore, the
CLI commands for adding or deleting such entries
to LPM table have been implemented. The key
functions such as QinQ and MPLS encapsulation,
classification action to select the input queue
of the hierarchical schedular(QoS) and adding
colour (Traffic-class for QoS) to the MPLS
tag have been implemented as action handlers.

Signed-off-by: Jasvinder Singh 
---
 examples/ip_pipeline/pipeline/pipeline_routing.c   |  795 ++-
 examples/ip_pipeline/pipeline/pipeline_routing.h   |8 +-
 .../ip_pipeline/pipeline/pipeline_routing_be.c | 1393 ++--
 .../ip_pipeline/pipeline/pipeline_routing_be.h |   62 +-
 4 files changed, 2074 insertions(+), 184 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_routing.c 
b/examples/ip_pipeline/pipeline/pipeline_routing.c
index beec982..8ba1e25 100644
--- a/examples/ip_pipeline/pipeline/pipeline_routing.c
+++ b/examples/ip_pipeline/pipeline/pipeline_routing.c
@@ -43,7 +43,7 @@

 struct app_pipeline_routing_route {
struct pipeline_routing_route_key key;
-   struct app_pipeline_routing_route_params params;
+   struct pipeline_routing_route_data data;
void *entry_ptr;

TAILQ_ENTRY(app_pipeline_routing_route) node;
@@ -196,12 +196,44 @@ print_route(const struct app_pipeline_routing_route 
*route)
key->ip & 0xFF,

key->depth,
-   route->params.port_id,
+   route->data.port_id);
+
+   if (route->data.flags & PIPELINE_ROUTING_ROUTE_ARP)
+   printf(
+   ", Next Hop IP = %" PRIu32 ".%" PRIu32
+   ".%" PRIu32 ".%" PRIu32,
+   (route->data.ethernet.ip >> 24) & 0xFF,
+   (route->data.ethernet.ip >> 16) & 0xFF,
+   (route->data.ethernet.ip >> 8) & 0xFF,
+   route->data.ethernet.ip & 0xFF);
+   else
+   printf(
+   ", Next Hop HWaddress = %02" PRIx32
+   ":%02" PRIx32 ":%02" PRIx32
+   ":%02" PRIx32 ":%02" PRIx32
+   ":%02" PRIx32,
+   route->data.ethernet.macaddr.addr_bytes[0],
+   route->data.ethernet.macaddr.addr_bytes[1],
+   route->data.ethernet.macaddr.addr_bytes[2],
+   route->data.ethernet.macaddr.addr_bytes[3],
+   route->data.ethernet.macaddr.addr_bytes[4],
+   route->data.ethernet.macaddr.addr_bytes[5]);
+
+   if (route->data.flags & PIPELINE_ROUTING_ROUTE_QINQ)
+   printf(", QinQ SVLAN = %" PRIu32 " CVLAN = %" PRIu32,
+   route->data.l2.qinq.svlan,
+   route->data.l2.qinq.cvlan);
+
+   if (route->data.flags & PIPELINE_ROUTING_ROUTE_MPLS) {
+   uint32_t i;
+
+   printf(", MPLS labels");
+   for (i = 0; i < route->data.l2.mpls.n_labels; i++)
+   printf(" %" PRIu32,
+   route->data.l2.mpls.labels[i]);
+   }

-   (route->params.ip >> 24) & 0xFF,
-   (route->params.ip >> 16) & 0xFF,
-   (route->params.ip >> 8) & 0xFF,
-   route->params.ip & 0xFF);
+   printf(")\n");
}
 }

@@ -212,6 +244,8 @@ print_arp_entry(const struct app_pipeline_routing_arp_entry 
*entry)
".%" PRIu32 ".%" PRIu32 ") => "
"HWaddress = %02" PRIx32 ":%02" PRIx32 ":%02" PRIx32
":%02" PRIx32 ":%02" PRIx32 ":%02" PRIx32 "\n",
+
+
entry->key.key.ipv4.port_id,
(entry->key.key.ipv4.ip >> 24) & 0xFF,
(entry->key.key.ipv4.ip >> 16) & 0xFF,
@@ -253,7 +287,7 @@ int
 

[dpdk-dev] [PATCH 1/1] ip_pipeline: added dynamic pipeline reconfiguration

2015-09-30 Thread Gajdzica, MaciejX T
Forgot to delete debug printfs, v2 incoming - Self NACK

> -Original Message-
> From: Gajdzica, MaciejX T
> Sent: Wednesday, September 30, 2015 6:11 PM
> To: dev at dpdk.org
> Cc: Gajdzica, MaciejX T
> Subject: [PATCH 1/1] ip_pipeline: added dynamic pipeline reconfiguration
> 
> Up till now pipeline was bound to thread selected in the initial config.
> This patch allows binding pipeline to other threads at runtime using CLI
> commands.
> 
> Signed-off-by: Maciej Gajdzica 
> ---
>  examples/ip_pipeline/Makefile  |1 +
>  examples/ip_pipeline/app.h |5 +
>  examples/ip_pipeline/config_parse.c|2 +-
>  examples/ip_pipeline/init.c|   61 
>  examples/ip_pipeline/pipeline.h|6 +
>  examples/ip_pipeline/pipeline/pipeline_common_fe.h |3 +
>  examples/ip_pipeline/thread.c  |  135 +++-
>  examples/ip_pipeline/thread.h  |  101 ++
>  examples/ip_pipeline/thread_fe.c   |  328 
> 
>  9 files changed, 640 insertions(+), 2 deletions(-)  create mode 100644
> examples/ip_pipeline/thread.h  create mode 100644
> examples/ip_pipeline/thread_fe.c
> 
> diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile
> index f3ff1ec..c8e80b5 100644
> --- a/examples/ip_pipeline/Makefile
> +++ b/examples/ip_pipeline/Makefile
> @@ -54,6 +54,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) +=
> config_parse_tm.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += config_check.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread_fe.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += cpu_core_map.c
> 
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_be.c diff --git
> a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h index
> 521e3a0..19ddd31 100644
> --- a/examples/ip_pipeline/app.h
> +++ b/examples/ip_pipeline/app.h
> @@ -220,9 +220,11 @@ struct app_pipeline_data {
>   void *be;
>   void *fe;
>   uint64_t timer_period;
> + uint32_t enabled;
>  };
> 
>  struct app_thread_pipeline_data {
> + uint32_t pipeline_id;
>   void *be;
>   pipeline_be_op_run f_run;
>   pipeline_be_op_timer f_timer;
> @@ -242,6 +244,9 @@ struct app_thread_data {
>   uint32_t n_custom;
> 
>   uint64_t deadline;
> +
> + struct rte_ring *msgq_in;
> + struct rte_ring *msgq_out;
>  };
> 
>  struct app_eal_params {
> diff --git a/examples/ip_pipeline/config_parse.c
> b/examples/ip_pipeline/config_parse.c
> index c9b78f9..d2aaadf 100644
> --- a/examples/ip_pipeline/config_parse.c
> +++ b/examples/ip_pipeline/config_parse.c
> @@ -362,7 +362,7 @@ parser_read_uint32(uint32_t *value, const char *p)
>   return 0;
>  }
> 
> -static int
> +int
>  parse_pipeline_core(uint32_t *socket,
>   uint32_t *core,
>   uint32_t *ht,
> diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c index
> 3f9c68d..1126288 100644
> --- a/examples/ip_pipeline/init.c
> +++ b/examples/ip_pipeline/init.c
> @@ -50,6 +50,7 @@
>  #include "pipeline_firewall.h"
>  #include "pipeline_flow_classification.h"
>  #include "pipeline_routing.h"
> +#include "thread.h"
> 
>  #define APP_NAME_SIZE32
> 
> @@ -1225,6 +1226,48 @@ app_init_pipelines(struct app_params *app)
>   }
>  }
> 
> +static inline struct rte_ring *
> +app_thread_msgq_in_get(struct app_params *app,
> + uint32_t socket_id, uint32_t core_id, uint32_t ht_id) {
> + char msgq_name[32];
> + ssize_t param_idx;
> +
> + snprintf(msgq_name, sizeof(msgq_name),
> + "MSGQ-REQ-CORE-s%" PRIu32 "c%" PRIu32 "%s",
> + socket_id,
> + core_id,
> + (ht_id) ? "h" : "");
> + param_idx = APP_PARAM_FIND(app->msgq_params, msgq_name);
> +
> + if (param_idx < 0)
> + return NULL;
> +
> + return app->msgq[param_idx];
> +}
> +
> +static inline struct rte_ring *
> +app_thread_msgq_out_get(struct app_params *app,
> + uint32_t socket_id, uint32_t core_id, uint32_t ht_id) {
> + char msgq_name[32];
> + ssize_t param_idx;
> +
> + snprintf(msgq_name, sizeof(msgq_name),
> + "MSGQ-RSP-CORE-s%" PRIu32 "c%" PRIu32 "%s",
> + socket_id,
> + core_id,
> + (ht_id) ? "h" : "");
> + param_idx = APP_PARAM_FIND(app->msgq_params, msgq_name);
> +
> +
> + if (param_idx < 0)
> + return NULL;
> +
> + return app->msgq[param_idx];
> +
> +}
> +
>  static void
>  app_init_threads(struct app_params *app)  { @@ -1253,6 +1296,20 @@
> app_init_threads(struct app_params *app)
> 
>   t = >thread_data[lcore_id];
> 
> + t->msgq_in = app_thread_msgq_in_get(app,
> + params->socket_id,
> + params->core_id,
> + params->hyper_th_id);
> + 

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Avi Kivity


On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 15:03, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
 On 09/30/15 14:41, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>> The whole idea is to bypass kernel. Especially for networking...
> ... on dumb hardware that doesn't support doing that securely.
 On a very capable HW that supports whatever security requirements needed
 (e.g. 82599 Intel's SR-IOV VF devices).
>>> Network card type is irrelevant as long as you do not have an IOMMU,
>>> otherwise you would just use e.g. VFIO.
>> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
>> example where there *is* iommu but it's not virtualized
>> and thus VFIO is
>> useless and there is an option to use directly assigned SR-IOV networking
>> device there where using the kernel drivers impose a performance impact
>> compared to user space UIO-based user space kernel bypass mode of usage. How
>> is it irrelevant? Could u, pls, clarify your point?
>>
> So it's not even dumb hardware, it's another piece of software
> that forces an "all or nothing" approach where either
> device has access to all VM memory, or none.
> And this, unfortunately, leaves you with no secure way to
> allow userspace drivers.

Some setups don't need security (they are single-user, single 
application). But do need a lot of performance (like 5X-10X 
performance).  An example is OpenVSwitch, security doesn't help it at 
all and if you force it to use the kernel drivers you cripple it.

Also, I'm root.  I can do anything I like, including loading a patched 
pci_uio_generic.  You're not providing _any_ security, you're simply 
making life harder for users.

> So it makes even less sense to add insecure work-arounds in the kernel.
> It seems quite likely that by the time the new kernel reaches
> production X years from now, EC2 will have a virtual iommu.

I can adopt a new kernel tomorrow.  I have no influence on EC2.





[dpdk-dev] [PATCH v5 9/9] doc: dynamic rss configuration for bonding

2015-09-30 Thread Tomasz Kulasek
Documentation update about implementation details and requirements for
Dynamic RSS Configuration for Bonding.

Signed-off-by: Tomasz Kulasek 
---
 .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |   34 ++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst 
b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
index 03baf90..46f0296 100644
--- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -173,7 +173,28 @@ After a slave device is added to a bonded device slave is 
stopped using
 ``rte_eth_dev_stop`` and then reconfigured using ``rte_eth_dev_configure``
 the RX and TX queues are also reconfigured using ``rte_eth_tx_queue_setup`` /
 ``rte_eth_rx_queue_setup`` with the parameters use to configure the bonding
-device.
+device. If RSS is enabled for bonding device, this mode is also enabled on new
+slave and configured as well.
+
+Setting up multi-queue mode for bonding device to RSS, makes it fully
+RSS-capable, so all slaves are synchronized with its configuration. This mode 
is
+intended to provide RSS configuration on slaves transparent for client
+application implementation.
+
+Bonding device stores its own version of RSS settings i.e. RETA, RSS hash
+function and RSS key, used to set up its slaves. That let to define the meaning
+of RSS configuration of bonding device as desired configuration of whole 
bonding
+(as one unit), without pointing any of slave inside. It is required to ensure
+consistency and made it more errorproof.
+
+RSS hash function set for bonding device, is a maximal set of RSS hash 
functions
+supported by all bonded slaves. RETA size is a GCD of all its RETA's sizes, so
+it can be easily used as a pattern providing expected behavior, even if slave
+RETAs' sizes are different. If RSS Key is not set for bonded device, it's not
+changed on the slaves and default key for device is used.
+
+All settings are managed through the bonding port API and always are propagated
+in one direction (from bonding to slaves).

 Link Status Change Interrupts / Polling
 
@@ -207,6 +228,15 @@ these parameters.
 A bonding device must have a minimum of one slave before the bonding device
 itself can be started.

+To use a bonding device dynamic RSS configuration feature effectively, it is
+also required, that all slaves should be RSS-capable and support, at least one
+common hash function available for each of them. Changing RSS key is only
+possible, when all slave devices support the same key size.
+
+To prevent inconsistency on how slaves process packets, once a device is added
+to a bonding device, RSS configuration should be managed through the bonding
+device API, and not directly on the slave.
+
 Like all other PMD, all functions exported by a PMD are lock-free functions
 that are assumed not to be invoked in parallel on different logical cores to
 work on the same target object.
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 8/9] doc: fixed spellings and typos

2015-09-30 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst 
b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
index 96e554f..03baf90 100644
--- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
@@ -188,7 +188,7 @@ conditions are not met. If a user wishes to monitor 
individual slaves then they
 must register callbacks with that slave directly.

 The link bonding library also supports devices which do not implement link
-status change interrupts, this is achieve by polling the devices link status at
+status change interrupts, this is achieved by polling the devices link status 
at
 a defined period which is set using the ``rte_eth_bond_link_monitoring_set``
 API, the default polling interval is 10ms. When a device is added as a slave to
 a bonding device it is determined using the ``RTE_PCI_DRV_INTR_LSC`` flag
@@ -286,7 +286,7 @@ and UDP protocols for load balancing.
 Using Link Bonding Devices
 --

-The librte_pmd_bond library support two modes of device creation, the libraries
+The librte_pmd_bond library supports two modes of device creation, the 
libraries
 export full C API or using the EAL command line to statically configure link
 bonding devices at application startup. Using the EAL option it is possible to
 use link bonding functionality transparently without specific knowledge of the
@@ -299,7 +299,7 @@ Using the Poll Mode Driver from an Application

 Using the librte_pmd_bond libraries API it is possible to dynamically create
 and manage link bonding device from within any application. Link bonding
-device are created using the ``rte_eth_bond_create`` API which requires a
+devices are created using the ``rte_eth_bond_create`` API which requires a
 unique device name, the link bonding mode to initial the device in and finally
 the socket Id which to allocate the devices resources onto. After successful
 creation of a bonding device it must be configured using the generic Ethernet
@@ -362,7 +362,7 @@ The different options are:
 mode=2

 *   slave: Defines the PMD device which will be added as slave to the bonded
-device. This option can be selected multiple time, for each device to be
+device. This option can be selected multiple times, for each device to be
 added as a slave. Physical devices should be specified using their PCI
 address, in the format domain:bus:devid.function

-- 
1.7.9.5



[dpdk-dev] [PATCH v5 7/9] bonding: per queue stats

2015-09-30 Thread Tomasz Kulasek
This patch fills bonding port's stats with a sum of corresponding values
taken from bonded slaves, when stats are requested for bonding port.

v5 changes:
 - removed queue_stats_mapping_set from eth_dev_ops of bonding device

Signed-off-by: Tomasz Kulasek 
---
 drivers/net/bonding/rte_eth_bond_pmd.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 2880f5c..eecb381 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1801,7 +1801,7 @@ bond_ethdev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
 {
struct bond_dev_private *internals = dev->data->dev_private;
struct rte_eth_stats slave_stats;
-   int i;
+   int i, j;

for (i = 0; i < internals->slave_count; i++) {
rte_eth_stats_get(internals->slaves[i].port_id, _stats);
@@ -1820,6 +1820,15 @@ bond_ethdev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
stats->rx_pause_xon += slave_stats.rx_pause_xon;
stats->tx_pause_xoff += slave_stats.tx_pause_xoff;
stats->rx_pause_xoff += slave_stats.rx_pause_xoff;
+
+   for (j = 0; j < RTE_ETHDEV_QUEUE_STAT_CNTRS; j++) {
+   stats->q_ipackets[j] += slave_stats.q_ipackets[j];
+   stats->q_opackets[j] += slave_stats.q_opackets[j];
+   stats->q_ibytes[j] += slave_stats.q_ibytes[j];
+   stats->q_obytes[j] += slave_stats.q_obytes[j];
+   stats->q_errors[j] += slave_stats.q_errors[j];
+   }
+
}
 }

-- 
1.7.9.5



[dpdk-dev] [PATCH v5 6/9] test: dynamic rss configuration

2015-09-30 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 app/test/Makefile|8 +
 app/test/test_link_bonding_rssconf.c |  679 ++
 2 files changed, 687 insertions(+)
 create mode 100644 app/test/test_link_bonding_rssconf.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 294618f..c122f28 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -138,6 +138,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_acl.c
 ifeq ($(CONFIG_RTE_LIBRTE_PMD_RING),y)
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += test_link_bonding.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += test_link_bonding_mode4.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += test_link_bonding_rssconf.c
 endif

 SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring.c
@@ -168,6 +169,13 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LDLIBS += -lrte_pmd_ring
 endif
 endif
+ifneq ($(CONFIG_RTE_LIBRTE_PMD_NULL),y)
+$(error Link bonding rssconf tests require CONFIG_RTE_LIBRTE_PMD_NULL=y)
+else
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
+LDLIBS += -lrte_pmd_null
+endif
+endif
 endif

 include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/test/test_link_bonding_rssconf.c 
b/app/test/test_link_bonding_rssconf.c
new file mode 100644
index 000..e6714b4
--- /dev/null
+++ b/app/test/test_link_bonding_rssconf.c
@@ -0,0 +1,679 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define SLAVE_COUNT (4)
+
+#define RXTX_RING_SIZE 1024
+#define RXTX_QUEUE_COUNT   4
+
+#define BONDED_DEV_NAME ("rssconf_bond_dev")
+
+#define SLAVE_DEV_NAME_FMT  ("rssconf_slave%d")
+#define SLAVE_RXTX_QUEUE_FMT  ("rssconf_slave%d_q%d")
+
+#define NUM_MBUFS 8191
+#define MBUF_SIZE (1600 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE 250
+#define BURST_SIZE 32
+
+#define INVALID_SOCKET_ID   (-1)
+#define INVALID_PORT_ID (0xFF)
+#define INVALID_BONDING_MODE(-1)
+
+struct slave_conf {
+   uint8_t port_id;
+   struct rte_eth_dev_info dev_info;
+
+   struct rte_eth_rss_conf rss_conf;
+   uint8_t rss_key[40];
+   struct rte_eth_rss_reta_entry64 reta_conf[512 / RTE_RETA_GROUP_SIZE];
+
+   uint8_t is_slave;
+   struct rte_ring *rxtx_queue[RXTX_QUEUE_COUNT];
+};
+
+struct link_bonding_rssconf_unittest_params {
+   uint8_t bond_port_id;
+   struct rte_eth_dev_info bond_dev_info;
+   struct rte_eth_rss_reta_entry64 bond_reta_conf[512 / 
RTE_RETA_GROUP_SIZE];
+   struct slave_conf slave_ports[SLAVE_COUNT];
+
+   struct rte_mempool *mbuf_pool;
+};
+
+static struct link_bonding_rssconf_unittest_params test_params  = {
+   .bond_port_id = INVALID_PORT_ID,
+   .slave_ports = {
+   [0 ... SLAVE_COUNT - 1] = { .port_id = INVALID_PORT_ID, 
.is_slave = 0}
+   },
+   .mbuf_pool = NULL,
+};
+
+/**
+ * Default port configuration with RSS turned off
+ */
+static struct rte_eth_conf default_pmd_conf = {
+   .rxmode = {
+   .mq_mode = ETH_MQ_RX_NONE,
+   .max_rx_pkt_len = ETHER_MAX_LEN,
+   .split_hdr_size = 0,
+  

[dpdk-dev] [PATCH v5 5/9] null: export eth_dev_null_create

2015-09-30 Thread Tomasz Kulasek

Signed-off-by: Tomasz Kulasek 
---
 drivers/net/null/Makefile |2 +-
 drivers/net/null/rte_eth_null.c   |2 +-
 drivers/net/null/rte_eth_null.h   |   40 +
 drivers/net/null/rte_pmd_null_version.map |7 +
 4 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/null/rte_eth_null.h

diff --git a/drivers/net/null/Makefile b/drivers/net/null/Makefile
index 96ba01c..2202389 100644
--- a/drivers/net/null/Makefile
+++ b/drivers/net/null/Makefile
@@ -51,7 +51,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += rte_eth_null.c
 #
 # Export include files
 #
-SYMLINK-y-include +=
+SYMLINK-y-include += rte_eth_null.h

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += lib/librte_mbuf
diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index b01f647..56e4278 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -491,7 +491,7 @@ static const struct eth_dev_ops ops = {
.rss_hash_conf_get = eth_rss_hash_conf_get
 };

-static int
+int
 eth_dev_null_create(const char *name,
const unsigned numa_node,
unsigned packet_size,
diff --git a/drivers/net/null/rte_eth_null.h b/drivers/net/null/rte_eth_null.h
new file mode 100644
index 000..abada8c
--- /dev/null
+++ b/drivers/net/null/rte_eth_null.h
@@ -0,0 +1,40 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_ETH_NULL_H_
+#define RTE_ETH_NULL_H_
+
+int eth_dev_null_create(const char *name, const unsigned numa_node,
+   unsigned packet_size, unsigned packet_copy);
+
+#endif /* RTE_ETH_NULL_H_ */
diff --git a/drivers/net/null/rte_pmd_null_version.map 
b/drivers/net/null/rte_pmd_null_version.map
index ef35398..84b1d0f 100644
--- a/drivers/net/null/rte_pmd_null_version.map
+++ b/drivers/net/null/rte_pmd_null_version.map
@@ -2,3 +2,10 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.2 {
+   global:
+
+   eth_dev_null_create;
+
+} DPDK_2.0;
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 4/9] null: virtual dynamic rss configuration

2015-09-30 Thread Tomasz Kulasek
This implementation allows to set and read RSS configuration for null
device, and is used to validate right values propagation over the slaves,
in test units for dynamic RSS configuration for bonding.

v5 changes:
 - replaced memcpy with rte_memcpy

Signed-off-by: Tomasz Kulasek 
---
 drivers/net/null/rte_eth_null.c |  116 +++
 1 file changed, 116 insertions(+)

diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index bf81b1b..b01f647 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -37,6 +37,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #define ETH_NULL_PACKET_SIZE_ARG   "size"
 #define ETH_NULL_PACKET_COPY_ARG   "copy"
@@ -73,6 +75,17 @@ struct pmd_internals {

struct null_queue rx_null_queues[RTE_MAX_QUEUES_PER_PORT];
struct null_queue tx_null_queues[RTE_MAX_QUEUES_PER_PORT];
+
+   /** Bit mask of RSS offloads, the bit offset also means flow type */
+   uint64_t flow_type_rss_offloads;
+
+   rte_spinlock_t rss_lock;
+
+   uint16_t reta_size;
+   struct rte_eth_rss_reta_entry64 reta_conf[ETH_RSS_RETA_SIZE_128 /
+   RTE_RETA_GROUP_SIZE];
+
+   uint8_t rss_key[40];/**< 40-byte hash key. */
 };


@@ -293,6 +306,8 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->max_tx_queues = RTE_DIM(internals->tx_null_queues);
dev_info->min_rx_bufsize = 0;
dev_info->pci_dev = NULL;
+   dev_info->reta_size = internals->reta_size;
+   dev_info->flow_type_rss_offloads = internals->flow_type_rss_offloads;
 }

 static void
@@ -373,6 +388,91 @@ static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
int wait_to_complete __rte_unused) { return 0; }

+static int
+eth_rss_reta_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size)
+{
+   int i, j;
+   struct pmd_internals *internal = dev->data->dev_private;
+
+   if (reta_size != internal->reta_size)
+   return -EINVAL;
+
+   rte_spinlock_lock(>rss_lock);
+
+   /* Copy RETA table */
+   for (i = 0; i < (internal->reta_size / RTE_RETA_GROUP_SIZE); i++) {
+   internal->reta_conf[i].mask = reta_conf[i].mask;
+   for (j = 0; j < RTE_RETA_GROUP_SIZE; j++)
+   if ((reta_conf[i].mask >> j) & 0x01)
+   internal->reta_conf[i].reta[j] = 
reta_conf[i].reta[j];
+   }
+
+   rte_spinlock_unlock(>rss_lock);
+
+   return 0;
+}
+
+static int
+eth_rss_reta_query(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size)
+{
+   int i, j;
+   struct pmd_internals *internal = dev->data->dev_private;
+
+   if (reta_size != internal->reta_size)
+   return -EINVAL;
+
+   rte_spinlock_lock(>rss_lock);
+
+   /* Copy RETA table */
+   for (i = 0; i < (internal->reta_size / RTE_RETA_GROUP_SIZE); i++) {
+   for (j = 0; j < RTE_RETA_GROUP_SIZE; j++)
+   if ((reta_conf[i].mask >> j) & 0x01)
+   reta_conf[i].reta[j] = 
internal->reta_conf[i].reta[j];
+   }
+
+   rte_spinlock_unlock(>rss_lock);
+
+   return 0;
+}
+
+static int
+eth_rss_hash_update(struct rte_eth_dev *dev, struct rte_eth_rss_conf *rss_conf)
+{
+   struct pmd_internals *internal = dev->data->dev_private;
+
+   rte_spinlock_lock(>rss_lock);
+
+   if ((rss_conf->rss_hf & internal->flow_type_rss_offloads) != 0)
+   dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf =
+   rss_conf->rss_hf & 
internal->flow_type_rss_offloads;
+
+   if (rss_conf->rss_key)
+   rte_memcpy(internal->rss_key, rss_conf->rss_key, 40);
+
+   rte_spinlock_unlock(>rss_lock);
+
+   return 0;
+}
+
+static int
+eth_rss_hash_conf_get(struct rte_eth_dev *dev,
+   struct rte_eth_rss_conf *rss_conf)
+{
+   struct pmd_internals *internal = dev->data->dev_private;
+
+   rte_spinlock_lock(>rss_lock);
+
+   rss_conf->rss_hf = dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf;
+   if (rss_conf->rss_key)
+   rte_memcpy(rss_conf->rss_key, internal->rss_key, 40);
+
+   rte_spinlock_unlock(>rss_lock);
+
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
@@ -385,6 +485,10 @@ static const struct eth_dev_ops ops = {
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
+   .reta_update = eth_rss_reta_update,
+   .reta_query = eth_rss_reta_query,
+   .rss_hash_update = eth_rss_hash_update,
+   .rss_hash_conf_get = eth_rss_hash_conf_get
 };

 static int
@@ -400,6 +504,13 @@ eth_dev_null_create(const char *name,
struct pmd_internals 

[dpdk-dev] [PATCH v5 3/9] null: extend number of virtual queues

2015-09-30 Thread Tomasz Kulasek
This patch adds a possibility to configure more than one queue on null
device.

v5 changes:
 - fixed queues number configuration (using internals->nb_*_queues instead
   of dev->data->nb_*_queues)

Signed-off-by: Tomasz Kulasek 
---
 drivers/net/null/rte_eth_null.c |   28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index c748101..bf81b1b 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -71,8 +71,8 @@ struct pmd_internals {
unsigned nb_rx_queues;
unsigned nb_tx_queues;

-   struct null_queue rx_null_queues[1];
-   struct null_queue tx_null_queues[1];
+   struct null_queue rx_null_queues[RTE_MAX_QUEUES_PER_PORT];
+   struct null_queue tx_null_queues[RTE_MAX_QUEUES_PER_PORT];
 };


@@ -178,7 +178,15 @@ eth_null_copy_tx(void *q, struct rte_mbuf **bufs, uint16_t 
nb_bufs)
 }

 static int
-eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+eth_dev_configure(struct rte_eth_dev *dev) {
+   struct pmd_internals *internals;
+
+   internals = dev->data->dev_private;
+   internals->nb_rx_queues = dev->data->nb_rx_queues;
+   internals->nb_tx_queues = dev->data->nb_tx_queues;
+
+   return 0;
+}

 static int
 eth_dev_start(struct rte_eth_dev *dev)
@@ -213,10 +221,11 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
rx_queue_id,
if ((dev == NULL) || (mb_pool == NULL))
return -EINVAL;

-   if (rx_queue_id != 0)
+   internals = dev->data->dev_private;
+
+   if (rx_queue_id >= internals->nb_rx_queues)
return -ENODEV;

-   internals = dev->data->dev_private;
packet_size = internals->packet_size;

internals->rx_null_queues[rx_queue_id].mb_pool = mb_pool;
@@ -246,10 +255,11 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
tx_queue_id,
if (dev == NULL)
return -EINVAL;

-   if (tx_queue_id != 0)
+   internals = dev->data->dev_private;
+
+   if (tx_queue_id >= internals->nb_tx_queues)
return -ENODEV;

-   internals = dev->data->dev_private;
packet_size = internals->packet_size;

dev->data->tx_queues[tx_queue_id] =
@@ -279,8 +289,8 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->driver_name = drivername;
dev_info->max_mac_addrs = 1;
dev_info->max_rx_pktlen = (uint32_t)-1;
-   dev_info->max_rx_queues = (uint16_t)internals->nb_rx_queues;
-   dev_info->max_tx_queues = (uint16_t)internals->nb_tx_queues;
+   dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
+   dev_info->max_tx_queues = RTE_DIM(internals->tx_null_queues);
dev_info->min_rx_bufsize = 0;
dev_info->pci_dev = NULL;
 }
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 2/9] null: fix segfault when null_pmd added to bonding

2015-09-30 Thread Tomasz Kulasek
This patch initializes eth_dev->link_intr_cbs queue used when null pmd is
added to the bonding.

v5 changes:
 - removed unnecessary malloc for eth_driver (rte_null_pmd)

Signed-off-by: Tomasz Kulasek 
---
 drivers/net/null/rte_eth_null.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index e244595..c748101 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -432,6 +432,7 @@ eth_dev_null_create(const char *name,
internals->numa_node = numa_node;

pci_dev->numa_node = numa_node;
+   pci_dev->driver = _null_pmd.pci_drv;

data->dev_private = internals;
data->port_id = eth_dev->data->port_id;
@@ -445,6 +446,7 @@ eth_dev_null_create(const char *name,
eth_dev->dev_ops = 
eth_dev->pci_dev = pci_dev;
eth_dev->driver = _null_pmd;
+   TAILQ_INIT(_dev->link_intr_cbs);

/* finally assign rx and tx ops */
if (packet_copy) {
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 1/9] bonding: rss dynamic configuration

2015-09-30 Thread Tomasz Kulasek
Bonding device implements independent management of RSS settings. It
stores its own copies of settings i.e. RETA, RSS hash function and RSS
key. It?s required to ensure consistency.

1) RSS hash function set for bonding device is maximal set of RSS hash
functions supported by all bonded devices. That mean, to have RSS support
for bonding, all slaves should be RSS-capable.

2) RSS key is propagated over the slaves "as is".

3) RETA for bonding is an internal table managed by bonding API, and is
used as a pattern to set up slaves. Its size is GCD of all RETA sizes, so
it can be easily used as a pattern providing expected behavior, even if
slaves RETA sizes are different.

Signed-off-by: Tomasz Kulasek 
---
 drivers/net/bonding/rte_eth_bond_api.c |   28 
 drivers/net/bonding/rte_eth_bond_pmd.c |  205 ++--
 drivers/net/bonding/rte_eth_bond_private.h |   12 ++
 3 files changed, 231 insertions(+), 14 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 0681d1a..92073df 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -273,6 +273,9 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
internals->rx_offload_capa = 0;
internals->tx_offload_capa = 0;

+   /* Initially allow to choose any offload type */
+   internals->flow_type_rss_offloads = ETH_RSS_PROTO_MASK;
+
memset(internals->active_slaves, 0, sizeof(internals->active_slaves));
memset(internals->slaves, 0, sizeof(internals->slaves));

@@ -369,6 +372,11 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)

rte_eth_dev_info_get(slave_port_id, _info);

+   /* We need to store slaves reta_size to be able to synchronize RETA for 
all
+* slave devices even if its sizes are different.
+*/
+   internals->slaves[internals->slave_count].reta_size = 
dev_info.reta_size;
+
if (internals->slave_count < 1) {
/* if MAC is not user defined then use MAC of first slave add to
 * bonded device */
@@ -382,9 +390,16 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
/* Make primary slave */
internals->primary_port = slave_port_id;

+   /* Inherit queues settings from first slave */
+   internals->nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
+   internals->nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
+
+   internals->reta_size = dev_info.reta_size;
+
/* Take the first dev's offload capabilities */
internals->rx_offload_capa = dev_info.rx_offload_capa;
internals->tx_offload_capa = dev_info.tx_offload_capa;
+   internals->flow_type_rss_offloads = 
dev_info.flow_type_rss_offloads;

} else {
/* Check slave link properties are supported if props are set,
@@ -403,8 +418,19 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
}
internals->rx_offload_capa &= dev_info.rx_offload_capa;
internals->tx_offload_capa &= dev_info.tx_offload_capa;
+   internals->flow_type_rss_offloads &= 
dev_info.flow_type_rss_offloads;
+
+   /* RETA size is GCD of all slaves RETA sizes, so, if all sizes 
will be
+* the power of 2, the lower one is GCD
+*/
+   if (internals->reta_size > dev_info.reta_size)
+   internals->reta_size = dev_info.reta_size;
+
}

+   bonded_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf &=
+   internals->flow_type_rss_offloads;
+
internals->slave_count++;

/* Update all slave devices MACs*/
@@ -531,6 +557,8 @@ __eth_bond_slave_remove_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
if (internals->slave_count == 0) {
internals->rx_offload_capa = 0;
internals->tx_offload_capa = 0;
+   internals->flow_type_rss_offloads = ETH_RSS_PROTO_MASK;
+   internals->reta_size = 0;
}
return 0;
 }
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 5cc6372..2880f5c 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1310,6 +1310,23 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
if (slave_eth_dev->driver->pci_drv.drv_flags & RTE_PCI_DRV_INTR_LSC)
slave_eth_dev->data->dev_conf.intr_conf.lsc = 1;

+   /* If RSS is enabled for bonding, try to enable it for slaves  */
+   if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS) {
+   if 
(bonded_eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key_len
+   != 0) {
+   

[dpdk-dev] [PATCH v5 0/9] Dynamic RSS Configuration for Bonding

2015-09-30 Thread Tomasz Kulasek
OVERVIEW

1) Setting .rxmode.mq_mode for bonding device to ETH_MQ_RX_RSS makes bonding
device fully RSS-capable, so all slaves are synchronized with its configuration.
This mode is intended to provide RSS configuration as known from "dynamic RSS
configuration for one port" and made slaves transparent for client application
implementation.

2) If .rxmode.mq_mode for bonding device isn't ETH_MQ_RX_RSS, slaves are not
synchronized. That provides an ability to configure them manually. This mode may
be useful when application wants to manage RSS in an unusual way and the
consistency of RSS configuration for slaves isn't required.

Turning on/off RSS mode for slaves when bonding is started is not possible.
Other RSS configuration is propagated over slaves, when bonding device API is
used to do it.

v5 changes:
 - updated to DPDK 2.2
 - removed copyright change from null device source
 - removed queue_stats_mapping_set from eth_dev_ops of bonding device
 - null pmd cleanups (removed unnecessary malloc, replaced memcpy with
   rte_memcpy)
 - fixed queues number configuration in null pmd

v4 changes:
 - fixed copy-paste error,
 - removed example application as too complex and introducing a new
   dependency,
 - addapted null pmd to be used as testing device for dynamic RSS configuration,
 - addapted test units to use null pmd instead of ring pmd,
 - ring pmd is not used and changed in this patchset

v3 changes:
 - checkpatch cleanups

v2 changes:
 - added support for keys other than 40 bytes long,
 - now, if RSS key is not set for bonding, it is not set also for slaves,
 - fix - full initial RSS configuration before any slave is added was not
   possible due to the initially zeroed flow_type_rss_offloads for bonding,
 - fix - changed error to warning when slave is synchronizing due to the
   bonding's initial configuration (to allow use slaves' drivers not supporting
   dynamic RSS configuration in bonding),
 - some code cleanups,
 - updated documentation,

Tomasz Kulasek (9):
  bonding: rss dynamic configuration
  null: fix segfault when null_pmd added to bonding
  null: extend number of virtual queues
  null: virtual dynamic rss configuration
  null: export eth_dev_null_create
  test: dynamic rss configuration
  bonding: per queue stats
  doc: fixed spellings and typos
  doc: dynamic rss configuration for bonding

 app/test/Makefile  |8 +
 app/test/test_link_bonding_rssconf.c   |  679 
 .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |   42 +-
 drivers/net/bonding/rte_eth_bond_api.c |   28 +
 drivers/net/bonding/rte_eth_bond_pmd.c |  216 ++-
 drivers/net/bonding/rte_eth_bond_private.h |   12 +
 drivers/net/null/Makefile  |2 +-
 drivers/net/null/rte_eth_null.c|  148 -
 drivers/net/null/rte_eth_null.h|   40 ++
 drivers/net/null/rte_pmd_null_version.map  |7 +
 10 files changed, 1150 insertions(+), 32 deletions(-)
 create mode 100644 app/test/test_link_bonding_rssconf.c
 create mode 100644 drivers/net/null/rte_eth_null.h

-- 
1.7.9.5



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 15:27, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 15:03, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
 On 09/30/15 14:41, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>>   The whole idea is to bypass kernel. Especially for networking...
> ... on dumb hardware that doesn't support doing that securely.
 On a very capable HW that supports whatever security requirements needed
 (e.g. 82599 Intel's SR-IOV VF devices).
>>> Network card type is irrelevant as long as you do not have an IOMMU,
>>> otherwise you would just use e.g. VFIO.
>> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
>> example where there *is* iommu but it's not virtualized
>> and thus VFIO is
>> useless and there is an option to use directly assigned SR-IOV networking
>> device there where using the kernel drivers impose a performance impact
>> compared to user space UIO-based user space kernel bypass mode of usage. How
>> is it irrelevant? Could u, pls, clarify your point?
>>
> So it's not even dumb hardware, it's another piece of software
> that forces an "all or nothing" approach where either
> device has access to all VM memory, or none.
> And this, unfortunately, leaves you with no secure way to
> allow userspace drivers.
UIO is not secure even today so what are we arguing about? ;)
Adding MSI/MSI-X support won't change this state, so, pls., discard the 
security argument unless u thing that UIO is completely secure piece of 
software today. In the later case, could u, pls., clarify what would 
prevent the userspace program to configure a DMA controller via 
registers and do whatever it wants?


How not virtualizing iommu forces "all or nothing" approach? What 
insecure in relying on HV to control the iommu and not letting the VF 
any access to it?
As far as I see it - there isn't any security problem here at all. The 
only problem I see here is that dumb current uio_pci_generic 
implementation forces people to go and invent the workarounds instead of 
having a proper MSI/MSI-X support implemented. And as I've mentioned 
above it has nothing to do with security because there is no such thing 
as security (on the UIO driver level) when we talk about UIO - it has to 
be ensured by some other entity like HV.

>
> So it makes even less sense to add insecure work-arounds in the kernel.
> It seems quite likely that by the time the new kernel reaches
> production X years from now, EC2 will have a virtual iommu.

I'd bet that new kernel would reach production long before Amazon does 
that... ;)

>
>
> Colour me unimpressed.
>



[dpdk-dev] [PATCH v2] Move rte_mbuf macros to common header file

2015-09-30 Thread Aaron Conole
Ravi Kerur  writes:

> Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
> are defined in each PMD driver file. Move those macros into common
> lib/librte_mbuf/rte_mbuf.h file. PMD drivers include rte_mbuf.h
> file directly/indirectly hence no additionl header file inclusion
> is necessary.
I think this should also mention that they are no longer macros, as
well.

<>
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -843,6 +843,16 @@ struct rte_mbuf {
>   uint16_t timesync;
>  } __rte_cache_aligned;
>  
> +static inline uint64_t RTE_MBUF_DATA_DMA_ADDR(struct rte_mbuf* mb)
> +{
> + return ((uint64_t)((mb)->buf_physaddr + (mb)->data_off));
> +}
> +
> +static inline uint64_t RTE_MBUF_DATA_DMA_ADDR_DEFAULT(struct rte_mbuf *mb)
> +{
> + return ((uint64_t)((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM));
> +}
> +
I think these names should be made lower case as well.


[dpdk-dev] [PATCH] ip_pipeline: fixed bug in app_link_config

2015-09-30 Thread Dumitrescu, Cristian


> -Original Message-
> From: Singh, Jasvinder
> Sent: Wednesday, September 30, 2015 4:26 PM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian
> Subject: [PATCH] ip_pipeline: fixed bug in app_link_config
> 
> This patch fixes bug in app_link_config.
> 
> Signed-off-by: Jasvinder Singh 
> ---

Acked-by: Cristian Dumitrescu 



[dpdk-dev] [PATCH] ip_pipeline: add more functions to routing-pipeline

2015-09-30 Thread Dumitrescu, Cristian


> -Original Message-
> From: Singh, Jasvinder
> Sent: Wednesday, September 30, 2015 4:21 PM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian
> Subject: [PATCH] ip_pipeline: add more functions to routing-pipeline
> 
> This patch adds following features to the
> routing-pipeline to enable it for various NFV
> use-cases;
> 
> 1.Fast-path ARP table enable/disable
> 2.Double-tagged VLAN (Q-in-Q) packet enacapsulation
> for the next-hop
> 3.MPLS encapsulation for the next-hop
> 4.Add colour (Traffic-class for QoS) to the MPLS tag
> 5.Classification action to select the input queue
> of the hierarchical scehdular (QoS)
> 

Acked-by: Cristian Dumitrescu 



[dpdk-dev] [PATCH v1 5/5] config: add build files for performance-thread

2015-09-30 Thread ibetts
From: Ian Betts 

This commit adds the build controls for the
performance-thread sample

Signed-off-by: Ian Betts 
---
 config/common_linuxapp   |  6 
 config/defconfig_i686-native-linuxapp-gcc|  6 
 config/defconfig_i686-native-linuxapp-icc|  6 
 config/defconfig_x86_x32-native-linuxapp-gcc |  6 
 examples/Makefile|  1 +
 examples/performance-thread/Makefile | 44 
 6 files changed, 69 insertions(+)
 create mode 100644 examples/performance-thread/Makefile

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..e1da564 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -466,3 +466,8 @@ CONFIG_RTE_APP_TEST=y
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the performance thread sample application
+#
+CONFIG_RTE_PERFORMANCE_THREAD=y
diff --git a/config/defconfig_i686-native-linuxapp-gcc 
b/config/defconfig_i686-native-linuxapp-gcc
index a90de9b..67caa06 100644
--- a/config/defconfig_i686-native-linuxapp-gcc
+++ b/config/defconfig_i686-native-linuxapp-gcc
@@ -49,3 +49,8 @@ CONFIG_RTE_LIBRTE_KNI=n
 # Vectorized PMD is not supported on 32-bit
 #
 CONFIG_RTE_IXGBE_INC_VECTOR=n
+
+#
+# Performance thread sample application is not supported on 32-bit
+#
+CONFIG_RTE_PERFORMANCE_THREAD=n
diff --git a/config/defconfig_i686-native-linuxapp-icc 
b/config/defconfig_i686-native-linuxapp-icc
index c021321..2610b3f 100644
--- a/config/defconfig_i686-native-linuxapp-icc
+++ b/config/defconfig_i686-native-linuxapp-icc
@@ -49,3 +49,8 @@ CONFIG_RTE_LIBRTE_KNI=n
 # Vectorized PMD is not supported on 32-bit
 #
 CONFIG_RTE_IXGBE_INC_VECTOR=n
+
+#
+# Performance thread sample application is not supported on 32-bit
+#
+CONFIG_RTE_PERFORMANCE_THREAD=n
diff --git a/config/defconfig_x86_x32-native-linuxapp-gcc 
b/config/defconfig_x86_x32-native-linuxapp-gcc
index fb0afc4..6e3f42a 100644
--- a/config/defconfig_x86_x32-native-linuxapp-gcc
+++ b/config/defconfig_x86_x32-native-linuxapp-gcc
@@ -44,3 +44,8 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
 # KNI is not supported on 32-bit
 #
 CONFIG_RTE_LIBRTE_KNI=n
+
+#
+# Performance thread sample application is not supported on 32-bit
+#
+CONFIG_RTE_PERFORMANCE_THREAD=n
diff --git a/examples/Makefile b/examples/Makefile
index b4eddbd..9dffac4 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -74,5 +74,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
 DIRS-y += vmdq
 DIRS-y += vmdq_dcb
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += vm_power_manager
+DIRS-$(CONFIG_RTE_PERFORMANCE_THREAD) += performance-thread

 include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/performance-thread/Makefile 
b/examples/performance-thread/Makefile
new file mode 100644
index 000..a9c75f7
--- /dev/null
+++ b/examples/performance-thread/Makefile
@@ -0,0 +1,44 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+DIRS-$(CONFIG_RTE_PERFORMANCE_THREAD) += l3fwd-thread
+DIRS-$(CONFIG_RTE_PERFORMANCE_THREAD) += pthread_shim
+
+include 

[dpdk-dev] [PATCH v1 4/5] examples: add pthread-shim in performance-thread sample app

2015-09-30 Thread ibetts
From: Ian Betts 

This commit adds a simple pthread_shim example for the
cooperative included with this patchset.

The shim demonstrates a way in which legacy code writtem for
pthreads could be adapted to lighweight threads.

Signed-off-by: Ian Betts 
---
 examples/performance-thread/pthread_shim/Makefile  |  61 ++
 examples/performance-thread/pthread_shim/main.c| 287 +
 .../performance-thread/pthread_shim/pthread_shim.c | 717 +
 .../performance-thread/pthread_shim/pthread_shim.h | 113 
 4 files changed, 1178 insertions(+)
 create mode 100644 examples/performance-thread/pthread_shim/Makefile
 create mode 100644 examples/performance-thread/pthread_shim/main.c
 create mode 100644 examples/performance-thread/pthread_shim/pthread_shim.c
 create mode 100644 examples/performance-thread/pthread_shim/pthread_shim.h

diff --git a/examples/performance-thread/pthread_shim/Makefile 
b/examples/performance-thread/pthread_shim/Makefile
new file mode 100644
index 000..953dc42
--- /dev/null
+++ b/examples/performance-thread/pthread_shim/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = lthread_pthread_shim
+
+# all source are stored in SRCS-y
+SRCS-y := main.c  pthread_shim.c
+INCLUDES := -I$(RTE_SDK)/$(RTE_TARGET)/include -I$(SRCDIR)
+include $(RTE_SDK)/examples/performance-thread/common/common.mk
+
+CFLAGS=-g -O3 $(USER_FLAGS) $(INCLUDES)
+CFLAGS += $(WERROR_FLAGS)
+
+LDFLAGS += -lpthread
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/performance-thread/pthread_shim/main.c 
b/examples/performance-thread/pthread_shim/main.c
new file mode 100644
index 000..afc12c8
--- /dev/null
+++ b/examples/performance-thread/pthread_shim/main.c
@@ -0,0 +1,284 @@
+
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE 

[dpdk-dev] [PATCH v1 3/5] examples: add l3fwd-thread in performance-thread sample app

2015-09-30 Thread ibetts
From: Ian Betts 

This commit adds an l3fwd derivative that allows multiple
EAL threads to be run on a single physical core or multiple
lightwieght threads to be run in an EAL thread.

Its purpose is to facilitate characterization of performance
with different threading models.

It depends on a simple cooperative scheduler included in
this patchset.

Signed-off-by: Ian Betts 
---
 examples/performance-thread/l3fwd-thread/Makefile |   57 +
 examples/performance-thread/l3fwd-thread/main.c   | 3355 +
 2 files changed, 3412 insertions(+)
 create mode 100644 examples/performance-thread/l3fwd-thread/Makefile
 create mode 100644 examples/performance-thread/l3fwd-thread/main.c

diff --git a/examples/performance-thread/l3fwd-thread/Makefile 
b/examples/performance-thread/l3fwd-thread/Makefile
new file mode 100644
index 000..d8fe5e6
--- /dev/null
+++ b/examples/performance-thread/l3fwd-thread/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = l3fwd-thread
+
+# all source are stored in SRCS-y
+SRCS-y := main.c
+
+include $(RTE_SDK)/examples/performance-thread/common/common.mk
+
+CFLAGS += -O3 -g $(USER_FLAGS) $(INCLUDES) $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+#ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+#endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/performance-thread/l3fwd-thread/main.c 
b/examples/performance-thread/l3fwd-thread/main.c
new file mode 100644
index 000..2708ec6
--- /dev/null
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -0,0 +1,3355 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * 

[dpdk-dev] [PATCH v1 2/5] examples: add cooperative scheduler subsytem for performance-thread app

2015-09-30 Thread ibetts
From: Ian Betts 

This commit adds a cooperative scheduler subsystem in the
performance-thread sample application.

It is used in the performance-thread sample application
by the l3fwd-thread application to enable multiple
lightweight threads to be run in an EAL thread.

Signed-off-by: Ian Betts 
---
 .../performance-thread/common/arch/x86/atomic.h|  60 ++
 examples/performance-thread/common/arch/x86/ctx.c  |  66 ++
 examples/performance-thread/common/arch/x86/ctx.h  |  57 ++
 examples/performance-thread/common/common.mk   |  40 +
 examples/performance-thread/common/lthread.c   | 528 +
 examples/performance-thread/common/lthread.h   |  99 +++
 examples/performance-thread/common/lthread_api.h   | 822 +
 examples/performance-thread/common/lthread_cond.c  | 228 ++
 examples/performance-thread/common/lthread_cond.h  |  77 ++
 examples/performance-thread/common/lthread_diag.c  | 315 
 examples/performance-thread/common/lthread_diag.h  | 129 
 .../performance-thread/common/lthread_diag_api.h   | 295 
 examples/performance-thread/common/lthread_int.h   | 212 ++
 examples/performance-thread/common/lthread_mutex.c | 244 ++
 examples/performance-thread/common/lthread_mutex.h |  52 ++
 .../performance-thread/common/lthread_objcache.h   | 160 
 examples/performance-thread/common/lthread_pool.h  | 338 +
 examples/performance-thread/common/lthread_queue.h | 303 
 examples/performance-thread/common/lthread_sched.c | 644 
 examples/performance-thread/common/lthread_sched.h | 152 
 examples/performance-thread/common/lthread_timer.h |  47 ++
 examples/performance-thread/common/lthread_tls.c   | 242 ++
 examples/performance-thread/common/lthread_tls.h   |  64 ++
 23 files changed, 5174 insertions(+)
 create mode 100644 examples/performance-thread/common/arch/x86/atomic.h
 create mode 100644 examples/performance-thread/common/arch/x86/ctx.c
 create mode 100644 examples/performance-thread/common/arch/x86/ctx.h
 create mode 100644 examples/performance-thread/common/common.mk
 create mode 100644 examples/performance-thread/common/lthread.c
 create mode 100644 examples/performance-thread/common/lthread.h
 create mode 100644 examples/performance-thread/common/lthread_api.h
 create mode 100644 examples/performance-thread/common/lthread_cond.c
 create mode 100644 examples/performance-thread/common/lthread_cond.h
 create mode 100644 examples/performance-thread/common/lthread_diag.c
 create mode 100644 examples/performance-thread/common/lthread_diag.h
 create mode 100644 examples/performance-thread/common/lthread_diag_api.h
 create mode 100644 examples/performance-thread/common/lthread_int.h
 create mode 100644 examples/performance-thread/common/lthread_mutex.c
 create mode 100644 examples/performance-thread/common/lthread_mutex.h
 create mode 100644 examples/performance-thread/common/lthread_objcache.h
 create mode 100644 examples/performance-thread/common/lthread_pool.h
 create mode 100644 examples/performance-thread/common/lthread_queue.h
 create mode 100644 examples/performance-thread/common/lthread_sched.c
 create mode 100644 examples/performance-thread/common/lthread_sched.h
 create mode 100644 examples/performance-thread/common/lthread_timer.h
 create mode 100644 examples/performance-thread/common/lthread_tls.c
 create mode 100644 examples/performance-thread/common/lthread_tls.h

diff --git a/examples/performance-thread/common/arch/x86/atomic.h 
b/examples/performance-thread/common/arch/x86/atomic.h
new file mode 100644
index 000..b1fa703
--- /dev/null
+++ b/examples/performance-thread/common/arch/x86/atomic.h
@@ -0,0 +1,59 @@
+/*
+ *-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ 

[dpdk-dev] [PATCH v1 1/5] doc: add performance-thread sample application guide

2015-09-30 Thread ibetts
From: Ian Betts 

This commit adds documentation for the performance-thread
sample application.

Signed-off-by: Ian Betts 
---
 doc/guides/rel_notes/release_2_2.rst|6 +
 doc/guides/sample_app_ug/index.rst  |1 +
 doc/guides/sample_app_ug/performance_thread.rst | 1221 +++
 3 files changed, 1228 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/performance_thread.rst

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..e9772d3 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -52,6 +52,12 @@ Libraries
 Examples
 

+* **examples: Introducing a performance thread example**
+
+  This an l3fwd derivative focused to enable characterization of performance
+  with different threading models, including multiple EAL threads per physical
+  core, and multiple Lightweight threads running in an EAL thread.
+  The examples includes a simple cooperative scheduler.

 Other
 ~
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index 9beedd9..70d4a5c 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -73,6 +73,7 @@ Sample Applications User Guide
 vm_power_management
 tep_termination
 proc_info
+performance_thread

 **Figures**

diff --git a/doc/guides/sample_app_ug/performance_thread.rst 
b/doc/guides/sample_app_ug/performance_thread.rst
new file mode 100644
index 000..497d729
--- /dev/null
+++ b/doc/guides/sample_app_ug/performance_thread.rst
@@ -0,0 +1,1220 @@
+..  BSD LICENSE
+Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Re-distributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+Performance Thread Sample Application
+=
+
+The performance thread sample application is a derivative of the standard L3
+forwarding application that demonstrates different threading models.
+
+Overview
+
+For a general description of the L3 forwarding applications capabilities
+please refer to the documentation of the standard application in
+:doc:`l3_forward`.
+
+The performance thread sample application differs from the standard L3 forward
+example in that it divides the TX and Rx processing between different threads,
+and makes it possible to assign individual threads to different cores.
+
+Three threading models are considered:-
+
+#.  When there is EAL thread per physical core
+#.  When there are multiple EAL threads per physical core
+#.  When there are multiple lightweight threads per EAL thread
+
+Since DPDK release 2.0 it is possible to launch applications using the ?lcores
+EAL parameter, specifying CPU sets for a physical core. With the  performance
+thread sample application its is now also possible to assign individual Rx
+and TX functions to different cores.
+
+As an alternative to dividing the L3 forwarding work between different EAL
+threads the performance thread sample introduces the possibility to run the
+application threads as lightweight threads (L-threads) within one or
+more EAL threads.
+
+In order to facilitate this threading model the example includes a primitive
+cooperative scheduler (L-thread) subsystem. More details of the L-thread
+subsystem can be found in :ref:`lthread_subsystem`
+
+**Note:** Whilst theoretcially possible it is not anticipated that multiple
+L-thread schedulers would 

[dpdk-dev] [PATCH v1 0/5] examples: add performance thread example

2015-09-30 Thread ibetts
From: Ian Betts 

Performance thread example application

This example comprises layer 3 forwarding derivative intended to
facilitate characterization of performance with different 
threading models, specifically:- 

1. EAL threads running on different physical cores
2. EAL threads running on the same physical core
3. Lightweight threads running in an EAL thread

Purpose and justification

Since dpdk 2.0 it has been possible to assign multiple EAL threads to 
a physical core ( case 2 above ).
Currently no example application has focused on demonstrating the 
performance constraints of differing threading models.

Whilst purpose built applications that fully comprehend the DPDK 
single threaded programming model will always yield superior 
performance, the desire to preserve ROI in legacy code written for 
multithreaded operating environments  makes lightweight threads ( case 
3 above ) worthy of consideration.

As well as aiding with legacy code reuse, it is anticipated that 
lightweight threads will make it possible to scale a multithreaded 
application with fine granularity allowing an application  to more 
easily take advantage of headroom on EAL cores, or conversely occupy 
more cores, as dictated by system load.

To explore performance with lightweight threads a simple cooperative 
scheduler subsystem is being included in this example application.
If the expected benefits and use cases prove to be of value, it is 
anticipated that this lightweight thread subsystem would become a 
library in some future DPDK release.

A simple pthread shim in the form of a hello world example is also
included.


Ian Betts (5):
  doc: add performance-thread sample application guide
  examples: add cooperative scheduler subsytem for performance-thread
app
  examples: add l3fwd-thread in performance-thread sample app
  examples: add pthread-shim in performance-thread sample app
  config: add build files for performance-thread-app

 config/common_linuxapp |6 +
 config/defconfig_i686-native-linuxapp-gcc  |6 +
 config/defconfig_i686-native-linuxapp-icc  |6 +
 config/defconfig_x86_x32-native-linuxapp-gcc   |6 +
 doc/guides/rel_notes/release_2_2.rst   |6 +
 doc/guides/sample_app_ug/index.rst |1 +
 doc/guides/sample_app_ug/performance_thread.rst| 1221 +++
 examples/Makefile  |1 +
 .../performance-thread/common/arch/x86/atomic.h|   60 +
 examples/performance-thread/common/arch/x86/ctx.c  |   66 +
 examples/performance-thread/common/arch/x86/ctx.h  |   57 +
 examples/performance-thread/common/common.mk   |   40 +
 examples/performance-thread/common/lthread.c   |  528 +++
 examples/performance-thread/common/lthread.h   |   99 +
 examples/performance-thread/common/lthread_api.h   |  822 +
 examples/performance-thread/common/lthread_cond.c  |  228 ++
 examples/performance-thread/common/lthread_cond.h  |   77 +
 examples/performance-thread/common/lthread_diag.c  |  315 ++
 examples/performance-thread/common/lthread_diag.h  |  129 +
 .../performance-thread/common/lthread_diag_api.h   |  295 ++
 examples/performance-thread/common/lthread_int.h   |  212 ++
 examples/performance-thread/common/lthread_mutex.c |  244 ++
 examples/performance-thread/common/lthread_mutex.h |   52 +
 .../performance-thread/common/lthread_objcache.h   |  160 +
 examples/performance-thread/common/lthread_pool.h  |  338 ++
 examples/performance-thread/common/lthread_queue.h |  303 ++
 examples/performance-thread/common/lthread_sched.c |  644 
 examples/performance-thread/common/lthread_sched.h |  152 +
 examples/performance-thread/common/lthread_timer.h |   47 +
 examples/performance-thread/common/lthread_tls.c   |  242 ++
 examples/performance-thread/common/lthread_tls.h   |   64 +
 examples/performance-thread/l3fwd-thread/Makefile  |   57 +
 examples/performance-thread/l3fwd-thread/main.c| 3355 
 examples/performance-thread/pthread_shim/Makefile  |   61 +
 examples/performance-thread/pthread_shim/main.c|  287 ++
 .../performance-thread/pthread_shim/pthread_shim.c |  717 +
 .../performance-thread/pthread_shim/pthread_shim.h |  113 +
 37 files changed, 11017 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/performance_thread.rst
 create mode 100644 examples/performance-thread/common/arch/x86/atomic.h
 create mode 100644 examples/performance-thread/common/arch/x86/ctx.c
 create mode 100644 examples/performance-thread/common/arch/x86/ctx.h
 create mode 100644 examples/performance-thread/common/common.mk
 create mode 100644 examples/performance-thread/common/lthread.c
 create mode 100644 examples/performance-thread/common/lthread.h
 create mode 100644 examples/performance-thread/common/lthread_api.h
 create mode 100644 examples/performance-thread/common/lthread_cond.c
 create mode 100644 

[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-09-30 Thread Stephen Hemminger
This driver allows using PCI device with Message Signalled Interrupt
from userspace. The API is similar to the igb_uio driver used by the DPDK.
Via ioctl it provides a mechanism to map MSI-X interrupts into event
file descriptors similar to VFIO.

VFIO is a better choice if IOMMU is available, but often userspace drivers
have to work in environments where IOMMU support (real or emulated) is
not available.  All UIO drivers that support DMA are not secure against
rogue userspace applications programming DMA hardware to access
private memory; this driver is no less secure than existing code.

Signed-off-by: Stephen Hemminger 
---
 drivers/uio/Kconfig  |   9 ++
 drivers/uio/Makefile |   1 +
 drivers/uio/uio_msi.c| 378 +++
 include/uapi/linux/Kbuild|   1 +
 include/uapi/linux/uio_msi.h |  22 +++
 5 files changed, 411 insertions(+)
 create mode 100644 drivers/uio/uio_msi.c
 create mode 100644 include/uapi/linux/uio_msi.h

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 52c98ce..04adfa0 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -93,6 +93,15 @@ config UIO_PCI_GENERIC
  primarily, for virtualization scenarios.
  If you compile this as a module, it will be called uio_pci_generic.

+config UIO_PCI_MSI
+   tristate "Generic driver supporting MSI-x on PCI Express cards"
+   depends on PCI
+   help
+ Generic driver that provides Message Signalled IRQ events
+ similar to VFIO. If IOMMMU is available please use VFIO
+ instead since it provides more security.
+ If you compile this as a module, it will be called uio_msi.
+
 config UIO_NETX
tristate "Hilscher NetX Card driver"
depends on PCI
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index 8560dad..62fc44b 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_UIO_NETX)  += uio_netx.o
 obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
 obj-$(CONFIG_UIO_MF624) += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)+= uio_fsl_elbc_gpcm.o
+obj-$(CONFIG_UIO_PCI_MSI)  += uio_msi.o
diff --git a/drivers/uio/uio_msi.c b/drivers/uio/uio_msi.c
new file mode 100644
index 000..802b5c4
--- /dev/null
+++ b/drivers/uio/uio_msi.c
@@ -0,0 +1,378 @@
+/*-
+ *
+ * Copyright (c) 2015 by Brocade Communications Systems, Inc.
+ * Author: Stephen Hemminger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 only.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION "0.1.1"
+#define MAX_MSIX_VECTORS   64
+
+/* MSI-X vector information */
+struct uio_msi_pci_dev {
+   struct uio_info info;   /* UIO driver info */
+   struct pci_dev *pdev;   /* PCI device */
+   struct mutexmutex;  /* open/release/ioctl mutex */
+   int ref_cnt;/* references to device */
+   unsigned intmax_vectors;/* MSI-X slots available */
+   struct msix_entry *msix;/* MSI-X vector table */
+   struct uio_msi_irq_ctx {
+   struct eventfd_ctx *trigger; /* vector to eventfd */
+   char *name; /* name in /proc/interrupts */
+   } *ctx;
+};
+
+static irqreturn_t uio_intx_irqhandler(int irq, void *arg)
+{
+   struct uio_msi_pci_dev *udev = arg;
+
+   if (pci_check_and_mask_intx(udev->pdev)) {
+   eventfd_signal(udev->ctx->trigger, 1);
+   return IRQ_HANDLED;
+   }
+
+   return IRQ_NONE;
+}
+
+static irqreturn_t uio_msi_irqhandler(int irq, void *arg)
+{
+   struct eventfd_ctx *trigger = arg;
+
+   eventfd_signal(trigger, 1);
+   return IRQ_HANDLED;
+}
+
+/* set the mapping between vector # and existing eventfd. */
+static int set_irq_eventfd(struct uio_msi_pci_dev *udev, u32 vec, int fd)
+{
+   struct eventfd_ctx *trigger;
+   int irq, err;
+
+   if (vec >= udev->max_vectors) {
+   dev_notice(>pdev->dev, "vec %u >= num_vec %u\n",
+  vec, udev->max_vectors);
+   return -ERANGE;
+   }
+
+   irq = udev->msix[vec].vector;
+   trigger = udev->ctx[vec].trigger;
+   if (trigger) {
+   /* Clearup existing irq mapping */
+   free_irq(irq, trigger);
+   eventfd_ctx_put(trigger);
+   udev->ctx[vec].trigger = NULL;
+   }
+
+   /* Passing -1 is used to disable interrupt */
+   if (fd < 0)
+   return 0;
+
+   trigger = eventfd_ctx_fdget(fd);
+   if (IS_ERR(trigger)) {
+   err = PTR_ERR(trigger);
+   dev_notice(>pdev->dev,
+  "eventfd ctx get failed: %d\n", err);
+   return err;
+   }
+
+   if (udev->msix)
+   err = request_irq(irq, uio_msi_irqhandler, 0,
+

[dpdk-dev] [PATCH 1/2] uio: add support for ioctls

2015-09-30 Thread Stephen Hemminger
Allow UIO device driver to provide ioctl interface.
This allow additional API's for UIO.

Signed-off-by: Stephen Hemminger 
---
 drivers/uio/uio.c  | 15 +++
 include/linux/uio_driver.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 8196581..5ab32ab 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -576,6 +576,20 @@ static ssize_t uio_write(struct file *filep, const char 
__user *buf,
return retval ? retval : sizeof(s32);
 }

+static long uio_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
+{
+   struct uio_listener *listener = filep->private_data;
+   struct uio_device *idev = listener->dev;
+
+   if (!idev->info)
+   return -EIO;
+
+   if (!idev->info->ioctl)
+   return -ENOTTY;
+
+   return idev->info->ioctl(idev->info, cmd, arg);
+}
+
 static int uio_find_mem_index(struct vm_area_struct *vma)
 {
struct uio_device *idev = vma->vm_private_data;
@@ -712,6 +726,7 @@ static const struct file_operations uio_fops = {
.write  = uio_write,
.mmap   = uio_mmap,
.poll   = uio_poll,
+   .unlocked_ioctl = uio_ioctl,
.fasync = uio_fasync,
.llseek = noop_llseek,
 };
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 32c0e83..10d7833 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -89,6 +89,7 @@ struct uio_device {
  * @mmap:  mmap operation for this uio device
  * @open:  open operation for this uio device
  * @release:   release operation for this uio device
+ * @ioctl: ioctl handler
  * @irqcontrol:disable/enable irqs when 0/1 is written to 
/dev/uioX
  */
 struct uio_info {
@@ -105,6 +106,8 @@ struct uio_info {
int (*open)(struct uio_info *info, struct inode *inode);
int (*release)(struct uio_info *info, struct inode *inode);
int (*irqcontrol)(struct uio_info *info, s32 irq_on);
+   int (*ioctl)(struct uio_info *info, unsigned int cmd,
+unsigned long arg);
 };

 extern int __must_check
-- 
2.1.4



[dpdk-dev] [PATCH 0/2] uio_msi: device driver

2015-09-30 Thread Stephen Hemminger
This is a new UIO device driver to allow supporting MSI-X and MSI devices
in userspace.  It has been used in environments like VMware and older versions
of QEMU/KVM where no IOMMU support is available.

Stephen Hemminger (2):

*** BLURB HERE ***

Stephen Hemminger (2):
  uio: add support for ioctls
  uio: new driver to support PCI MSI-X

 drivers/uio/Kconfig  |   9 ++
 drivers/uio/Makefile |   1 +
 drivers/uio/uio.c|  15 ++
 drivers/uio/uio_msi.c| 378 +++
 include/linux/uio_driver.h   |   3 +
 include/uapi/linux/Kbuild|   1 +
 include/uapi/linux/uio_msi.h |  22 +++
 7 files changed, 429 insertions(+)
 create mode 100644 drivers/uio/uio_msi.c
 create mode 100644 include/uapi/linux/uio_msi.h

-- 
2.1.4



[dpdk-dev] [PATCH 3/3] fm10k: add VMDQ support in multi-queue configure

2015-09-30 Thread Shaopeng He
Add separate functions to configure VMDQ and RSS.
Update dglort map and logic ports accordingly.
Reset MAC/VLAN filter after VMDQ configure was changed.

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k_ethdev.c | 164 +--
 1 file changed, 141 insertions(+), 23 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index cf48cd5..4d6dd57 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -337,8 +337,43 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
return 0;
 }

+/* fls = find last set bit = 32 minus the number of leading zeros */
+#ifndef fls
+#define fls(x) (((x) == 0) ? 0 : (32 - __builtin_clz((x
+#endif
+
 static void
-fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev)
+fm10k_dev_vmdq_rx_configure(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct rte_eth_vmdq_rx_conf *vmdq_conf;
+   uint32_t i;
+
+   vmdq_conf = >data->dev_conf.rx_adv_conf.vmdq_rx_conf;
+
+   for (i = 0; i < vmdq_conf->nb_pool_maps; i++) {
+   if (!vmdq_conf->pool_map[i].pools)
+   continue;
+   fm10k_mbx_lock(hw);
+   fm10k_update_vlan(hw, vmdq_conf->pool_map[i].vlan_id, 0, true);
+   fm10k_mbx_unlock(hw);
+   }
+}
+
+static void
+fm10k_dev_pf_main_vsi_reset(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   /* Add default mac address */
+   ether_addr_copy((const struct ether_addr *)hw->mac.addr,
+   >data->mac_addrs[0]);
+   fm10k_MAC_filter_set(dev, hw->mac.addr, true,
+   MAIN_VSI_POOL_NUMBER);
+}
+
+static void
+fm10k_dev_rss_configure(struct rte_eth_dev *dev)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct rte_eth_conf *dev_conf = >data->dev_conf;
@@ -409,6 +444,76 @@ fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev)
FM10K_WRITE_REG(hw, FM10K_MRQC(0), mrqc);
 }

+static void
+fm10k_dev_logic_port_update(struct rte_eth_dev *dev,
+   uint16_t nb_lport_old, uint16_t nb_lport_new)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t i;
+
+   fm10k_mbx_lock(hw);
+   /* Disable previous logic ports */
+   if (nb_lport_old)
+   hw->mac.ops.update_lport_state(hw, hw->mac.dglort_map,
+   nb_lport_old, false);
+   /* Enable new logic ports */
+   hw->mac.ops.update_lport_state(hw, hw->mac.dglort_map,
+   nb_lport_new, true);
+   fm10k_mbx_unlock(hw);
+
+   for (i = 0; i < nb_lport_new; i++) {
+   /* Set unicast mode by default. App can change
+* to other mode in other API func.
+*/
+   fm10k_mbx_lock(hw);
+   hw->mac.ops.update_xcast_mode(hw, hw->mac.dglort_map + i,
+   FM10K_XCAST_MODE_NONE);
+   fm10k_mbx_unlock(hw);
+   }
+}
+
+static void
+fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct rte_eth_vmdq_rx_conf *vmdq_conf;
+   struct rte_eth_conf *dev_conf = >data->dev_conf;
+   struct fm10k_macvlan_filter_info *macvlan;
+   uint16_t nb_queue_pools = 0; /* pool number in configuration */
+   uint16_t nb_lport_new, nb_lport_old;
+
+   macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);
+   vmdq_conf = >data->dev_conf.rx_adv_conf.vmdq_rx_conf;
+
+   fm10k_dev_rss_configure(dev);
+
+   /* only PF supports VMDQ */
+   if (hw->mac.type != fm10k_mac_pf)
+   return;
+
+   if (dev_conf->rxmode.mq_mode & ETH_MQ_RX_VMDQ_FLAG)
+   nb_queue_pools = vmdq_conf->nb_queue_pools;
+
+   /* no pool number change, no need to update logic port and VLAN/MAC */
+   if (macvlan->nb_queue_pools == nb_queue_pools)
+   return;
+
+   nb_lport_old = macvlan->nb_queue_pools ? macvlan->nb_queue_pools : 1;
+   nb_lport_new = nb_queue_pools ? nb_queue_pools : 1;
+   fm10k_dev_logic_port_update(dev, nb_lport_old, nb_lport_new);
+
+   /* reset MAC/VLAN as it's based on VMDQ or PF default VSI */
+   memset(dev->data->mac_addrs, 0,
+   ETHER_ADDR_LEN * FM10K_MAX_MACADDR_NUM);
+   memset(macvlan, 0, sizeof(*macvlan));
+   macvlan->nb_queue_pools = nb_queue_pools;
+
+   if (nb_queue_pools)
+   fm10k_dev_vmdq_rx_configure(dev);
+   else
+   fm10k_dev_pf_main_vsi_reset(dev);
+}
+
 static int
 fm10k_dev_tx_init(struct rte_eth_dev *dev)
 {
@@ -517,7 +622,7 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
FM10K_WRITE_FLUSH(hw);
}

-   /* Configure RSS if applicable */
+   /* Configure VMDQ/RSS if applicable */

[dpdk-dev] [PATCH 2/3] fm10k: add VMDQ support in MAC/VLAN filter

2015-09-30 Thread Shaopeng He
The patch does below things for fm10k MAC/VLAN filter:
- Add separate functions to add MAC address for VMDQ and
  main VSI.
- Disable modification to VLAN filter in VMDQ mode.
- In device close phase, delete logic ports to remove all
  the MAC/VLAN filters.

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k.h|   3 +
 drivers/net/fm10k/fm10k_ethdev.c | 150 +--
 2 files changed, 99 insertions(+), 54 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..439e95f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -126,6 +126,9 @@
 struct fm10k_macvlan_filter_info {
uint16_t vlan_num;   /* Total VLAN number */
uint16_t mac_num;/* Total mac number */
+   uint16_t nb_queue_pools; /* Active queue pools number */
+   /* VMDQ ID for each MAC address */
+   uint8_t  mac_vmdq_id[FM10K_MAX_MACADDR_NUM];
uint32_t vfta[FM10K_VFTA_SIZE];/* VLAN bitmap */
 };

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 082937d..cf48cd5 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -45,6 +45,8 @@
 #define FM10K_MBXLOCK_DELAY_US 20
 #define UINT64_LOWER_32BITS_MASK 0xULL

+#define MAIN_VSI_POOL_NUMBER 0
+
 /* Max try times to acquire switch status */
 #define MAX_QUERY_SWITCH_STATE_TIMES 10
 /* Wait interval to get switch status */
@@ -61,10 +63,8 @@ static void fm10k_dev_allmulticast_disable(struct 
rte_eth_dev *dev);
 static inline int fm10k_glort_valid(struct fm10k_hw *hw);
 static int
 fm10k_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
-static void
-fm10k_MAC_filter_set(struct rte_eth_dev *dev, const u8 *mac, bool add);
-static void
-fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
+static void fm10k_MAC_filter_set(struct rte_eth_dev *dev,
+   const u8 *mac, bool add, uint32_t pool);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);

@@ -883,10 +883,17 @@ static void
 fm10k_dev_close(struct rte_eth_dev *dev)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint16_t nb_lport;
+   struct fm10k_macvlan_filter_info *macvlan;

PMD_INIT_FUNC_TRACE();

-   fm10k_MACVLAN_remove_all(dev);
+   macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);
+   nb_lport = macvlan->nb_queue_pools ? macvlan->nb_queue_pools : 1;
+   fm10k_mbx_lock(hw);
+   hw->mac.ops.update_lport_state(hw, hw->mac.dglort_map,
+   nb_lport, false);
+   fm10k_mbx_unlock(hw);

/* Stop mailbox service first */
fm10k_close_mbx_service(hw);
@@ -1023,6 +1030,11 @@ fm10k_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);

+   if (macvlan->nb_queue_pools > 0) { /* VMDQ mode */
+   PMD_INIT_LOG(ERR, "Cannot change VLAN filter in VMDQ mode");
+   return (-EINVAL);
+   }
+
if (vlan_id > ETH_VLAN_ID_MAX) {
PMD_INIT_LOG(ERR, "Invalid vlan_id: must be < 4096");
return (-EINVAL);
@@ -1100,38 +1112,80 @@ fm10k_vlan_offload_set(__rte_unused struct rte_eth_dev 
*dev, int mask)
}
 }

-/* Add/Remove a MAC address, and update filters */
-static void
-fm10k_MAC_filter_set(struct rte_eth_dev *dev, const u8 *mac, bool add)
+/* Add/Remove a MAC address, and update filters to main VSI */
+static void fm10k_MAC_filter_set_main_vsi(struct rte_eth_dev *dev,
+   const u8 *mac, bool add, uint32_t pool)
 {
-   uint32_t i, j, k;
-   struct fm10k_hw *hw;
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct fm10k_macvlan_filter_info *macvlan;
+   uint32_t i, j, k;

-   hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);

-   i = 0;
-   for (j = 0; j < FM10K_VFTA_SIZE; j++) {
-   if (macvlan->vfta[j]) {
-   for (k = 0; k < FM10K_UINT32_BIT_SIZE; k++) {
-   if (macvlan->vfta[j] & (1 << k)) {
-   if (i + 1 > macvlan->vlan_num) {
-   PMD_INIT_LOG(ERR, "vlan number "
-   "not match");
-   return;
-   }
-   fm10k_mbx_lock(hw);
-   fm10k_update_uc_addr(hw,
-   hw->mac.dglort_map, mac,
-   j * FM10K_UINT32_BIT_SIZE + k,
-   

[dpdk-dev] [PATCH 0/3] fm10k: add VMDQ support

2015-09-30 Thread Shaopeng He
This patch series adds VMDQ support to fm10k.
It includes the functions to configure VMDQ mode and 
add MAC address for each VMDQ queue pool.
It also includes logic to do sanity check for 
multi-queue settings.

1. implement rx_descriptor_done function in fm10k
2. make sure default VID available in dev_init in fm10k
3. fix a memory leak for non-ip packet in l3fwd-power
4. add rx interrupt support in fm10k PF and VF

Shaopeng He (3):
  fm10k: add multi-queue checking
  fm10k: add VMDQ support in MAC/VLAN filter
  fm10k: add VMDQ support in multi-queue configure

 drivers/net/fm10k/fm10k.h|   3 +
 drivers/net/fm10k/fm10k_ethdev.c | 358 ++-
 2 files changed, 284 insertions(+), 77 deletions(-)

-- 
1.9.3



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote:
> 
> 
> On 09/30/15 15:03, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
> >>
> >>On 09/30/15 14:41, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
> The whole idea is to bypass kernel. Especially for networking...
> >>>... on dumb hardware that doesn't support doing that securely.
> >>On a very capable HW that supports whatever security requirements needed
> >>(e.g. 82599 Intel's SR-IOV VF devices).
> >Network card type is irrelevant as long as you do not have an IOMMU,
> >otherwise you would just use e.g. VFIO.
> 
> Sorry, but I don't follow your logic here - Amazon EC2 environment is a
> example where there *is* iommu but it's not virtualized
> and thus VFIO is
> useless and there is an option to use directly assigned SR-IOV networking
> device there where using the kernel drivers impose a performance impact
> compared to user space UIO-based user space kernel bypass mode of usage. How
> is it irrelevant? Could u, pls, clarify your point?
> 

So it's not even dumb hardware, it's another piece of software
that forces an "all or nothing" approach where either
device has access to all VM memory, or none.
And this, unfortunately, leaves you with no secure way to
allow userspace drivers.

So it makes even less sense to add insecure work-arounds in the kernel.
It seems quite likely that by the time the new kernel reaches
production X years from now, EC2 will have a virtual iommu.


> >
> >>>Colour me unimpressed.
> >>>


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 15:03, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 14:41, Michael S. Tsirkin wrote:
>>> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
 The whole idea is to bypass kernel. Especially for networking...
>>> ... on dumb hardware that doesn't support doing that securely.
>> On a very capable HW that supports whatever security requirements needed
>> (e.g. 82599 Intel's SR-IOV VF devices).
> Network card type is irrelevant as long as you do not have an IOMMU,
> otherwise you would just use e.g. VFIO.

Sorry, but I don't follow your logic here - Amazon EC2 environment is a 
example where there *is* iommu but it's not virtualized and thus VFIO is 
useless and there is an option to use directly assigned SR-IOV 
networking device there where using the kernel drivers impose a 
performance impact compared to user space UIO-based user space kernel 
bypass mode of usage. How is it irrelevant? Could u, pls, clarify your 
point?

>
>>> Colour me unimpressed.
>>>



[dpdk-dev] [PATCH v3] Move rte_mbuf macros to common header file

2015-09-30 Thread Stephen Hemminger
On Wed, 30 Sep 2015 14:55:03 -0700
Ravi Kerur  wrote:

> +static inline uint64_t rte_mbuf_data_dma_addr(struct rte_mbuf *mb)
> +{
> + return ((uint64_t)((mb)->buf_physaddr + (mb)->data_off));
> +}
> +
> +static inline uint64_t rte_mbuf_data_dma_addr_default(struct rte_mbuf *mb)
> +{
> + return ((uint64_t)((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM));
> +}
> +

Some nits:
  * extra () on return is an unnecessary BSDism
  * cast to (uint64_t) is probably not needed since C  does that anyway.
  * functions should take "const struct rte_mbuf *" since they don't modify it.



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote:
> 
> 
> On 09/30/15 14:41, Michael S. Tsirkin wrote:
> >On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
> >>The whole idea is to bypass kernel. Especially for networking...
> >... on dumb hardware that doesn't support doing that securely.
> 
> On a very capable HW that supports whatever security requirements needed
> (e.g. 82599 Intel's SR-IOV VF devices).

Network card type is irrelevant as long as you do not have an IOMMU,
otherwise you would just use e.g. VFIO.

> >Colour me unimpressed.
> >


[dpdk-dev] [PATCH v2] Move rte_mbuf macros to common header file

2015-09-30 Thread Ravi Kerur
On Wed, Sep 30, 2015 at 12:41 PM, Aaron Conole  wrote:

> Ravi Kerur  writes:
>
> > Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
> > are defined in each PMD driver file. Move those macros into common
> > lib/librte_mbuf/rte_mbuf.h file. PMD drivers include rte_mbuf.h
> > file directly/indirectly hence no additionl header file inclusion
> > is necessary.
> I think this should also mention that they are no longer macros, as
> well.
>
> <>
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -843,6 +843,16 @@ struct rte_mbuf {
> >   uint16_t timesync;
> >  } __rte_cache_aligned;
> >
> > +static inline uint64_t RTE_MBUF_DATA_DMA_ADDR(struct rte_mbuf* mb)
> > +{
> > + return ((uint64_t)((mb)->buf_physaddr + (mb)->data_off));
> > +}
> > +
> > +static inline uint64_t RTE_MBUF_DATA_DMA_ADDR_DEFAULT(struct rte_mbuf
> *mb)
> > +{
> > + return ((uint64_t)((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM));
> > +}
> > +
> I think these names should be made lower case as well.
>

Thanks, I was waiting for one explicit input on this. I have sent v3 patch.


[dpdk-dev] [PATCH v3] Move rte_mbuf macros to common header file

2015-09-30 Thread Ravi Kerur
Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
are defined in each PMD driver file. Convert macros to inline
functions and move them to common lib/librte_mbuf/rte_mbuf.h file.
PMD drivers include rte_mbuf.h file directly/indirectly hence no
additionl header file inclusion is necessary.

v3:
> Changed converted macro->inline names to lower-case
> Fix checkpatch.pl errors and warnings
(camelcase warning is not fixed)
> Compiled following targets
> x86_64-native-linuxapp-clang
> x86_64-native-linuxapp-gcc
> i686-native-linuxapp-gcc
> x86_64-native-bsdapp-gcc
> x86_64-native-bsdapp-clang

> Tested x86_64 ubuntu 14.04
> make test
v2:
> Changed both macros to inline functions in all PMD
> Changed macro to rte_pktmbuf_mtod in xenvirt module

> Compiled following targets
> x86_64-native-linuxapp-clang
> x86_64-native-linuxapp-gcc
> i686-native-linuxapp-gcc

> Tested x86_64 ubuntu 14.04
> testpmd and make test
v1:
> Move macros into common rte_mbuf header file.

> Compiled for:
> x86_64-native-linuxapp-clang
> x86_64-native-linuxapp-gcc
> i686-native-linuxapp-gcc
> x86_64-native-bsdapp-gcc
> x86_64-native-bsdapp-clang

Tested on:
> x86_64 Ubuntu 14.04, testpmd and 'make test'
> FreeBSD 10.1, testpmd

Signed-off-by: Ravi Kerur 
---
 drivers/net/bnx2x/bnx2x.c  |  2 +-
 drivers/net/bnx2x/bnx2x.h  |  3 ---
 drivers/net/cxgbe/sge.c|  3 ---
 drivers/net/e1000/em_rxtx.c| 15 +--
 drivers/net/e1000/igb_rxtx.c   | 14 --
 drivers/net/i40e/i40e_rxtx.c   | 20 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c | 14 +++---
 drivers/net/ixgbe/ixgbe_rxtx.h |  6 --
 drivers/net/virtio/virtio_rxtx.c   |  2 +-
 drivers/net/virtio/virtqueue.h |  3 ---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 11 +++
 drivers/net/xenvirt/virtqueue.h|  5 +
 lib/librte_mbuf/rte_mbuf.h | 10 ++
 13 files changed, 39 insertions(+), 69 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c
index fed7a06..a3f118c 100644
--- a/drivers/net/bnx2x/bnx2x.c
+++ b/drivers/net/bnx2x/bnx2x.c
@@ -2147,7 +2147,7 @@ int bnx2x_tx_encap(struct bnx2x_tx_queue *txq, struct 
rte_mbuf **m_head, int m_p
tx_start_bd = >tx_ring[TX_BD(bd_prod, txq)].start_bd;

tx_start_bd->addr =
-   rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR(m0));
+   rte_cpu_to_le_64(rte_mbuf_data_dma_addr(m0));
tx_start_bd->nbytes = rte_cpu_to_le_16(m0->data_len);
tx_start_bd->bd_flags.as_bitfield = ETH_TX_BD_FLAGS_START_BD;
tx_start_bd->general_data =
diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
index 867b92a..28bd83f 100644
--- a/drivers/net/bnx2x/bnx2x.h
+++ b/drivers/net/bnx2x/bnx2x.h
@@ -141,9 +141,6 @@ struct bnx2x_device_type {
char *bnx2x_name;
 };

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   ((uint64_t)((mb)->buf_physaddr + (mb)->data_off))
-
 #define BNX2X_PAGE_SHIFT   12
 #define BNX2X_PAGE_SIZE(1 << BNX2X_PAGE_SHIFT)
 #define BNX2X_PAGE_MASK(~(BNX2X_PAGE_SIZE - 1))
diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 6eb1244..8f4c025 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -1267,9 +1267,6 @@ static struct rte_mbuf *t4_pktgl_to_mbuf(const struct 
pkt_gl *gl)
return t4_pktgl_to_mbuf_usembufs(gl);
 }

-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   ((dma_addr_t) ((mb)->buf_physaddr + (mb)->data_off))
-
 /**
  * t4_ethrx_handler - process an ingress ethernet packet
  * @q: the response queue that received the packet
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 3b8776d..39c9744 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -88,12 +88,6 @@ rte_rxmbuf_alloc(struct rte_mempool *mp)
return (m);
 }

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   (uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
-
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -585,7 +579,7 @@ eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 * Set up Transmit Data Descriptor.
 */
slen = m_seg->data_len;
-   buf_dma_addr = RTE_MBUF_DATA_DMA_ADDR(m_seg);
+   buf_dma_addr = rte_mbuf_data_dma_addr(m_seg);


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 14:41, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
>> The whole idea is to bypass kernel. Especially for networking...
> ... on dumb hardware that doesn't support doing that securely.

On a very capable HW that supports whatever security requirements needed 
(e.g. 82599 Intel's SR-IOV VF devices).

> Colour me unimpressed.
>



[dpdk-dev] Is there any example application to used DPDK packet distributor library?

2015-09-30 Thread 최익성
Dear DPDK experts.

I am Ick-Sung Choi living in South Korea.

I have a question about DPDK? packet distributor library.

Is there any example application to used DPDK packet distributor library?

I am trying to experiment simple function using DPDK packet distributor library.

If I can study an example application of DPDK packet distributor library, it 
would be very helpful for my experiment.

I will appreciate if I can be given any example applications, advice, and 
information.

Thank you very much.

Sincerely Yours,

Ick-Sung Choi.



[dpdk-dev] [PATCH v2 3/3] ixgbe: add check for supported flow director behaviors

2015-09-30 Thread Andrey Chilikin
Handle only supported flow director behaviors

Signed-off-by: Andrey Chilikin 
---
 drivers/net/ixgbe/ixgbe_fdir.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_fdir.c b/drivers/net/ixgbe/ixgbe_fdir.c
index 5c8b833..cf0e8be 100644
--- a/drivers/net/ixgbe/ixgbe_fdir.c
+++ b/drivers/net/ixgbe/ixgbe_fdir.c
@@ -959,7 +959,8 @@ ixgbe_add_del_fdir_filter(struct rte_eth_dev *dev,
" signature mode.");
return -EINVAL;
}
-   } else if (fdir_filter->action.rx_queue < IXGBE_MAX_RX_QUEUE_NUM)
+   } else if (fdir_filter->action.behavior == RTE_ETH_FDIR_ACCEPT &&
+   fdir_filter->action.rx_queue < IXGBE_MAX_RX_QUEUE_NUM)
queue = (uint8_t)fdir_filter->action.rx_queue;
else
return -EINVAL;
-- 
1.7.4.1



[dpdk-dev] [PATCH v2 2/3] i40e: add RTE_ETH_FDIR_PASSTHRU processing for flow director behavior

2015-09-30 Thread Andrey Chilikin
Add support for RTE_ETH_FDIR_PASSTHRU flow director behavior so output queue is 
assigned by other filters

Signed-off-by: Andrey Chilikin 
---
 drivers/net/i40e/i40e_fdir.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index c9ce98f..45a372c 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1103,8 +1103,16 @@ i40e_fdir_filter_programming(struct i40e_pf *pf,

if (fdir_action->behavior == RTE_ETH_FDIR_REJECT)
dest = I40E_FILTER_PROGRAM_DESC_DEST_DROP_PACKET;
-   else
-   dest = I40E_FILTER_PROGRAM_DESC_DEST_DIRECT_PACKET_QINDEX;
+   else if (fdir_action->behavior == RTE_ETH_FDIR_ACCEPT)
+   dest = I40E_FILTER_PROGRAM_DESC_DEST_DIRECT_PACKET_QINDEX;
+   else if (fdir_action->behavior == RTE_ETH_FDIR_PASSTHRU)
+   dest = I40E_FILTER_PROGRAM_DESC_DEST_DIRECT_PACKET_OTHER;
+   else {
+   PMD_DRV_LOG(ERR, "Failed to program FDIR filter:"
+   " unsupported fdir behavior.");
+   return -EINVAL;
+   }
+
fdirdp->dtype_cmd_cntindex |= rte_cpu_to_le_32((dest <<
I40E_TXD_FLTR_QW1_DEST_SHIFT) &
I40E_TXD_FLTR_QW1_DEST_MASK);
-- 
1.7.4.1



[dpdk-dev] [PATCH v2 1/3] librte_ether: add RTE_ETH_FDIR_PASSTHRU for flow director behavior

2015-09-30 Thread Andrey Chilikin
Add new flow director behavior RTE_ETH_FDIR_PASSTHRU to assign a queue by other 
filters

Signed-off-by: Andrey Chilikin 
---
 lib/librte_ether/rte_eth_ctrl.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 26b7b33..a0a6aab 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -417,6 +417,7 @@ struct rte_eth_fdir_input {
 enum rte_eth_fdir_behavior {
RTE_ETH_FDIR_ACCEPT = 0,
RTE_ETH_FDIR_REJECT,
+   RTE_ETH_FDIR_PASSTHRU,
 };

 /**
-- 
1.7.4.1



[dpdk-dev] [PATCH v2 0/3] Support for flow director behavior "passthru" on Intel FVL NIC

2015-09-30 Thread Andrey Chilikin
This patch set adds new flow director behavior "passthru" on Intel X(L)710 NIC.
When this mode is selected flow director will direct packet to LAN while the 
queue is defined by other filters. This can be used to extract flexible payload 
to RX desriptor with the flow director filter while targeted queue will be 
defined by other filters, for example, by hash filter (RSS).

v2: rename RTE_ETH_FDIR_OTHER to RTE_ETH_FDIR_PASSTHRU

Andrey Chilikin (3):
  librte_ether: add RTE_ETH_FDIR_PASSTHRU for flow director behavior
  i40e: add RTE_ETH_FDIR_PASSTHRU processing for flow director behavior
  ixgbe: add check for supported flow director behaviors

 drivers/net/i40e/i40e_fdir.c|   12 ++--
 drivers/net/ixgbe/ixgbe_fdir.c  |3 ++-
 lib/librte_ether/rte_eth_ctrl.h |1 +
 3 files changed, 13 insertions(+), 3 deletions(-)

-- 
1.7.4.1



[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote:
> The whole idea is to bypass kernel. Especially for networking...

... on dumb hardware that doesn't support doing that securely.
Colour me unimpressed.

-- 
MST


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Stephen Hemminger
On Wed, 30 Sep 2015 23:09:33 +0300
Vlad Zolotarov  wrote:

> 
> 
> On 09/30/15 22:39, Michael S. Tsirkin wrote:
> > On Wed, Sep 30, 2015 at 10:06:52PM +0300, Vlad Zolotarov wrote:
>  How would iommu
>  virtualization change anything?
> >>> Kernel can use an iommu to limit device access to memory of
> >>> the controlling application.
> >> Ok, this is obvious but what it has to do with enabling using MSI/MSI-X
> >> interrupts support in uio_pci_generic? kernel may continue to limit the
> >> above access with this support as well.
> > It could maybe. So if you write a patch to allow MSI by at the same time
> > creating an isolated IOMMU group and blocking DMA from device in
> > question anywhere, that sounds reasonable.
> 
> No, I'm only planning to add MSI and MSI-X interrupts support for 
> uio_pci_generic device.
> The rest mentioned above should naturally be a matter of a different 
> patch and writing it is orthogonal to the patch I'm working on as has 
> been extensively discussed in this thread.
> 
> >
> 

I have a generic MSI and MSI-X driver (posted earlier on this list).
About to post to upstream kernel.


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 13:58, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 01:37:22PM +0300, Vlad Zolotarov wrote:
>>
>> On 09/30/15 00:49, Michael S. Tsirkin wrote:
>>> On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
 On Tue, 29 Sep 2015 23:54:54 +0300
 "Michael S. Tsirkin"  wrote:

> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
>> The security breach motivation u brought in "[RFC PATCH] uio:
>> uio_pci_generic: Add support for MSI interrupts" thread seems a bit weak
>> since one u let the userland access to the bar it may do any funny thing
>> using the DMA engine of the device. This kind of stuff should be 
>> prevented
>> using the iommu and if it's enabled then any funny tricks using MSI/MSI-X
>> configuration will be prevented too.
>>
>> I'm about to send the patch to main Linux mailing list. Let's continue 
>> this
>> discussion there.
> Basically UIO shouldn't be used with devices capable of DMA.
> Use VFIO for that (yes, this implies an emulated or PV IOMMU).
>> If there is an IOMMU in the picture there shouldn't be any problem to use
>> UIO with DMA capable devices.
> UIO doesn't enforce the IOMMU though. That's why it's not a good fit.

Having said all that - does UIO denies to work with the devices with DMA 
capability today? Either i have missed that logic or it's not there.
So all what u are so worried about may already be done today. That's why 
I don't understand why adding a support for MSI/MSI-X interrupts
would change anything here. U are right that UIO *today* has a security 
hole however it should be addressed separately and the same solution
that will cover the the security breach in the current code will cover 
the "MSI/MSI-X security vulnerability" because they are actually exactly 
the same
issue.

>
> I don't think this can change.
 Given there is no PV IOMMU and even if there was it would be too slow for 
 DPDK
 use, I can't accept that.
>>> QEMU does allow emulating an iommu.
>> Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an option
>> there.
> Not only that, a bunch of boxes have their IOMMU disabled.
> So for such systems, you can't have userspace poking at
> device registers. You need a kernel driver to validate
> userspace requests before passing them on to devices.

I think u are describing a HV functionality here. ;) And yes, u are 
absolutely right, HV has to control the non-privileged userland.
For HV/non-virtualized boxes a possible solution could be to allow UIO 
only for some privileged group of processes.

>
>> And again, it's a general issue not DPDK specific.
>> Today one has to develop some proprietary modules (like igb_uio) to
>> workaround the issue and this is lame.
> Of course it is lame. So don't bypass the kernel then, use the upstream 
> drivers.

This would impose a heavy performance penalty. The whole idea is to 
bypass kernel. Especially for networking...

>
>> IMHO uio_pci_generic should
>> be fixed to be able to properly work within any virtualized environment and
>> not only with KVM.
> The motivation for UIO is pretty clear:
>
>  For many types of devices, creating a Linux kernel driver is
>  overkill.  All that is really needed is some way to handle an
>  interrupt and provide access to the memory space of the
>  device.  The logic of controlling the device does not
>  necessarily have to be within the kernel, as the device does
>  not need to take advantage of any of other resources that the
>  kernel provides.  One such common class of devices that are
>  like this are for industrial I/O cards.
>
> Devices doing DMA do need to take advantage of memory protection
> that the kernel provides.
Well, yeah - but who said I has to be forbidden to work with the device 
if MSI-X interrupts is my only option?

Kernel may provide a protection in the way that it would check the 
process permissions and deny the UIO access to non-privileged processes.
I'm not sure it's the case today and if it's not the case then, as 
mentioned above, this would rather be fixed ASAP exactly due to reasons 
u bring up
here. And once it's done there shouldn't be any limitation to allow MSI 
or MSI-X interrupts along with INT#X.

>
>>>   DPDK uses static mappings, so I
>>> doubt it's speed matters at all.
>>>
>>> Anyway, DPDK is doing polling all the time. I don't see why does it
>>> insist on using interrupts to detect link up events. Just poll for that
>>> too.
>>>



[dpdk-dev] [PATCH 02/20] librte_ether: add fields from rte_pci_driver to rte_eth_dev_data

2015-09-30 Thread Bruce Richardson
On Wed, Sep 30, 2015 at 09:18:53AM -0400, Neil Horman wrote:
> > +}
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index fa06554..9cd262b 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1635,8 +1635,23 @@ struct rte_eth_dev_data {
> > all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
> > dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). 
> > */
> > lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
> > +   uint32_t dev_flags; /**< Flags controlling handling of device. */
> > +   enum rte_kernel_driver kdrv;/**< Kernel driver passthrough */
> > +   int numa_node;
> > +   const char *drv_name;
> >  };
> >  
> Unrelated to my other questions on this code: Is rte_eth_dev_data ever
> allocation by any applications?  If so, this will have to go through the ABI
> process.  I don't think it is, but I wanted to ask to be sure
> 
> Neil
> 

No - applications do not allocate this structure directly, it's internal only, 
so
we should be safe here from an ABI perspective.

/Bruce


[dpdk-dev] [PATCH 02/20] librte_ether: add fields from rte_pci_driver to rte_eth_dev_data

2015-09-30 Thread Bruce Richardson
On Wed, Sep 30, 2015 at 09:14:48AM -0400, Neil Horman wrote:
> On Wed, Sep 30, 2015 at 10:56:04AM +0100, Bruce Richardson wrote:
> > On Tue, Sep 29, 2015 at 03:08:12PM -0400, Neil Horman wrote:
> > > On Mon, Sep 28, 2015 at 02:03:20PM +0100, Bernard Iremonger wrote:
> > > > add dev_flags to rte_eth_dev_data, add macros for dev_flags.
> > > > add kdrv to rte_eth_dev_data.
> > > > add numa_node to rte_eth_dev_data.
> > > > add drv_name to rte_eth_dev_data.
> > > > use dev_type to distinguish between vdev's and pdev's.
> > > > remove pci_dev branches.
> > > > 
> > > > Signed-off-by: Bernard Iremonger 
> > > > ---
> > > >  lib/librte_ether/rte_ethdev.c | 53 
> > > > ---
> > > >  lib/librte_ether/rte_ethdev.h | 15 
> > > >  2 files changed, 45 insertions(+), 23 deletions(-)
> > > > 
> > 
> > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > @@ -1635,8 +1635,23 @@ struct rte_eth_dev_data {
> > > > all_multicast : 1, /**< RX all multicast mode ON(1) / 
> > > > OFF(0). */
> > > > dev_started : 1,   /**< Device state: STARTED(1) / 
> > > > STOPPED(0). */
> > > > lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
> > > > +   uint32_t dev_flags; /**< Flags controlling handling of device. 
> > > > */
> > > > +   enum rte_kernel_driver kdrv;/**< Kernel driver passthrough 
> > > > */
> > > Why add this here? The ennumerated driver types are all variants on PCI 
> > > bus
> > > types.  Not sure why the ethernet interface needs to know this info
> > > 
> > > > +   int numa_node;
> > > Ditto, this seems like information that is only relevant if the device is 
> > > on a
> > > physical bus (i.e. virual devices are likely to not have a numa node)
> > >
> > Actually, I disagree. For some virtual devices they will have a numa node. 
> > For
> > ring or other virtual PMDs the numa node will be the node on which the ring 
> > /
> > mempool etc. memory is allocated on, and can be of relevance.
> > 
> > /Bruce
> > 
> 
> I think its fairly clear that some devices (including virtual ones) have some
> relevant relation to a numa_node (There are even some that have no numa_node,
> for which a -1 value makes some sense).  That said, there are just as many 
> that
> don't have a relevant numa_node.
> 
> 1) There are some drivers for which numa_node make no sense (regardless of
> value):
>  * af_packet - The numa node is at best determined at run time by the 
> interface
> the socket is bound to
> 
>  * pcap - same as af_packet
> 
>  * bonding - multiple interfaces mean multiple numa_nodes, any value set here 
> is
> just as likely to be wrong as right
> 
>  * mpipe - no real large memory area to associate with a numa node
> 
>  * virtio - uses iopl for communication, and cannot know its numa_node
> 
>  * vmxnet3 - same concept as virtio
> 
>  * xenvirt - same as vmxnet3
> 
> I think its better that you store numa locality information in a pmd's private
> bus data, and export it to applications via a device method.  that provides 
> the
> flexibility to tell the application that there is no numa locality for a 
> device
> (by not implementing the method), without having to expose an unset data field
> to the application.
> 
> Neil
> 

Sure, that could work.
However, is it really worthwhile asking drivers to implement a new ethdev API
function, rather than just having them set the numa node field correctly in the
init function?

/Bruce


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2015 at 01:37:22PM +0300, Vlad Zolotarov wrote:
> 
> 
> On 09/30/15 00:49, Michael S. Tsirkin wrote:
> >On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
> >>On Tue, 29 Sep 2015 23:54:54 +0300
> >>"Michael S. Tsirkin"  wrote:
> >>
> >>>On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
> The security breach motivation u brought in "[RFC PATCH] uio:
> uio_pci_generic: Add support for MSI interrupts" thread seems a bit weak
> since one u let the userland access to the bar it may do any funny thing
> using the DMA engine of the device. This kind of stuff should be prevented
> using the iommu and if it's enabled then any funny tricks using MSI/MSI-X
> configuration will be prevented too.
> 
> I'm about to send the patch to main Linux mailing list. Let's continue 
> this
> discussion there.
> >>>Basically UIO shouldn't be used with devices capable of DMA.
> >>>Use VFIO for that (yes, this implies an emulated or PV IOMMU).
> 
> If there is an IOMMU in the picture there shouldn't be any problem to use
> UIO with DMA capable devices.

UIO doesn't enforce the IOMMU though. That's why it's not a good fit.

> >>>I don't think this can change.
> >>Given there is no PV IOMMU and even if there was it would be too slow for 
> >>DPDK
> >>use, I can't accept that.
> >QEMU does allow emulating an iommu.
> 
> Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an option
> there.

Not only that, a bunch of boxes have their IOMMU disabled.
So for such systems, you can't have userspace poking at
device registers. You need a kernel driver to validate
userspace requests before passing them on to devices.

> And again, it's a general issue not DPDK specific.
> Today one has to develop some proprietary modules (like igb_uio) to
> workaround the issue and this is lame.

Of course it is lame. So don't bypass the kernel then, use the upstream drivers.

> IMHO uio_pci_generic should
> be fixed to be able to properly work within any virtualized environment and
> not only with KVM.

The motivation for UIO is pretty clear:

For many types of devices, creating a Linux kernel driver is
overkill.  All that is really needed is some way to handle an
interrupt and provide access to the memory space of the
device.  The logic of controlling the device does not
necessarily have to be within the kernel, as the device does
not need to take advantage of any of other resources that the
kernel provides.  One such common class of devices that are
like this are for industrial I/O cards.

Devices doing DMA do need to take advantage of memory protection
that the kernel provides.

> 
> >  DPDK uses static mappings, so I
> >doubt it's speed matters at all.
> >
> >Anyway, DPDK is doing polling all the time. I don't see why does it
> >insist on using interrupts to detect link up events. Just poll for that
> >too.
> >


[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-09-30 Thread Vlad Zolotarov


On 09/30/15 00:49, Michael S. Tsirkin wrote:
> On Tue, Sep 29, 2015 at 02:46:16PM -0700, Stephen Hemminger wrote:
>> On Tue, 29 Sep 2015 23:54:54 +0300
>> "Michael S. Tsirkin"  wrote:
>>
>>> On Tue, Sep 29, 2015 at 07:41:09PM +0300, Vlad Zolotarov wrote:
 The security breach motivation u brought in "[RFC PATCH] uio:
 uio_pci_generic: Add support for MSI interrupts" thread seems a bit weak
 since one u let the userland access to the bar it may do any funny thing
 using the DMA engine of the device. This kind of stuff should be prevented
 using the iommu and if it's enabled then any funny tricks using MSI/MSI-X
 configuration will be prevented too.

 I'm about to send the patch to main Linux mailing list. Let's continue this
 discussion there.

>>> Basically UIO shouldn't be used with devices capable of DMA.
>>> Use VFIO for that (yes, this implies an emulated or PV IOMMU).

If there is an IOMMU in the picture there shouldn't be any problem to 
use UIO with DMA capable devices.

>>> I don't think this can change.
>> Given there is no PV IOMMU and even if there was it would be too slow for 
>> DPDK
>> use, I can't accept that.
> QEMU does allow emulating an iommu.

Amazon's EC2 xen HV doesn't. At least today. Therefore VFIO is not an 
option there. And again, it's a general issue not DPDK specific.
Today one has to develop some proprietary modules (like igb_uio) to 
workaround the issue and this is lame. IMHO uio_pci_generic should
be fixed to be able to properly work within any virtualized environment 
and not only with KVM.



>   DPDK uses static mappings, so I
> doubt it's speed matters at all.
>
> Anyway, DPDK is doing polling all the time. I don't see why does it
> insist on using interrupts to detect link up events. Just poll for that
> too.
>



[dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function

2015-09-30 Thread Chen, Jing D
Hi, Bruce,

> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, September 29, 2015 10:23 PM
> To: Ananyev, Konstantin
> Cc: Chen, Jing D; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> 
> On Tue, Sep 29, 2015 at 01:14:26PM +, Ananyev, Konstantin wrote:
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chen Jing
> > > D(Mark)
> > > Sent: Tuesday, September 29, 2015 2:04 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> > >
> > > From: "Chen Jing D(Mark)" 
> > >
> > > Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
> > > includes possible chained packets.
> > > Add func fm10k_recv_pkts_vec to receive single mbuf packet.
> > >
> > > Signed-off-by: Chen Jing D(Mark) 
> > > ---
> > >  drivers/net/fm10k/fm10k.h  |1 +
> > >  drivers/net/fm10k/fm10k_rxtx_vec.c |  213
> > > 
> > >  2 files changed, 214 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > > index d924cae..285254f 100644
> > > --- a/drivers/net/fm10k/fm10k.h
> > > +++ b/drivers/net/fm10k/fm10k.h
> > > @@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
> > >   uint16_t nb_pkts);
> > >
> > >  int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> > > +uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> > >  #endif
> > > diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > index 581a309..63b34b5 100644
> > > --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > > @@ -281,3 +281,216 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > >   /* Update the tail pointer on the NIC */
> > >   FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);  }
> > > +
> > > +/*
> > > + * vPMD receive routine, now only accept (nb_pkts ==
> > > +RTE_IXGBE_VPMD_RX_BURST)
> > > + * in one loop
> > > + *
> > > + * Notice:
> > > + * - nb_pkts < RTE_IXGBE_VPMD_RX_BURST, just return no packet
> 
> Why this limitation? I believe this limitation has already been removed for
> ixgbe, so the same solution should be applicable here
> 
> /Bruce

Thanks! I'll change it accordingly.  


[dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function

2015-09-30 Thread Chen, Jing D


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, September 29, 2015 9:14 PM
> To: Chen, Jing D; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 06/14] fm10k: add Vector RX function
> 
> 
> > +   /* A. load 4 packet in one loop
> > +* [A*. mask out 4 unused dirty field in desc]
> > +* B. copy 4 mbuf point from swring to rx_pkts
> > +* C. calc the number of DD bits among the 4 packets
> > +* [C*. extract the end-of-packet bit, if requested]
> > +* D. fill info. from desc to mbuf
> > +*/
> > +   for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
> > +   pos += RTE_FM10K_DESCS_PER_LOOP,
> > +   rxdp += RTE_FM10K_DESCS_PER_LOOP) {
> > +   __m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
> > +   __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> > +   __m128i zero, staterr, sterr_tmp1, sterr_tmp2;
> > +   __m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg.
> */
> > +
> > +   if (split_packet) {
> > +   rte_prefetch0(_pkts[pos]->cacheline1);
> > +   rte_prefetch0(_pkts[pos + 1]->cacheline1);
> > +   rte_prefetch0(_pkts[pos + 2]->cacheline1);
> > +   rte_prefetch0(_pkts[pos + 3]->cacheline1);
> > +   }
> 
> 
> Same thing as with i40e vPMD:
> You are pretching junk addreses here.
> Check out Zoltan's patch:
> http://dpdk.org/dev/patchwork/patch/7190/
> and related conversation:
> http://dpdk.org/ml/archives/dev/2015-September/023715.html
> I think there is the same issue here.
> Konstantin
> 

Thanks for the comments, Konstantin!  I'll check the material you referred to.



[dpdk-dev] [PATCH 4/4] test: Add perf test for ring pmd

2015-09-30 Thread Bruce Richardson
Add a performance test for ring pmd, comparing performance of the pmd
compared to the basic rte_ring APIs.

Signed-off-by: Bruce Richardson 
---
 app/test/Makefile |   1 +
 app/test/test_pmd_ring_perf.c | 188 ++
 2 files changed, 189 insertions(+)
 create mode 100644 app/test/test_pmd_ring_perf.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 294618f..14fa502 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -141,6 +141,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += 
test_link_bonding_mode4.c
 endif

 SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c

 CFLAGS += -O3
diff --git a/app/test/test_pmd_ring_perf.c b/app/test/test_pmd_ring_perf.c
new file mode 100644
index 000..3077dba
--- /dev/null
+++ b/app/test/test_pmd_ring_perf.c
@@ -0,0 +1,188 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define RING_NAME "RING_PERF"
+#define RING_SIZE 4096
+#define MAX_BURST 32
+
+/*
+ * the sizes to enqueue and dequeue in testing
+ * (marked volatile so they won't be seen as compile-time constants)
+ */
+static const volatile unsigned bulk_sizes[] = { 1, 8, 32 };
+
+/* The ring structure used for tests */
+static struct rte_ring *r;
+static uint8_t ring_ethdev_port;
+
+/* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles 
*/
+static void
+test_empty_dequeue(void)
+{
+   const unsigned iter_shift = 26;
+   const unsigned iterations = 1 << iter_shift;
+   unsigned i = 0;
+   void *burst[MAX_BURST];
+
+   const uint64_t sc_start = rte_rdtsc();
+   for (i = 0; i < iterations; i++)
+   rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+   const uint64_t sc_end = rte_rdtsc();
+
+   const uint64_t eth_start = rte_rdtsc();
+   for (i = 0; i < iterations; i++)
+   rte_eth_rx_burst(ring_ethdev_port, 0, (void *)burst,
+   bulk_sizes[0]);
+   const uint64_t eth_end = rte_rdtsc();
+
+   printf("Ring empty dequeue  : %.1F\n",
+   (double)(sc_end - sc_start) / iterations);
+   printf("Ethdev empty dequeue: %.1F\n",
+   (double)(eth_end - eth_start) / iterations);
+}
+
+/*
+ * Test function that determines how long an enqueue + dequeue of a single item
+ * takes on a single lcore. Result is for comparison with the bulk enq+deq.
+ */
+static void
+test_single_enqueue_dequeue(void)
+{
+   const unsigned iter_shift = 24;
+   const unsigned iterations = 1 << iter_shift;
+   unsigned i = 0;
+   void *burst = NULL;
+   struct rte_mbuf *mburst[1] = { NULL };
+
+   const uint64_t sc_start = rte_rdtsc_precise();
+   rte_compiler_barrier();
+   for (i = 0; i < iterations; i++) {
+   rte_ring_enqueue_bulk(r, , 1);
+   rte_ring_dequeue_bulk(r, , 1);
+   }
+   const uint64_t sc_end = rte_rdtsc_precise();
+   rte_compiler_barrier();
+
+   const uint64_t eth_start = rte_rdtsc_precise();
+   rte_compiler_barrier();
+   for (i = 0; i < iterations; i++) {
+   

[dpdk-dev] [PATCH 3/4] ring: add rte_eth_from_ring function

2015-09-30 Thread Bruce Richardson
Add a one-parameter function to take an existing rte_ring and wrap it as
an ethdev, returning the port id of the new ethdev instance.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ring/rte_eth_ring.c   |  8 
 drivers/net/ring/rte_eth_ring.h   | 14 ++
 drivers/net/ring/rte_eth_ring_version.map |  5 +
 3 files changed, 27 insertions(+)

diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index bfd6f4e..d851f9e 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -392,6 +393,13 @@ error:
return -1;
 }

+int
+rte_eth_from_ring(struct rte_ring *r)
+{
+   return rte_eth_from_rings(r->name, , 1, , 1,
+   r->memzone ? r->memzone->socket_id : SOCKET_ID_ANY);
+}
+
 enum dev_action{
DEV_CREATE,
DEV_ATTACH
diff --git a/drivers/net/ring/rte_eth_ring.h b/drivers/net/ring/rte_eth_ring.h
index 5a69bff..4ff83ec 100644
--- a/drivers/net/ring/rte_eth_ring.h
+++ b/drivers/net/ring/rte_eth_ring.h
@@ -65,6 +65,20 @@ int rte_eth_from_rings(const char *name,
const unsigned nb_tx_queues,
const unsigned numa_node);

+/**
+ * Create a new ethdev port from a ring
+ *
+ * This function is a shortcut call for rte_eth_from_rings for the
+ * case where one wants to take a single rte_ring and use it as though
+ * it were an ethdev
+ *
+ * @param ring
+ *the ring to be used as an ethdev
+ * @return
+ *the port number of the newly created ethdev, or -1 on error
+ */
+int rte_eth_from_ring(struct rte_ring *r);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/ring/rte_eth_ring_version.map 
b/drivers/net/ring/rte_eth_ring_version.map
index 0875e25..8e113a4 100644
--- a/drivers/net/ring/rte_eth_ring_version.map
+++ b/drivers/net/ring/rte_eth_ring_version.map
@@ -5,3 +5,8 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.2 {
+   global:
+   rte_eth_from_ring;
+} DPDK_2.0;
-- 
2.4.3



[dpdk-dev] [PATCH 2/4] rte_ring: store memzone pointer inside ring

2015-09-30 Thread Bruce Richardson
Add a new field to the rte_ring structure to store the memzone pointer which
contains the ring. For rings created using rte_ring_create(), the field will
be set automatically.

This new field will allow users of the ring to query the numa node a ring is
allocated on, or to get the physical address of the ring, if so needed.

The rte_ring structure will also maintain ABI compatibility, as the
structure members, after the new one, are set to be cache line aligned,
so leaving a space.

Signed-off-by: Bruce Richardson 
---
 lib/librte_ring/rte_ring.c | 1 +
 lib/librte_ring/rte_ring.h | 4 
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index c9e59d4..4e78e14 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -196,6 +196,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
rte_ring_init(r, name, count, flags);

te->data = (void *) r;
+   r->memzone = mz;

TAILQ_INSERT_TAIL(ring_list, te, next);
} else {
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index af6..df45f3f 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -134,6 +134,8 @@ struct rte_ring_debug_stats {
 *   if RTE_RING_PAUSE_REP not defined. */
 #endif

+struct rte_memzone; /* forward declaration, so as not to require memzone.h */
+
 /**
  * An RTE ring structure.
  *
@@ -147,6 +149,8 @@ struct rte_ring_debug_stats {
 struct rte_ring {
char name[RTE_RING_NAMESIZE];/**< Name of the ring. */
int flags;   /**< Flags supplied at creation. */
+   const struct rte_memzone *memzone;
+   /**< Memzone, if any, containing the rte_ring */

/** Ring producer status. */
struct prod {
-- 
2.4.3



[dpdk-dev] [PATCH 1/4] ring: enhance rte_eth_from_rings

2015-09-30 Thread Bruce Richardson
The ring ethdev creation function creates an ethdev, but does not
actually set it up for use. Even if it's just a single ring, the user
still needs to create a mempool, call rte_eth_dev_configure, then call
rx and tx setup functions before the ethdev can be used.

This patch changes things so that the ethdev is fully set up after the
call to create the ethdev. The above-mentionned functions can still be
called - as will be the case, for instance, if the NIC is created via
commandline parameters - but they no longer are essential.

The function now also sets rte_errno appropriately on error, so the
caller can get a better indication of why a call may have failed.

Signed-off-by: Bruce Richardson 
---
 drivers/net/ring/rte_eth_ring.c | 47 +++--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index 0ba36d5..bfd6f4e 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #define ETH_RING_NUMA_NODE_ACTION_ARG  "nodeaction"
 #define ETH_RING_ACTION_CREATE "CREATE"
@@ -276,10 +277,18 @@ rte_eth_from_rings(const char *name, struct rte_ring 
*const rx_queues[],
unsigned i;

/* do some parameter checking */
-   if (rx_queues == NULL && nb_rx_queues > 0)
+   if (rx_queues == NULL && nb_rx_queues > 0) {
+   rte_errno = EINVAL;
goto error;
-   if (tx_queues == NULL && nb_tx_queues > 0)
+   }
+   if (tx_queues == NULL && nb_tx_queues > 0) {
+   rte_errno = EINVAL;
+   goto error;
+   }
+   if (nb_rx_queues > RTE_PMD_RING_MAX_RX_RINGS) {
+   rte_errno = EINVAL;
goto error;
+   }

RTE_LOG(INFO, PMD, "Creating rings-backed ethdev on numa socket %u\n",
numa_node);
@@ -288,21 +297,43 @@ rte_eth_from_rings(const char *name, struct rte_ring 
*const rx_queues[],
 * and internal (private) data
 */
data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
-   if (data == NULL)
+   if (data == NULL) {
+   rte_errno = ENOMEM;
goto error;
+   }
+
+   data->rx_queues = rte_zmalloc_socket(name, sizeof(void *) * 
nb_rx_queues,
+   0, numa_node);
+   if (data->rx_queues == NULL) {
+   rte_errno = ENOMEM;
+   goto error;
+   }
+
+   data->tx_queues = rte_zmalloc_socket(name, sizeof(void *) * 
nb_tx_queues,
+   0, numa_node);
+   if (data->tx_queues == NULL) {
+   rte_errno = ENOMEM;
+   goto error;
+   }

pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
-   if (pci_dev == NULL)
+   if (pci_dev == NULL) {
+   rte_errno = ENOMEM;
goto error;
+   }

internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
-   if (internals == NULL)
+   if (internals == NULL) {
+   rte_errno = ENOMEM;
goto error;
+   }

/* reserve an ethdev entry */
eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
-   if (eth_dev == NULL)
+   if (eth_dev == NULL) {
+   rte_errno = ENOSPC;
goto error;
+   }


/* now put it all together
@@ -318,9 +349,11 @@ rte_eth_from_rings(const char *name, struct rte_ring 
*const rx_queues[],
internals->nb_tx_queues = nb_tx_queues;
for (i = 0; i < nb_rx_queues; i++) {
internals->rx_ring_queues[i].rng = rx_queues[i];
+   data->rx_queues[i] = >rx_ring_queues[i];
}
for (i = 0; i < nb_tx_queues; i++) {
internals->tx_ring_queues[i].rng = tx_queues[i];
+   data->tx_queues[i] = >tx_ring_queues[i];
}

rte_ring_pmd.pci_drv.name = ring_ethdev_driver_name;
@@ -350,6 +383,8 @@ rte_eth_from_rings(const char *name, struct rte_ring *const 
rx_queues[],
return data->port_id;

 error:
+   rte_free(data->rx_queues);
+   rte_free(data->tx_queues);
rte_free(data);
rte_free(pci_dev);
rte_free(internals);
-- 
2.4.3



[dpdk-dev] [PATCH 0/4] eth_ring: perf test and usability improvements

2015-09-30 Thread Bruce Richardson
This patchset makes it easier to create ring pmd instances from code, by
providing a simple ring->ethdev wrapper function and also ensuring that
any created rings are ready for use immediately, without having to call
configure and rx/tx queue setup.

This set also contains a set of unit tests to compare the performance of
basic ring operations against the same operations via the ring ethdev.
This shows how the perf penalty can be significant for small bursts, but
is much less so for larger bursts of 32 packets.

Bruce Richardson (4):
  ring: enhance rte_eth_from_rings
  rte_ring: store memzone pointer inside ring
  ring: add rte_eth_from_ring function
  test: Add perf test for ring pmd

 app/test/Makefile |   1 +
 app/test/test_pmd_ring_perf.c | 188 ++
 drivers/net/ring/rte_eth_ring.c   |  55 -
 drivers/net/ring/rte_eth_ring.h   |  14 +++
 drivers/net/ring/rte_eth_ring_version.map |   5 +
 lib/librte_ring/rte_ring.c|   1 +
 lib/librte_ring/rte_ring.h|   4 +
 7 files changed, 262 insertions(+), 6 deletions(-)
 create mode 100644 app/test/test_pmd_ring_perf.c

-- 
2.4.3



[dpdk-dev] Is there any example application to used DPDK packet distributor library?

2015-09-30 Thread Bruce Richardson
On Wed, Sep 30, 2015 at 08:41:04PM +0900, ??? wrote:
>  Dear Bruce Richardson and DPDK experts.
>  
> Thank you very much for your precious answer.
>  
> I found it. It seems very short and simple.
>  
> Thank you very much.
>  
> I have another question.
>  
> I don't know how the following steps work from new_tag to match variables.
>  
> /* in dpdk library. ~/dpdk-?.?.?/lib/librte_distributor/rte_distributor.c */
> /* process a set of packets to distribute them to workers */
> rte_distributor_process(struct rte_distributor *d, struct rte_mbuf **mbufs, 
> unsigned num_mbufs)
> {
> ...
>  new_tag = next_mb-hash.usr;  /* flow ID hash.usr is set by NIC */
>  
>  for (i = 0; i  d-num_workers; i++)
>   match |= (!(d-in_flight_tags[i] ^ new_tag)  i);
>  
>  /* Only turned-on bits are considered as match */
>  match = d-in_flight_bitmask;
>  
>  unsigned worker = __builtin_ctzl(match);
> ...
> }
>  
> I will appreciate if you let me know the steps.

We build up a bitmask for each worker, where the bit is set of the new_tag
matches the inflight tag for the worker. We then find the matching worker, if
any using count-training-zeros (ctz) operation.

/Bruce

>  
> Thank you very much.
>  
> Sincerely Yours,
>  
> Ick-Sung Choi.
>  
>  
> -Original Message-
> From: "Bruce Richardson"bruce.richardson at intel.com 
> To: "???"pnk003 at naver.com; 
> Cc: dev at dpdk.org; 
> Sent: 2015-09-30 (?) 19:56:28
> Subject: Re: [dpdk-dev] Is there any example application to used DPDK packet 
> distributor library?
>  
> On Wed, Sep 30, 2015 at 02:45:20PM +0900, ??? wrote:
>  Dear DPDK experts.
>   
>  I am Ick-Sung Choi living in South Korea.
>   
>  I have a question about DPDK? packet distributor library.
>   
>  Is there any example application to used DPDK packet distributor library?
>   
>  I am trying to experiment simple function using DPDK packet distributor 
> library.
>   
>  If I can study an example application of DPDK packet distributor 
> library, it would be very helpful for my experiment.
>   
>  I will appreciate if I can be given any example applications, advice, 
> and information.
>   
>  Thank you very much.
>   
>  Sincerely Yours,
>   
>  Ick-Sung Choi.
>   
> Hi,
> 
> there is a "distributor" example app in the examples directory.
> 
> /Bruce
> 


[dpdk-dev] [PATCH v2] Move rte_mbuf macros to common header file

2015-09-30 Thread Ravi Kerur
Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
are defined in each PMD driver file. Move those macros into common
lib/librte_mbuf/rte_mbuf.h file. PMD drivers include rte_mbuf.h
file directly/indirectly hence no additionl header file inclusion
is necessary.

v2:
> Changed both macros to inline functions in all PMD
> Changed macro to rte_pktmbuf_mtod in xenvirt module

v1:
> Move macros into common rte_mbuf header file.

Compiled for:
> x86_64-native-linuxapp-clang
> x86_64-native-linuxapp-gcc
> i686-native-linuxapp-gcc
> x86_64-native-bsdapp-gcc
> x86_64-native-bsdapp-clang

Tested on:
> x86_64 Ubuntu 14.04, testpmd and 'make test'
> FreeBSD 10.1, testpmd

Signed-off-by: Ravi Kerur 
---
 drivers/net/bnx2x/bnx2x.h  |  3 ---
 drivers/net/cxgbe/sge.c|  3 ---
 drivers/net/e1000/em_rxtx.c|  6 --
 drivers/net/e1000/igb_rxtx.c   |  6 --
 drivers/net/i40e/i40e_rxtx.c   |  6 --
 drivers/net/ixgbe/ixgbe_rxtx.h |  6 --
 drivers/net/virtio/virtqueue.h |  3 ---
 drivers/net/vmxnet3/vmxnet3_rxtx.c |  6 --
 drivers/net/xenvirt/virtqueue.h|  5 +
 lib/librte_mbuf/rte_mbuf.h | 10 ++
 10 files changed, 11 insertions(+), 43 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
index 867b92a..28bd83f 100644
--- a/drivers/net/bnx2x/bnx2x.h
+++ b/drivers/net/bnx2x/bnx2x.h
@@ -141,9 +141,6 @@ struct bnx2x_device_type {
char *bnx2x_name;
 };

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   ((uint64_t)((mb)->buf_physaddr + (mb)->data_off))
-
 #define BNX2X_PAGE_SHIFT   12
 #define BNX2X_PAGE_SIZE(1 << BNX2X_PAGE_SHIFT)
 #define BNX2X_PAGE_MASK(~(BNX2X_PAGE_SIZE - 1))
diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 6eb1244..8f4c025 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -1267,9 +1267,6 @@ static struct rte_mbuf *t4_pktgl_to_mbuf(const struct 
pkt_gl *gl)
return t4_pktgl_to_mbuf_usembufs(gl);
 }

-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   ((dma_addr_t) ((mb)->buf_physaddr + (mb)->data_off))
-
 /**
  * t4_ethrx_handler - process an ingress ethernet packet
  * @q: the response queue that received the packet
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 3b8776d..c7d97c1 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -88,12 +88,6 @@ rte_rxmbuf_alloc(struct rte_mempool *mp)
return (m);
 }

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   (uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
-
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 19905fd..a217cea 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -88,12 +88,6 @@ rte_rxmbuf_alloc(struct rte_mempool *mp)
return (m);
 }

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   (uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
-
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index fd656d5..5ba6d27 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -78,12 +78,6 @@
PKT_TX_L4_MASK | \
PKT_TX_OUTER_IP_CKSUM)

-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
-
-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   ((uint64_t)((mb)->buf_physaddr + (mb)->data_off))
-
 static const struct rte_memzone *
 i40e_ring_dma_zone_reserve(struct rte_eth_dev *dev,
   const char *ring_name,
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index b9eca67..dbb9f00 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -40,12 +40,6 @@

 #define RTE_IXGBE_DESCS_PER_LOOP4

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   (uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
-#define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
-   (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
-
 #ifdef RTE_IXGBE_INC_VECTOR
 #define RTE_IXGBE_RXQ_REARM_THRESH  32
 #define RTE_IXGBE_MAX_RX_BURST  RTE_IXGBE_RXQ_REARM_THRESH
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..9ea9b96 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -68,9 +68,6 @@ struct rte_mbuf;

 #define VIRTQUEUE_MAX_NAME_SZ 32

-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-   (uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
 #define VTNET_SQ_RQ_QUEUE_IDX 0

[dpdk-dev] [PATCH v2] Change rte_eal_vdev_init to update port_id

2015-09-30 Thread Ravi Kerur
Hi Tetsuya,


On Mon, Sep 28, 2015 at 8:32 PM, Tetsuya Mukawa  wrote:

> On 2015/09/24 6:22, Ravi Kerur wrote:
> > Hi David, Tetsuya,
> >
> > I have sent V3 (changes isolated to rte_ether component) for formal
> review.
> > Please look into it and let me know your inputs.
>
> Hi Ravi,
>
> I've checked the patch.
> I guess your patch is good.
>
> >
> > @David: I looked at "rte_eth_dev_get_port_by_name()", this function is
> > similar to "rte_eth_dev_get_name_by_port" and I have used same logic. Let
> > me know if this not correct I can fix both.
>
> Do you comment about rte_eth_dev_get_port_by_name and
> rte_eth_dev_get_port_by_addr?
> If so, I guess we don't need to merge.
>
>
I just mentioned that new functions are using same logic as existing
function.

Thanks,
Ravi


> > Thanks,
> > Ravi
> >
> >
> > On Tue, Sep 15, 2015 at 4:28 AM, Ravi Kerur  wrote:
> >
> >> Hi David,
> >>
> >>
> >> On Thu, Sep 3, 2015 at 7:04 AM, David Marchand <
> david.marchand at 6wind.com>
> >> wrote:
> >>
> >>> Hello Ravi, Tetsuya,
> >>>
> >>> On Tue, Aug 25, 2015 at 7:59 PM, Ravi Kerur  wrote:
> >>>
>  Let us know how you want us to fix this? To fix rte_eal_vdev_init and
>  rte_eal_pci_probe_one to return allocated port_id we had 2 approaches
>  mentioned in earlier discussion. In addition to those we have another
>  approach with changes isolated only to rte_ether component. I am
> attaching
>  diffs (preliminary) with this email. Please let us know your inputs
> since
>  it involves EAL component.
> 
> >>> - This patch looks like a good ethdev cleanup (even if it really lacks
> >>> some context / commit log).
> >>>
> >>> I wonder just why you only take the first part of the name in
> >>> rte_eth_dev_get_port_by_name().
> >>> Would not this match, let's say, both toto and toto0 vdevs ?
> >>> Is this intended ?
> >>>
> >>> It was not intended, i will look into it.
> >>> - In the end, with this patch, do we still need to update eal ?
> >>> Looking at the code, I am not sure anymore.
> >>>
> >> Approach 3 (preliminary diffs sent as an attachment) doesn't involve EAL
> >> but the other two solutions do. So please let us know which one you
> prefer.
> >> I will send updated patch.
> >>
> >> Thanks,
> >> Ravi
> >>
> >>
> >>>
> >>>
> >>> --
> >>> David Marchand
> >>>
> >>
>
>


[dpdk-dev] [PATCH v1] Move rte_mbuf macros to common header file

2015-09-30 Thread Ravi Kerur
Thanks Konstantin. I will send out v2 shortly.

On Tue, Sep 29, 2015 at 2:55 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
> Hi Ravi,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur
> > Sent: Saturday, September 26, 2015 3:47 AM
> > To: Stephen Hemminger; Olivier Matz
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] Move rte_mbuf macros to common header
> file
> >
> > On Thu, Sep 24, 2015 at 4:25 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Thu, 24 Sep 2015 15:50:41 -0700
> > > Ravi Kerur  wrote:
> > >
> > > > Macros RTE_MBUF_DATA_DMA_ADDR and RTE_MBUF_DATA_DMA_ADDR_DEFAULT
> > > > are defined in each PMD driver file. Move those macros into common
> > > > lib/librte_mbuf/rte_mbuf.h file. All PMD drivers include rte_mbuf.h
> > > > file directly/indirectly hence no additionl header file inclusion
> > > > is necessary.
> > > >
> > > > Compiled for:
> > > > > x86_64-native-linuxapp-clang
> > > > > x86_64-native-linuxapp-gcc
> > > > > i686-native-linuxapp-gcc
> > > > > x86_64-native-bsdapp-gcc
> > > > > x86_64-native-bsdapp-clang
> > > >
> > > > Tested on:
> > > > > x86_64 Ubuntu 14.04, testpmd and 'make test'
> > > > > FreeBSD 10.1, testpmd
> > > >
> > > > Signed-off-by: Ravi Kerur 
> > >
> > > I like the idea, should have been done long ago.
> > >
> > > My only gripe is that you should do this as inline functions
> > > rather than macros. Inline functions are type safe, macros are not.
> > >
> >
> > Agreed. However, I see another variation of the macro, users are
> primarily
> > from "app" directory and lone user from drivers/net/xenvirt/virtqueue.h
> >
> > #define RTE_MBUF_DATA_DMA_ADDR(mb) \
> > rte_pktmbuf_mtod(mb, uint64_t)
>
>
> As I can see, it is used only in one place inside xenvirt:
>
> drivers/net/xenvirt/virtqueue.h:start_dp[idx].addr  =
> RTE_MBUF_DATA_DMA_ADDR(cookie);
>
> So we probably can remove that macro definition here and use
> rte_pktmbuf_mtod(mb, uint64_t) directly.
>
> Konstantin
>
> >
> > #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
> >
> > #define rte_pktmbuf_mtod_offset(m, t, o)\
> > ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
> >
> > Let me know should I still go ahead and do inline variation for drivers
> or
> > use above macro?
>


[dpdk-dev] Is there any example application to used DPDK packet distributor library?

2015-09-30 Thread Bruce Richardson
On Wed, Sep 30, 2015 at 02:45:20PM +0900, ??? wrote:
> Dear DPDK experts.
>  
> I am Ick-Sung Choi living in South Korea.
>  
> I have a question about DPDK? packet distributor library.
>  
> Is there any example application to used DPDK packet distributor library?
>  
> I am trying to experiment simple function using DPDK packet distributor 
> library.
>  
> If I can study an example application of DPDK packet distributor library, it 
> would be very helpful for my experiment.
>  
> I will appreciate if I can be given any example applications, advice, and 
> information.
>  
> Thank you very much.
>  
> Sincerely Yours,
>  
> Ick-Sung Choi.
>  
Hi,

there is a "distributor" example app in the examples directory.

/Bruce


[dpdk-dev] [PATCH 02/20] librte_ether: add fields from rte_pci_driver to rte_eth_dev_data

2015-09-30 Thread Bruce Richardson
On Tue, Sep 29, 2015 at 03:08:12PM -0400, Neil Horman wrote:
> On Mon, Sep 28, 2015 at 02:03:20PM +0100, Bernard Iremonger wrote:
> > add dev_flags to rte_eth_dev_data, add macros for dev_flags.
> > add kdrv to rte_eth_dev_data.
> > add numa_node to rte_eth_dev_data.
> > add drv_name to rte_eth_dev_data.
> > use dev_type to distinguish between vdev's and pdev's.
> > remove pci_dev branches.
> > 
> > Signed-off-by: Bernard Iremonger 
> > ---
> >  lib/librte_ether/rte_ethdev.c | 53 
> > ---
> >  lib/librte_ether/rte_ethdev.h | 15 
> >  2 files changed, 45 insertions(+), 23 deletions(-)
> > 

> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1635,8 +1635,23 @@ struct rte_eth_dev_data {
> > all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
> > dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). 
> > */
> > lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
> > +   uint32_t dev_flags; /**< Flags controlling handling of device. */
> > +   enum rte_kernel_driver kdrv;/**< Kernel driver passthrough */
> Why add this here? The ennumerated driver types are all variants on PCI bus
> types.  Not sure why the ethernet interface needs to know this info
> 
> > +   int numa_node;
> Ditto, this seems like information that is only relevant if the device is on a
> physical bus (i.e. virual devices are likely to not have a numa node)
>
Actually, I disagree. For some virtual devices they will have a numa node. For
ring or other virtual PMDs the numa node will be the node on which the ring /
mempool etc. memory is allocated on, and can be of relevance.

/Bruce


[dpdk-dev] [PATCH 2/2] virtio: change io privilege level as early as possible

2015-09-30 Thread Neil Horman
On Wed, Sep 30, 2015 at 10:28:53AM +0200, David Marchand wrote:
> On Tue, Sep 29, 2015 at 9:25 PM, Stephen Hemminger <
> stephen at networkplumber.org> wrote:
> 
> > On Tue, 10 Mar 2015 09:14:28 -0400
> > Neil Horman  wrote:
> >
> > >
> > > I don't see how this works for all cases.  The constructor is called
> > once when
> > > the library is first loaded.  What if you have multiple independent
> > (i.e. not
> > > forked children) processes that are using the dpdk in parallel?  Only the
> > > process that triggered the library load will have io permissions set
> > > appropriately.  I think what you need is to have every application that
> > expects
> > > to call through the transmit path or poll the receive path call iopl,
> > which I
> > > think speaks to having this requirement documented, so each application
> > can call
> > > iopl prior to calling fork/daemonize/etc.
> > >
> >
> > I am still seeing this problem with DPDK 2.0 and 2.1.
> > It seems to me that doing the iopl init in eal_init is the only safe way.
> > Other workaround is to have application calling iopl_init before eal_init
> > but that kind of violates the current method of all things being
> > initialized
> > by eal_init
> >
> 
> Putting it in the virtio pmd constructor is my preferred solution and we
> don't need to pollute the eal for virtio (specific to x86, btw).
> 

Preferred solution or not, you can't just call iopl from the constructor,
because not all process will get appropriate permissions.  It needs to be called
by every process.  What Stephen is saying is that your solution has use cases
for which it doesn't work, and that needs to be solved.
Neil

> 
> -- 
> David Marchand


[dpdk-dev] [PATCH v5 4/4] bond mode 4: tests for external state machine

2015-09-30 Thread Eric Kinzie
From: Eric Kinzie 

  This adds test cases for exercising the external state machine API to
  the mode 4 autotest.

Signed-off-by: Eric Kinzie 
---
 app/test/test_link_bonding_mode4.c |  210 ++--
 1 file changed, 201 insertions(+), 9 deletions(-)

diff --git a/app/test/test_link_bonding_mode4.c 
b/app/test/test_link_bonding_mode4.c
index d785393..6a459cd 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -151,6 +151,8 @@ static struct rte_eth_conf default_pmd_conf = {
.lpbk_mode = 0,
 };

+static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
+
 #define FOR_EACH(_i, _item, _array, _size) \
for (_i = 0, _item = &_array[0]; _i < _size && (_item = &_array[_i]); 
_i++)

@@ -320,8 +322,26 @@ remove_slave(struct slave_conf *slave)
return 0;
 }

+static void
+lacp_recv_cb(uint8_t slave_id, struct rte_mbuf *lacp_pkt)
+{
+   struct ether_hdr *hdr;
+   struct slow_protocol_frame *slow_hdr;
+
+   RTE_VERIFY(lacp_pkt != NULL);
+
+   hdr = rte_pktmbuf_mtod(lacp_pkt, struct ether_hdr *);
+   RTE_VERIFY(hdr->ether_type == rte_cpu_to_be_16(ETHER_TYPE_SLOW));
+
+   slow_hdr = rte_pktmbuf_mtod(lacp_pkt, struct slow_protocol_frame *);
+   RTE_VERIFY(slow_hdr->slow_protocol.subtype == SLOW_SUBTYPE_LACP);
+
+   lacpdu_rx_count[slave_id]++;
+   rte_pktmbuf_free(lacp_pkt);
+}
+
 static int
-initialize_bonded_device_with_slaves(uint8_t slave_count, uint8_t start)
+initialize_bonded_device_with_slaves(uint8_t slave_count, uint8_t external_sm)
 {
uint8_t i;

@@ -337,9 +357,17 @@ initialize_bonded_device_with_slaves(uint8_t slave_count, 
uint8_t start)
rte_eth_bond_8023ad_setup(test_params.bonded_port_id, NULL);
rte_eth_promiscuous_disable(test_params.bonded_port_id);

-   if (start)
-   
TEST_ASSERT_SUCCESS(rte_eth_dev_start(test_params.bonded_port_id),
-   "Failed to start bonded device");
+   if (external_sm) {
+   struct rte_eth_bond_8023ad_conf conf;
+
+   rte_eth_bond_8023ad_conf_get(test_params.bonded_port_id, );
+   conf.slowrx_cb = lacp_recv_cb;
+   rte_eth_bond_8023ad_setup(test_params.bonded_port_id, );
+
+   }
+
+   TEST_ASSERT_SUCCESS(rte_eth_dev_start(test_params.bonded_port_id),
+   "Failed to start bonded device");

return TEST_SUCCESS;
 }
@@ -642,7 +670,7 @@ test_mode4_lacp(void)
 {
int retval;

-   retval = initialize_bonded_device_with_slaves(TEST_LACP_SLAVE_COUT, 1);
+   retval = initialize_bonded_device_with_slaves(TEST_LACP_SLAVE_COUT, 0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

/* Test LACP handshake function */
@@ -740,7 +768,7 @@ test_mode4_rx(void)
struct ether_addr dst_mac;
struct ether_addr bonded_mac;

-   retval = initialize_bonded_device_with_slaves(TEST_PROMISC_SLAVE_COUNT, 
1);
+   retval = initialize_bonded_device_with_slaves(TEST_PROMISC_SLAVE_COUNT, 
0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

retval = bond_handshake();
@@ -917,7 +945,7 @@ test_mode4_tx_burst(void)
struct ether_addr dst_mac = { { 0x00, 0xFF, 0x00, 0xFF, 0x00, 0x00 } };
struct ether_addr bonded_mac;

-   retval = initialize_bonded_device_with_slaves(TEST_TX_SLAVE_COUNT, 1);
+   retval = initialize_bonded_device_with_slaves(TEST_TX_SLAVE_COUNT, 0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

retval = bond_handshake();
@@ -1101,7 +1129,7 @@ test_mode4_marker(void)
uint8_t i, j;
const uint16_t ethtype_slow_be = rte_be_to_cpu_16(ETHER_TYPE_SLOW);

-   retval = initialize_bonded_device_with_slaves(TEST_MARKER_SLAVE_COUT, 
1);
+   retval = initialize_bonded_device_with_slaves(TEST_MARKER_SLAVE_COUT, 
0);
TEST_ASSERT_SUCCESS(retval, "Failed to initialize bonded device");

/* Test LACP handshake function */
@@ -1186,7 +1214,7 @@ test_mode4_expired(void)

struct rte_eth_bond_8023ad_conf conf;

-   retval = initialize_bonded_device_with_slaves(TEST_EXPIRED_SLAVE_COUNT, 
1);
+   retval = initialize_bonded_device_with_slaves(TEST_EXPIRED_SLAVE_COUNT, 
0);
/* Set custom timeouts to make test last shorter. */
rte_eth_bond_8023ad_conf_get(test_params.bonded_port_id, );
conf.fast_periodic_ms = 100;
@@ -1268,6 +1296,156 @@ test_mode4_expired(void)
 }

 static int
+test_mode4_ext_ctrl(void)
+{
+   /*
+* configure bonded interface without the external sm enabled
+*   . try to transmit lacpdu (should fail)
+*   . try to set collecting and distributing flags (should fail)
+* reconfigure w/external sm
+*   . transmit one lacpdu on each slave using new api
+*   . make sure each slave receives one lacpdu using the callback api
+*   . 

  1   2   >