[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-04 Thread Yuanhan Liu
On Thu, May 05, 2016 at 03:29:44AM +, Xie, Huawei wrote:
> On 5/5/2016 11:03 AM, Yuanhan Liu wrote:
> > On Thu, May 05, 2016 at 01:54:25AM +, Xie, Huawei wrote:
> >> On 5/5/2016 7:59 AM, Yuanhan Liu wrote:
> >>> On Wed, May 04, 2016 at 08:50:27AM +0800, Huawei Xie wrote:
>  -int virtio_dev_queue_setup(struct rte_eth_dev *dev,
>  -int queue_type,
>  -uint16_t queue_idx,
>  +static int
>  +virtio_dev_cq_queue_setup(struct rte_eth_dev *dev,
> >>> While it's good to split Rx/Tx specific stuff, but why are you trying to
> >>> remove a common queue_setup function that does common setups, such as 
> >>> vring
> >>> memory allocation.
> >>>
> >>> This results to much duplicated code: following diff summary also shows
> >>> it clearly:
> >> The motivation to do this is we need separate RX/TX queue setup.
> > We actually have done that. If you look at current rx/tx/ctrl_queue_setup()
> > code, we invoked the common function; we also did some queue specific
> > settings. It has not been done in a very clean way though: there are quite
> > many "if .. else .." as you stated. And that's what you are going to 
> > resolve,
> > but IMO, you went far: you made __same__ code 3 copies, one for rx, tx and
> > ctrl queue, respectively.
> >
> >> The switch/case in the common queue setup looks bad.
> > Assuming you are talking about the "if .. else .." ...
> >
> > While I agree with you on that, introducing so many duplicated code is 
> > worse.
> >
> >> I am aware of the common operations, and i had planned to extract them,
> >> maybe i could do this in this patchset.
> > If you meant to do in another patch on top of this patch, then it looks
> > like the wrong way to go: breaking something first and then fixing it
> > later does not sound a good practice to me.
> 
> To your later comment, we could split first, then do the queue setup rework.

Well, if you insist, I'm Okay. But please don't do it in the way this
patch does, that introduces quite many duplicated codes.

--yliu


[dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned

2016-05-04 Thread Jerin Jacob
On Wed, May 04, 2016 at 01:53:39PM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Wednesday, May 4, 2016 2:43 PM
> > To: Richardson, Bruce 
> > Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> > aligned
> > 
> > On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:

Snip

> > >
> > > Hi Jerin,
> > 
> > Hi Bruce,
> > 
> > >
> > > have you seen a performance degradation due to ethdev elements sharing
> > > a cache
> > 
> > No. Not because of sharing the cache line.
> > 
> > > line? I ask because, surprisingly for me, I actually see a performance
> > > regression
> > 
> > I see performance degradation in PMD in my setup where independent changes
> > are causing the performance issue in PMD(~<100k). That's the reason I
> > thought making aligned cache line stuff where ever it makes sense so that
> > independent change shouldn't impact the PMD performance and this patch was
> > an initiative for the same.
> > 
> > > when I apply the above patch. It's not a big change - perf reduction
> > > of <1% - but still noticable across multiple runs using testpmd. I'm
> > > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > > E5-2699 v3 @ 2.30GHz]
> > 
> > This particular patch does not have any performance degradation in my
> > setup.
> > CPU: ThunderX
> 
> Ok, so I take it that this patch is performance neutral on your setup, then?
> If that's the case, can we hold off on merging it on the basis that it's not 
> needed and does cause a slight regression.

OK

Can you share the base dpdk.org upstream change set where this patch
creates the slight regression? I will test it the same on IA and
ThunderX platform.

In my testpmd build, rte_eth_devices(0x00751ef8) share the cache
line with inactive "notify_ops" that the reason for its not creating any 
benefit.
I guess the case would have been different if its active write location.

 COMMON 0x00751ef00x8
/home/jerin/dpdk-thunderx/build/lib/librte_vhost.a(virtio-net.o)
0x00751ef0notify_ops
 COMMON 0x00751ef80x80900
/home/jerin/dpdk-thunderx/build/lib/libethdev.a(rte_ethdev.o)
0x00751ef8rte_eth_devices

Thanks,
Jerin

> 
> Thanks,
> /Bruce


[dpdk-dev] [PATCH v3 2/2] virtio: fix memory leak of virtqueue memzones

2016-05-04 Thread Yuanhan Liu
ping...

On Thu, Apr 28, 2016 at 10:33:08PM -0700, Yuanhan Liu wrote:
> On Fri, Apr 29, 2016 at 12:48:46AM +, Jianfeng Tan wrote:
> > @@ -447,6 +453,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
> >  
> > hw->vtpci_ops->setup_queue(hw, vq);
> >  
> > +   vq->started = 1;
> 
> Judging that this is in the "_queue_setup" stage, and we have another
> stage called "_dev_start", naming it to "started" seems confusing then.
> 
> So, how about naming it to something like "configured"? Besides that,
> this patch set looks good to me. If you agree the name change, or have
> better idea, I could fix it while applying it.
> 
>   --yliu


[dpdk-dev] [PATCH v3 1/2] virtio: cleanup virtio_dev_queue_setup()

2016-05-04 Thread Yuanhan Liu
On Fri, Apr 29, 2016 at 12:48:45AM +, Jianfeng Tan wrote:
> + if (queue_type < VTNET_RQ || queue_type > VTNET_CQ) {
> + PMD_INIT_LOG(ERR, "invalid queue type: %d", queue_type);
> + return -EINVAL;
>   }

I'm thinking this check is not necessary. We can make sure it's a valid
queue type.

--yliu


[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-04 Thread Yuanhan Liu
On Thu, May 05, 2016 at 01:54:25AM +, Xie, Huawei wrote:
> On 5/5/2016 7:59 AM, Yuanhan Liu wrote:
> > On Wed, May 04, 2016 at 08:50:27AM +0800, Huawei Xie wrote:
> >> -int virtio_dev_queue_setup(struct rte_eth_dev *dev,
> >> -  int queue_type,
> >> -  uint16_t queue_idx,
> >> +static int
> >> +virtio_dev_cq_queue_setup(struct rte_eth_dev *dev,
> > While it's good to split Rx/Tx specific stuff, but why are you trying to
> > remove a common queue_setup function that does common setups, such as vring
> > memory allocation.
> >
> > This results to much duplicated code: following diff summary also shows
> > it clearly:
> 
> The motivation to do this is we need separate RX/TX queue setup.

We actually have done that. If you look at current rx/tx/ctrl_queue_setup()
code, we invoked the common function; we also did some queue specific
settings. It has not been done in a very clean way though: there are quite
many "if .. else .." as you stated. And that's what you are going to resolve,
but IMO, you went far: you made __same__ code 3 copies, one for rx, tx and
ctrl queue, respectively.

> The switch/case in the common queue setup looks bad.

Assuming you are talking about the "if .. else .." ...

While I agree with you on that, introducing so many duplicated code is worse.

> I am aware of the common operations, and i had planned to extract them,
> maybe i could do this in this patchset.

If you meant to do in another patch on top of this patch, then it looks
like the wrong way to go: breaking something first and then fixing it
later does not sound a good practice to me.

> >
> > 7 files changed, 655 insertions(+), 422 deletions(-)
> >
> > which makes it harder for maintaining.
> >
> >> -}
> >> +  rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq,
> >> +  sizeof(*vq) + vq_size * sizeof(struct vq_desc_extra));
> >> +  rxvq->vq = vq;
> >> +  vq->sw_ring = sw_ring;
> > sw_ring is needed for rx queue only, why not moving it to rx queue struct?
> 
> Actually this is not about sw_ring.
> I had planned to use sw_ring for both RX/TX and remove the vq_desc_extra.
> Two issues
> 1. RX uses both sw_ring and vq_desc_extra
> 2. ndescs in vq_desc_extra isn't really needed, we could simply
> calculate this when we walk through the desc chain, and in most cases,
> it is 1 or 2.
> 
> As it is not related to this rework, will do this in a separate patch.

Yes, it's not related to this patch, and this patch does rx/tx split
only. So, thinking that sw_ring is for rx only, you should move there.

It will not against with your plan; you can make corresponding change
there. But for this patch, let's do the split only.

BTW, I still would suggest you to build the patch on top of the cleanup
and memory leak fix patches from Jianfeng. Your patch won't apply on
top of current dpdk-next-virtio, and one way or another, you need do
a rebase.

Last, if I were you, I would split this patch in two: one to move
the queue specific settings to it's queue setup function, another
to split rx/tx fields. That would make it easier for review.

--yliu


[dpdk-dev] [PATCH] app/test: shorten execution time

2016-05-04 Thread Thomas Monjalon
The autotests are too long to be run often.
This patch reduces the needed time of some tests in fast_test.
The others will be analyzed when they will be able to run in a
VM with a reasonnable amount of memory.

The current status in a small VM is below:

> make fast_test
/root/dpdk/build/app/test -c f -n 4

Test name  Test result  Test
Total

Start group_1: Success   [00m 00s]
Timer autotest:Success   [00m 02s]
Debug autotest:Success   [00m 00s]
Errno autotest:Success   [00m 00s]
Meter autotest:Success   [00m 00s]
Common autotest:   Success   [00m 01s]
Dump log history:  Success   [00m 00s]
Dump rings:Success   [00m 00s]
Dump mempools: Success   [00m 00s] [00m 05s]
Start group_2: Success   [00m 00s]
Memory autotest:   Success   [00m 00s]
Read/write lock autotest:  Success   [00m 00s]
Logs autotest: Success   [00m 00s]
CPU flags autotest:Success   [00m 00s]
Version autotest:  Success   [00m 00s]
EAL filesystem autotest:   Success   [00m 00s]
EAL flags autotest:Success   [00m 05s]
Hash autotest: Success   [00m 00s] [00m 11s]
Start group_3: Fail [No prompt]  [00m 00s]
LPM autotest:  Fail [No prompt]  [00m 00s]
IVSHMEM autotest:  Fail [No prompt]  [00m 00s]
Memcpy autotest:   Fail [No prompt]  [00m 00s]
Memzone autotest:  Fail [No prompt]  [00m 00s]
String autotest:   Fail [No prompt]  [00m 00s]
Alarm autotest:Fail [No prompt]  [00m 00s] [00m 11s]
Start group_4: Success   [00m 00s]
PCI autotest:  Success   [00m 00s]
Malloc autotest:   Success   [00m 00s]
Multi-process autotest:Success   [00m 00s]
Mbuf autotest: Success   [00m 02s]
Per-lcore autotest:Success   [00m 00s]
Ring autotest: Success   [00m 00s] [00m 16s]
Start group_5: Success   [00m 00s]
Spinlock autotest: Success   [00m 00s]
Byte order autotest:   Success   [00m 00s]
TAILQ autotest:Success   [00m 00s]
Command-line autotest: Success   [00m 00s]
Interrupts autotest:   Success   [00m 00s] [00m 18s]
Start group_6: Fail [No prompt]  [00m 00s]
Function reentrancy autotest:  Fail [No prompt]  [00m 00s]
Mempool autotest:  Fail [No prompt]  [00m 00s]
Atomics autotest:  Fail [No prompt]  [00m 00s]
Prefetch autotest: Fail [No prompt]  [00m 00s]
Red autotest:  Fail [No prompt]  [00m 00s] [00m 18s]
Start group_7: Success   [00m 00s]
PMD ring autotest: Success   [00m 00s]
Access list control autotest:  Success   [00m 01s]
Sched autotest:Success   [00m 00s] [00m 20s]
Start kni: Fail [No prompt]  [00m 00s]
KNI autotest:  Fail [No prompt]  [00m 00s] [00m 20s]
Start mempool_perf:Success   [00m 00s]
Cycles autotest:   Success   [00m 01s] [00m 22s]
Start power:   Fail [No prompt]  [00m 00s]
Power autotest:Fail [No prompt]  [00m 00s] [00m 22s]
Start power_acpi_cpufreq:  Fail [No prompt]  [00m 00s]
Power ACPI cpufreq autotest:   Fail [No prompt]  [00m 00s] [00m 22s]
Start power_kvm_vm:Fail [No prompt]  [00m 00s]
Power KVM VM  autotest:Fail [No prompt]  [00m 00s] [00m 23s]
Start timer_perf:  Fail [No prompt]  [00m 00s]
Timer performance autotest:Fail [No prompt]  [00m 00s] [00m 23s]

Total run time: 00m 23s
Number of failed tests: 16


[dpdk-dev] [PATCH v2] eal: make hugetlb initialization more robust

2016-05-04 Thread Tan, Jianfeng
Hi Sergio,


On 5/4/2016 7:07 PM, Sergio Gonzalez Monroy wrote:
> On 08/03/2016 01:42, Jianfeng Tan wrote:
>> This patch adds an option, --huge-trybest, to use a recover mechanism to
>> the case that there are not so many hugepages (declared in sysfs), which
>> can be used. It relys on a mem access to fault-in hugepages, and if 
>> fails
>> with SIGBUS, recover to previously saved stack environment with
>> siglongjmp().
>>
>> Test example:
>>a. cgcreate -g hugetlb:/test-subgroup
>>b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup
>>c. cgexec -g hugetlb:test-subgroup \
>>   ./examples/helloworld/build/helloworld -c 0x2 -n 4 --huge-trybest
>
> I think you should mention in the commit message that this option also 
> covers the case
> of hugetlbfs mount with quota.

Yes, I should do that.

>
>>   +static sigjmp_buf jmpenv;
>> +
>> +static void sigbus_handler(int signo __rte_unused)
>> +{
>> +siglongjmp(jmpenv, 1);
>> +}
>> +
>> +/* Put setjmp into a wrap method to avoid compiling error. Any 
>> non-volatile,
>> + * non-static local variable in the stack frame calling setjmp might be
>> + * clobbered by a call to longjmp.
>> + */
>> +static int wrap_setjmp(void)
>> +{
>> +return setjmp(jmpenv);
>> +}
>
> Use sigsetjmp instead of setjmp and restore the signal masks.

The difference lies in whether signal mask will be saved for further 
restore. And you are right we should keep either sigsetjmp(xxx, 
1)/siglongjmp(xxx, 1) or setjmp()/longjmp. Nice catch! I'll go with the 
former.

>
>>   /*
>>* Mmap all hugepages of hugepage table: it first open a file in
>>* hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
>> @@ -396,7 +413,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>>   if (fd < 0) {
>>   RTE_LOG(ERR, EAL, "%s(): open failed: %s\n", __func__,
>>   strerror(errno));
>> -return -1;
>> +return i;
>
> When using --try-best, we could get an error and still work as expected.
> It can be confusing for users to see an error when it is expected 
> behavior.
>
> Any thoughts?

Shall we remove those RTE_LOG complaints, because the failure here does 
not mean we cannot satisfy the requirements of applications?

...
> There is another call to map_all_hugepages that you are not updating 
> the check of the return value.

You are right, the second map_all_hugepages's return value check should 
be changed to compare with the total hugepages owned by the hpi. I'll 
fix this.

Thanks,
Jianfeng

>
> Sergio
>



[dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned

2016-05-04 Thread Jerin Jacob
On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > Elements of struct rte_eth_dev used in the fast path.
> > Make struct rte_eth_dev cache aligned to avoid the cases where
> > rte_eth_dev elements share the same cache line with other structures.
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> > v2:
> > Remove __rte_cache_aligned from rte_eth_devices and keep
> > it only at struct rte_eth_dev definition as suggested by Bruce
> > http://dpdk.org/dev/patchwork/patch/12328/
> > ---
> >  lib/librte_ether/rte_ethdev.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index 2757510..48f14d5 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> > struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> > uint8_t attached; /**< Flag indicating the port is attached */
> > enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> > -};
> > +} __rte_cache_aligned;
> >  
> >  struct rte_eth_dev_sriov {
> > uint8_t active;   /**< SRIOV is active with 16, 32 or 64 
> > pools */
> > -- 
> 
> Hi Jerin,

Hi Bruce,

> 
> have you seen a performance degradation due to ethdev elements sharing a cache

No. Not because of sharing the cache line.

> line? I ask because, surprisingly for me, I actually see a performance 
> regression

I see performance degradation in PMD in my setup where independent
changes are causing the performance issue in PMD(~<100k). That's the reason
I thought making aligned cache line stuff where ever it makes sense so that
independent change shouldn't impact the PMD performance and this patch
was an initiative for the same.

> when I apply the above patch. It's not a big change - perf reduction of <1% - 
> but
> still noticable across multiple runs using testpmd. I'm using two 1x40G NICs
> using i40e driver, and I see ~100kpps less traffic per port after applying the
> patch. [CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz]

This particular patch does not have any performance degradation in my
setup.
CPU: ThunderX

> 
> Testpmd cmd and output shown below.
> 
> Regards,
> /Bruce
> 
> $ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c C -n 4 -- --rxd=512 
> --txd=512 --numa
> EAL: Detected 36 lcore(s)
> EAL: Probing VFIO support...
> PMD: nfp_net_pmd_init(): librte_pmd_nfp_net version 0.1
> 
> EAL: PCI device :01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL: PCI device :01:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL: PCI device :05:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device :05:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device :08:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device :08:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device :81:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1584 rte_i40e_pmd
> PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
> EAL: PCI device :88:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1584 rte_i40e_pmd
> PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
> Configuring Port 0 (socket 1)
> Port 0: 68:05:CA:27:D4:4E
> Configuring Port 1 (socket 1)
> Port 1: 68:05:CA:27:D2:0A
> Checking link statuses...
> Port 0 Link Up - speed 4 Mbps - full-duplex
> Port 1 Link Up - speed 4 Mbps - full-duplex
> Done
> No commandline core given, start packet forwarding
>   io packet forwarding - CRC stripping disabled - packets/burst=32
>   nb forwarding cores=1 - nb forwarding ports=2
>   RX queues=1 - RX desc=512 - RX free threshold=32
>   RX threshold registers: pthresh=8 hthresh=8 wthresh=0
>   TX queues=1 - TX desc=512 - TX free threshold=32
>   TX threshold registers: pthresh=32 hthresh=0 wthresh=0
>   TX RS bit threshold=32 - TXQ flags=0xf01
> Press enter to exit
> 
> Telling cores to stop...
> Waiting for lcores to finish...
> 
>   -- Forward statistics for port 0  --
>   RX-packets: 1940564672 RX-dropped: 1456035742RX-total: 3396600414
>   TX-packets: 1940564736 TX-dropped: 0 TX-total: 1940564736
>   
> 
>   -- Forward statistics for port 1  --
>   RX-packets: 1940564671 RX-dropped: 1456036082RX-total: 3396600753
>   TX-packets: 1940564736 TX-dropped: 0 TX-total: 1940564736
>   
> 
>   +++ Accumulated forward statistics for all ports+++
>   RX-packets: 3881129343 

[dpdk-dev] [PATCH v2 12/17] pci: add a helper for device name

2016-05-04 Thread Thomas Monjalon
2016-05-04 16:25, Bruce Richardson:
> On Wed, Apr 20, 2016 at 01:44:12PM +0200, David Marchand wrote:
> > eal is a better place than crypto / ethdev for naming resources.
> > Add a helper in eal and make use of it in crypto / ethdev.
> > 
> > Signed-off-by: David Marchand 
> > ---
> >  lib/librte_cryptodev/rte_cryptodev.c| 27 ---
> >  lib/librte_eal/common/include/rte_pci.h | 25 +
> >  lib/librte_ether/rte_ethdev.c   | 24 
> >  3 files changed, 33 insertions(+), 43 deletions(-)
> > 
> FYI: I see a compile error after applying this patch. It's unrelated directly 
> to
> this patch as root cause seems to be rte_debug.h missing an include for
> rte_branch_prediction.h

It's probably due to this recent commit:

http://dpdk.org/browse/dpdk/diff/lib/librte_eal/common/include/rte_debug.h?id=50705e


[dpdk-dev] [PATCH] eal: add missing include to debug header

2016-05-04 Thread Bruce Richardson
The header file rte_debug.h makes use of the "unlikely" macro which
means it should include the rte_branch_prediction.h header file.

Fixes: 50705e8e3cdd ("eal: add assert macro for debug")

Signed-off-by: Bruce Richardson 
---
 lib/librte_eal/common/include/rte_debug.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/include/rte_debug.h 
b/lib/librte_eal/common/include/rte_debug.h
index 9260eda..cab6fb4 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -44,6 +44,7 @@
  */

 #include "rte_log.h"
+#include "rte_branch_prediction.h"

 #ifdef __cplusplus
 extern "C" {
-- 
2.5.5



[dpdk-dev] [PATCH v2 12/17] pci: add a helper for device name

2016-05-04 Thread Bruce Richardson
On Wed, May 04, 2016 at 06:26:18PM +0200, Thomas Monjalon wrote:
> 2016-05-04 16:25, Bruce Richardson:
> > On Wed, Apr 20, 2016 at 01:44:12PM +0200, David Marchand wrote:
> > > eal is a better place than crypto / ethdev for naming resources.
> > > Add a helper in eal and make use of it in crypto / ethdev.
> > > 
> > > Signed-off-by: David Marchand 
> > > ---
> > >  lib/librte_cryptodev/rte_cryptodev.c| 27 ---
> > >  lib/librte_eal/common/include/rte_pci.h | 25 +
> > >  lib/librte_ether/rte_ethdev.c   | 24 
> > >  3 files changed, 33 insertions(+), 43 deletions(-)
> > > 
> > FYI: I see a compile error after applying this patch. It's unrelated 
> > directly to
> > this patch as root cause seems to be rte_debug.h missing an include for
> > rte_branch_prediction.h
> 
> It's probably due to this recent commit:
>   
> http://dpdk.org/browse/dpdk/diff/lib/librte_eal/common/include/rte_debug.h?id=50705e

Yes, patch is in the works here...


[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-04 Thread Yuanhan Liu
On Wed, May 04, 2016 at 08:50:27AM +0800, Huawei Xie wrote:
> -int virtio_dev_queue_setup(struct rte_eth_dev *dev,
> - int queue_type,
> - uint16_t queue_idx,
> +static int
> +virtio_dev_cq_queue_setup(struct rte_eth_dev *dev,

While it's good to split Rx/Tx specific stuff, but why are you trying to
remove a common queue_setup function that does common setups, such as vring
memory allocation.

This results to much duplicated code: following diff summary also shows
it clearly:

7 files changed, 655 insertions(+), 422 deletions(-)

which makes it harder for maintaining.

> -}
> + rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq,
> + sizeof(*vq) + vq_size * sizeof(struct vq_desc_extra));
> + rxvq->vq = vq;
> + vq->sw_ring = sw_ring;

sw_ring is needed for rx queue only, why not moving it to rx queue struct?

>  static void
> -virtio_update_packet_stats(struct virtqueue *vq, struct rte_mbuf *mbuf)
> +virtio_update_rxq_stats(struct virtnet_rx *rxvq, struct rte_mbuf *mbuf)
>  {
>   uint32_t s = mbuf->pkt_len;
>   struct ether_addr *ea;
>  
>   if (s == 64) {
> - vq->size_bins[1]++;
> + rxvq->stats.size_bins[1]++;
>   } else if (s > 64 && s < 1024) {
>   uint32_t bin;
>  
>   /* count zeros, and offset into correct bin */
>   bin = (sizeof(s) * 8) - __builtin_clz(s) - 5;
> - vq->size_bins[bin]++;
> + rxvq->stats.size_bins[bin]++;
>   } else {
>   if (s < 64)
> - vq->size_bins[0]++;
> + rxvq->stats.size_bins[0]++;
>   else if (s < 1519)
> - vq->size_bins[6]++;
> + rxvq->stats.size_bins[6]++;
>   else if (s >= 1519)
> - vq->size_bins[7]++;
> + rxvq->stats.size_bins[7]++;
>   }
>  
>   ea = rte_pktmbuf_mtod(mbuf, struct ether_addr *);
>   if (is_multicast_ether_addr(ea)) {
>   if (is_broadcast_ether_addr(ea))
> - vq->broadcast++;
> + rxvq->stats.broadcast++;
>   else
> - vq->multicast++;
> + rxvq->stats.multicast++;
> + }
> +}
> +
> +static void
> +virtio_update_txq_stats(struct virtnet_tx *txvq, struct rte_mbuf *mbuf)

Why not taking "struct virtnet_stats *stats" as the arg, so that we
don't have to implment two exactly same functions.


> diff --git a/drivers/net/virtio/virtio_rxtx.h 
> b/drivers/net/virtio/virtio_rxtx.h
> index a76c3e5..ced55a3 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -34,7 +34,59 @@
>  #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
>  
>  #ifdef RTE_MACHINE_CPUFLAG_SSSE3
> -int virtio_rxq_vec_setup(struct virtqueue *rxq);
> +
> +struct virtnet_stats {

Another remind again: you should put following codes before the
"#ifdef".

--yliu


[dpdk-dev] virtio still blindly take over virtio device managed by kernel

2016-05-04 Thread Xie, Huawei


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vincent Li
> Sent: Thursday, May 05, 2016 12:21 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] virtio still blindly take over virtio device managed
> by kernel
> 
> Hi,
> 
> I am running the dpdk git repo which already had commit ac5e1d838dc
> (virtio: skip error when probing kernel managed device), but in my
> test, it seems still taking control of the kernel managed virtio
> device and segmentation fault, here is the example:
> 
> # ./tools/dpdk_nic_bind.py --status
> 
> Network devices using DPDK-compatible driver
> 
> :00:07.0 'Virtio network device' drv=igb_uio unused=
> :00:08.0 'Virtio network device' drv=igb_uio unused=
> 
> Network devices using kernel driver
> ===
> :00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio
> 
>  #./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4  -- -i
> EAL: Detected 4 lcore(s)
> EAL: Probing VFIO support...
> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
> unreliable clock cycles !
> EAL: PCI device :00:03.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> PMD: vtpci_init(): trying with legacy virtio pci.
> Segmentation fault (core dumped)
> 
> if I blacklist :00:03.0 from testpmd, testpmd works:
> 
> # ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4  -b :00:03.0 -
> - -i
> EAL: Detected 4 lcore(s)
> EAL: Probing VFIO support...
> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
> unreliable clock cycles !
> EAL: PCI device :00:03.0 on NUMA socket -1
> EAL: PCI device :00:07.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> PMD: virtio_read_caps(): no modern virtio pci device found.
> PMD: vtpci_init(): trying with legacy virtio pci.
> EAL: PCI device :00:08.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> PMD: virtio_read_caps(): no modern virtio pci device found.
> PMD: vtpci_init(): trying with legacy virtio pci.
> Interactive-mode selected
> Configuring Port 0 (socket 0)
> rte_eth_dev_config_restore: port 0: MAC address array not supported
> Port 0: 52:54:00:EA:6E:3E
> Configuring Port 1 (socket 0)
> rte_eth_dev_config_restore: port 1: MAC address array not supported
> Port 1: 52:54:00:24:06:DB
> Checking link statuses...
> Port 0 Link Up - speed 1 Mbps - full-duplex
> Port 1 Link Up - speed 1 Mbps - full-duplex
> Done
> testpmd>

Hi Vincent:
Thanks for the info. Will check tomorrow to see what breaks this.


[dpdk-dev] [PATCH v2 12/17] pci: add a helper for device name

2016-05-04 Thread Bruce Richardson
On Wed, Apr 20, 2016 at 01:44:12PM +0200, David Marchand wrote:
> eal is a better place than crypto / ethdev for naming resources.
> Add a helper in eal and make use of it in crypto / ethdev.
> 
> Signed-off-by: David Marchand 
> ---
>  lib/librte_cryptodev/rte_cryptodev.c| 27 ---
>  lib/librte_eal/common/include/rte_pci.h | 25 +
>  lib/librte_ether/rte_ethdev.c   | 24 
>  3 files changed, 33 insertions(+), 43 deletions(-)
> 
FYI: I see a compile error after applying this patch. It's unrelated directly to
this patch as root cause seems to be rte_debug.h missing an include for
rte_branch_prediction.h

/Bruce


[dpdk-dev] [PATCH v2] virtio: fix modify drv_flags for specific device

2016-05-04 Thread Yuanhan Liu
On Tue, May 03, 2016 at 10:05:01AM +0200, David Marchand wrote:
> Hello Tan,
> 
> On Thu, Apr 28, 2016 at 8:08 PM, Jianfeng Tan  
> wrote:
> > Issue: virtio's drv_flags are decided by devices types (modern vs legacy),
> > and which kernel driver is used, and the negotiated features (especially
> > VIRTIO_NET_STATUS) with backend, which makes it possible to multiple
> > virtio devices have different versions of drv_flags, but this variable
> > is currently shared by each virtio device.
> 
> The wording is a bit strange, maybe the sentence is a bit too long.

Agreed.

Besides that, it just described the fact that we are sharing one
flag for all virtio devices, but it didn't state what's wrong with
it, and what's the per-device flag for. From this point of view,
I don't think you are actually solving an "issue", as I don't see
one from your description.

> But the rest looks good to me.
> 
> Acked-by: David Marchand 

Thanks for the review.

--yliu


[dpdk-dev] [PATCH] app/test: fix log check when default level is high

2016-05-04 Thread Thomas Monjalon
The log unit test was checking display of low priority messages.
It was not working if RTE_LOG_LEVEL is not RTE_LOG_DEBUG.
It is even easier to see since the default level is INFO (9b9d7ca).

Now the test use ERR and CRIT levels which should be always enabled
while not trigerring syslog output on the console.

Signed-off-by: Thomas Monjalon 
---
 app/test/autotest_test_funcs.py | 20 
 app/test/test_logs.c| 20 +---
 2 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/app/test/autotest_test_funcs.py b/app/test/autotest_test_funcs.py
index 0f012f6..f04909d 100644
--- a/app/test/autotest_test_funcs.py
+++ b/app/test/autotest_test_funcs.py
@@ -145,18 +145,14 @@ def logs_autotest(child, test_name):
child.sendline(test_name)

log_list = [
-   "TESTAPP1: this is a debug level message",
-   "TESTAPP1: this is a info level message",
-   "TESTAPP1: this is a warning level message",
-   "TESTAPP2: this is a info level message",
-   "TESTAPP2: this is a warning level message",
-   "TESTAPP1: this is a debug level message",
-   "TESTAPP1: this is a debug level message",
-   "TESTAPP1: this is a info level message",
-   "TESTAPP1: this is a warning level message",
-   "TESTAPP2: this is a info level message",
-   "TESTAPP2: this is a warning level message",
-   "TESTAPP1: this is a debug level message",
+   "TESTAPP1: error message",
+   "TESTAPP1: critical message",
+   "TESTAPP2: critical message",
+   "TESTAPP1: error message",
+   "TESTAPP1: error message",
+   "TESTAPP1: critical message",
+   "TESTAPP2: critical message",
+   "TESTAPP1: error message",
]

for log_msg in log_list:
diff --git a/app/test/test_logs.c b/app/test/test_logs.c
index 18a3b6a..861cdff 100644
--- a/app/test/test_logs.c
+++ b/app/test/test_logs.c
@@ -66,24 +66,22 @@ test_logs(void)
rte_set_log_type(RTE_LOGTYPE_TESTAPP2, 1);

/* log in debug level */
-   rte_set_log_level(RTE_LOG_DEBUG);
-   RTE_LOG(DEBUG, TESTAPP1, "this is a debug level message\n");
-   RTE_LOG(INFO, TESTAPP1, "this is a info level message\n");
-   RTE_LOG(WARNING, TESTAPP1, "this is a warning level message\n");
+   rte_set_log_level(RTE_LOG_ERR);
+   RTE_LOG(ERR, TESTAPP1, "error message\n");
+   RTE_LOG(CRIT, TESTAPP1, "critical message\n");

/* log in info level */
-   rte_set_log_level(RTE_LOG_INFO);
-   RTE_LOG(DEBUG, TESTAPP2, "debug level message (not displayed)\n");
-   RTE_LOG(INFO, TESTAPP2, "this is a info level message\n");
-   RTE_LOG(WARNING, TESTAPP2, "this is a warning level message\n");
+   rte_set_log_level(RTE_LOG_CRIT);
+   RTE_LOG(ERR, TESTAPP2, "error message (not displayed)\n");
+   RTE_LOG(CRIT, TESTAPP2, "critical message\n");

/* disable one log type */
rte_set_log_type(RTE_LOGTYPE_TESTAPP2, 0);

/* log in debug level */
-   rte_set_log_level(RTE_LOG_DEBUG);
-   RTE_LOG(DEBUG, TESTAPP1, "this is a debug level message\n");
-   RTE_LOG(DEBUG, TESTAPP2, "debug level message (not displayed)\n");
+   rte_set_log_level(RTE_LOG_ERR);
+   RTE_LOG(ERR, TESTAPP1, "error message\n");
+   RTE_LOG(ERR, TESTAPP2, "error message (not displayed)\n");

rte_log_dump_history(stdout);

-- 
2.7.0



[dpdk-dev] Old oversubscription related checkin

2016-05-04 Thread Sridhar.V.Iyer
Hi Cristian,

 I stumbled into an old from 2013 
(http://dpdk.org/browse/dpdk/patch/lib/librte_sched/rte_sched.c?id=835c5409a7bac3055b82bebee65d8ada7f20d332
 
)

I couldn?t find any context for the patch from the mail archives. Could you 
please let me know why oversubscription was removed from all the other traffic 
classes and just given to tc3?
Were there huge performance penalties? Would there be any issues if I add this 
patch back again locally (enable/disabled via config).

Regards,

Sridhar V Iyer









  


[dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned

2016-05-04 Thread Richardson, Bruce


> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Wednesday, May 4, 2016 2:43 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> aligned
> 
> On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> > On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > > Elements of struct rte_eth_dev used in the fast path.
> > > Make struct rte_eth_dev cache aligned to avoid the cases where
> > > rte_eth_dev elements share the same cache line with other structures.
> > >
> > > Signed-off-by: Jerin Jacob 
> > > ---
> > > v2:
> > > Remove __rte_cache_aligned from rte_eth_devices and keep it only at
> > > struct rte_eth_dev definition as suggested by Bruce
> > > http://dpdk.org/dev/patchwork/patch/12328/
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 2757510..48f14d5 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> > >   struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> > >   uint8_t attached; /**< Flag indicating the port is attached */
> > >   enum rte_eth_dev_type dev_type; /**< Flag indicating the device
> > > type */ -};
> > > +} __rte_cache_aligned;
> > >
> > >  struct rte_eth_dev_sriov {
> > >   uint8_t active;   /**< SRIOV is active with 16, 32 or 64
> pools */
> > > --
> >
> > Hi Jerin,
> 
> Hi Bruce,
> 
> >
> > have you seen a performance degradation due to ethdev elements sharing
> > a cache
> 
> No. Not because of sharing the cache line.
> 
> > line? I ask because, surprisingly for me, I actually see a performance
> > regression
> 
> I see performance degradation in PMD in my setup where independent changes
> are causing the performance issue in PMD(~<100k). That's the reason I
> thought making aligned cache line stuff where ever it makes sense so that
> independent change shouldn't impact the PMD performance and this patch was
> an initiative for the same.
> 
> > when I apply the above patch. It's not a big change - perf reduction
> > of <1% - but still noticable across multiple runs using testpmd. I'm
> > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > E5-2699 v3 @ 2.30GHz]
> 
> This particular patch does not have any performance degradation in my
> setup.
> CPU: ThunderX

Ok, so I take it that this patch is performance neutral on your setup, then?
If that's the case, can we hold off on merging it on the basis that it's not 
needed and does cause a slight regression.

Thanks,
/Bruce


[dpdk-dev] [PATCH] i40e: fix vlan stripping from inner header

2016-05-04 Thread Jingjing Wu
Previously, for tunnel packets, such as VXLAN/NVGRE, the vlan
tags of the inner header will be stripped without putting vlan
info to descriptor, what is not expected behaviour.
This patch fixes it by changing hardware configuration to leave
the inner packet alone.

Fixes: 4861cde46116 ("i40e: new poll mode driver")
Fixes: a778a1fa2e4e ("i40e: set up and initialize flow director")
Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_16_07.rst | 7 +++
 drivers/net/i40e/i40e_fdir.c   | 2 +-
 drivers/net/i40e/i40e_rxtx.c   | 7 ++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 83c841b..f6d543c 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -54,6 +54,13 @@ EAL
 Drivers
 ~~~

+* **i40e: Fixed vlan stripping from inner header.**
+
+  Previously, for tunnel packets, such as VXLAN/NVGRE, the vlan
+  tags of the inner header will be stripped without putting vlan
+  info to descriptor.
+  Now this issue is fixed by disabling vlan stripping from inner header.
+

 Libraries
 ~
diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index ff57e8a..cd4c324 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -163,7 +163,7 @@ i40e_fdir_rx_queue_init(struct i40e_rx_queue *rxq)
rx_ctx.lrxqthresh = 2;
rx_ctx.crcstrip = 0;
rx_ctx.l2tsel = 1;
-   rx_ctx.showiv = 1;
+   rx_ctx.showiv = 0;
rx_ctx.prefena = 1;

err = i40e_clear_lan_rx_queue_context(hw, rxq->reg_idx);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 4d35d83..a69fde1 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2904,7 +2904,12 @@ i40e_rx_queue_init(struct i40e_rx_queue *rxq)
rx_ctx.lrxqthresh = 2;
rx_ctx.crcstrip = (rxq->crc_len == 0) ? 1 : 0;
rx_ctx.l2tsel = 1;
-   rx_ctx.showiv = 1;
+   /* showiv indicates if inner VLAN is stripped inside of tunnel
+* packet. When set it to 1, vlan information is stripped from
+* the inner header, but the hardware does not put it in the
+* descriptor. So set it zero by default.
+*/
+   rx_ctx.showiv = 0;
rx_ctx.prefena = 1;

err = i40e_clear_lan_rx_queue_context(hw, pf_q);
-- 
2.4.0



[dpdk-dev] [PATCH v2] eal: make hugetlb initialization more robust

2016-05-04 Thread Sergio Gonzalez Monroy
On 08/03/2016 01:42, Jianfeng Tan wrote:
> This patch adds an option, --huge-trybest, to use a recover mechanism to
> the case that there are not so many hugepages (declared in sysfs), which
> can be used. It relys on a mem access to fault-in hugepages, and if fails
> with SIGBUS, recover to previously saved stack environment with
> siglongjmp().
>
> Test example:
>a. cgcreate -g hugetlb:/test-subgroup
>b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup
>c. cgexec -g hugetlb:test-subgroup \
> ./examples/helloworld/build/helloworld -c 0x2 -n 4 --huge-trybest

I think you should mention in the commit message that this option also 
covers the case
of hugetlbfs mount with quota.

>
> +static sigjmp_buf jmpenv;
> +
> +static void sigbus_handler(int signo __rte_unused)
> +{
> + siglongjmp(jmpenv, 1);
> +}
> +
> +/* Put setjmp into a wrap method to avoid compiling error. Any non-volatile,
> + * non-static local variable in the stack frame calling setjmp might be
> + * clobbered by a call to longjmp.
> + */
> +static int wrap_setjmp(void)
> +{
> + return setjmp(jmpenv);
> +}

Use sigsetjmp instead of setjmp and restore the signal masks.

>   /*
>* Mmap all hugepages of hugepage table: it first open a file in
>* hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
> @@ -396,7 +413,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>   if (fd < 0) {
>   RTE_LOG(ERR, EAL, "%s(): open failed: %s\n", __func__,
>   strerror(errno));
> - return -1;
> + return i;

When using --try-best, we could get an error and still work as expected.
It can be confusing for users to see an error when it is expected behavior.

Any thoughts?

>   }
>
>   /* map the segment, and populate page tables,
> @@ -407,7 +424,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>   RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__,
>   strerror(errno));
>   close(fd);
> - return -1;
> + return i;
>   }
>

Same comment as above

>   /* set shared flock on the file. */
>   if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
>   RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
>   __func__, strerror(errno));
>   close(fd);
> - return -1;
> + return i;

Same comment as above

> @@ -1137,10 +1206,24 @@ rte_eal_hugepage_init(void)
>   continue;
>
>   /* map all hugepages available */
> - if (map_all_hugepages(_hp[hp_offset], hpi, 1) < 0){
> - RTE_LOG(DEBUG, EAL, "Failed to mmap %u MB hugepages\n",
> - (unsigned)(hpi->hugepage_sz / 
> 0x10));
> - goto fail;
> + pages_old = hpi->num_pages[0];
> + pages_new = map_all_hugepages(_hp[hp_offset], hpi, 1);
> + if (pages_new < pages_old) {
> + RTE_LOG(DEBUG, EAL,
> + "%d not %d hugepages of size %u MB allocated\n",
> + pages_new, pages_old,
> + (unsigned)(hpi->hugepage_sz / 0x10));
> + if (internal_config.huge_trybest) {
> + int pages = pages_old - pages_new;
> +
> + internal_config.memory -=
> + hpi->hugepage_sz * pages;
> + nr_hugepages -= pages;
> + hpi->num_pages[0] = pages_new;
> + if (pages_new == 0)
> + continue;
> + } else
> + goto fail;
>   }

There is another call to map_all_hugepages that you are not updating the 
check of the return value.

Sergio



[dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned

2016-05-04 Thread Bruce Richardson
On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob 
> ---
> v2:
> Remove __rte_cache_aligned from rte_eth_devices and keep
> it only at struct rte_eth_dev definition as suggested by Bruce
> http://dpdk.org/dev/patchwork/patch/12328/
> ---
>  lib/librte_ether/rte_ethdev.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 2757510..48f14d5 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
>   struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>   uint8_t attached; /**< Flag indicating the port is attached */
>   enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> -};
> +} __rte_cache_aligned;
>  
>  struct rte_eth_dev_sriov {
>   uint8_t active;   /**< SRIOV is active with 16, 32 or 64 
> pools */
> -- 

Hi Jerin,

have you seen a performance degradation due to ethdev elements sharing a cache
line? I ask because, surprisingly for me, I actually see a performance 
regression
when I apply the above patch. It's not a big change - perf reduction of <1% - 
but
still noticable across multiple runs using testpmd. I'm using two 1x40G NICs
using i40e driver, and I see ~100kpps less traffic per port after applying the
patch. [CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz]

Testpmd cmd and output shown below.

Regards,
/Bruce

$ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c C -n 4 -- --rxd=512 
--txd=512 --numa
EAL: Detected 36 lcore(s)
EAL: Probing VFIO support...
PMD: nfp_net_pmd_init(): librte_pmd_nfp_net version 0.1

EAL: PCI device :01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL: PCI device :01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL: PCI device :05:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device :05:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device :08:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device :08:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device :81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1584 rte_i40e_pmd
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
EAL: PCI device :88:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1584 rte_i40e_pmd
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
Configuring Port 0 (socket 1)
Port 0: 68:05:CA:27:D4:4E
Configuring Port 1 (socket 1)
Port 1: 68:05:CA:27:D2:0A
Checking link statuses...
Port 0 Link Up - speed 4 Mbps - full-duplex
Port 1 Link Up - speed 4 Mbps - full-duplex
Done
No commandline core given, start packet forwarding
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=512 - RX free threshold=32
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=32
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=32 - TXQ flags=0xf01
Press enter to exit

Telling cores to stop...
Waiting for lcores to finish...

  -- Forward statistics for port 0  --
  RX-packets: 1940564672 RX-dropped: 1456035742RX-total: 3396600414
  TX-packets: 1940564736 TX-dropped: 0 TX-total: 1940564736
  

  -- Forward statistics for port 1  --
  RX-packets: 1940564671 RX-dropped: 1456036082RX-total: 3396600753
  TX-packets: 1940564736 TX-dropped: 0 TX-total: 1940564736
  

  +++ Accumulated forward statistics for all ports+++
  RX-packets: 3881129343 RX-dropped: 2912071824RX-total: 6793201167
  TX-packets: 3881129472 TX-dropped: 0 TX-total: 3881129472
  

Done.

Shutting down port 0...
Stopping ports...
Done
Closing ports...
Done

Shutting down port 1...
Stopping ports...
Done
Closing ports...
Done

Bye...




[dpdk-dev] [PATCH v2] eal: make hugetlb initialization more robust

2016-05-04 Thread Sergio Gonzalez Monroy
On 08/03/2016 01:42, Jianfeng Tan wrote:
> This patch adds an option, --huge-trybest, to use a recover mechanism to
> the case that there are not so many hugepages (declared in sysfs), which
> can be used. It relys on a mem access to fault-in hugepages, and if fails
> with SIGBUS, recover to previously saved stack environment with
> siglongjmp().
>
> Test example:
>a. cgcreate -g hugetlb:/test-subgroup
>b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup
>c. cgexec -g hugetlb:test-subgroup \
> ./examples/helloworld/build/helloworld -c 0x2 -n 4 --huge-trybest

I think you should mention in the commit message that this option also 
covers the case
of hugetlbfs mount with quota.

>   
> +static sigjmp_buf jmpenv;
> +
> +static void sigbus_handler(int signo __rte_unused)
> +{
> + siglongjmp(jmpenv, 1);
> +}
> +
> +/* Put setjmp into a wrap method to avoid compiling error. Any non-volatile,
> + * non-static local variable in the stack frame calling setjmp might be
> + * clobbered by a call to longjmp.
> + */
> +static int wrap_setjmp(void)
> +{
> + return setjmp(jmpenv);
> +}

Use sigsetjmp instead of setjmp and restore the signal masks.

>   /*
>* Mmap all hugepages of hugepage table: it first open a file in
>* hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
> @@ -396,7 +413,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>   if (fd < 0) {
>   RTE_LOG(ERR, EAL, "%s(): open failed: %s\n", __func__,
>   strerror(errno));
> - return -1;
> + return i;

When using --try-best, we could get an error and still work as expected.
It can be confusing for users to see an error when it is expected behavior.

Any thoughts?

>   }
>   
>   /* map the segment, and populate page tables,
> @@ -407,7 +424,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>   RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__,
>   strerror(errno));
>   close(fd);
> - return -1;
> + return i;
>   }
>   

Same comment as above

>   /* set shared flock on the file. */
>   if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
>   RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
>   __func__, strerror(errno));
>   close(fd);
> - return -1;
> + return i;

Same comment as above

> @@ -1137,10 +1206,24 @@ rte_eal_hugepage_init(void)
>   continue;
>   
>   /* map all hugepages available */
> - if (map_all_hugepages(_hp[hp_offset], hpi, 1) < 0){
> - RTE_LOG(DEBUG, EAL, "Failed to mmap %u MB hugepages\n",
> - (unsigned)(hpi->hugepage_sz / 
> 0x10));
> - goto fail;
> + pages_old = hpi->num_pages[0];
> + pages_new = map_all_hugepages(_hp[hp_offset], hpi, 1);
> + if (pages_new < pages_old) {
> + RTE_LOG(DEBUG, EAL,
> + "%d not %d hugepages of size %u MB allocated\n",
> + pages_new, pages_old,
> + (unsigned)(hpi->hugepage_sz / 0x10));
> + if (internal_config.huge_trybest) {
> + int pages = pages_old - pages_new;
> +
> + internal_config.memory -=
> + hpi->hugepage_sz * pages;
> + nr_hugepages -= pages;
> + hpi->num_pages[0] = pages_new;
> + if (pages_new == 0)
> + continue;
> + } else
> + goto fail;
>   }

There is another call to map_all_hugepages that you are not updating the 
check of the return value.

Sergio



[dpdk-dev] [PATCH V2] i40evf: add ops for rx queue and tx queue

2016-05-04 Thread Bruce Richardson
On Thu, Apr 28, 2016 at 04:57:51AM +, Wu, Jingjing wrote:
> 
> 
> > -Original Message-
> > From: Xing, Beilei
> > Sent: Thursday, April 28, 2016 11:18 AM
> > To: Wu, Jingjing
> > Cc: dev at dpdk.org; Xing, Beilei
> > Subject: [PATCH V2] i40evf: add ops for rx queue and tx queue
> > 
> > Add 3 vf ops: rx_queue_count, rxq_info_get and txq_info_get. They can
> > reuse corresponding pf APIs.
> > 
> > Signed-off-by: Beilei Xing 
> 
> Acked-by: Jingjing Wu 
> 
Applied to dpdk-next-net/rel_16_07

/Bruce


[dpdk-dev] [PATCH] enic: fix misalignment of Rx mbuf data

2016-05-04 Thread Bruce Richardson
On Tue, Apr 26, 2016 at 07:51:56PM -0700, John Daley wrote:
> Data DMA used m->data_off of uninitialized mbufs instead of
> RTE_PKTMBUF_HEADROOM, potentially causing Rx data to be
> placed at the wrong alignment in the mbuf.
> 
> Fixes: 947d860c821f ("enic: improve Rx performance")
> Signed-off-by: John Daley 
> ---
>  drivers/net/enic/enic_main.c | 5 +++--
>  drivers/net/enic/enic_rx.c   | 6 --
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
Applied to dpdk-next-net/rel_16_07

/Bruce


[dpdk-dev] [PATCH v3 02/13] ixgbe: move pci device ids to driver

2016-05-04 Thread David Marchand
On Fri, Apr 29, 2016 at 3:34 AM, Wu, Jingjing  wrote:
> Hi, David
>
> For the changes on igb, ixgbe, I saw you create a new header file called 
> **__pci_dev_ids.h to replace the rte_pci_dev_ids.h for each driver.
> But for the changes on i40e, you didn't do that way.
> If you look into the base code, you will find for each Intel NIC, the device 
> ids are defined there, such as ixgbe_type.h; i40e_devid.h; E1000_hw.h.
>
> I'd prefer the way you did in i40e driver. It's clearer and with minor change.

This is what I wanted to do, but this requires a rework in kni and the pci test.

-- 
David Marchand


[dpdk-dev] [RFC PATCH 0/4]: Implement module information export

2016-05-04 Thread David Marchand
On Tue, May 3, 2016 at 1:57 PM, Neil Horman  wrote:
> On Tue, Apr 26, 2016 at 01:39:47PM -0400, Neil Horman wrote:
>> Hey-
>>   So a few days ago we were reviewing Davids patch series to introduce 
>> the
>> abiilty to dump hardware support from pmd DSO's in a human readable format.
>> That effort encountered some problems, most notably the fact that stripping a
>> DSO removed the required information that the proposed tool keyed off, as 
>> well
>> as the need to dead reckon offsets between symbols that may not be constant
>> (dependent on architecture).
>>
>>   I was going to start looking into the possibility of creating a modinfo
>> section in a simmilar fashion to what kernel modules do in linux or BSD.  I
>> decided to propose this solution instead though, because the kernel style
>> solution requires a significant amount of infrastructure that I think we can
>> maybe avoid maintaining, if we accept some minor caviats
>>
>> To do this We emit a set of well known marker symbols for each DSO that an
>> external application can search for (in this case I called them
>> this_pmd_driver, where n is a counter macro).  These marker symbols are
>> n is a counter macro).  These marker symbols are exported by PMDs for
>> external access.  External tools can then access these symbols via the
>> dlopen/dlsym api (or via elfutils libraries)
>>
>> The symbols above alias the rte_driver struct for each PMD, and the external
>> application can then interrogate the registered driver information.
>>
>> I also add a pointer to the pci id table struct for each PMD so that we can
>> export hardware support.
>>
>> This approach has a few pros and cons:
>>
>> pros:
>> 1) Its simple, and doesn't require extra infrastructure to implement.  E.g. 
>> we
>> don't need a new tool to extract driver information and emit the C code to 
>> build
>> the binary data for the special section, nor do we need a custom linker 
>> script
>> to link said special section in place
>>
>> 2) Its stable.  Because the marker symbols are explicitly exported, this
>> approach is resilient against stripping.
>>
>> cons:
>> 1) It creates an artifact in that PMD_REGISTER_DRIVER has to be used in one
>> compilation unit per DSO.  As an example em and igb effectively merge two
>> drivers into one DSO, and the uses of PMD_REGISTER_DRIVER occur in two 
>> separate
>> C files for the same single linked DSO.  Because of the use of the 
>> __COUNTER__
>> macro we get multiple definitions of the same marker symbols.
>>
>> I would make the argument that the downside of the above artifact isn't that 
>> big
>> a deal.  Traditionally in other projects a unit like a module (or DSO in our
>> case) only ever codifies a single driver (e.g. module_init() in the linux 
>> kernel
>> is only ever used once per built module).  If we have code like igb/em that
>> shares some core code, we should build the shared code to object files and 
>> link
>> them twice, once to an em.so pmd and again to an igb.so pmd.
>>
>> But regardless, I thought I would propose this to see what you all thought of
>> it.
>>
>> FWIW, heres sample output of the pmdinfo tool from this series probing the
>> librte_pmd_ena.so module:
>>
>> [nhorman at hmsreliant dpdk]$ ./build/app/pmdinfo
>> ~/git/dpdk/build/lib/librte_pmd_ena.so
>> PMD 0 Information:
>> Driver Name: ena_driver
>> Driver Type: PCI
>> |PCI Table|
>> | VENDOR ID | DEVICE ID | SUBVENDOR ID | SUBDEVICE ID |
>> |-|
>> |   1d0f|   ec20|  |  |
>> |   1d0f|   ec21|  |  |
>> |-|
>>
>>
>>
>>
> Ping, thoughts here?

- This implementation does not support binaries, so it is not suitable
for people who don't want dso, this is partially why I used bfd rather
than just dlopen.
- The name of the tool "pmdinfo" is likely to cause problems, I would
say we need to prefix t with dpdk.
- How does it behave if we strip the dsos ?
- Using __COUNTER__ seeems a bit tricky to me, can't this cause misalignments ?
- The tool output format is not script friendly from my pov.


-- 
David Marchand


[dpdk-dev] [PATCH] app/test: fix +/-1 error in allocation

2016-05-04 Thread David Marchand
On Tue, May 3, 2016 at 9:15 PM, Jan Viktorin  wrote:
> A bug has been detected by valgrind:
>
> ==14406== Invalid write of size 1
> ==14406==by 0x86ECC76: sprintf (in /usr/lib/libc-2.23.so)
> ==14406==by 0x430B0A: commands_init (in 
> /home/jviki/Projects/dpdk/dpdk-soc/build/app/test)
> ==14406==by 0x42F215: main (in 
> /home/jviki/Projects/dpdk/dpdk-soc/build/app/test)
> ==14406==  Address 0x9d72ff2 is 0 bytes after a block of size 1,346 alloc'd
> ==14406==at 0x78C1BD0: malloc (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==14406==by 0x430AE4: commands_init (in 
> /home/jviki/Projects/dpdk/dpdk-soc/build/app/test)
> ==14406==by 0x42F215: main (in 
> /home/jviki/Projects/dpdk/dpdk-soc/build/app/test)
> ==14406==
>
> The commands buffer is exactly 1346 B long so there is an access just after
> the buffer. The sprintf always writes '\0' at the end of the string. The '#'
> separator adds 1 B more to each string. This is correct until the last string
> is sprinted there. The last one is 1 B longer then expected, i.e.:
>
>  strlen(t->command) + strlen("#") + ONE_FOR_ZERO
>
> Fixes: 727909c59231 ("app/test: introduce dynamic commands list")
>
> Signed-off-by: Jan Viktorin 

Good catch.
Acked-by: David Marchand 

-- 
David Marchand


[dpdk-dev] virtio still blindly take over virtio device managed by kernel

2016-05-04 Thread Vincent Li
Hi,

I am running the dpdk git repo which already had commit ac5e1d838dc
(virtio: skip error when probing kernel managed device), but in my
test, it seems still taking control of the kernel managed virtio
device and segmentation fault, here is the example:

# ./tools/dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver

:00:07.0 'Virtio network device' drv=igb_uio unused=
:00:08.0 'Virtio network device' drv=igb_uio unused=

Network devices using kernel driver
===
:00:03.0 'Virtio network device' if= drv=virtio-pci unused=igb_uio

 #./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4  -- -i
EAL: Detected 4 lcore(s)
EAL: Probing VFIO support...
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
unreliable clock cycles !
EAL: PCI device :00:03.0 on NUMA socket -1
EAL:   probe driver: 1af4:1000 rte_virtio_pmd
PMD: vtpci_init(): trying with legacy virtio pci.
Segmentation fault (core dumped)

if I blacklist :00:03.0 from testpmd, testpmd works:

# ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4  -b :00:03.0 -- -i
EAL: Detected 4 lcore(s)
EAL: Probing VFIO support...
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
unreliable clock cycles !
EAL: PCI device :00:03.0 on NUMA socket -1
EAL: PCI device :00:07.0 on NUMA socket -1
EAL:   probe driver: 1af4:1000 rte_virtio_pmd
PMD: virtio_read_caps(): no modern virtio pci device found.
PMD: vtpci_init(): trying with legacy virtio pci.
EAL: PCI device :00:08.0 on NUMA socket -1
EAL:   probe driver: 1af4:1000 rte_virtio_pmd
PMD: virtio_read_caps(): no modern virtio pci device found.
PMD: vtpci_init(): trying with legacy virtio pci.
Interactive-mode selected
Configuring Port 0 (socket 0)
rte_eth_dev_config_restore: port 0: MAC address array not supported
Port 0: 52:54:00:EA:6E:3E
Configuring Port 1 (socket 0)
rte_eth_dev_config_restore: port 1: MAC address array not supported
Port 1: 52:54:00:24:06:DB
Checking link statuses...
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Up - speed 1 Mbps - full-duplex
Done
testpmd>


[dpdk-dev] [PATCH v2] examples: remove useless checking

2016-05-04 Thread Ferruh Yigit
On 5/3/2016 10:16 PM, Mauricio Vasquez B wrote:
> The rte_eth_dev_count() function will never return a value greater
> than RTE_MAX_ETHPORTS, so that checking is useless.
> 
> Signed-off-by: Mauricio Vasquez B  studenti.polito.it>
> ---
> v2:
>  Add missed case in examples/kni/main.c

Acked-by: Ferruh Yigit 



[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-04 Thread Huawei Xie
Currently virtio RX/TX paths use common vq structure.
The initial idea is to split virtio RX and TX queues completely as they
have different memory requirement and we could arrange data friendly for
optimization for different paths in future.

With this patch, we keep a common vq structure, as we have too
many common vq operations. Split fields into virtnet_rx
and virtnet_tx respectively.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c  | 333 +---
 drivers/net/virtio/virtio_pci.c |   4 +-
 drivers/net/virtio/virtio_pci.h |   3 +-
 drivers/net/virtio/virtio_rxtx.c| 531 +++-
 drivers/net/virtio/virtio_rxtx.h|  54 +++-
 drivers/net/virtio/virtio_rxtx_simple.c |  85 ++---
 drivers/net/virtio/virtqueue.h  |  67 ++--
 7 files changed, 655 insertions(+), 422 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 63a368a..4d4e59e 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -115,40 +115,62 @@ struct rte_virtio_xstats_name_off {
 };

 /* [rt]x_qX_ is prepended to the name string here */
-static const struct rte_virtio_xstats_name_off rte_virtio_q_stat_strings[] = {
-   {"good_packets",   offsetof(struct virtqueue, packets)},
-   {"good_bytes", offsetof(struct virtqueue, bytes)},
-   {"errors", offsetof(struct virtqueue, errors)},
-   {"multicast_packets",  offsetof(struct virtqueue, multicast)},
-   {"broadcast_packets",  offsetof(struct virtqueue, broadcast)},
-   {"undersize_packets",  offsetof(struct virtqueue, size_bins[0])},
-   {"size_64_packets",offsetof(struct virtqueue, size_bins[1])},
-   {"size_65_127_packets",offsetof(struct virtqueue, size_bins[2])},
-   {"size_128_255_packets",   offsetof(struct virtqueue, size_bins[3])},
-   {"size_256_511_packets",   offsetof(struct virtqueue, size_bins[4])},
-   {"size_512_1023_packets",  offsetof(struct virtqueue, size_bins[5])},
-   {"size_1024_1517_packets", offsetof(struct virtqueue, size_bins[6])},
-   {"size_1518_max_packets",  offsetof(struct virtqueue, size_bins[7])},
+static const struct rte_virtio_xstats_name_off rte_virtio_rxq_stat_strings[] = 
{
+   {"good_packets",   offsetof(struct virtnet_rx, stats.packets)},
+   {"good_bytes", offsetof(struct virtnet_rx, stats.bytes)},
+   {"errors", offsetof(struct virtnet_rx, stats.errors)},
+   {"multicast_packets",  offsetof(struct virtnet_rx, 
stats.multicast)},
+   {"broadcast_packets",  offsetof(struct virtnet_rx, 
stats.broadcast)},
+   {"undersize_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[0])},
+   {"size_64_packets",offsetof(struct virtnet_rx, 
stats.size_bins[1])},
+   {"size_65_127_packets",offsetof(struct virtnet_rx, 
stats.size_bins[2])},
+   {"size_128_255_packets",   offsetof(struct virtnet_rx, 
stats.size_bins[3])},
+   {"size_256_511_packets",   offsetof(struct virtnet_rx, 
stats.size_bins[4])},
+   {"size_512_1023_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[5])},
+   {"size_1024_1517_packets", offsetof(struct virtnet_rx, 
stats.size_bins[6])},
+   {"size_1518_max_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[7])},
 };

-#define VIRTIO_NB_Q_XSTATS (sizeof(rte_virtio_q_stat_strings) / \
-   sizeof(rte_virtio_q_stat_strings[0]))
+/* [rt]x_qX_ is prepended to the name string here */
+static const struct rte_virtio_xstats_name_off rte_virtio_txq_stat_strings[] = 
{
+   {"good_packets",   offsetof(struct virtnet_tx, stats.packets)},
+   {"good_bytes", offsetof(struct virtnet_tx, stats.bytes)},
+   {"errors", offsetof(struct virtnet_tx, stats.errors)},
+   {"multicast_packets",  offsetof(struct virtnet_tx, 
stats.multicast)},
+   {"broadcast_packets",  offsetof(struct virtnet_tx, 
stats.broadcast)},
+   {"undersize_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[0])},
+   {"size_64_packets",offsetof(struct virtnet_tx, 
stats.size_bins[1])},
+   {"size_65_127_packets",offsetof(struct virtnet_tx, 
stats.size_bins[2])},
+   {"size_128_255_packets",   offsetof(struct virtnet_tx, 
stats.size_bins[3])},
+   {"size_256_511_packets",   offsetof(struct virtnet_tx, 
stats.size_bins[4])},
+   {"size_512_1023_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[5])},
+   {"size_1024_1517_packets", offsetof(struct virtnet_tx, 
stats.size_bins[6])},
+   {"size_1518_max_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[7])},
+};
+
+#define VIRTIO_NB_RXQ_XSTATS (sizeof(rte_virtio_rxq_stat_strings) / \
+   sizeof(rte_virtio_rxq_stat_strings[0]))
+#define VIRTIO_NB_TXQ_XSTATS 

[dpdk-dev] [RFC PATCH 0/4]: Implement module information export

2016-05-04 Thread Neil Horman
On Wed, May 04, 2016 at 10:24:18AM +0200, David Marchand wrote:
> On Tue, May 3, 2016 at 1:57 PM, Neil Horman  wrote:
> > On Tue, Apr 26, 2016 at 01:39:47PM -0400, Neil Horman wrote:
> >> Hey-
> >>   So a few days ago we were reviewing Davids patch series to introduce 
> >> the
> >> abiilty to dump hardware support from pmd DSO's in a human readable format.
> >> That effort encountered some problems, most notably the fact that 
> >> stripping a
> >> DSO removed the required information that the proposed tool keyed off, as 
> >> well
> >> as the need to dead reckon offsets between symbols that may not be constant
> >> (dependent on architecture).
> >>
> >>   I was going to start looking into the possibility of creating a 
> >> modinfo
> >> section in a simmilar fashion to what kernel modules do in linux or BSD.  I
> >> decided to propose this solution instead though, because the kernel style
> >> solution requires a significant amount of infrastructure that I think we 
> >> can
> >> maybe avoid maintaining, if we accept some minor caviats
> >>
> >> To do this We emit a set of well known marker symbols for each DSO that an
> >> external application can search for (in this case I called them
> >> this_pmd_driver, where n is a counter macro).  These marker symbols are
> >> n is a counter macro).  These marker symbols are exported by PMDs for
> >> external access.  External tools can then access these symbols via the
> >> dlopen/dlsym api (or via elfutils libraries)
> >>
> >> The symbols above alias the rte_driver struct for each PMD, and the 
> >> external
> >> application can then interrogate the registered driver information.
> >>
> >> I also add a pointer to the pci id table struct for each PMD so that we can
> >> export hardware support.
> >>
> >> This approach has a few pros and cons:
> >>
> >> pros:
> >> 1) Its simple, and doesn't require extra infrastructure to implement.  
> >> E.g. we
> >> don't need a new tool to extract driver information and emit the C code to 
> >> build
> >> the binary data for the special section, nor do we need a custom linker 
> >> script
> >> to link said special section in place
> >>
> >> 2) Its stable.  Because the marker symbols are explicitly exported, this
> >> approach is resilient against stripping.
> >>
> >> cons:
> >> 1) It creates an artifact in that PMD_REGISTER_DRIVER has to be used in one
> >> compilation unit per DSO.  As an example em and igb effectively merge two
> >> drivers into one DSO, and the uses of PMD_REGISTER_DRIVER occur in two 
> >> separate
> >> C files for the same single linked DSO.  Because of the use of the 
> >> __COUNTER__
> >> macro we get multiple definitions of the same marker symbols.
> >>
> >> I would make the argument that the downside of the above artifact isn't 
> >> that big
> >> a deal.  Traditionally in other projects a unit like a module (or DSO in 
> >> our
> >> case) only ever codifies a single driver (e.g. module_init() in the linux 
> >> kernel
> >> is only ever used once per built module).  If we have code like igb/em that
> >> shares some core code, we should build the shared code to object files and 
> >> link
> >> them twice, once to an em.so pmd and again to an igb.so pmd.
> >>
> >> But regardless, I thought I would propose this to see what you all thought 
> >> of
> >> it.
> >>
> >> FWIW, heres sample output of the pmdinfo tool from this series probing the
> >> librte_pmd_ena.so module:
> >>
> >> [nhorman at hmsreliant dpdk]$ ./build/app/pmdinfo
> >> ~/git/dpdk/build/lib/librte_pmd_ena.so
> >> PMD 0 Information:
> >> Driver Name: ena_driver
> >> Driver Type: PCI
> >> |PCI Table|
> >> | VENDOR ID | DEVICE ID | SUBVENDOR ID | SUBDEVICE ID |
> >> |-|
> >> |   1d0f|   ec20|  |  |
> >> |   1d0f|   ec21|  |  |
> >> |-|
> >>
> >>
> >>
> >>
> > Ping, thoughts here?
> 
> - This implementation does not support binaries, so it is not suitable
> for people who don't want dso, this is partially why I used bfd rather
> than just dlopen.
If you're statically linking an application, you should know what hardware you
support already.  Its going to be very hard, if not impossible to develop a
robust solution that works with static binaries (the prior solutions don't work
consistently with static binaries either).  I really think the static solution
needs to just be built into the application (i.e. the application needs to add a
command line option to dump out the pci id's that are registered).

> - The name of the tool "pmdinfo" is likely to cause problems, I would
> say we need to prefix t with dpdk.
Why?

> - How does it behave if we strip the dsos ?
I described this above, its invariant to stripping, because the symbols for each
pmd are explicitly exported, so strip doesn't touch the symbols that pmdinfo
keys off of.


[dpdk-dev] [PATCH v3 3/4] bnx2x: Enhance stats get

2016-05-04 Thread Rasesh Mody
> From: Van Haaren, Harry [mailto:harry.van.haaren at intel.com]
> Sent: Wednesday, April 06, 2016 7:33 AM
> 
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rasesh Mody
> > Subject: [dpdk-dev] [PATCH v3 3/4] bnx2x: Enhance stats get
> 
> Hi Rasesh,
> 
> > +   snprintf(xstats[num].name, sizeof(xstats[num].name),
> "brb_drops");
> 
> I don't understand what a "brb" drop is.
> 
> 
> > +   snprintf(xstats[num].name, sizeof(xstats[num].name), "tx_pfc");
> 
> Similarly here, and with some other of the xstats strings, it doesn't become
> clear to me what exactly the value represents.
> 
> "mac_filter_discard" is descriptive and readable, but the next stat has
> "mf_tag_discard" - these small inconsistencies make it much harder
> (impossible?) to scrap the xstats strings and retrieve useful metadata.
> 
> I'll suggest leaving the xstats implementation part of this patch until the 
> next
> release, and we can align on the names of the stats.
> 
> -Harry

We have re-worked the patches and submitted v4. It incorporates changes to 
rename some of the stats.

Thanks!
Rasesh


[dpdk-dev] [PATCH v2] examples: remove useless checking

2016-05-04 Thread Mauricio Vasquez B
The rte_eth_dev_count() function will never return a value greater
than RTE_MAX_ETHPORTS, so that checking is useless.

Signed-off-by: Mauricio Vasquez B 
---
v2:
 Add missed case in examples/kni/main.c
 app/proc_info/main.c | 4 
 app/test/test_pmd_perf.c | 3 ---
 doc/guides/sample_app_ug/l2_forward_job_stats.rst| 3 ---
 doc/guides/sample_app_ug/l2_forward_real_virtual.rst | 3 ---
 doc/guides/sample_app_ug/link_status_intr.rst| 3 ---
 examples/dpdk_qat/main.c | 2 --
 examples/ip_fragmentation/main.c | 4 +---
 examples/ip_reassembly/main.c| 4 +---
 examples/ipsec-secgw/ipsec-secgw.c   | 4 
 examples/kni/main.c  | 2 --
 examples/l2fwd-crypto/main.c | 3 ---
 examples/l2fwd-ivshmem/host/host.c   | 3 ---
 examples/l2fwd-jobstats/main.c   | 3 ---
 examples/l2fwd-keepalive/main.c  | 3 ---
 examples/l2fwd/main.c| 3 ---
 examples/l3fwd-acl/main.c| 2 --
 examples/l3fwd-power/main.c  | 3 ---
 examples/l3fwd-vf/main.c | 2 --
 examples/l3fwd/main.c| 2 --
 examples/link_status_interrupt/main.c| 3 ---
 examples/multi_process/l2fwd_fork/main.c | 3 ---
 examples/performance-thread/l3fwd-thread/main.c  | 2 --
 examples/tep_termination/main.c  | 2 --
 examples/vhost/main.c| 2 --
 examples/vhost_xen/main.c| 2 --
 examples/vmdq/main.c | 2 --
 examples/vmdq_dcb/main.c | 2 --
 27 files changed, 2 insertions(+), 72 deletions(-)

diff --git a/app/proc_info/main.c b/app/proc_info/main.c
index 341176d..5f83092 100644
--- a/app/proc_info/main.c
+++ b/app/proc_info/main.c
@@ -327,10 +327,6 @@ main(int argc, char **argv)
if (nb_ports == 0)
rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-
-   if (nb_ports > RTE_MAX_ETHPORTS)
-   nb_ports = RTE_MAX_ETHPORTS;
-
/* If no port mask was specified*/
if (enabled_port_mask == 0)
enabled_port_mask = 0x;
diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 59803f7..3d56cd2 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -709,9 +709,6 @@ test_pmd_perf(void)
return -1;
}

-   if (nb_ports > RTE_MAX_ETHPORTS)
-   nb_ports = RTE_MAX_ETHPORTS;
-
nb_lcores = rte_lcore_count();

memset(lcore_conf, 0, sizeof(lcore_conf));
diff --git a/doc/guides/sample_app_ug/l2_forward_job_stats.rst 
b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
index 03d9977..2444e36 100644
--- a/doc/guides/sample_app_ug/l2_forward_job_stats.rst
+++ b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
@@ -238,9 +238,6 @@ in the *DPDK Programmer's Guide* and the *DPDK API 
Reference*.
 if (nb_ports == 0)
 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-if (nb_ports > RTE_MAX_ETHPORTS)
-nb_ports = RTE_MAX_ETHPORTS;
-
 /* reset l2fwd_dst_ports */

 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst 
b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index e77d67c..b51b2dc 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -242,9 +242,6 @@ in the *DPDK Programmer's Guide* - Rel 1.4 EAR and the 
*DPDK API Reference*.
 if (nb_ports == 0)
 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-if (nb_ports > RTE_MAX_ETHPORTS)
-nb_ports = RTE_MAX_ETHPORTS;
-
 /* reset l2fwd_dst_ports */

 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
diff --git a/doc/guides/sample_app_ug/link_status_intr.rst 
b/doc/guides/sample_app_ug/link_status_intr.rst
index a4dbb54..779df97 100644
--- a/doc/guides/sample_app_ug/link_status_intr.rst
+++ b/doc/guides/sample_app_ug/link_status_intr.rst
@@ -145,9 +145,6 @@ To fully understand this code, it is recommended to study 
the chapters that rela
 if (nb_ports == 0)
 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-if (nb_ports > RTE_MAX_ETHPORTS)
-nb_ports = RTE_MAX_ETHPORTS;
-
 /*
  * Each logical core is assigned a dedicated TX queue on each port.
  */
diff --git a/examples/dpdk_qat/main.c b/examples/dpdk_qat/main.c
index dc68989..3c6112d 100644
--- a/examples/dpdk_qat/main.c
+++ b/examples/dpdk_qat/main.c
@@ -661,8 +661,6 @@ main(int argc, char **argv)
return -1;

nb_ports = rte_eth_dev_count();
-   if (nb_ports > RTE_MAX_ETHPORTS)
-