[dpdk-dev] [PATCH] ring: fix sc dequeue performance issue

2016-07-24 Thread Jerin Jacob
Use of rte_smb_wmb() instead of rte_smb_rmb() in sc dequeue
function creates the additional overhead of waiting for
all the STOREs to be completed to local buffer from ring buffer
memory. The sc dequeue function demands only LOAD-STORE barrier
where LOADs from ring buffer memory needs to be
completed before tail pointer update. Changing to rte_smb_rmb()
to enable the required LOAD-STORE barrier.

Fixes: ecc7d10e448e ("ring: guarantee dequeue ordering before tail update")

Signed-off-by: Jerin Jacob 
---
 lib/librte_ring/rte_ring.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f928324..0e22e69 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -756,7 +756,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void 
**obj_table,

/* copy in table */
DEQUEUE_PTRS();
-   rte_smp_wmb();
+   rte_smp_rmb();

__RING_STAT_ADD(r, deq_success, n);
r->cons.tail = cons_next;
-- 
2.5.5



[dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count

2016-07-24 Thread Wiles, Keith


Sent from my iPhone

> On Jul 21, 2016, at 1:34 PM, Neil Horman  wrote:
> 
>> On Thu, Jul 21, 2016 at 03:22:45PM +, Wiles, Keith wrote:
>> 
>>> On Jul 21, 2016, at 10:06 AM, Neil Horman  wrote:
>>> 
>>> On Thu, Jul 21, 2016 at 02:09:19PM +, Wiles, Keith wrote:
 
> On Jul 21, 2016, at 8:54 AM, Neil Horman  wrote:
> 
> On Wed, Jul 20, 2016 at 10:32:28PM +, Wiles, Keith wrote:
>> 
>>> On Jul 20, 2016, at 3:16 PM, Neil Horman  
>>> wrote:
>>> 
>>> On Wed, Jul 20, 2016 at 07:47:32PM +, Wiles, Keith wrote:
 
> On Jul 20, 2016, at 12:48 PM, Neil Horman  
> wrote:
> 
> On Wed, Jul 20, 2016 at 07:40:49PM +0200, Thomas Monjalon wrote:
>> 2016-07-20 13:09, Neil Horman:
>>> From: Neil Horman 
>>> 
>>> John Mcnamara and I were discussing enhacing the validate_abi 
>>> script to build
>>> the dpdk tree faster with multiple jobs.  Theres no reason not to 
>>> do it, so this
>>> implements that requirement.  It uses a MAKE_JOBS variable that can 
>>> be set by
>>> the user to limit the job count.  By default the job count is set 
>>> to the number
>>> of online cpus.
>> 
>> Please could you use the variable name DPDK_MAKE_JOBS?
>> This name is already used in scripts/test-build.sh.
> Sure
> 
>>> +if [ -z "$MAKE_JOBS" ]
>>> +then
>>> +# This counts the number of cpus on the system
>>> +MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
>>> +fi
>> 
>> Is lscpu common enough?
> I'm not sure how to answer that.  lscpu is part of the util-linux 
> package, which
> is part of any base install.  Theres a variant for BSD, but I'm not 
> sure how
> common it is there.
> Neil
> 
>> Another acceptable default would be just "-j" without any number.
>> It would make the number of jobs unlimited.
 
 I think the best is just use -j as it tries to use the correct number 
 of jobs based on the number of cores, right?
>>> -j with no argument (or -j 0), is sort of, maybe what you want.  With 
>>> either of
>>> those options, make will just issue jobs as fast as it processes 
>>> dependencies.
>>> Dependent on how parallel the build is, that can lead to tons of 
>>> waiting process
>>> (i.e. more than your number of online cpus), which can actually hurt 
>>> your build
>>> time.
>> 
>> I read the manual and looked at the code, which supports your statement. 
>> (I think I had some statement on stack overflow and the last time I 
>> believe anything on the internet :-) I have not seen a lot of 
>> differences in compile times with -j on my system. Mostly I suspect it 
>> is the number of paths in the dependency, cores and memory on the system.
>> 
>> I have 72 lcores or 2 sockets, 18 cores per socket. Xeon 2.3Ghz cores.
>> 
>> $ export RTE_TARGET=x86_64-native-linuxapp-gcc 
>> 
>> $ time make install T=${RTE_TARGET}
>> real0m59.445s user0m27.344s sys0m7.040s
>> 
>> $ time make install T=${RTE_TARGET} -j
>> real0m26.584s user0m14.380s sys0m5.120s
>> 
>> # Remove the x86_64-native-linuxapp-gcc
>> 
>> $ time make install T=${RTE_TARGET} -j 72
>> real0m23.454s user0m10.832s sys0m4.664s
>> 
>> $ time make install T=${RTE_TARGET} -j 8
>> real0m23.812s user0m10.672s sys0m4.276s
>> 
>> cd x86_64-native-linuxapp-gcc
>> $ make clean
>> $ time make
>> real0m28.539s user0m9.820s sys0m3.620s
>> 
>> # Do a make clean between each build.
>> 
>> $ time make -j
>> real0m7.217s user0m6.532s sys0m2.332s
>> 
>> $ time make -j 8
>> real0m8.256s user0m6.472s sys0m2.456s
>> 
>> $ time make -j 72
>> real0m6.866s user0m6.184s sys0m2.216s
>> 
>> Just the real time numbers in the following table.
>> 
>> processes real Time   depdirs
>>   no -j 59.4sYes
>> -j 8 23.8sYes
>>-j 7223.5sYes
>>  -j   26.5sYes
>> 
>>   no -j 28.5s No
>> -j 8   8.2s No
>>-j 72  6.8s No
>>  -j 7.2s No
>> 
>> Looks like the depdirs build time on my system:
>> $ make clean -j
>> $ rm .depdirs
>> $ time make -j
>> real0m23.734s user0m11.228s sys0m4.844s
>> 
>> About 16 seconds, which is not a lot of savings. Now the difference from 
>> no -j to -j is a lot, but the difference between -j and -j  
>> is not a huge saving. This leads me back to 

[dpdk-dev] Updating http://dpdk.org/doc/nics

2016-07-24 Thread Ajit Khaparde
I don't see the Broadcom NICs listed in the list of supported NICs.
Can you add an entry for the Broadcom NICs supported by the bnxt PMD driver?

Thanks
Ajit


[dpdk-dev] [PATCH] ring: fix sc dequeue performance issue

2016-07-24 Thread Ananyev, Konstantin


> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Sunday, July 24, 2016 6:08 PM
> To: dev at dpdk.org
> Cc: thomas.monjalon at 6wind.com; Ananyev, Konstantin  intel.com>; Jerin Jacob
> 
> Subject: [dpdk-dev] [PATCH] ring: fix sc dequeue performance issue
> 
> Use of rte_smb_wmb() instead of rte_smb_rmb() in sc dequeue function creates 
> the additional overhead of waiting for all the STOREs to be
> completed to local buffer from ring buffer memory. The sc dequeue function 
> demands only LOAD-STORE barrier where LOADs from ring
> buffer memory needs to be completed before tail pointer update. Changing to 
> rte_smb_rmb() to enable the required LOAD-STORE barrier.
> 
> Fixes: ecc7d10e448e ("ring: guarantee dequeue ordering before tail update")
> 
> Signed-off-by: Jerin Jacob 
> ---
>  lib/librte_ring/rte_ring.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index 
> f928324..0e22e69 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -756,7 +756,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void 
> **obj_table,
> 
>   /* copy in table */
>   DEQUEUE_PTRS();
> - rte_smp_wmb();
> + rte_smp_rmb();
> 
>   __RING_STAT_ADD(r, deq_success, n);
>   r->cons.tail = cons_next;
> --

Acked-by: Konstantin Ananyev 

> 2.5.5