Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-09-01 Thread Robert Hoo
On Fri, 2017-08-25 at 14:24 +, Waskiewicz Jr, Peter wrote:
> On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote:
> >>
> >> Tested with Intel XL710 NIC with Cisco 3172 switch.
> >>
> >> It would be even slightly better if the irqbalance service is turned
> >> off outside.
> > 
> > Yes, if you don't turn-off (kill) irqbalance it will move around the
> > IRQs behind your back...
> 
> Or you can use the --banirq option to irqbalance to ignore your device's 
> interrupts as targets for balancing.

Oh, I wasn't aware of this parameter. I will be glad to have try later.
Meanwhile, in my test above, the irqbalance service just affect result
very little.
> 
> Cheers,
> -PJ




Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-09-01 Thread Robert Hoo
On Sun, 2017-08-27 at 11:25 +0300, Tariq Toukan wrote:
> 
> On 25/08/2017 12:26 PM, Robert Hoo wrote:
> > (Sorry for yesterday's wrong sending, I finally fixed my MTA and git
> > send-email settings.)
> > 
> > It's hard to benchmark 40G+ network bandwidth using ordinary
> > tools like iperf, netperf (see reference 1).
> > Pktgen, packet generator from Kernel sapce, shall be a candidate.
> > I then tried with pktgen multiqueue sample scripts, but still
> > cannot reach line rate.
> 
> Try samples 03 and 04.

Thanks Tariq for review. Sorry for late reply; I do this part time.

Yes, I just tried sample 03 and 04. They can approximately reach 40G
line rate; though still slightly less than my script :) (see my reply
to Jesper).
> 
> > I then derived this NUMA awared irq affinity sample script from
> > multi-queue sample one, successfully benchmarked 40G link. I think this can
> > also be useful for 100G reference, though I haven't got device to test yet.
> > 
> > This script simply does:
> > Detect $DEV's NUMA node belonging.
> > Bind each thread (processor from that NUMA node) with each $DEV queue's
> > irq affinity, 1:1 mapping.
> > How many '-t' threads input determines how many queues will be
> > utilized.
> 
> I agree this is an essential capability.
> This was the main reason I added support for the -f argument.
> Using it, I could choose cores of local NUMA, especially for single 
> thread, or when cores of the NUMA are sequential.

Indeed this argument is very helpful.
Sorry I haven't taken it into consideration in v1. I should consider
this, what if user designate '-f'. I can improve this in v2.
> 
> > 
> > Tested with Intel XL710 NIC with Cisco 3172 switch.
> > 
> > It would be even slightly better if the irqbalance service is turned
> > off outside.
> > 
> > Referrences:
> > https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
> > http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf
> > 
> > Signed-off-by: Robert Hoo 
> > ---
> 
> Regards,
> Tariq Toukan




Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-09-01 Thread Robert Hoo
On Fri, 2017-08-25 at 11:19 +0200, Jesper Dangaard Brouer wrote:
> (please don't use BCC on the netdev list, replies might miss the list in cc)
> 
> Comments inlined below:
> 
> On Fri, 25 Aug 2017 10:24:30 +0800 Robert Hoo  wrote:
> 
> > From: Robert Ho 
> > 
> > It's hard to benchmark 40G+ network bandwidth using ordinary
> > tools like iperf, netperf. I then tried with pktgen multiqueue sample
> > scripts, but still cannot reach line rate.
> 
> The pktgen_sample02_multiqueue.sh does not use burst or skb_cloning.
> Thus, the performance will suffer.
> 
> See the samples that use the burst feature:
>   pktgen_sample03_burst_single_flow.sh
>   pktgen_sample05_flow_per_thread.sh
> 
> With the pktgen "burst" feature, I can easily generate 40G.  Generating
> 100G is also possible, but often you will hit some HW limits before the
> pktgen limit.  I experienced hitting both (1) PCIe Gen3 x8 limit, and (2)
> memory bandwidth limit.

Thanks Jesper for review. Sorry for late reply, I do this part time.

I just tried 'pktgen_sample03_burst_single_flow.sh' and 
'pktgen_sample05_flow_per_thread.sh'
cmd:
./pktgen_sample05_flow_per_thread.sh -i ens801 -s 1500 -m 
3c:fd:fe:9d:6f:f0 -t 2 -v -x -d 192.168.0.107
./pktgen_sample03_burst_single_flow.sh -i ens801 -s 1500 -m 
3c:fd:fe:9d:6f:f0 -t 2 -v -x -d 192.168.0.107

indeed, they can achieve nearly 40G. (though still slightly less than my 
script). pktgen_sample03 and pktgen_sample05 can approximately achieve 
38xxxMb/sec ~ 39xxxMb/sec;
my script can achieve 40xxxMb/sec ~ 41xxxMb/sec. (threads >= 2)

So a general question: is it still necessary to continue my 
sample06_numa_awared_queue_irq_affinity work? as sample03
and sample05 already approximately achieved 40G line rate.

> 
> 
> > I then derived this NUMA awared irq affinity sample script from
> > multi-queue sample one, successfully benchmarked 40G link. I think this can
> > also be useful for 100G reference, though I haven't got device to test.
> 
> Okay, so your issue was really related to NUMA irq affinity.  I do feel
> that IRQ tuning lives outside the realm of the pktgen scripts, but
> looking closer at your script, I it doesn't look like you change the
> IRQ setting which is good.  

Sorry I don't quite understand above. I changed the irq affinities.
See "echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list".
You would not like me to change it? I can restore them to original at the end
of the script.
> 
> You introduce some helper functions take makes it possible to extract
> NUMA information in the shell script code, really cool.  I would like
> to see these functions being integrated into the function.sh file.

Yes, it is doable, if you maintainer think so.
> 
>  
> > This script simply does:
> > Detect $DEV's NUMA node belonging.
> > Bind each thread (processor from that NUMA node) with each $DEV queue's
> > irq affinity, 1:1 mapping.
> > How many '-t' threads input determines how many queues will be
> > utilized.
> > 
> > Tested with Intel XL710 NIC with Cisco 3172 switch.
> > 
> > It would be even slightly better if the irqbalance service is turned
> > off outside.
> 
> Yes, if you don't turn-off (kill) irqbalance it will move around the
> IRQs behind your back...

Yes; while the experiment result turns out it affects just very little.
> 
>  
> > Referrences:
> > https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
> > http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf
> > 
> > Signed-off-by: Robert Hoo 
> > ---
> >  ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 
> > +
> >  1 file changed, 132 insertions(+)
> >  create mode 100755 
> > samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> > 
> > diff --git 
> > a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh 
> > b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> > new file mode 100755
> > index 000..f0ee25c
> > --- /dev/null
> > +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> > @@ -0,0 +1,132 @@
> > +#!/bin/bash
> > +#
> > +# Multiqueue: Using pktgen threads for sending on multiple CPUs
> > +#  * adding devices to kernel threads which are in the same NUMA node
> > +#  * bound devices queue's irq affinity to the threads, 1:1 mapping
> > +#  * notice the naming scheme for keeping device names unique
> > +#  * nameing scheme: dev@thread_number
> > +#  * flow variation via random UDP source port
> > +#
> > +basedir=`dirname $0`
> > +source ${basedir}/functions.sh
> > +root_check_run_with_sudo "$@"
> > +#
> > +# Required param: -i dev in $DEV
> > +source ${basedir}/parameters.sh
> > +
> > +get_iface_node()
> > +{
> > +   echo `cat /sys/class/net/$1/device/numa_node`
> 
> Here you could use the following shell trick to avoid using "cat":
> 
>  echo $( 
> It 

Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-27 Thread Tariq Toukan



On 25/08/2017 12:26 PM, Robert Hoo wrote:

(Sorry for yesterday's wrong sending, I finally fixed my MTA and git
send-email settings.)

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf (see reference 1).
Pktgen, packet generator from Kernel sapce, shall be a candidate.
I then tried with pktgen multiqueue sample scripts, but still
cannot reach line rate.


Try samples 03 and 04.


I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test yet.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.


I agree this is an essential capability.
This was the main reason I added support for the -f argument.
Using it, I could choose cores of local NUMA, especially for single 
thread, or when cores of the NUMA are sequential.




Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo 
---


Regards,
Tariq Toukan


Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-25 Thread Waskiewicz Jr, Peter
On 8/25/17 10:59 AM, Jesper Dangaard Brouer wrote:
> On Fri, 25 Aug 2017 14:24:28 +
> "Waskiewicz Jr, Peter"  wrote:
> 
>> On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote:

 Tested with Intel XL710 NIC with Cisco 3172 switch.

 It would be even slightly better if the irqbalance service is turned
 off outside.
>>>
>>> Yes, if you don't turn-off (kill) irqbalance it will move around the
>>> IRQs behind your back...
>>
>> Or you can use the --banirq option to irqbalance to ignore your device's
>> interrupts as targets for balancing.
> 
> It might be worth mentioning that --banirq=X is specified for each IRQ
> that you want to exclude, and --banirq is simply specified multiple
> times on the command line.
> 
> Is it possible to tell a running irqbalance that I want to excluded an
> extra IRQ? (just before I do my manual adjustment).

It isn't possible today, since we don't have a way to attach a 
foreground/oneshot irqbalance run to a currently-running daemon.  That's 
an interesting feature enhancement...I can add it to our list as a 
feature request so I don't forget about it.  That way I can also get 
Neil's thoughts on this.

-PJ



Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-25 Thread Jesper Dangaard Brouer
On Fri, 25 Aug 2017 14:24:28 +
"Waskiewicz Jr, Peter"  wrote:

> On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote:
> >>
> >> Tested with Intel XL710 NIC with Cisco 3172 switch.
> >>
> >> It would be even slightly better if the irqbalance service is turned
> >> off outside.  
> > 
> > Yes, if you don't turn-off (kill) irqbalance it will move around the
> > IRQs behind your back...  
> 
> Or you can use the --banirq option to irqbalance to ignore your device's 
> interrupts as targets for balancing.

It might be worth mentioning that --banirq=X is specified for each IRQ
that you want to exclude, and --banirq is simply specified multiple
times on the command line.

Is it possible to tell a running irqbalance that I want to excluded an
extra IRQ? (just before I do my manual adjustment).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-25 Thread Waskiewicz Jr, Peter
On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote:
>>
>> Tested with Intel XL710 NIC with Cisco 3172 switch.
>>
>> It would be even slightly better if the irqbalance service is turned
>> off outside.
> 
> Yes, if you don't turn-off (kill) irqbalance it will move around the
> IRQs behind your back...

Or you can use the --banirq option to irqbalance to ignore your device's 
interrupts as targets for balancing.

Cheers,
-PJ


Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-25 Thread Jesper Dangaard Brouer
On Fri, 25 Aug 2017 17:26:36 +0800
Robert Hoo  wrote:

> (Sorry for yesterday's wrong sending, I finally fixed my MTA and git
> send-email settings.)

Please see my reply in:
  http://lkml.kernel.org/r/20170825111921.06171...@redhat.com

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-25 Thread Robert Hoo
(Sorry for yesterday's wrong sending, I finally fixed my MTA and git
send-email settings.)

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf (see reference 1). 
Pktgen, packet generator from Kernel sapce, shall be a candidate.
I then tried with pktgen multiqueue sample scripts, but still
cannot reach line rate.
I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test yet.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.

Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo 
---
 ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 +
 1 file changed, 132 insertions(+)
 create mode 100755 
samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh

diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh 
b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
new file mode 100755
index 000..f0ee25c
--- /dev/null
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Multiqueue: Using pktgen threads for sending on multiple CPUs
+#  * adding devices to kernel threads which are in the same NUMA node
+#  * bound devices queue's irq affinity to the threads, 1:1 mapping
+#  * notice the naming scheme for keeping device names unique
+#  * nameing scheme: dev@thread_number
+#  * flow variation via random UDP source port
+#
+basedir=`dirname $0`
+source ${basedir}/functions.sh
+root_check_run_with_sudo "$@"
+#
+# Required param: -i dev in $DEV
+source ${basedir}/parameters.sh
+
+get_iface_node()
+{
+   echo `cat /sys/class/net/$1/device/numa_node`
+}
+
+get_iface_irqs()
+{
+   local IFACE=$1
+   local queues="${IFACE}-.*TxRx"
+
+   irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
+   [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
+   [ -z "$irqs" ] && irqs=$(for i in `ls -Ux 
/sys/class/net/$IFACE/device/msi_irqs` ;\
+   do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 
-d : ;\
+   done)
+   [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"
+
+   echo $irqs
+}
+
+get_node_cpus()
+{
+   local node=$1
+   local node_cpu_list
+   local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \
+   /sys/devices/system/node/node$node/cpulist`
+
+   for cpu_range in $node_cpu_range_list
+   do
+   node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }`
+   done
+
+   echo $node_cpu_list
+}
+
+
+# Base Config
+DELAY="0"# Zero means max speed
+COUNT="2000"   # Zero means indefinitely
+[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
+
+# Flow variation random source port between min and max
+UDP_MIN=9
+UDP_MAX=109
+
+node=`get_iface_node $DEV`
+irq_array=(`get_iface_irqs $DEV`)
+cpu_array=(`get_node_cpus $node`)
+
+[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]}  ] && \
+   err 1 "Thread number $THREADS exceeds: min 
(${#irq_array[*]},${#cpu_array[*]})"
+
+# (example of setting default params in your script)
+if [ -z "$DEST_IP" ]; then
+[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
+fi
+[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+
+# General cleanup everything since last run
+pg_ctrl "reset"
+
+# Threads are specified with parameter -t value in $THREADS
+for ((i = 0; i < $THREADS; i++)); do
+# The device name is extended with @name, using thread number to
+# make then unique, but any name will do.
+# Set the queue's irq affinity to this $thread (processor)
+thread=${cpu_array[$i]}
+dev=${DEV}@${thread}
+echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list
+echo "irq ${irq_array[$i]} is set affinity to `cat 
/proc/irq/${irq_array[$i]}/smp_affinity_list`"
+
+# Add remove all other devices and add_device $dev to thread
+pg_thread $thread "rem_device_all"
+pg_thread $thread "add_device" $dev
+
+# select queue and bind the queue and $dev in 1:1 relationship
+queue_num=$i
+echo "queue number is $queue_num"
+pg_set $dev "queue_map_min $queue_num"
+pg_set $dev "queue_map_max $queue_num"
+
+# Notice config queue to map to cpu (mirrors smp_processor_id())
+# It is beneficial to map IRQ 

Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-25 Thread Jesper Dangaard Brouer

(please don't use BCC on the netdev list, replies might miss the list in cc)

Comments inlined below:

On Fri, 25 Aug 2017 10:24:30 +0800 Robert Hoo  wrote:

> From: Robert Ho 
> 
> It's hard to benchmark 40G+ network bandwidth using ordinary
> tools like iperf, netperf. I then tried with pktgen multiqueue sample
> scripts, but still cannot reach line rate.

The pktgen_sample02_multiqueue.sh does not use burst or skb_cloning.
Thus, the performance will suffer.

See the samples that use the burst feature:
  pktgen_sample03_burst_single_flow.sh
  pktgen_sample05_flow_per_thread.sh

With the pktgen "burst" feature, I can easily generate 40G.  Generating
100G is also possible, but often you will hit some HW limits before the
pktgen limit.  I experienced hitting both (1) PCIe Gen3 x8 limit, and (2)
memory bandwidth limit.


> I then derived this NUMA awared irq affinity sample script from
> multi-queue sample one, successfully benchmarked 40G link. I think this can
> also be useful for 100G reference, though I haven't got device to test.

Okay, so your issue was really related to NUMA irq affinity.  I do feel
that IRQ tuning lives outside the realm of the pktgen scripts, but
looking closer at your script, I it doesn't look like you change the
IRQ setting which is good.  

You introduce some helper functions take makes it possible to extract
NUMA information in the shell script code, really cool.  I would like
to see these functions being integrated into the function.sh file.

 
> This script simply does:
> Detect $DEV's NUMA node belonging.
> Bind each thread (processor from that NUMA node) with each $DEV queue's
> irq affinity, 1:1 mapping.
> How many '-t' threads input determines how many queues will be
> utilized.
> 
> Tested with Intel XL710 NIC with Cisco 3172 switch.
> 
> It would be even slightly better if the irqbalance service is turned
> off outside.

Yes, if you don't turn-off (kill) irqbalance it will move around the
IRQs behind your back...

 
> Referrences:
> https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
> http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf
> 
> Signed-off-by: Robert Hoo 
> ---
>  ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 
> +
>  1 file changed, 132 insertions(+)
>  create mode 100755 
> samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> 
> diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh 
> b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> new file mode 100755
> index 000..f0ee25c
> --- /dev/null
> +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> @@ -0,0 +1,132 @@
> +#!/bin/bash
> +#
> +# Multiqueue: Using pktgen threads for sending on multiple CPUs
> +#  * adding devices to kernel threads which are in the same NUMA node
> +#  * bound devices queue's irq affinity to the threads, 1:1 mapping
> +#  * notice the naming scheme for keeping device names unique
> +#  * nameing scheme: dev@thread_number
> +#  * flow variation via random UDP source port
> +#
> +basedir=`dirname $0`
> +source ${basedir}/functions.sh
> +root_check_run_with_sudo "$@"
> +#
> +# Required param: -i dev in $DEV
> +source ${basedir}/parameters.sh
> +
> +get_iface_node()
> +{
> + echo `cat /sys/class/net/$1/device/numa_node`

Here you could use the following shell trick to avoid using "cat":

 echo $( +}
> +
> +get_iface_irqs()
> +{
> + local IFACE=$1
> + local queues="${IFACE}-.*TxRx"
> +
> + irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
> + [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
> + [ -z "$irqs" ] && irqs=$(for i in `ls -Ux 
> /sys/class/net/$IFACE/device/msi_irqs` ;\
> + do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 
> -d : ;\
> + done)

Nice that you handle all these different methods.  I personally look
in /proc/irq/*/$IFACE*/../smp_affinity_list , like (copy-paste):

echo " --- Align IRQs ---"
# I've named my NICs ixgbe1 + ixgbe2
for F in /proc/irq/*/ixgbe*-TxRx-*/../smp_affinity_list; do
   # Extract irqname e.g. "ixgbe2-TxRx-2"
   irqname=$(basename $(dirname $(dirname $F))) ;
   # Substring pattern removal
   hwq_nr=${irqname#*-*-}
   echo $hwq_nr > $F
   #grep . -H $F;
done
grep -H . /proc/irq/*/ixgbe*/../smp_affinity_list

Maybe I should switch to use:
   /sys/class/net/$IFACE/device/msi_irqs/*
 

> + [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"

In the error case you should let the script die.  There is a helper
function for this called "err" (where first arg is the exitcode, which
is useful to detect the reason your script failed).


> + echo $irqs
> +}

> +get_node_cpus()
> +{
> + local node=$1
> + local node_cpu_list
> + local node_cpu_range_list=`cut 

[PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-24 Thread Robert Hoo
From: Robert Ho 

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf. I then tried with pktgen multiqueue sample
scripts, but still cannot reach line rate.
I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.

Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo 
---
 ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 +
 1 file changed, 132 insertions(+)
 create mode 100755 
samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh

diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh 
b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
new file mode 100755
index 000..f0ee25c
--- /dev/null
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Multiqueue: Using pktgen threads for sending on multiple CPUs
+#  * adding devices to kernel threads which are in the same NUMA node
+#  * bound devices queue's irq affinity to the threads, 1:1 mapping
+#  * notice the naming scheme for keeping device names unique
+#  * nameing scheme: dev@thread_number
+#  * flow variation via random UDP source port
+#
+basedir=`dirname $0`
+source ${basedir}/functions.sh
+root_check_run_with_sudo "$@"
+#
+# Required param: -i dev in $DEV
+source ${basedir}/parameters.sh
+
+get_iface_node()
+{
+   echo `cat /sys/class/net/$1/device/numa_node`
+}
+
+get_iface_irqs()
+{
+   local IFACE=$1
+   local queues="${IFACE}-.*TxRx"
+
+   irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
+   [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
+   [ -z "$irqs" ] && irqs=$(for i in `ls -Ux 
/sys/class/net/$IFACE/device/msi_irqs` ;\
+   do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 
-d : ;\
+   done)
+   [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"
+
+   echo $irqs
+}
+
+get_node_cpus()
+{
+   local node=$1
+   local node_cpu_list
+   local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \
+   /sys/devices/system/node/node$node/cpulist`
+
+   for cpu_range in $node_cpu_range_list
+   do
+   node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }`
+   done
+
+   echo $node_cpu_list
+}
+
+
+# Base Config
+DELAY="0"# Zero means max speed
+COUNT="2000"   # Zero means indefinitely
+[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
+
+# Flow variation random source port between min and max
+UDP_MIN=9
+UDP_MAX=109
+
+node=`get_iface_node $DEV`
+irq_array=(`get_iface_irqs $DEV`)
+cpu_array=(`get_node_cpus $node`)
+
+[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]}  ] && \
+   err 1 "Thread number $THREADS exceeds: min 
(${#irq_array[*]},${#cpu_array[*]})"
+
+# (example of setting default params in your script)
+if [ -z "$DEST_IP" ]; then
+[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
+fi
+[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+
+# General cleanup everything since last run
+pg_ctrl "reset"
+
+# Threads are specified with parameter -t value in $THREADS
+for ((i = 0; i < $THREADS; i++)); do
+# The device name is extended with @name, using thread number to
+# make then unique, but any name will do.
+# Set the queue's irq affinity to this $thread (processor)
+thread=${cpu_array[$i]}
+dev=${DEV}@${thread}
+echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list
+echo "irq ${irq_array[$i]} is set affinity to `cat 
/proc/irq/${irq_array[$i]}/smp_affinity_list`"
+
+# Add remove all other devices and add_device $dev to thread
+pg_thread $thread "rem_device_all"
+pg_thread $thread "add_device" $dev
+
+# select queue and bind the queue and $dev in 1:1 relationship
+queue_num=$i
+echo "queue number is $queue_num"
+pg_set $dev "queue_map_min $queue_num"
+pg_set $dev "queue_map_max $queue_num"
+
+# Notice config queue to map to cpu (mirrors smp_processor_id())
+# It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number
+pg_set $dev "flag QUEUE_MAP_CPU"
+
+# Base config of dev
+pg_set $dev "count $COUNT"
+pg_set $dev