Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On Fri, 2017-08-25 at 14:24 +, Waskiewicz Jr, Peter wrote: > On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote: > >> > >> Tested with Intel XL710 NIC with Cisco 3172 switch. > >> > >> It would be even slightly better if the irqbalance service is turned > >> off outside. > > > > Yes, if you don't turn-off (kill) irqbalance it will move around the > > IRQs behind your back... > > Or you can use the --banirq option to irqbalance to ignore your device's > interrupts as targets for balancing. Oh, I wasn't aware of this parameter. I will be glad to have try later. Meanwhile, in my test above, the irqbalance service just affect result very little. > > Cheers, > -PJ
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On Sun, 2017-08-27 at 11:25 +0300, Tariq Toukan wrote: > > On 25/08/2017 12:26 PM, Robert Hoo wrote: > > (Sorry for yesterday's wrong sending, I finally fixed my MTA and git > > send-email settings.) > > > > It's hard to benchmark 40G+ network bandwidth using ordinary > > tools like iperf, netperf (see reference 1). > > Pktgen, packet generator from Kernel sapce, shall be a candidate. > > I then tried with pktgen multiqueue sample scripts, but still > > cannot reach line rate. > > Try samples 03 and 04. Thanks Tariq for review. Sorry for late reply; I do this part time. Yes, I just tried sample 03 and 04. They can approximately reach 40G line rate; though still slightly less than my script :) (see my reply to Jesper). > > > I then derived this NUMA awared irq affinity sample script from > > multi-queue sample one, successfully benchmarked 40G link. I think this can > > also be useful for 100G reference, though I haven't got device to test yet. > > > > This script simply does: > > Detect $DEV's NUMA node belonging. > > Bind each thread (processor from that NUMA node) with each $DEV queue's > > irq affinity, 1:1 mapping. > > How many '-t' threads input determines how many queues will be > > utilized. > > I agree this is an essential capability. > This was the main reason I added support for the -f argument. > Using it, I could choose cores of local NUMA, especially for single > thread, or when cores of the NUMA are sequential. Indeed this argument is very helpful. Sorry I haven't taken it into consideration in v1. I should consider this, what if user designate '-f'. I can improve this in v2. > > > > > Tested with Intel XL710 NIC with Cisco 3172 switch. > > > > It would be even slightly better if the irqbalance service is turned > > off outside. > > > > Referrences: > > https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf > > http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf > > > > Signed-off-by: Robert Hoo> > --- > > Regards, > Tariq Toukan
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On Fri, 2017-08-25 at 11:19 +0200, Jesper Dangaard Brouer wrote: > (please don't use BCC on the netdev list, replies might miss the list in cc) > > Comments inlined below: > > On Fri, 25 Aug 2017 10:24:30 +0800 Robert Hoowrote: > > > From: Robert Ho > > > > It's hard to benchmark 40G+ network bandwidth using ordinary > > tools like iperf, netperf. I then tried with pktgen multiqueue sample > > scripts, but still cannot reach line rate. > > The pktgen_sample02_multiqueue.sh does not use burst or skb_cloning. > Thus, the performance will suffer. > > See the samples that use the burst feature: > pktgen_sample03_burst_single_flow.sh > pktgen_sample05_flow_per_thread.sh > > With the pktgen "burst" feature, I can easily generate 40G. Generating > 100G is also possible, but often you will hit some HW limits before the > pktgen limit. I experienced hitting both (1) PCIe Gen3 x8 limit, and (2) > memory bandwidth limit. Thanks Jesper for review. Sorry for late reply, I do this part time. I just tried 'pktgen_sample03_burst_single_flow.sh' and 'pktgen_sample05_flow_per_thread.sh' cmd: ./pktgen_sample05_flow_per_thread.sh -i ens801 -s 1500 -m 3c:fd:fe:9d:6f:f0 -t 2 -v -x -d 192.168.0.107 ./pktgen_sample03_burst_single_flow.sh -i ens801 -s 1500 -m 3c:fd:fe:9d:6f:f0 -t 2 -v -x -d 192.168.0.107 indeed, they can achieve nearly 40G. (though still slightly less than my script). pktgen_sample03 and pktgen_sample05 can approximately achieve 38xxxMb/sec ~ 39xxxMb/sec; my script can achieve 40xxxMb/sec ~ 41xxxMb/sec. (threads >= 2) So a general question: is it still necessary to continue my sample06_numa_awared_queue_irq_affinity work? as sample03 and sample05 already approximately achieved 40G line rate. > > > > I then derived this NUMA awared irq affinity sample script from > > multi-queue sample one, successfully benchmarked 40G link. I think this can > > also be useful for 100G reference, though I haven't got device to test. > > Okay, so your issue was really related to NUMA irq affinity. I do feel > that IRQ tuning lives outside the realm of the pktgen scripts, but > looking closer at your script, I it doesn't look like you change the > IRQ setting which is good. Sorry I don't quite understand above. I changed the irq affinities. See "echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list". You would not like me to change it? I can restore them to original at the end of the script. > > You introduce some helper functions take makes it possible to extract > NUMA information in the shell script code, really cool. I would like > to see these functions being integrated into the function.sh file. Yes, it is doable, if you maintainer think so. > > > > This script simply does: > > Detect $DEV's NUMA node belonging. > > Bind each thread (processor from that NUMA node) with each $DEV queue's > > irq affinity, 1:1 mapping. > > How many '-t' threads input determines how many queues will be > > utilized. > > > > Tested with Intel XL710 NIC with Cisco 3172 switch. > > > > It would be even slightly better if the irqbalance service is turned > > off outside. > > Yes, if you don't turn-off (kill) irqbalance it will move around the > IRQs behind your back... Yes; while the experiment result turns out it affects just very little. > > > > Referrences: > > https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf > > http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf > > > > Signed-off-by: Robert Hoo > > --- > > ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 > > + > > 1 file changed, 132 insertions(+) > > create mode 100755 > > samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > > > > diff --git > > a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > > b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > > new file mode 100755 > > index 000..f0ee25c > > --- /dev/null > > +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > > @@ -0,0 +1,132 @@ > > +#!/bin/bash > > +# > > +# Multiqueue: Using pktgen threads for sending on multiple CPUs > > +# * adding devices to kernel threads which are in the same NUMA node > > +# * bound devices queue's irq affinity to the threads, 1:1 mapping > > +# * notice the naming scheme for keeping device names unique > > +# * nameing scheme: dev@thread_number > > +# * flow variation via random UDP source port > > +# > > +basedir=`dirname $0` > > +source ${basedir}/functions.sh > > +root_check_run_with_sudo "$@" > > +# > > +# Required param: -i dev in $DEV > > +source ${basedir}/parameters.sh > > + > > +get_iface_node() > > +{ > > + echo `cat /sys/class/net/$1/device/numa_node` > > Here you could use the following shell trick to avoid using "cat": > > echo $( > It
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On 25/08/2017 12:26 PM, Robert Hoo wrote: (Sorry for yesterday's wrong sending, I finally fixed my MTA and git send-email settings.) It's hard to benchmark 40G+ network bandwidth using ordinary tools like iperf, netperf (see reference 1). Pktgen, packet generator from Kernel sapce, shall be a candidate. I then tried with pktgen multiqueue sample scripts, but still cannot reach line rate. Try samples 03 and 04. I then derived this NUMA awared irq affinity sample script from multi-queue sample one, successfully benchmarked 40G link. I think this can also be useful for 100G reference, though I haven't got device to test yet. This script simply does: Detect $DEV's NUMA node belonging. Bind each thread (processor from that NUMA node) with each $DEV queue's irq affinity, 1:1 mapping. How many '-t' threads input determines how many queues will be utilized. I agree this is an essential capability. This was the main reason I added support for the -f argument. Using it, I could choose cores of local NUMA, especially for single thread, or when cores of the NUMA are sequential. Tested with Intel XL710 NIC with Cisco 3172 switch. It would be even slightly better if the irqbalance service is turned off outside. Referrences: https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf Signed-off-by: Robert Hoo--- Regards, Tariq Toukan
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On 8/25/17 10:59 AM, Jesper Dangaard Brouer wrote: > On Fri, 25 Aug 2017 14:24:28 + > "Waskiewicz Jr, Peter"wrote: > >> On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote: Tested with Intel XL710 NIC with Cisco 3172 switch. It would be even slightly better if the irqbalance service is turned off outside. >>> >>> Yes, if you don't turn-off (kill) irqbalance it will move around the >>> IRQs behind your back... >> >> Or you can use the --banirq option to irqbalance to ignore your device's >> interrupts as targets for balancing. > > It might be worth mentioning that --banirq=X is specified for each IRQ > that you want to exclude, and --banirq is simply specified multiple > times on the command line. > > Is it possible to tell a running irqbalance that I want to excluded an > extra IRQ? (just before I do my manual adjustment). It isn't possible today, since we don't have a way to attach a foreground/oneshot irqbalance run to a currently-running daemon. That's an interesting feature enhancement...I can add it to our list as a feature request so I don't forget about it. That way I can also get Neil's thoughts on this. -PJ
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On Fri, 25 Aug 2017 14:24:28 + "Waskiewicz Jr, Peter"wrote: > On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote: > >> > >> Tested with Intel XL710 NIC with Cisco 3172 switch. > >> > >> It would be even slightly better if the irqbalance service is turned > >> off outside. > > > > Yes, if you don't turn-off (kill) irqbalance it will move around the > > IRQs behind your back... > > Or you can use the --banirq option to irqbalance to ignore your device's > interrupts as targets for balancing. It might be worth mentioning that --banirq=X is specified for each IRQ that you want to exclude, and --banirq is simply specified multiple times on the command line. Is it possible to tell a running irqbalance that I want to excluded an extra IRQ? (just before I do my manual adjustment). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On 8/25/17 5:19 AM, Jesper Dangaard Brouer wrote: >> >> Tested with Intel XL710 NIC with Cisco 3172 switch. >> >> It would be even slightly better if the irqbalance service is turned >> off outside. > > Yes, if you don't turn-off (kill) irqbalance it will move around the > IRQs behind your back... Or you can use the --banirq option to irqbalance to ignore your device's interrupts as targets for balancing. Cheers, -PJ
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On Fri, 25 Aug 2017 17:26:36 +0800 Robert Hoowrote: > (Sorry for yesterday's wrong sending, I finally fixed my MTA and git > send-email settings.) Please see my reply in: http://lkml.kernel.org/r/20170825111921.06171...@redhat.com -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[PATCH] pktgen: add a new sample script for 40G and above link testing
(Sorry for yesterday's wrong sending, I finally fixed my MTA and git send-email settings.) It's hard to benchmark 40G+ network bandwidth using ordinary tools like iperf, netperf (see reference 1). Pktgen, packet generator from Kernel sapce, shall be a candidate. I then tried with pktgen multiqueue sample scripts, but still cannot reach line rate. I then derived this NUMA awared irq affinity sample script from multi-queue sample one, successfully benchmarked 40G link. I think this can also be useful for 100G reference, though I haven't got device to test yet. This script simply does: Detect $DEV's NUMA node belonging. Bind each thread (processor from that NUMA node) with each $DEV queue's irq affinity, 1:1 mapping. How many '-t' threads input determines how many queues will be utilized. Tested with Intel XL710 NIC with Cisco 3172 switch. It would be even slightly better if the irqbalance service is turned off outside. Referrences: https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf Signed-off-by: Robert Hoo--- ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 + 1 file changed, 132 insertions(+) create mode 100755 samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh new file mode 100755 index 000..f0ee25c --- /dev/null +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh @@ -0,0 +1,132 @@ +#!/bin/bash +# +# Multiqueue: Using pktgen threads for sending on multiple CPUs +# * adding devices to kernel threads which are in the same NUMA node +# * bound devices queue's irq affinity to the threads, 1:1 mapping +# * notice the naming scheme for keeping device names unique +# * nameing scheme: dev@thread_number +# * flow variation via random UDP source port +# +basedir=`dirname $0` +source ${basedir}/functions.sh +root_check_run_with_sudo "$@" +# +# Required param: -i dev in $DEV +source ${basedir}/parameters.sh + +get_iface_node() +{ + echo `cat /sys/class/net/$1/device/numa_node` +} + +get_iface_irqs() +{ + local IFACE=$1 + local queues="${IFACE}-.*TxRx" + + irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:) + [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:) + [ -z "$irqs" ] && irqs=$(for i in `ls -Ux /sys/class/net/$IFACE/device/msi_irqs` ;\ + do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 -d : ;\ + done) + [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE" + + echo $irqs +} + +get_node_cpus() +{ + local node=$1 + local node_cpu_list + local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \ + /sys/devices/system/node/node$node/cpulist` + + for cpu_range in $node_cpu_range_list + do + node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }` + done + + echo $node_cpu_list +} + + +# Base Config +DELAY="0"# Zero means max speed +COUNT="2000" # Zero means indefinitely +[ -z "$CLONE_SKB" ] && CLONE_SKB="0" + +# Flow variation random source port between min and max +UDP_MIN=9 +UDP_MAX=109 + +node=`get_iface_node $DEV` +irq_array=(`get_iface_irqs $DEV`) +cpu_array=(`get_node_cpus $node`) + +[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]} ] && \ + err 1 "Thread number $THREADS exceeds: min (${#irq_array[*]},${#cpu_array[*]})" + +# (example of setting default params in your script) +if [ -z "$DEST_IP" ]; then +[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1" +fi +[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff" + +# General cleanup everything since last run +pg_ctrl "reset" + +# Threads are specified with parameter -t value in $THREADS +for ((i = 0; i < $THREADS; i++)); do +# The device name is extended with @name, using thread number to +# make then unique, but any name will do. +# Set the queue's irq affinity to this $thread (processor) +thread=${cpu_array[$i]} +dev=${DEV}@${thread} +echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list +echo "irq ${irq_array[$i]} is set affinity to `cat /proc/irq/${irq_array[$i]}/smp_affinity_list`" + +# Add remove all other devices and add_device $dev to thread +pg_thread $thread "rem_device_all" +pg_thread $thread "add_device" $dev + +# select queue and bind the queue and $dev in 1:1 relationship +queue_num=$i +echo "queue number is $queue_num" +pg_set $dev "queue_map_min $queue_num" +pg_set $dev "queue_map_max $queue_num" + +# Notice config queue to map to cpu (mirrors smp_processor_id()) +# It is beneficial to map IRQ
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
(please don't use BCC on the netdev list, replies might miss the list in cc) Comments inlined below: On Fri, 25 Aug 2017 10:24:30 +0800 Robert Hoowrote: > From: Robert Ho > > It's hard to benchmark 40G+ network bandwidth using ordinary > tools like iperf, netperf. I then tried with pktgen multiqueue sample > scripts, but still cannot reach line rate. The pktgen_sample02_multiqueue.sh does not use burst or skb_cloning. Thus, the performance will suffer. See the samples that use the burst feature: pktgen_sample03_burst_single_flow.sh pktgen_sample05_flow_per_thread.sh With the pktgen "burst" feature, I can easily generate 40G. Generating 100G is also possible, but often you will hit some HW limits before the pktgen limit. I experienced hitting both (1) PCIe Gen3 x8 limit, and (2) memory bandwidth limit. > I then derived this NUMA awared irq affinity sample script from > multi-queue sample one, successfully benchmarked 40G link. I think this can > also be useful for 100G reference, though I haven't got device to test. Okay, so your issue was really related to NUMA irq affinity. I do feel that IRQ tuning lives outside the realm of the pktgen scripts, but looking closer at your script, I it doesn't look like you change the IRQ setting which is good. You introduce some helper functions take makes it possible to extract NUMA information in the shell script code, really cool. I would like to see these functions being integrated into the function.sh file. > This script simply does: > Detect $DEV's NUMA node belonging. > Bind each thread (processor from that NUMA node) with each $DEV queue's > irq affinity, 1:1 mapping. > How many '-t' threads input determines how many queues will be > utilized. > > Tested with Intel XL710 NIC with Cisco 3172 switch. > > It would be even slightly better if the irqbalance service is turned > off outside. Yes, if you don't turn-off (kill) irqbalance it will move around the IRQs behind your back... > Referrences: > https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf > http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf > > Signed-off-by: Robert Hoo > --- > ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 > + > 1 file changed, 132 insertions(+) > create mode 100755 > samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > > diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > new file mode 100755 > index 000..f0ee25c > --- /dev/null > +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh > @@ -0,0 +1,132 @@ > +#!/bin/bash > +# > +# Multiqueue: Using pktgen threads for sending on multiple CPUs > +# * adding devices to kernel threads which are in the same NUMA node > +# * bound devices queue's irq affinity to the threads, 1:1 mapping > +# * notice the naming scheme for keeping device names unique > +# * nameing scheme: dev@thread_number > +# * flow variation via random UDP source port > +# > +basedir=`dirname $0` > +source ${basedir}/functions.sh > +root_check_run_with_sudo "$@" > +# > +# Required param: -i dev in $DEV > +source ${basedir}/parameters.sh > + > +get_iface_node() > +{ > + echo `cat /sys/class/net/$1/device/numa_node` Here you could use the following shell trick to avoid using "cat": echo $( +} > + > +get_iface_irqs() > +{ > + local IFACE=$1 > + local queues="${IFACE}-.*TxRx" > + > + irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:) > + [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:) > + [ -z "$irqs" ] && irqs=$(for i in `ls -Ux > /sys/class/net/$IFACE/device/msi_irqs` ;\ > + do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 > -d : ;\ > + done) Nice that you handle all these different methods. I personally look in /proc/irq/*/$IFACE*/../smp_affinity_list , like (copy-paste): echo " --- Align IRQs ---" # I've named my NICs ixgbe1 + ixgbe2 for F in /proc/irq/*/ixgbe*-TxRx-*/../smp_affinity_list; do # Extract irqname e.g. "ixgbe2-TxRx-2" irqname=$(basename $(dirname $(dirname $F))) ; # Substring pattern removal hwq_nr=${irqname#*-*-} echo $hwq_nr > $F #grep . -H $F; done grep -H . /proc/irq/*/ixgbe*/../smp_affinity_list Maybe I should switch to use: /sys/class/net/$IFACE/device/msi_irqs/* > + [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE" In the error case you should let the script die. There is a helper function for this called "err" (where first arg is the exitcode, which is useful to detect the reason your script failed). > + echo $irqs > +} > +get_node_cpus() > +{ > + local node=$1 > + local node_cpu_list > + local node_cpu_range_list=`cut
[PATCH] pktgen: add a new sample script for 40G and above link testing
From: Robert HoIt's hard to benchmark 40G+ network bandwidth using ordinary tools like iperf, netperf. I then tried with pktgen multiqueue sample scripts, but still cannot reach line rate. I then derived this NUMA awared irq affinity sample script from multi-queue sample one, successfully benchmarked 40G link. I think this can also be useful for 100G reference, though I haven't got device to test. This script simply does: Detect $DEV's NUMA node belonging. Bind each thread (processor from that NUMA node) with each $DEV queue's irq affinity, 1:1 mapping. How many '-t' threads input determines how many queues will be utilized. Tested with Intel XL710 NIC with Cisco 3172 switch. It would be even slightly better if the irqbalance service is turned off outside. Referrences: https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf Signed-off-by: Robert Hoo --- ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 + 1 file changed, 132 insertions(+) create mode 100755 samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh new file mode 100755 index 000..f0ee25c --- /dev/null +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh @@ -0,0 +1,132 @@ +#!/bin/bash +# +# Multiqueue: Using pktgen threads for sending on multiple CPUs +# * adding devices to kernel threads which are in the same NUMA node +# * bound devices queue's irq affinity to the threads, 1:1 mapping +# * notice the naming scheme for keeping device names unique +# * nameing scheme: dev@thread_number +# * flow variation via random UDP source port +# +basedir=`dirname $0` +source ${basedir}/functions.sh +root_check_run_with_sudo "$@" +# +# Required param: -i dev in $DEV +source ${basedir}/parameters.sh + +get_iface_node() +{ + echo `cat /sys/class/net/$1/device/numa_node` +} + +get_iface_irqs() +{ + local IFACE=$1 + local queues="${IFACE}-.*TxRx" + + irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:) + [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:) + [ -z "$irqs" ] && irqs=$(for i in `ls -Ux /sys/class/net/$IFACE/device/msi_irqs` ;\ + do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 -d : ;\ + done) + [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE" + + echo $irqs +} + +get_node_cpus() +{ + local node=$1 + local node_cpu_list + local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \ + /sys/devices/system/node/node$node/cpulist` + + for cpu_range in $node_cpu_range_list + do + node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }` + done + + echo $node_cpu_list +} + + +# Base Config +DELAY="0"# Zero means max speed +COUNT="2000" # Zero means indefinitely +[ -z "$CLONE_SKB" ] && CLONE_SKB="0" + +# Flow variation random source port between min and max +UDP_MIN=9 +UDP_MAX=109 + +node=`get_iface_node $DEV` +irq_array=(`get_iface_irqs $DEV`) +cpu_array=(`get_node_cpus $node`) + +[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]} ] && \ + err 1 "Thread number $THREADS exceeds: min (${#irq_array[*]},${#cpu_array[*]})" + +# (example of setting default params in your script) +if [ -z "$DEST_IP" ]; then +[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1" +fi +[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff" + +# General cleanup everything since last run +pg_ctrl "reset" + +# Threads are specified with parameter -t value in $THREADS +for ((i = 0; i < $THREADS; i++)); do +# The device name is extended with @name, using thread number to +# make then unique, but any name will do. +# Set the queue's irq affinity to this $thread (processor) +thread=${cpu_array[$i]} +dev=${DEV}@${thread} +echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list +echo "irq ${irq_array[$i]} is set affinity to `cat /proc/irq/${irq_array[$i]}/smp_affinity_list`" + +# Add remove all other devices and add_device $dev to thread +pg_thread $thread "rem_device_all" +pg_thread $thread "add_device" $dev + +# select queue and bind the queue and $dev in 1:1 relationship +queue_num=$i +echo "queue number is $queue_num" +pg_set $dev "queue_map_min $queue_num" +pg_set $dev "queue_map_max $queue_num" + +# Notice config queue to map to cpu (mirrors smp_processor_id()) +# It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number +pg_set $dev "flag QUEUE_MAP_CPU" + +# Base config of dev +pg_set $dev "count $COUNT" +pg_set $dev