Re: [-RT] multiple streams have degraded performance
On Tue, 2007-06-19 at 20:38 -0700, Vernon Mauery wrote: > On Monday 18 June 2007 10:12:21 pm Vernon Mauery wrote: > > In looking at the performance characteristics of my network I found that > > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads. The > > test that I did this with is simply invoking 1, 2, 4, and 8 instances of > > netperf at a time and measuring the total throughput. I have two 4-way > > machines connected with 10GbE cards. I tested several kernels (some older > > and some newer) and found that the only thing in common was that with -RT > > kernels the performance went down with concurrent streams. > > I just tested this using lo instead of the 10GbE adapter. I found similar > results. Since this makes it reproducible by just about anybody (maybe the > only factor now is the number of CPUs), I have attached the script that I > test things with. > > So with the script run like ./stream_test 127.0.0.1 on 2.6.21 and > 2.6.21.5-rt17 I got the following: > > 2.6.21 > === > default: 1 streams: Send at 2790.3 Mb/s, Receive at 2790.3 Mb/s > default: 2 streams: Send at 4129.4 Mb/s, Receive at 4128.7 Mb/s > default: 4 streams: Send at 7949.6 Mb/s, Receive at 7735.5 Mb/s > default: 8 streams: Send at 7930.7 Mb/s, Receive at 7910.1 Mb/s > 1Msock: 1 streams: Send at 2810.7 Mb/s, Receive at 2810.7 Mb/s > 1Msock: 2 streams: Send at 4093.4 Mb/s, Receive at 4092.6 Mb/s > 1Msock: 4 streams: Send at 7887.8 Mb/s, Receive at 7880.4 Mb/s > 1Msock: 8 streams: Send at 8091.7 Mb/s, Receive at 8082.2 Mb/s > > 2.6.21.5-rt17 > == > default: 1 streams: Send at 938.2 Mb/s, Receive at 938.2 Mb/s > default: 2 streams: Send at 1476.3 Mb/s, Receive at 1436.9 Mb/s > default: 4 streams: Send at 1489.8 Mb/s, Receive at 1145.0 Mb/s > default: 8 streams: Send at 1099.8 Mb/s, Receive at 1079.1 Mb/s > 1Msock: 1 streams: Send at 921.4 Mb/s, Receive at 920.4 Mb/s > 1Msock: 2 streams: Send at 1332.2 Mb/s, Receive at 1311.5 Mb/s > 1Msock: 4 streams: Send at 1483.0 Mb/s, Receive at 1137.8 Mb/s > 1Msock: 8 streams: Send at 1446.2 Mb/s, Receive at 1135.6 Mb/s Unfortunately I do not have a 4-way machine, but a dual core was enough to show the problem: ./stream_test 127.0.0.1 ** default: 1 streams: Send at 268.9 Mb/s, Receive at 268.0 Mb/s ** default: 2 streams: Send at 448.7 Mb/s, Receive at 448.4 Mb/s ** default: 4 streams: Send at 438.7 Mb/s, Receive at 438.3 Mb/s ** default: 8 streams: Send at 352.7 Mb/s, Receive at 351.2 Mb/s ** 1Msock: 1 streams: Send at 207.4 Mb/s, Receive at 207.4 Mb/s ** 1Msock: 2 streams: Send at 450.0 Mb/s, Receive at 450.0 Mb/s ** 1Msock: 4 streams: Send at 444.9 Mb/s, Receive at 444.9 Mb/s ** 1Msock: 8 streams: Send at 351.4 Mb/s, Receive at 351.4 Mb/s and lock_stat shows the following offender: --- class namecontentions waittime-min waittime-max waittime-totalcon-bounces acquisitions holdtime-min holdtime-max holdtime-totalacq-bounces --- udp_hash_lock-W: 1 1.58 1.58 1.58 1163 0.100 4.38 184.44110 udp_hash_lock-R:539603 1.30 4848.34 7756821.91 539131 15095136 0.36 520.0920404391.93 11527348 --- udp_hash_lock 539603 [] __udp4_lib_lookup+0x3c/0x10d udp_hash_lock 1 [] __udp_lib_get_port+0x2c/0x21a Which makes perfect sense, in that r/w locks are degraded to regular exclusive locks on -rt. So I guess you're wanting to go help out with the RCU-ification of whatever it is that this lock protects. :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-RT] multiple streams have degraded performance
On Monday 18 June 2007 10:12:21 pm Vernon Mauery wrote: > In looking at the performance characteristics of my network I found that > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads. The > test that I did this with is simply invoking 1, 2, 4, and 8 instances of > netperf at a time and measuring the total throughput. I have two 4-way > machines connected with 10GbE cards. I tested several kernels (some older > and some newer) and found that the only thing in common was that with -RT > kernels the performance went down with concurrent streams. I just tested this using lo instead of the 10GbE adapter. I found similar results. Since this makes it reproducible by just about anybody (maybe the only factor now is the number of CPUs), I have attached the script that I test things with. So with the script run like ./stream_test 127.0.0.1 on 2.6.21 and 2.6.21.5-rt17 I got the following: 2.6.21 === default: 1 streams: Send at 2790.3 Mb/s, Receive at 2790.3 Mb/s default: 2 streams: Send at 4129.4 Mb/s, Receive at 4128.7 Mb/s default: 4 streams: Send at 7949.6 Mb/s, Receive at 7735.5 Mb/s default: 8 streams: Send at 7930.7 Mb/s, Receive at 7910.1 Mb/s 1Msock: 1 streams: Send at 2810.7 Mb/s, Receive at 2810.7 Mb/s 1Msock: 2 streams: Send at 4093.4 Mb/s, Receive at 4092.6 Mb/s 1Msock: 4 streams: Send at 7887.8 Mb/s, Receive at 7880.4 Mb/s 1Msock: 8 streams: Send at 8091.7 Mb/s, Receive at 8082.2 Mb/s 2.6.21.5-rt17 == default: 1 streams: Send at 938.2 Mb/s, Receive at 938.2 Mb/s default: 2 streams: Send at 1476.3 Mb/s, Receive at 1436.9 Mb/s default: 4 streams: Send at 1489.8 Mb/s, Receive at 1145.0 Mb/s default: 8 streams: Send at 1099.8 Mb/s, Receive at 1079.1 Mb/s 1Msock: 1 streams: Send at 921.4 Mb/s, Receive at 920.4 Mb/s 1Msock: 2 streams: Send at 1332.2 Mb/s, Receive at 1311.5 Mb/s 1Msock: 4 streams: Send at 1483.0 Mb/s, Receive at 1137.8 Mb/s 1Msock: 8 streams: Send at 1446.2 Mb/s, Receive at 1135.6 Mb/s --Vernon > While the test was showing the numbers for receiving as well as sending, > the receiving numbers are not reliable because that machine was running a > -RT kernel for these tests. > > I was just wondering if anyone had seen this problem before or would have > any idea on where to start hunting for the solution. > > --Vernon > > The key for this is 'default' was invoked like: > netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472 > and '1Msock' was invoked like: > netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472 -s 1M -S > 1M > > 2.6.21 > == > default: 1 streams: Send at 2844.2 Mb/s, Receive at 2840.1 Mb/s > default: 2 streams: Send at 3927.9 Mb/s, Receive at 3603.9 Mb/s > default: 4 streams: Send at 4197.4 Mb/s, Receive at 3776.3 Mb/s > default: 8 streams: Send at 4223.9 Mb/s, Receive at 3848.9 Mb/s > 1Msock: 1 streams: Send at 4232.3 Mb/s, Receive at 3914.4 Mb/s > 1Msock: 2 streams: Send at 5428.8 Mb/s, Receive at 3853.2 Mb/s > 1Msock: 4 streams: Send at 6202.1 Mb/s, Receive at 3774.8 Mb/s > 1Msock: 8 streams: Send at 6225.1 Mb/s, Receive at 3754.7 Mb/s > > 2.6.21.5-rt15 > === > default: 1 streams: Send at 3091.6 Mb/s, Receive at 3048.1 Mb/s > default: 2 streams: Send at 3768.8 Mb/s, Receive at 3714.2 Mb/s > default: 4 streams: Send at 1873.6 Mb/s, Receive at 1825.9 Mb/s > default: 8 streams: Send at 1806.5 Mb/s, Receive at 1792.7 Mb/s > 1Msock: 1 streams: Send at 3680.4 Mb/s, Receive at 3255.6 Mb/s > 1Msock: 2 streams: Send at 4129.8 Mb/s, Receive at 3991.5 Mb/s > 1Msock: 4 streams: Send at 1862.1 Mb/s, Receive at 1787.1 Mb/s > 1Msock: 8 streams: Send at 1790.2 Mb/s, Receive at 1556.8 Mb/s #!/bin/sh # File: stream_test # Description: test network throughput with a varying number of streams # Author: Vernon Mauery <[EMAIL PROTECTED]> # Copyright: IBM Corporation (C) 2007 instances="1 2 4 8" msg_size= time=60 TEST=UDP_STREAM sock_size= function usage() { echo "usage: $0 [-t|-u] [-c N] [-m N] [-s N] [-T N] [-x ...] " echo " -t, -u run TCP or UDP tests respectively (default: UDP)" echo " -c Nrun N concurrent instances (default \"1 2 4 8\")" echo " -m Nset message size to N bytes (default 1472)" echo " -s Nset socket buffer size (default 1M)" echo " -T Nrun test for N seconds (default 60)" echo " -x '...'pass extra ags to netperf (included after -- )" echo " -h display this message" } while [ $# -gt 0 ]; do case $1 in -t) TEST=TCP_STREAM ;; -u) TEST=UDP_STREAM ;; -c) shift instances=$1 ;; -m) shift msg_size=$1 ;; -s) shift sock_size=$1 ;; -T) shift
Re: [-RT] multiple streams have degraded performance
On Tuesday 19 June 2007 8:38:50 am Peter Zijlstra wrote: > On Tue, 2007-06-19 at 07:25 -0700, Vernon Mauery wrote: > > On Monday 18 June 2007 11:51:38 pm Peter Zijlstra wrote: > > > On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote: > > > > In looking at the performance characteristics of my network I found > > > > that 2.6.21.5-rt15 suffers from degraded thoughput with multiple > > > > threads. The test that I did this with is simply invoking 1, 2, 4, > > > > and 8 instances of netperf at a time and measuring the total > > > > throughput. I have two 4-way machines connected with 10GbE cards. I > > > > tested several kernels (some older and some newer) and found that the > > > > only thing in common was that with -RT kernels the performance went > > > > down with concurrent streams. > > > > > > > > While the test was showing the numbers for receiving as well as > > > > sending, the receiving numbers are not reliable because that machine > > > > was running a -RT kernel for these tests. > > > > > > > > I was just wondering if anyone had seen this problem before or would > > > > have any idea on where to start hunting for the solution. > > > > > > could you enable CONFIG_LOCK_STAT > > > > > > echo 0 > /proc/lock_stat > > > > > > > > > > > > and report the output of (preferably not 80 column wrapped): > > > > > > grep : /proc/lock_stat | head > > > > /proc/lock_stat stayed empty for the duration of the test. I am guessing > > this means there was no lock contention. > > Most likely caused by the issue below. > > > I do see this on the console: > > BUG: scheduling with irqs disabled: IRQ-8414/0x/9494 > > caller is wait_for_completion+0x85/0xc4 > > > > Call Trace: > > [] dump_trace+0xaa/0x32a > > [] show_trace+0x41/0x64 > > [] dump_stack+0x15/0x17 > > [] schedule+0x82/0x102 > > [] wait_for_completion+0x85/0xc4 > > [] set_cpus_allowed+0xa1/0xc8 > > [] do_softirq_from_hardirq+0x105/0x12d > > [] do_irqd+0x2a8/0x32f > > [] kthread+0xf5/0x128 > > [] child_rip+0xa/0x12 > > > > INFO: lockdep is turned off. > > > > I haven't seen this until I enabled lock_stat and ran the test. > > I think this is what causes the lack of output. It disables all lock > tracking features... > > > > or otherwise if there are any highly contended network locks listed? > > > > Any other ideas for debugging this? > > fixing the above bug would help :-) > > Ingo says that should be fixed in -rt17, so if you could give that a > spin... I just tested with -rt17 and the BUG message is gone, but I still don't see any entries in. I don't see any other suspicious message so I think it is really strange that is not listing any locks at all. --Vernon - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-RT] multiple streams have degraded performance
On Tue, 2007-06-19 at 07:25 -0700, Vernon Mauery wrote: > On Monday 18 June 2007 11:51:38 pm Peter Zijlstra wrote: > > On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote: > > > In looking at the performance characteristics of my network I found that > > > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads. The > > > test that I did this with is simply invoking 1, 2, 4, and 8 instances of > > > netperf at a time and measuring the total throughput. I have two 4-way > > > machines connected with 10GbE cards. I tested several kernels (some > > > older and some newer) and found that the only thing in common was that > > > with -RT kernels the performance went down with concurrent streams. > > > > > > While the test was showing the numbers for receiving as well as sending, > > > the receiving numbers are not reliable because that machine was running a > > > -RT kernel for these tests. > > > > > > I was just wondering if anyone had seen this problem before or would have > > > any idea on where to start hunting for the solution. > > > > could you enable CONFIG_LOCK_STAT > > > > echo 0 > /proc/lock_stat > > > > > > > > and report the output of (preferably not 80 column wrapped): > > > > grep : /proc/lock_stat | head > > /proc/lock_stat stayed empty for the duration of the test. I am guessing > this > means there was no lock contention. Most likely caused by the issue below. > I do see this on the console: > BUG: scheduling with irqs disabled: IRQ-8414/0x/9494 > caller is wait_for_completion+0x85/0xc4 > > Call Trace: > [] dump_trace+0xaa/0x32a > [] show_trace+0x41/0x64 > [] dump_stack+0x15/0x17 > [] schedule+0x82/0x102 > [] wait_for_completion+0x85/0xc4 > [] set_cpus_allowed+0xa1/0xc8 > [] do_softirq_from_hardirq+0x105/0x12d > [] do_irqd+0x2a8/0x32f > [] kthread+0xf5/0x128 > [] child_rip+0xa/0x12 > > INFO: lockdep is turned off. > > I haven't seen this until I enabled lock_stat and ran the test. I think this is what causes the lack of output. It disables all lock tracking features... > > or otherwise if there are any highly contended network locks listed? > > Any other ideas for debugging this? fixing the above bug would help :-) Ingo says that should be fixed in -rt17, so if you could give that a spin... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-RT] multiple streams have degraded performance
On Monday 18 June 2007 11:51:38 pm Peter Zijlstra wrote: > On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote: > > In looking at the performance characteristics of my network I found that > > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads. The > > test that I did this with is simply invoking 1, 2, 4, and 8 instances of > > netperf at a time and measuring the total throughput. I have two 4-way > > machines connected with 10GbE cards. I tested several kernels (some > > older and some newer) and found that the only thing in common was that > > with -RT kernels the performance went down with concurrent streams. > > > > While the test was showing the numbers for receiving as well as sending, > > the receiving numbers are not reliable because that machine was running a > > -RT kernel for these tests. > > > > I was just wondering if anyone had seen this problem before or would have > > any idea on where to start hunting for the solution. > > could you enable CONFIG_LOCK_STAT > > echo 0 > /proc/lock_stat > > > > and report the output of (preferably not 80 column wrapped): > > grep : /proc/lock_stat | head /proc/lock_stat stayed empty for the duration of the test. I am guessing this means there was no lock contention. I do see this on the console: BUG: scheduling with irqs disabled: IRQ-8414/0x/9494 caller is wait_for_completion+0x85/0xc4 Call Trace: [] dump_trace+0xaa/0x32a [] show_trace+0x41/0x64 [] dump_stack+0x15/0x17 [] schedule+0x82/0x102 [] wait_for_completion+0x85/0xc4 [] set_cpus_allowed+0xa1/0xc8 [] do_softirq_from_hardirq+0x105/0x12d [] do_irqd+0x2a8/0x32f [] kthread+0xf5/0x128 [] child_rip+0xa/0x12 INFO: lockdep is turned off. I haven't seen this until I enabled lock_stat and ran the test. > or otherwise if there are any highly contended network locks listed? Any other ideas for debugging this? --Vernon - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-RT] multiple streams have degraded performance
On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote: > In looking at the performance characteristics of my network I found that > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads. The > test that I did this with is simply invoking 1, 2, 4, and 8 instances of > netperf at a time and measuring the total throughput. I have two 4-way > machines connected with 10GbE cards. I tested several kernels (some older > and some newer) and found that the only thing in common was that with -RT > kernels the performance went down with concurrent streams. > > While the test was showing the numbers for receiving as well as sending, the > receiving numbers are not reliable because that machine was running a -RT > kernel for these tests. > > I was just wondering if anyone had seen this problem before or would have any > idea on where to start hunting for the solution. could you enable CONFIG_LOCK_STAT echo 0 > /proc/lock_stat and report the output of (preferably not 80 column wrapped): grep : /proc/lock_stat | head or otherwise if there are any highly contended network locks listed? - Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[-RT] multiple streams have degraded performance
In looking at the performance characteristics of my network I found that 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads. The test that I did this with is simply invoking 1, 2, 4, and 8 instances of netperf at a time and measuring the total throughput. I have two 4-way machines connected with 10GbE cards. I tested several kernels (some older and some newer) and found that the only thing in common was that with -RT kernels the performance went down with concurrent streams. While the test was showing the numbers for receiving as well as sending, the receiving numbers are not reliable because that machine was running a -RT kernel for these tests. I was just wondering if anyone had seen this problem before or would have any idea on where to start hunting for the solution. --Vernon The key for this is 'default' was invoked like: netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472 and '1Msock' was invoked like: netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472 -s 1M -S 1M 2.6.21 == default: 1 streams: Send at 2844.2 Mb/s, Receive at 2840.1 Mb/s default: 2 streams: Send at 3927.9 Mb/s, Receive at 3603.9 Mb/s default: 4 streams: Send at 4197.4 Mb/s, Receive at 3776.3 Mb/s default: 8 streams: Send at 4223.9 Mb/s, Receive at 3848.9 Mb/s 1Msock: 1 streams: Send at 4232.3 Mb/s, Receive at 3914.4 Mb/s 1Msock: 2 streams: Send at 5428.8 Mb/s, Receive at 3853.2 Mb/s 1Msock: 4 streams: Send at 6202.1 Mb/s, Receive at 3774.8 Mb/s 1Msock: 8 streams: Send at 6225.1 Mb/s, Receive at 3754.7 Mb/s 2.6.21.5-rt15 === default: 1 streams: Send at 3091.6 Mb/s, Receive at 3048.1 Mb/s default: 2 streams: Send at 3768.8 Mb/s, Receive at 3714.2 Mb/s default: 4 streams: Send at 1873.6 Mb/s, Receive at 1825.9 Mb/s default: 8 streams: Send at 1806.5 Mb/s, Receive at 1792.7 Mb/s 1Msock: 1 streams: Send at 3680.4 Mb/s, Receive at 3255.6 Mb/s 1Msock: 2 streams: Send at 4129.8 Mb/s, Receive at 3991.5 Mb/s 1Msock: 4 streams: Send at 1862.1 Mb/s, Receive at 1787.1 Mb/s 1Msock: 8 streams: Send at 1790.2 Mb/s, Receive at 1556.8 Mb/s - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/