Hi Dave, so I went to play around with this a bit more. I turned to UDP flooding my cable modem through the router and this surely allows me to create enough load on the wndr3700v2 to cause the allocation errors and as a "bonus" also to drive the router to reboot (driven by the watchdog timer?). Here is the script I used over 5G wireless (from http://blog.ioshints.info/2008/03/udp-flood-in-perl.html)
#!/usr/bin/perl ############## # udp flood. ############## use Socket; use strict; if ($#ARGV != 3) { print "flood.pl <ip> <port> <size> <time>\n\n"; print " port=0: use random ports\n"; print " size=0: use random size between 64 and 1024\n"; print " time=0: continuous flood\n"; exit(1); } my ($ip,$port,$size,$time) = @ARGV; my ($iaddr,$endtime,$psize,$pport); $iaddr = inet_aton("$ip") or die "Cannot resolve hostname $ip\n"; $endtime = time() + ($time ? $time : 1000000); socket(flood, PF_INET, SOCK_DGRAM, 17); print "Flooding $ip " . ($port ? $port : "random") . " port with " . ($size ? "$size-byte" : "random size") . " packets" . ($time ? " for $time seconds" : "") . "\n"; print "Break with Ctrl-C\n" unless $time; for (;time() <= $endtime;) { $psize = $size ? $size : int(rand(1024-64)+64) ; $pport = $port ? $port : int(rand(65500))+1; send(flood, pack("a$psize","flood"), 0, pack_sockaddr_in($pport, $iaddr));} called as either udp_flood.pl 192.168.100.1 0 1024 240 or udp_flood.pl 192.168.100.1 32000 1024 240 The first version with randomized port number spreads the load nicely over many fq_codel bins/flows and seems slightly more likely to cause allocation errors and reboots than the 2nd invocation which restricts itself to port 32000 and presumably just one flow. I wonder how to make cerowrt survive this kind of stress test… best Sebastian On Aug 15, 2012, at 9:08 PM, Dave Taht wrote: > re: ath: skbuff alloc of size 1926 failed > > as for the ath skbuff problem, I've seen that a lot. I had put hard > packet limits (~600) on fq_codel in -11 and prior that were too low > and it mostly went away, but I hit tail drop behavior everywhere, > instead of codel behavior. What I have now (typically 1200) may well > be too high, but not as overly high as the default (10k packets). > There may be another means of increasing the size of that slab pool or > making it less onerous. > > I would like it if codel "kicked in" earlier than it currently does. > The code in ns2 is currently using half the period that the linux code > is. This would control things better, or so I hope (planning on trying > this as I get time) > > I am also considering means of artificially upscaling the drop > scheduler when we get close to queue limits. > > See some discussions on the codel list for these issues. (sims are > easier to deal with than cerowrt, too!) > > as for bind, it should be automagically restarted from xinetd, no need > to fiddle with anything. However, since you are already under massive > memory pressure, it may well fail to start up that way, too. At the > moment, I've largely given up on bind on anything but a more core home > gw, and am running dnsmasq on everything (3700v2, picostations, > nanostations) but the 3800s. (and the ones I run it on, aren't being > used for wifi right now). > > Lastly: Swap space won't help you on exhausting kernel limits. > > I'm glad you can reproduce the ath: slab problem - I can get it too at > high rates using netperf over wifi. I will try a 3700v2 with and > without bind to see if it's still there in 3.3.8-17. In the meantime > if anyone knows how to get more allocations in that (2048? 4096?) slab > by default, perhaps that will help? > > > > On Wed, Aug 15, 2012 at 10:23 AM, Sebastian Moeller <moell...@gmx.de> wrote: >> Hi Dave, >> >> great work, as always I upgraded my production router to the latest and >> greatest (since I only have one router…). And it works quite well for normal >> usage… >> Netalyzr reports around 2800ms seconds of uplink buffering, yet saturating >> the uplink does not affect ping times to a remote target noticeably, >> basically the same as for all codellized ceo versions I tested so far... >> >> Some notes and a question: >> I noticed that even given plenty of swap space (1GB on a usb stick), using >> http://broadband.mpi-sws.org/residential/ to exercise UDP stress (on the >> uplink I assume) I can easily produce (I run the test from a macosx via 5GHz >> wireless over 1.5 yards): >> Aug 15 01:16:29 nacktmulle kern.err kernel: [175395.132812] ath: skbuff >> alloc of size 1926 failed >> (and plenty of those…). >> What then happens is that the OOM killer will aim for bind (reasonable since >> it is the largest single process) and kill it. When I try to restart bind by: >> root@nacktmulle:~# /etc/rc.d/S47namedprep start >> root@nacktmulle:~# /etc/rc.d/S48named restart >> Stopping isc-bind >> /etc/chroot/named//var/run/named/named.pid not found, trying brute force >> killall: named: no process killed >> Kicking isc-bind in xinetd >> rndc: connect failed: 127.0.0.1#953: connection refused >> And bind does not start again and the router becomes less than useful. Now I >> assume I am doing something wrong, but what, if you have any idea how to >> solve this short of a reboot of the router (my current method) I would be >> happy to learn >> >> >> >> best regards >> sebastian >> >> On Aug 12, 2012, at 11:08 PM, Dave Taht wrote: >> >>> I'm too tired to write up a full set of release notes, but I've been >>> testing it all day, >>> and it looks better than -10 and certainly better than -11, but I won't know >>> until some more folk sit down and test it, so here it is. >>> >>> http://huchra.bufferbloat.net/~cero1/3.3/3.3.8-17/ >>> >>> fresh merge with openwrt, fix to a bind CVE, fixes for 6in4 and quagga >>> routing problems, >>> and a few tweaks to fq_codel setup that might make voip better. >>> >>> Go forth and break things! >>> >>> In other news: >>> >>> Van Jacobson gave a great talk about bufferbloat, BQL, codel, and fq_codel >>> at last week's ietf meeting. Well worth watching. At the end he outlines >>> the deployment problems in particular. >>> >>> http://recordings.conf.meetecho.com/Recordings/watch.jsp?recording=IETF84_TSVAREA&chapter=part_3 >>> >>> Far more interesting than this email! >>> >>> >>> -- >>> Dave Täht >>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out >>> with fq_codel!" >>> _______________________________________________ >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >> > > > > -- > Dave Täht > http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out > with fq_codel!" _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel