Re: [Cerowrt-devel] ADSL Issue (PPPoE)
Hi William, On Jul 30, 2012, at 11:34 AM, William Katsak wrote: Hello, I am playing with a CeroWRT (3.3.8-6) router on my vacation in Russia and am seeing some weird behavior with simple_qos.sh that I am unsure if I should attribute to a bug, or to an Internet connection that is just that bad. Background: - The router is on my wife's parents' ADSL line (according to the modem, ~3000/500). The modem is a D-Link DSL-2500U. - Even though the link is 3000/500, and I can get speedtest.net to report 2.5mbps/0.42mbps on a clean connection (direct or Cero with no QOS on), as soon as I use a host that is outside of Rostelecom (local service), it drops to 0.9/0.4mbps. This is consistent with Netalyzr test: Upload 430 Kbit/sec, Download 970 Kbit/sec. This suggests that even though the DSL link is higher bitrate, the ISP doesn't have the outgoing bandwidth or is rate-limiting it somehow. - I don't necessarily intend to leave the router running Cero here, but I want to get a handle on the latency situation, as it makes Skype pretty messy...I am hoping to roll what I learn into a more stable build of OpenWRT. I have tried several different configurations of the modem including: 1) Default: Modem does PPPoE and hands out 192.168.1.xxx addresses. I tried just letting Cero route through that address. 2) PPP-IP extension: This has the effect of the modem handling the PPPoE connection and handing out the single real IP address over DHCP. In this case Cero would see the Internet IP on ge00. 3) Bridging: Allow Cero to establish the PPPoE connection and manage it. Right now I am in PPP-IP extension mode on the modem, and GUI QOS on the router. This seems to be reliable and also keeps the latency down, although I would imagine that PPPoE on the router and the GUI QOS would be fine too, but obviously I would rather use simple_qos. The problem: When I try simple_qos.sh, I see this: insmod: can't insert 'cls_fw': File exists insmod: can't insert 'sch_htb': File exists RTNETLINK answers: No such file or directory RTNETLINK answers: No such file or directory If I run it again, the RTNETLINK errors go away...I assume this is just an annoyance. This gives me super stable ping times, etc. but a lot of websites hang loading, and the connection is unusable. If I reboot the router, the connection works fine again, although the high latency comes back. So, with all that out there, I have some questions with simple_qos: 1) If I am using PPPoE on the router, do I need to do IFACE=pppoe-ge00 or still just ge00? 2) Should I set PPOE to yes? Since your DSL connection is running PPPOE you should set PPOE to yes in any case IF your DSL connection uses ATM as link layer (most probable). This will just make sure that the shaper calculus the right packet seizes to account against your link rates. But check against http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ to figure out the prier value for overhead, as that depends on the specifics of your DSL connection. I found that http://www.linuxhowtos.org/manpages/8/tc-stab.htm also is quite interesting to read to better understand the overhead parameter. But unfortunately simple_qos does not (yet) use the generic tc-stab method but the atm link layer adjustments specific for HTB. (Since I am on cable right now I have no way of testing whether the tc-stab method also works with HTB). Especially for small packets (like VoIP) if you do not account for the the fact that ATM always sends out integer 48byte cells and will pad if necessary, you will cause severe queueing way below reaching the nominal link rate, as the shaper does not account for a) the padding nor b) the 5byte ATM overhead per ATM-cell (at least I think that is the case). You should do this in any case so that shaping actually has a chance to work reliably and repeatably independent on the size distribution of your shaped packages. 3) Is it possible that no matter what I do, the buffers at the speed drop between Rostelecom and their bandwidth provider is hurting me somehow? If I understand correctly, yes this is going to hurt you, so if your intended VoIP traffic leaves the Rostelecom net you might need to specify 974/430 for the shaped rates instead of 3000/512. But that sounds like something that is easy to test. Since your achievable uplink (430) is still quite close to your link rate (500) I would still recommend to look at getting link layer and overhead specified correctly in simple_qos. 4) If 3, what to do other than yell at them? As an emergency stop-gap measure shape your rates to what the network path you are most interested in can deliver? That said I, isn't that what codel is supposed to do automatically??? Overall, is anyone using Cero with a PPPoE connection with good results? What kind of configuration do you have? No, but I used cerowrt with a bridged ATM-based DSL
Re: [Cerowrt-devel] cerowrt 3.3.8-17 is released
Hi Dave, so I went to play around with this a bit more. I turned to UDP flooding my cable modem through the router and this surely allows me to create enough load on the wndr3700v2 to cause the allocation errors and as a bonus also to drive the router to reboot (driven by the watchdog timer?). Here is the script I used over 5G wireless (from http://blog.ioshints.info/2008/03/udp-flood-in-perl.html) #!/usr/bin/perl ## # udp flood. ## use Socket; use strict; if ($#ARGV != 3) { print flood.pl ip port size time\n\n; print port=0: use random ports\n; print size=0: use random size between 64 and 1024\n; print time=0: continuous flood\n; exit(1); } my ($ip,$port,$size,$time) = @ARGV; my ($iaddr,$endtime,$psize,$pport); $iaddr = inet_aton($ip) or die Cannot resolve hostname $ip\n; $endtime = time() + ($time ? $time : 100); socket(flood, PF_INET, SOCK_DGRAM, 17); print Flooding $ip . ($port ? $port : random) . port with . ($size ? $size-byte : random size) . packets . ($time ? for $time seconds : ) . \n; print Break with Ctrl-C\n unless $time; for (;time() = $endtime;) { $psize = $size ? $size : int(rand(1024-64)+64) ; $pport = $port ? $port : int(rand(65500))+1; send(flood, pack(a$psize,flood), 0, pack_sockaddr_in($pport, $iaddr));} called as either udp_flood.pl 192.168.100.1 0 1024 240 or udp_flood.pl 192.168.100.1 32000 1024 240 The first version with randomized port number spreads the load nicely over many fq_codel bins/flows and seems slightly more likely to cause allocation errors and reboots than the 2nd invocation which restricts itself to port 32000 and presumably just one flow. I wonder how to make cerowrt survive this kind of stress test… best Sebastian On Aug 15, 2012, at 9:08 PM, Dave Taht wrote: re: ath: skbuff alloc of size 1926 failed as for the ath skbuff problem, I've seen that a lot. I had put hard packet limits (~600) on fq_codel in -11 and prior that were too low and it mostly went away, but I hit tail drop behavior everywhere, instead of codel behavior. What I have now (typically 1200) may well be too high, but not as overly high as the default (10k packets). There may be another means of increasing the size of that slab pool or making it less onerous. I would like it if codel kicked in earlier than it currently does. The code in ns2 is currently using half the period that the linux code is. This would control things better, or so I hope (planning on trying this as I get time) I am also considering means of artificially upscaling the drop scheduler when we get close to queue limits. See some discussions on the codel list for these issues. (sims are easier to deal with than cerowrt, too!) as for bind, it should be automagically restarted from xinetd, no need to fiddle with anything. However, since you are already under massive memory pressure, it may well fail to start up that way, too. At the moment, I've largely given up on bind on anything but a more core home gw, and am running dnsmasq on everything (3700v2, picostations, nanostations) but the 3800s. (and the ones I run it on, aren't being used for wifi right now). Lastly: Swap space won't help you on exhausting kernel limits. I'm glad you can reproduce the ath: slab problem - I can get it too at high rates using netperf over wifi. I will try a 3700v2 with and without bind to see if it's still there in 3.3.8-17. In the meantime if anyone knows how to get more allocations in that (2048? 4096?) slab by default, perhaps that will help? On Wed, Aug 15, 2012 at 10:23 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, great work, as always I upgraded my production router to the latest and greatest (since I only have one router…). And it works quite well for normal usage… Netalyzr reports around 2800ms seconds of uplink buffering, yet saturating the uplink does not affect ping times to a remote target noticeably, basically the same as for all codellized ceo versions I tested so far... Some notes and a question: I noticed that even given plenty of swap space (1GB on a usb stick), using http://broadband.mpi-sws.org/residential/ to exercise UDP stress (on the uplink I assume) I can easily produce (I run the test from a macosx via 5GHz wireless over 1.5 yards): Aug 15 01:16:29 nacktmulle kern.err kernel: [175395.132812] ath: skbuff alloc of size 1926 failed (and plenty of those…). What then happens is that the OOM killer will aim for bind (reasonable since it is the largest single process) and kill it. When I try to restart bind by: root@nacktmulle:~# /etc/rc.d/S47namedprep start root@nacktmulle:~# /etc/rc.d/S48named restart Stopping isc-bind /etc/chroot/named//var/run/named/named.pid not found, trying brute force killall: named: no process killed Kicking isc-bind in xinetd rndc: connect failed: 127.0.0.1#953: connection refused And bind does
Re: [Cerowrt-devel] the agile thread, post-sugarland thoughts, etc
Hi Dave, sugarland really took stability under load to a new level. My typical UDP flooding experiments failed to take the router down even though I opened the flood gates for a full hour (against qos; will repeat against simple-qos once time permits); not even a single report in dmesg on the router. Nice work. Thanks for all the hard work. (If time allows I will try to run a few more stability tests and will report noteworthy results back, if any should show up) best sebastian On Sep 19, 2012, at 09:49 , Dave Taht wrote: I am enjoying the thread on agile over here: http://esr.ibiblio.org/?p=4564 Trying to formalize some stuff that I do instinctively into language more folk grok would be good. One of the better links to come from it was this one: http://blogs.valvesoftware.com/abrash/valve-how-i-got-here-what-its-like-and-what-im-doing-2/ This is something like what we've done with the bufferbloat effort - find something worthwhile, start a project to do it. However steam has a revenue model that we thus far lack. It does help to be making something lots of people want, and I suppose the hard problem is making people aware we have something they want. Speaking of that, the 3.6-rc6 kernel I was working on which has most of the cerowrt stuff in it, but for x86 and ubuntu is here: http://snapon.lab.bufferbloat.net/~cero1/deb/ and (in trying to lick the memory problems) I've been doing some builds for the 32MB ram nanostation M5 and picostation 2HP, based on the current cerowrt patch sets. With a single SSID I haven't been able to crash the 2HP yet with a variety of traffic. It's easy to calculate however how to crash nearly any access point with extra SSIDs if (Total spare ram - (4 wireless queues, 1000 packets = 2Mbytes roughly for each = 8Mbytes) * SSIDS) 0) boom() This would be improvable with a multi hw queue fq_codel as each hardware queue could share an overall fq_codel queue (factor of 4 decrease), however, it seems to make more sense to have the queueing in the mac layer below the SSID abstractions. What's currently in cerowrt is eric dumazet's suggestions to reduce packet allocations under load. The above math was worse before - no matter the packet size, it seemed as though 2k and 4k allocations would be exausted. ... After I recover from the sprint required to get sugarland out the door, I'd like to work on ways to do scrum and sprint-like things (google hangouts?) to spread the knowledge and work around, and to parallelize the effort more. So much work remains. Truly addressing the wireless problem hasn't even started. I have to admit that after doing something like 30 official releases of cerowrt out the last 18 months, I'd really like to hand over the reins to that to someone else. Worse is after the openwrt unfreeze, new kernels will start to appear, and while working with Linux 3.6 and later would be helpful, I'd rather have stability for a while to work on higher layers of the stack, and get analytical. Doing both stable maintainence and trying to move forward on new kernels is a problem... Next up for me is working on qos-scripts, analytical models and tests, and updating my test deployment to this generation of code if all goes well. I just dumped a ton of raw data into the deBloat repo, too. Also have a few patches for the linux and openwrt mainlines to polish... On other fronts, I'm still working the basic funding angles and trying to fix things with amazon. I was encouraged enough by your (thus far failed) attempts at financial help to sink the time I did into sugarland (sugar helped too, I think she needs a job title). If it wasn't for the outpouring of your support, I'd have given up. Thx. I sure hope sugarland is better than -10. There has been an upswing in corporate interest in the last few weeks, I may have some news on that shortly. I had planned originally to get to barcelona for the wireless summit and the linux conference. I may still make the second (issue is in doubt, though). Is anyone besides jg going to this? http://www.wirelesssummit.org/ It's near the home of guifi.net which is one of the larger wireless networks I've ever heard of. -- Dave Täht http://www.bufferbloat.net/projects/cerowrt/wiki - 3.3.8-26 is out with fq_codel! ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] blocking probes...
Hi Dave, On Jan 12, 2013, at 20:50 , Dave Taht wrote: one of the underused features of cerowrt is that I stuck a sensor on xinetd to detect attempts to telnet or ftp to the router and cut off access to some other services, notably ssh. I would have loved to extend this facility to either do it entirely in iptables or leverage xinetd to talk to iptables to (for example) disable access to the web server. I'm curious if anyone elses server logs ever show something like this in the Real World: Jan 12 20:44:02 europa daemon.crit xinetd[3273]: 3273 {process_sensor} Adding 190.185.12.121 to the global_no_access list for 120 minutes And I'm curious as to what more fully blown tools like this already exist. This sounds remotely like a sort of reverse port knocking system, where you would connect to certain ports before allowing say ssh on some unusual port. You probably know this but on the off chance it might be news… best Sebastian -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Fixing simple_qos.sh
Hi Maciej, thanks for your thoughts. On Jan 30, 2013, at 04:20 , Maciej Soltysiak wrote: On Tue, Jan 29, 2013 at 10:21 PM, Sebastian Moeller moell...@gmx.de wrote: Any idea of how to determine link speed by a script? I assumed Dave meant this to be as simple as fetching a file and timing that. Basically a quite script form of http://speedtest.net/ Well, I am not sure whether that is a good idea, as speediest.net might be not as well connected as your typical servers. So personally I try to rate limit my up and download to line rates minus 5% to avoid the buffer bloat in the CMTS/DSLAM. I guess I am hoping that all real routers suffer less from over buffering than the consumer facing endnodes. (Then again this is a can of worms, but the minus 5% so far worked okay for me) As I intend to disable upnp it would be great if the link speeds still be stored somewhere and/or manually overridden. I want a firewall since I do not trust a number of devices too much, like an iPod and a nexus7 and want to keep them under supervision, so allowing them to pierce the firewall makes me feel a bit uneasy. Then again, Skype and friends figured out how to do NAT traversal without upnp so disabling it will only buy me a little more control with a lot more hassle. Any expert on the security tradeoff involved with UPNP willing to give their opinion on this question. Well, UPNP or not, with a 3rd party server outside your network and proper client/server code Skype and friends can do hole punching. If you don't trust ipad and nexus, you're on privacy territory, not network security per se, so I think you're better off proxying and filtering (e.g. privoxy), than only disabling upnp. I might have phrased that a bit awkward, I am not sure about the speed in which critical remote exploitable bugs are fixed in an aging collection of devices (this certainly includes iPod and nexus, but honestly also my laptop). (If I'd really be concerned about privacy I guess I would need to disable networking in apple ang google devices completely :) ) In related news: https://community.rapid7.com/community/infosec/blog/2013/01/29/security-flaws-in-universal-plug-and-play-unplug-dont-play So maybe my uneasyness has some grounding in reality, Mind you, I have not yet tested whether cerowrt is affected (and I doubt that, since the linked exploit requires old ). Related question should cero's firewall drop tcp port 5000 and udp port 1900 connection requests on the wan interface to put in belt and suspenders for UPNP remote exploits? But how does the interact with using cerowrt as secondary router? (Being away from the router I can not easily check/change the firewall settings…) Yeah, this old thing. One thing is cerowrt firewall ruleset is a default ACCEPT with exceptions to block in zone_wan and that's one bad thing [tm] and should be the other way round. Where is the file that contains the default ruleset? I guess this what I will set my router to (default drop), I assume though that Dave's goal is rather to be open so end to end connectivity is open enough to easily allow to run your own servers. Mmmh, thinking over this I should bolt down the router itself from the outside a bit more and the secure network segments and use the guest segments as permissive segments in which to run servers and such... I'll try to confirm if blocking it breaks anything or not today. Perhaps running metasploit against cero from outside and inside could be beneficial? Or at least a through nmap scan. I checked my 3.7.2-4 cerowrt router and ScanNOwUPnP.exe (from rapid7) and it comes up empty, meaning cerowrt is not affected by that issue (as to be expected as cero's miniupnp 1.4). Thanks a lot for your thoughts. best Sebastian Maciej ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] trivial 6in4 fix(?)
Hi Toke, On Jun 16, 2013, at 21:36 , Toke Høiland-Jørgensen t...@toke.dk wrote: Dave Taht dave.t...@gmail.com writes: Well, that result is mildly puzzling. netperf-wrapper -6 throughout? no ipv4? There's some ipv4 traffic in the background. Dunno exactly how much. You are on a dsl line, too? There has been some fixes to the overhead issue that have landed but encapsulation atm is still borked (you using atm?) Yeah. I'm on VDSL. No idea what encapsulation (and can't access my isp-provided router since that is in bridge mode). As far as I can tell at least VDSL typically means VDSL2 and that probably means PTM instead of ATM. In essence this means you do not have to deal with ATMs 48 payload bytes per 53 byte cell transport inefficiencies. So all you need to deal with is per packet overhead. Then again I am sure you probably know that already. (Sidenote, as far as I understand (so not very far) using ATM for DSL connections with POTS service in the lower frequency range never made much sense at all, the 5 byte ATM header typically was constant and by that just ballast and the 48 byte quantization on the last mile never came with any benefits, but I digress) Best Sebastian -Toke ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] trivial 6in4 fix(?)
Hi Toke, On Jun 16, 2013, at 22:55 , Toke Høiland-Jørgensen t...@toke.dk wrote: Sebastian Moeller moell...@gmx.de writes: As far as I can tell at least VDSL typically means VDSL2 and that probably means PTM instead of ATM. In essence this means you do not have to deal with ATMs 48 payload bytes per 53 byte cell transport inefficiencies. So all you need to deal with is per packet overhead. Then again I am sure you probably know that already. (Sidenote, as far as I understand (so not very far) using ATM for DSL connections with POTS service in the lower frequency range never made much sense at all, the 5 byte ATM header typically was constant and by that just ballast and the 48 byte quantization on the last mile never came with any benefits, but I digress) Right, thanks. So that means the overhead is constant per (ethernet) package? That is my interpretation, I am still waiting for vdxl deployment in my area so I have no actual hands-on experience yet. Honestly, I think the best thing to do is not so much assume ATM or lack of ATM, but simply measure it :) (while VDSL offers PTM, it can also operate over ATM if the telco wishes, so vdsl is technically not guaranteed to be free of ATM). If you collect a large quantity of pings to the nearest IP address ouside of your control for 16 to 113 byte ping sizes (say 100 packets at each size) you should be able to see a step profile in the RTTs for an ATM carrier (with two steps) and no steps (but rather a ramp) for no PTM. Best Sebastian -Toke ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] trivial 6in4 fix(?)
Hi Dave, hi Toke, On Jun 16, 2013, at 22:57 , Dave Taht dave.t...@gmail.com wrote: On Sun, Jun 16, 2013 at 1:55 PM, Toke Høiland-Jørgensen t...@toke.dk wrote: Sebastian Moeller moell...@gmx.de writes: As far as I can tell at least VDSL typically means VDSL2 and that probably means PTM instead of ATM. In essence this means you do not have to deal with ATMs 48 payload bytes per 53 byte cell transport inefficiencies. So all you need to deal with is per packet overhead. Then again I am sure you probably know that already. (Sidenote, as far as I understand (so not very far) using ATM for DSL connections with POTS service in the lower frequency range never made much sense at all, the 5 byte ATM header typically was constant and by that just ballast and the 48 byte quantization on the last mile never came with any benefits, but I digress) Right, thanks. So that means the overhead is constant per (ethernet) package? So what's the MTU on VSDL2? Is PPPOE used? Easy to figure out empirically by hand, by finding the largest ping packet size that still passes without fragmentation (see http://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_finding_optimal_mtu) Best Sebastian -Toke -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] trivial 6in4 fix(?)
Hi Toke, On Jun 17, 2013, at 11:44 , Toke Høiland-Jørgensen t...@toke.dk wrote: Sebastian Moeller moell...@gmx.de writes: Honestly, I think the best thing to do is not so much assume ATM or lack of ATM, but simply measure it :) Right, doing the ping test with payload sizes from 16 to 113 packets gives me an almost completely flat ping time distribution ranging from 20.3 to 21.3 ms (see attached graphic). So probably I'm on PTM… I fully believe you that it is flat (graph did not make it into my inbox…) So that looks like PTM. Good! But beware the expected step size depends on your down and uplink speeds, at VDSL I would only expect a very tiny increase (basically the time it takes to see an additional ATM cell back and forth, (RTT step per ATM cell in milliseconds = (53*8 / line.down.bit + 53*8 / line.up.bit ) * 1000); this means that potentially a large sample size per ping packet size is required to be reasonably sure that there is no step Easy to figure out empirically by hand, by finding the largest ping packet size that still passes without fragmentation (see http://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_finding_optimal_mtu) $ ping -c 1 -s $((1500-28)) -M do www.debian.org PING www.debian.org (128.31.0.51) 1472(1500) bytes of data. 1480 bytes from senfl.debian.org (128.31.0.51): icmp_seq=1 ttl=45 time=114 ms --- www.debian.org ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 114.522/114.522/114.522/0.000 ms $ ping -c 1 -s $((1500-27)) -M do www.debian.org PING www.debian.org (128.31.0.51) 1473(1501) bytes of data. From 10.42.3.5 icmp_seq=1 Frag needed and DF set (mtu = 1500) --- www.debian.org ping statistics --- 0 packets transmitted, 0 received, +1 errors So the MTU seems to be 1500 bytes. That is great! Now, how do I figure out what the PTM overhead is and feed it to HTB? :) All I know so far is that PTM will not drag in the quite baroque ATM encapsulation options. Googling for vdsl2 makes me hope that maybe there is no additional user visible overhead; so if you have PPP that would still need handling. It would be quite interesting to determine the overhead empirically. ATM's quantization makes overhead detection in atm based del lines conceptually easy; but for VDSL I am not so sure. In principle we expect to see buffer bloat and its signature increase of latency on saturated links if we shape with too high rates. So too small an overhead should fill the modems buffers and might increase latency (depending on the modems configuration, but assuming pfifo the buffer should just fill up slowly until latencies should be noticeably affected, or?). Hence in theory using a saturating load and measuring the latencies for different overhead values should still work. I wonder whether rrul might just be the right probe? If you go that route I would be delighted to learn the outcome :). Sorry to be of no more help here. Best Sebastian -Toke ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
[Cerowrt-devel] late report on 3.7.4-3 with ATM linklayer
Hi Dave, hi group so finally I am getting around to repeat a few test with cerowrt on a rather typical ADSL line. Using simple_qos.sh did not result in nice ping time behavior under up- and downlink saturating loads, as I had seen with earlier versions (like 1.5 years ago when I switched to an docsis carrier). Even though I had actives the scripts provisions for ATM and my PPPOE overhead. Still the occasional ping was delayed for 400+ milliseconds (with unloaded ping time to the host of ~24 ms). The following two changes to (the old) simple_qos.sh script helped a lot (by somehow bounding the worst case ping at around 105 ms, still not great, but so much better than before…). Anyway here are the two additions: Note: I have no linux machine at home ATM so I do my measurements under macosx and I have not managed to get netsurf-wrapper to work for me, so all data are quite inscientifcly recorded (start a long ping train against the nearest ISP-side host that reliably responds to my ping probes and shows robust timing without load, then adding a big link saturating file transfer to the net, then opening 99 browser tabs at once). Finally my encapsulation ends up in a per packet overhead of 40 bytes per packet (in addition to the ip header and all the rest). EGRESS_STAB_STRING=stab mtu 2048 tsize 128 overhead 26 linklayer atm INGRESS_STAB_STRING=stab mtu 2048 tsize 128 overhead 40 linklayer atm (NOTE: mtu here is not MTU but an stab specific way of setting an upper limit for the size table to be built and used) and here is where I put them: 1) in egress() c qdisc add dev $IFACE root handle 1: ${EGRESS_STAB_STRING} htb ${RTQ} default 12 2) in ingress() tc qdisc add dev $DEV root handle 1: ${INGRESS_STAB_STRING} htb ${RTQ} default 12 and in addition I set: UPLINK=2400 #2558 DOWNLINK=15582 #16402 DEV=ifb0 QDISC=nfq_codel # nfq_codel is higher over head than fq_codel but does better on quantums. I hope. IFACE=ge00 DEPTH=42 TC=/usr/sbin/tc FLOWS=8000 PERTURB=perturb 0 # Permutation is costly, disable FLOWS=16000 # BQL_MAX=3000 # it is important to factor this into the RED calc CEIL=$UPLINK MTU=1500 ADSLL= PPOE= (using the same setting with empty [E|IN]GRESS_STAB_STRING variables but PPOE=1 yielded the quite miserable maximal ping times of 400ms). So my hunch is that the more general stab mechanism (which to this layman seems to work not by fudging the rate in the kernel, but by telling the kernel the actually sizes of the data packets (in the linklayer) so the kernel shapes correctly) seems to have taken less damage the original HTB internal mechanism of the same spirit. I will, once I get round to it, that is, try to repeat these measurement with the most recent cerowrt alpha build. I encourage everyone on an ATM link to test the described modifications and report back success or failure. Don't know whether your XDSL line uses ATM as link layer, don't know your overhead? just let me know I might be able to help :). Most VDSLs hopefully use packet transfer mode (PTM) which did away with the ATM cell quantization voodoo, so only ADSL, ADSL2 and ADSL2+ user need to worry about the link layer. The overhead however might also be an issue with VDSL… Best Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt-3.10.2-1 dev release + owamp
Hi Fred, hi List On Jul 26, 2013, at 08:21 , Fred Stratton fredstrat...@imap.cc wrote: I can certainly confirm this, having spent several fruitless hours with the build. 6in4 remains broken for henet. dnsmasq appears not to recognise additional domain name servers. The ISP I use has a very slow domain name service, to which the system now defaults. The consequence of this is that opkg times out, and no packages can be installed. It is still not possible to watch a video stream and download files simultaneously on an ADSL line. I can not comment on most of your issues, but I might have some information about the ADSL issue. I only got around to test 3.10.1-1 but I suspect that there are no changes in the relevant packages between these versions. I had a few issues with getting my ADSL line (atm carried adsl2+) to work reasonably; maybe some of these issues are at play in your setup as well. Anyway, it turned out to have probably two main reasons: 1) It looks that the AQM luci interface does not really propagate the requested bandwidth down to simple_qos.sh (the version Toke's AQM package supplies), to get this working I had to edit the bandwidth defaults in /usr/lib/aqm/functions.sh to make it work at all. (I then switched back to the stand alone simple_qos.sh; no time yet to debug why the luci-fied version did not honor the up- and download speeds from the gui) That helped a lot. Enabling simple_qos.sh's PPPOE option did not improve things to where I expected them. 2) It seems HTB's issues with regards to the ATM carrier (and potentially per packet overhead do not seem fully solved yet. I side stepped this issue by resorting to handle these issues with the more generic td-stab mechanism: I added the following to simple_qos.sh (my line has 40 bytes of encapsulation overhead, but linux already accounts for the 14bytes ethernet header, so the additional overhead is 26, you probably know your overhead already*): EGRESS_STAB_STRING=stab mtu 2048 tsize 128 overhead 26 linklayer atm INGRESS_STAB_STRING=stab mtu 2048 tsize 128 overhead 26 linklayer atm then I changed egress() from: $TC qdisc add dev $IFACE root handle 1: htb default 12 to: $TC qdisc add dev $IFACE root handle 1: ${EGRESS_STAB_STRING} htb default 12 and ingress() from: $TC qdisc add dev $DEV root handle 1: htb default 12 to: $TC qdisc add dev $DEV root handle 1: ${INGRESS_STAB_STRING} htb default 12 that again helped a lot. 3) I also turned of polipo on my wndr3700 v2 assuming that the device has to little memory and flash storage to allow for polipo to be actually useful. I intend to supply polipo with a larger backing store and enable it again in due time. These changes turned maximum ping RTTs under load from initially up to almost 6 seconds (avg ~ 250ms) down to 226ms (avg 26ms). I just measured the ping times to a near host (RTT ~24ms ) while saturating the upload with a single large transfer and stressing the download by opening around 100 media heavy browser tabs at once In case you test these changes I would love to hear whether this improves your situation or not. *) Note: thee is no universal ADSL overhead, it depends on the encapsulation method used by your ISP, so one either needs to look up the required information (see: http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ and http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=tc-stabampsect=8 and http://www.faqs.org/rfcs/rfc2684.html) or figure it out empirically. Best Regards Sebastian On 26 Jul 2013, at 06:20, Dave Taht dave.t...@gmail.com wrote: sysupgrade -n doesn't work with this release. Stay away. I have a new build of 3.10.3-1 and am trying to fix it... I did find the problem on the ubnt builds - I'd switched to the new babeld from quagga, but failed to install it by default. in openwrt trunk, elliptic curve has been enabled in openssl. It's long past time we enable https for configuration by default, and might as well figure out how to turn perfect forward secrecy on as well in the post-snowden era. owamp seemingly works well, with a couple glitches here and there. I got to where the lab was synced to about 1ms resolution... and 5 more gpses arrived today ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt-3.10.2-1 dev release + owamp
Hi Fred, On Jul 31, 2013, at 21:50 , Fred Stratton fredstrat...@imap.cc wrote: I have spent the time since last posting reinstalling, and retesting, the build. 6in4 works. As ever, the firewall configuration as reimported from backup blocks ipv6 function, even though the rules appear correct. I worked around this by only reimporting the configuration files needed. I tried the approach advocated by Sebastian, but achieved better results leaving the scripts in /usr/lib/aqm unaltered, which implies that the AQM GUI functions as intended. Interesting, so I might have screwed something up on my router. Could you do me a favor and post the results from running the following on your cerowrt router via ssh (for the uplink): tc -s -d class show dev ge00 and (for the downlink): tc -s -d class show dev ifb0 I am especially interested to learn whether the NNN class htb 1:1 root rate Kbit reflects the values you configured in the AQM mask or wether those are the defaults (4000 for up and 2 for down if I recall correctly). That would be just great as I have already mucked with these files and hence have no pristine sources. To check this is on my todo list for the next cerowrt verso, unless you beat me to it :) . And finally did you check the ADSL connection checkbox or not. Left with an uplink delay of 650 to 750 milliseconds, on a known poor ADSL line, which is better than previous builds, but still means that undertaking concurrent internet activities is problematic. This looks quite abysmal, if I might ask, what is the unloaded ping RTT to one of the first hops on your ISPs side? And how did you measure the uplink delay (netalyzr by any chance)? best Sebastian On 26 Jul 2013, at 16:51, Sebastian Moeller moell...@gmx.de wrote: Hi Fred, On Jul 26, 2013, at 16:31 , Fred Stratton fredstrat...@imap.cc wrote: Thank you, Sebastian and David. Sebastian I was unaware of the problem with functions.sh I use polipo. I shall try your stab approach again. I have a bridged connection, rather than using PPPoE, but can adapt. There is a number of different encapsulations for bridged setups, just pick the one relevant for your link: Connection: Bridged, VC/Mux RFC-1483/2684 Protocol (bytes): Ethernet Header (14), ATM pad (2), ATM AAL5 SAR (8) : Total 24 Connection: Bridged, VC/Mux+FCS RFC-1483/2684 Protocol (bytes): Ethernet Header (14), Ethernet PAD [8] (0), Ethernet Checksum (4), ATM pad (2), ATM AAL5 SAR (8) : Total 28 Connection: Bridged, LLC/SNAP RFC-1483/2684 Protocol (bytes): Ethernet Header (14), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 32 Connection: Bridged, LLC/SNAP+FCS RFC-1483/2684 Protocol (bytes): Ethernet Header (14), Ethernet PAD [8] (0), Ethernet Checksum (4), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 36 To my knowledge, the worst case overhead for ADSL connections is 44bytes. If you do not have sufficient information about your encapsulation at hand contact me and I am happy to figure it our empirically... I suspect all European telcos use ADSL2+ over ATM. At least in Germany you can also find ADSL1 over ATM, but that has no bearing on the overhead or on the general ATM quantization issue. As fas as I know all ADSLs use an ATM carrier, while VDSL systems (hopefully) should use PTM (without the weird quantization issues). Hope that helps Sebastian David I am encouraged that it works for you. Shall review settings via uci. On 26 Jul 2013, at 11:51, David Personette dper...@gmail.com wrote: HEnet has been working consistently for me. In /etc/config/network make sure that se00, sw00, sw10, gw00, gw10, gw01, and gw11 all have the following line: option ip6assign64 And add the following using your information to replace the '###' fields, also remove any earlier configuration for it: config interface henet option proto6in4 option peeraddr ### option ip6addr ### option tunnelid ### option username ### option password ### option ip6prefix### option mtu 1480 option ttl 64 Finally add henet to the wan zone in /etc/config/firewall The way to setup custom DNS is also in /etc/config/network, add the following to you ge00 config: option peerdns 0 option dns '208.67.222.222 208.67.220.220' I ran into issues updating to the 3.10.1 build, something got borked with my configuration. Once I restored a backup, everything was fine (NOTE: this is an assumption, it could have just been the additional reboot that fixed the flakeyness). The upgrade to 3.10.2
Re: [Cerowrt-devel] cerowrt-3.10.2-1 dev release + owamp
Hi Fred, On Aug 8, 2013, at 01:21 , Fred Stratton fredstrat...@imap.cc wrote: On 7 Aug 2013, at 14:38, Sebastian Moeller moell...@gmx.de wrote: Hi Fred, this got a bit longish so I took the liberty to reduce the quoted text a bit On Aug 5, 2013, at 12:47 , Fred Stratton fredstrat...@imap.cc wrote: [snipp] You are using 2 routers in series. I have disabled all routing functions on the 2wire. It is transparent to the network. Which is exactly the situation I faced with the cable modem before; my cerowrt-router was provisioned with an IP address through the bridged cable-modem via DHCP, but I still could access the modem's 192.168.100.1 with out any configuration required. I know there is some openwork information (http://wiki.openwrt.org/doc/howto/access.modem.through.nat) that makes it look like one needs to do more involved fiddling with the firewall, but that turned out not to be required with cerowrt. I do not know how that works if one runs a pppoe client on cerowrt though and I left cerowrt's ip address assignment in place. (My hunch is that since cerowrt leaves the typical 192.168.N.N ranges alone the whole issue gets reduced to a simple routing issue… and since Dave takes care that cero works well as secondary (test) router in a typical home situation, I guess routing 192.168.N.N is well with in cerowrt's scope) But, I guess you tried that already and it still does not work. Would be interesting to learn why… The difference is that you have the ISP gateway as a primary device issuing a DHCP address to the cerowrt secondary router. The 2 devices are then obviously on the same ipv4 subnet. I use the 2700 transparently. DHCP is turned off. If I turn it on, I have to use the device in DMZ mode with its firewall on, which I do not want to do. Sorry, to keep harping on this, but this is pretty close to what I did with the cable modem. As I said I had it working with a similar setup as you have, cerowrt was assigned a public IP (75.142.58.156) address by the cable-ISPs dhcp server while the modems configuration interface was running on the private 192.168.100.1. So the modem and cero were decidedly not on the same IP subnet, but still I could connect to it without needing to change anything. Initially, before I found out that it works out of the box I had defined an alias IP address on the wan interface of (WAN2CABLEMODEM ipv4-address:192.168.100.2; ipv4-netmask: 255.255.255.0). But it turned out that this was not necessary as of cerowrt 3.3.8-17 I did not need to do this any more, accessing the cable modem just worked by directing a browser to 192.168.100.1. So, have you tried to access the modem recently by simply directing a browser to its address? And have you tried the same after just configuring an alias as hinted above? If so what was the result? I configured an alias using uci at your prompting. It works. I can now access the 2700. Excellent, that is solved then. On to the next issues... Initially, I used the 2700 with the tomatoUSB router attached to that, and then a router running openWRT. This setup allowed access to the 2700, through a masquerade in tomatoUSB. Although ipv6 addresses were propagated throughout the network by Barrier Breaker, ipv6 did not work, probably because of the way radvd works in tomato. I have never used the cerowrt as a secondary device because of this. [snipp] I do not want to use cable, which is expensive. The DOCSIS box - a custom Netgear device - has a poor reputation. I do not want to use fibre, again, because when it comes here, it will be supplied by BT/, and is traffic shaped and capped. The BT web site has 35 pages of price increases for this year. I will continue with ADSL2+ I fully agree that getting ADSL(2+) links debated and offering low latency internet access is worthwhile as a considerable number of people simply have no other choice available. And this is why your case is so interesting! If you can improve the interactivity in your home and document the required steps somewhere others will have an easier time. Yes. Hopefully others using ADSL will also participate. Okay, let's assume for a minute that the ATM link layer adaptation mechanism might be busted. Since each package (and its overhead) always consume an integer number of ATM cells and no ATM cells are shared between packets, the worst case ATM overhead would be close to 50% (a small packet with 48 + 1 byte length will take 2 full 48 byte ATM cells, just looking at the payload here). For bigger packets the ATM quantization overhead will be a smaller percentage. So if you shape down to 50% of link rates (and do your testing not only with very small packets) this should make the shaping robust even with a busted ATM link layer adaptation mechanism. Assuming sane numbers of channels
Re: [Cerowrt-devel] cerowrt-3.10.2-1 dev release + owamp
Hi Dave, I see that 3.10.5-1 is out. On Jul 26, 2013, at 07:20 , Dave Taht dave.t...@gmail.com wrote: sysupgrade -n doesn't work with this release. Stay away. I have a new build of 3.10.3-1 and am trying to fix it… Does sys upgrade -n work for this release? best Sebastian I did find the problem on the ubnt builds - I'd switched to the new babeld from quagga, but failed to install it by default. in openwrt trunk, elliptic curve has been enabled in openssl. It's long past time we enable https for configuration by default, and might as well figure out how to turn perfect forward secrecy on as well in the post-snowden era. owamp seemingly works well, with a couple glitches here and there. I got to where the lab was synced to about 1ms resolution... and 5 more gpses arrived today ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt-3.10.2-1 dev release + owamp
Hi Dave, On Aug 11, 2013, at 23:52 , Dave Taht dave.t...@gmail.com wrote: On Sun, Aug 11, 2013 at 1:25 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, I see that 3.10.5-1 is out. I'm just keeping up with the patches and the openwrt tree. Stephen walker also dropped in a huge update to a bunch of packages As for 3.10.5-1. It builds, it boots. 3.10 will be the stable version of linux for a while, so it's going to pay to backport some stuff. There are a ton of patches in net-next (a better fix to 6in4 encapsulation and a couple tcp fixes in particular), a mod to nfq_codel I want to put in and then there's this ongoing htb issue. Good to know, thanks. I apologize for merely watching that debate blur by and not doing much about it. It seemed like you were making progress… Oh, nothing to see here, Fred and I are both actually on ADSL lines so we need to do the testing anyway, and it is fun to some degree trying to figure out what is the issue with his setup. It looks like is a window of opportunity to switch to VDSL2+ in autumn and I very much would like to that, so I want to help with the ADSL issues while I still can test. That said if anyone has additional input I would be happy to hear it. and Jesper was on vacation, and I was wrapped up in ietf and then cluecon and just got back from chicago yesterday… I saw the video on your site, but have not gotten around to actually watch it… On the plus side the members of the aqm bof voted overwhelmingly (180+ folk) in favor to propose a working group for aqm and packet scheduling. It's up before the board now and the proposed charter is here: http://datatracker.ietf.org/wg/aqm/charter/ I spent a great deal of time this past month fooling with webrtc and fq_codel and variants and talked a bit about that experience at cluecon: https://plus.google.com/u/0/107942175615993706558/posts/hqitKPqAHkc I've asked a buddy to get the slides incorporated directly in the video... they are really needed to keep the context going. Great, will wait for that On Jul 26, 2013, at 07:20 , Dave Taht dave.t...@gmail.com wrote: sysupgrade -n doesn't work with this release. Stay away. I have a new build of 3.10.3-1 and am trying to fix it… Does sys upgrade -n work for this release? I don't know. I just got back! :/ I'll be sure to find out in a day or so after I unpack my demo routers… The odds are extremely good I'll try a 3.10.6 based build this week and be able to poke harder into it. However, fixing the dsl issue is going to take some work and as I'm also prototyping some codel changes in x86 I will probably be testing the patchset on that arch rather than in cero, first. Felix has also been doing some serious rework around wifi aggregation handling... I don't have anything seriously conflicting on my schedule for the next several weeks so I hope we'll get much closer to a stable version of everything on everything soon. Are there any other serious roadblockers to a release? Any new packages that are needed? Not a real packet request, but open connect (http://www.infradead.org/openconnect/) and ocserv (http://www.infradead.org/ocserv/ ) look quite interesting. (I have to access several servers using open connect and it would be sweet to have this connection directly set up in cerowrt so I can keep the client computers pristine). But I am unsure how well these integrate into current openwork trunk and hence cerowrt... I saw the aqm script patches go by a few minutes ago, I'll be sure to take look at them before popping something new out… Great! best Sebastian best Sebastian I did find the problem on the ubnt builds - I'd switched to the new babeld from quagga, but failed to install it by default. in openwrt trunk, elliptic curve has been enabled in openssl. It's long past time we enable https for configuration by default, and might as well figure out how to turn perfect forward secrecy on as well in the post-snowden era. owamp seemingly works well, with a couple glitches here and there. I got to where the lab was synced to about 1ms resolution... and 5 more gpses arrived today ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] AQM scripts modified for DSL exploration
Hi Dave, On Aug 12, 2013, at 02:06 , Dave Taht dave.t...@gmail.com wrote: Alright I slammed them into ceropackages for the next build but I don't get this line: STABSTRING=stab mtu 2048 tsize ${TSIZE} overhead ${OVERHEAD} linklayer ${LINKLAYER} mtu 2048? I agree that looks wrong on the face of it, but it is not as bad as it seems. From td-stab's man page (http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=tc-stabampsect=8): NAME tc-stab - Generic size table manipulations SYNOPSIS tc qdisc add ... stab \ [ mtu BYTES ] [ tsize SLOTS ] \ [ mpu BYTES ] [ overhead BYTES ] [ linklayer TYPE ] ... TYPE := adsl | atm | ethernet For the description of BYTES - please refer to the UNITS section of tc(8) . mtu maximum packet size we create size table for, assumed 2048 if not specified explicitly tsize required table size, assumed 512 if not specified explicitly mpu minimum packet size used in computations overhead per-packet size overhead (can be negative) used in computations linklayer required linklayer adaptation. So tc-stab's MTU value is only used to create a size table up to that value, HTB, by the way, also has: { fprintf(stderr, Usage: ... qdisc add ... htb [default N] [r2q N]\n default minor id of class to which unclassified packets are sent {0}\n r2q DRR quantums are computed as rate in Bps/r2q {10}\n debugstring of 16 numbers each 0-3 {0}\n\n ... class add ... htb rate R1 [burst B1] [mpu B] [overhead O]\n [prio P] [slot S] [pslot PS]\n [ceil R2] [cburst B2] [mtu MTU] [quantum Q]\n rate rate allocated to this class (class can still borrow)\n burstmax bytes burst which can be accumulated during idle period {computed}\n mpu minimum packet size used in rate computations\n overhead per-packet size overhead used in rate computations\n ceil definite upper class rate (no borrows) {rate}\n cburst burst but for ceil {computed}\n mtu max packet size we create rate map for {1600}\n prio priority of leaf; lower are served first {0}\n quantum how much bytes to serve from leaf at once {use r2q}\n \nTC HTB version %d.%d\n,HTB_TC_VER16,HTB_TC_VER0x ); } So, I guess this should have been called max_MTU_tablesize to not be confusing. My understanding is that the kernel builds a table giving the effective size for each packet length (by taking ATM cell overhead and per packet overhead into account) by looking it up in a table, the mtu here just tells tc how large a table to create. I picked 2048, since it seems to be the default just to be verbose (and to account for baby giant frames that seem to be intudiced at some del ISPs to allow 1500 effective MTU in spite of PPPOE overhead, but I digress). Looking at it again I assume that 1600 (HTB's default) might be large enough, but 2048 (actually 2047) allows an easier way to get 16byte size table steps which work well for 48byte ATM payload per cell sizes… I guess I will change the AQM scripts to allow direct manipulation of MTU, MPU and TSIZE just to be compete (of these only TSIZE is stab only, the other two also apply to HTB), so that it becomes easier to experiment with those from the GUI... hope that helps best regards Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] AQM and ADSL
Hi Dave, On Aug 13, 2013, at 22:33 , Dave Taht dave.t...@gmail.com wrote: On Tue, Aug 13, 2013 at 12:40 PM, Fred Stratton fredstrat...@imap.cc wrote: DT has taken to stealth with his releases. Heh. No, these are not releases. If I don't get a chance to do some testing I don't post their existence to the list. I don't mind people being eager and willing to test 'em without me doing so, tho. To explain, I do what's called continuous integration whenever possible (see wikipedia) Ah, understood, I guess continuous testing works well with this approach :) Big triggers for that are new kernel releases (like 3.10.6), fixes like the boatload that felix just landed for the ethernet and the wifi aggregation rework, and tossing in a new version of the codel patches, and tossing in the AQM stuff that sebastian just did - and all that landed in 3.10.6-1. Merely getting this stuff to compile is often a chore. In fact, it took me all day to get all that to work... and my only spare router I dare sacrifice (the other 4 in the testbed are doing some benchmarks as I write) is still in my suitcase and in the car. Usually I get around to announcing a development release once it's been tested for a little while, at least a quick check, and preferably a couple hours under heavy load, and usually that will be at least a -3. I generally try to stay away from X.Y.0 and wait for X.Y.1... We still have a few things left to fold into this kernel - notably the new htb patches from jesper and to backport a few other fixes from net-next, and it looks like the changes to the AQM thing need a smarter radio box in the gui(?). Ah, yes, I was looking for radio buttons initially came up blank, then failed to get LUCI's :depends mechanism work properly and just resorted to exposing both checkboxes. Far from ideal, I guess in the next iteration I will just have a drop down bow with none, htb, stab then it will be unambiguous. I'm also trying to freeze the ubnt picostation and nanostation on the same stuff (and what's causing even more delay is trying to get a guruplug and edgerouter booted with it too. I've always resisted expanding the scope of the cerowrt project to more hardware, but I need two higher power boxes to drive some tests with desperately) More hardware, more pain? I wonder, have you found a successor to the WNDR3[7|8]00 yet? I'm off to do my laundry in a bit, and will return with fresh clothes and that router this evening. I would very much like to get to a new stable release this month, it's been far too long since the last one! But to do that we have to have zero crash bugs Okay, not that it is conclusive, but 3.10.1-1did not crash on me once. , and be feature complete. I think we are close to zero crashes (but felix fiddling with the ethernet and the wifi recently scares me and I still haven't checked to see if sysupgrade worked in a couple releases (I flash via tftp usually)), but there are some nagging features that I'd like to get in there. So are Felix changes already in 3.10.6? If so I will ahead and try to sysupgrade it (I always can fall back to TFTP even if it is a bit inconvenient due to positioning the router above my reach :) ) Like this one. Keep plugging away, please. Are there any other must-have features for this puppy? I've been looking over gargoyle's ACC code in particular. Many Thanks Best Regards Sebastian So 3.10.6-1 fails with sysupgrade? best Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] AQM and ADSL
Hi Fred, On Aug 14, 2013, at 14:01 , Fred Stratton fredstrat...@imap.cc wrote: On 14 Aug 2013, at 12:42, Sebastian Moeller moell...@gmx.de wrote: Hi Fred, On Aug 13, 2013, at 21:40 , Fred Stratton fredstrat...@imap.cc wrote: (apologies for wrecking the list, and introducing email addresses in error) Begin forwarded message On 13 Aug 2013, at 19:53, Sebastian Moeller moell...@gmx.de wrote: H Fred On Aug 13, 2013, at 17:28 , Fred Stratton fredstrat...@imap.cc wrote: I have been experimenting with the two sets of modified sets of scripts and AQM panels. Thank you for constructing them. Thanks for testing... To mention the string ''for ATM choose' is repeated erroneously in the extended panel. Fixed… I will try to test whether it actually works before sending the next version... The scripts work. The link layer giving best results is ethernet. What and how did you measure? Using use HTB's private mechanism for linklayer and overhead or Use tc's stab mechanism for linklayer and overhead? A little browsing of the kernel source makes me believe that the HTB version is fully busted and will not do anything at all (so I would have imagined adel atm and ethernet to behave the same). I am thinking about how to test whether a link layer adjustment works or not. Ein Fehler. I had both chosen. They are mutually exclusive options. 2 days of testing lost. Shall restart. I will try to fix the AQM scripts to make these two mutually exclusive. That said, the HTB internal implementation does not seem to work at all, so enabling both should be equivalent to just enabling stab. In my quick and dirty testing (using netsurf-wrapper, which I got working on macosx 10.8) it looks like activating both actually should work. BTW I am looking for an open netsurf server in Europe anybody any ideas? I am actually getting better results from htb than td-stab at present. Then I will have to test an compare the RRUL performance for stab-linklayeradjustments (loa), htb-lla, no-lla, no-shaping at all, at 50% of link rate and at say 80% of link rate and see which performs best. Alas I need a closer netperf 2.6.0 net server binary than the ones in NY and CA. So far I am failing to find a windows binary I could run on one of the machines in the lab… How do you measure currently? I would love to run the same tests to figure out what is up with the two loa methods. Pinging severs whilst running Netalyzr has no effect. Not being a native english speaker cloud you be more explicit, please. Was the ping RTT affected by the concurrent netalyzr run (especially up- and download testing)? Did you get netsurf-wrapper to work on ubuntu? You did not understand because I explained what I did, and I did the wrong thing. Not done properly. Will retry. Netsurf-wrapper will not compile. I am going to move to a more recent version of Ubuntu. Interesting, I managed to install it under 64bit Ubuntu 12.04 in a virtual machine, using the packages Toke supplied. I just added http://archive.tohojo.dk/ to Software Sources in Update Manager than I could use the Synaptic Package Manager to install netperf and netperf-wrapper from Toke's repository; so I guess no ned to compile anything. (Under maces however installing netsurf-wrapper was slightly more involved as the recommended way via pip did not work, so I had to download the netperf-wrapper repository from https://github.com/tohojo/netperf-wrapper and the cd into the downloaded directory and issue sudo python2.7 ./setup.py install there and I had to symplink python2.7 to python2, but after that it also worked). Just as a illustration what to expect, please find attached the RRUL results with stab based AQM und without any AQM; clearly fq_codel improves the ping RTTs a lot, so AQM works. Alas, I did not repeat this test with shaping enabled but no link layer adjustments or with the HTB link layer adjustments, so can not really tell, whether RRUL is sensitive enough to show the effects of link layer adjustments or not (my bet is on not as RRUL in my understanding uses large packets while the ATM quantization effects are strongest for small packets). I might try to do this tonight or when I get around to do it… I would be really curious to see such plots from your setup for comparison. Will try your suggestion for Ubuntu. figure_5.pngfigure_6_like_5_noAQM.png The tone buckets of the phone signal are translated into ATM packets by the DSP in the 2 Wire 2700. I have no idea what this closed source BSD implementation does to the packets before they are sent to CeroWRT. I am using 3.10.2-1, as I cannot get the latest version to install with sys upgrade. I was trying 3.10.5-1 Ah, good, I might try 3.10.6-1 then directly in tftp mode. Does anyone know how much time
Re: [Cerowrt-devel] Fwd: [PATCH] net_sched: restore linklayer atm handling
Hi Dave, On Aug 15, 2013, at 02:46 , Dave Taht dave.t...@gmail.com wrote: jesper just dropped this set of patches on the netdev list, which I hope address some of the remaining issues on htb. There was some debate the last time this went by on the list, so the word is not in yet. I know Jesper created the link layer adaptation in the first place and probably has a soft spot n his heart for the HTB implementation thereof, but really he should switch HTB to use the size table instead of the rate table approach. This should allow for backward compatibility and will eliminate one of two ways to achieve basically the same. (And to soothe him, it looks pretty much that the stab code heavily borrowed from Jespers code) I note I would really like to get around to profiling the final aqm code in the hope of squeezing more performance out of it. Presently it tops out at about 25/25Mbit. (it's not fq_codel but htb) One reason I keep harping this cord is that should we need switching to a different disc (like hfsc) stab will just work… best Sebastian -- Forwarded message -- From: Jesper Dangaard Brouer bro...@redhat.com Date: Wed, Aug 14, 2013 at 2:47 PM Subject: [PATCH] net_sched: restore linklayer atm handling To: David S. Miller da...@davemloft.net, Dave Taht dave.t...@gmail.com, net...@vger.kernel.org Cc: Jesper Dangaard Brouer bro...@redhat.com, Stephen Hemminger shemmin...@vyatta.com commit 56b765b79 (htb: improved accuracy at high rates) broke the linklayer atm handling. tc class add ... htb rate X ceil Y linklayer atm The linklayer setting is implemented by modifying the rate table which is send to the kernel. No direct parameter were transferred to the kernel indicating the linklayer setting. The commit 56b765b79 (htb: improved accuracy at high rates) removed the use of the rate table system. To keep compatible with older iproute2 utils, this patch detects the linklayer by parsing the rate table. It also supports future versions of iproute2 to send this linklayer parameter to the kernel directly. This is done by using the __reserved field in struct tc_ratespec, to convey the choosen linklayer option, but only using the lower 4 bits of this field. Linklayer detection is limited to speeds below 100Mbit/s, because at high rates the rtab is gets too inaccurate, so bad that several fields contain the same values, this resembling the ATM detect. Fields even start to contain 0 time to send, e.g. at 1000Mbit/s sending a 96 bytes packet cost 0, thus the rtab have been more broken than we first realized. Signed-off-by: Jesper Dangaard Brouer bro...@redhat.com --- include/net/sch_generic.h |9 - include/uapi/linux/pkt_sched.h | 10 +- net/sched/sch_api.c| 41 net/sched/sch_generic.c|1 + net/sched/sch_htb.c| 13 + 5 files changed, 72 insertions(+), 2 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 6eab633..e5ae0c5 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -683,13 +683,19 @@ struct psched_ratecfg { u64 rate_bytes_ps; /* bytes per second */ u32 mult; u16 overhead; + u8 linklayer; u8 shift; }; static inline u64 psched_l2t_ns(const struct psched_ratecfg *r, unsigned int len) { - return ((u64)(len + r-overhead) * r-mult) r-shift; + len += r-overhead; + + if (unlikely(r-linklayer == TC_LINKLAYER_ATM)) + return ((u64)(DIV_ROUND_UP(len,48)*53) * r-mult) r-shift; + + return ((u64)len * r-mult) r-shift; } extern void psched_ratecfg_precompute(struct psched_ratecfg *r, const struct tc_ratespec *conf); @@ -700,6 +706,7 @@ static inline void psched_ratecfg_getrate(struct tc_ratespec *res, memset(res, 0, sizeof(*res)); res-rate = r-rate_bytes_ps; res-overhead = r-overhead; + res-linklayer = (r-linklayer TC_LINKLAYER_MASK); } #endif diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index dbd71b0..09d62b9 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -73,9 +73,17 @@ struct tc_estimator { #define TC_H_ROOT (0xU) #define TC_H_INGRESS(0xFFF1U) +/* Need to corrospond to iproute2 tc/tc_core.h enum link_layer */ +enum tc_link_layer { + TC_LINKLAYER_UNAWARE, /* Indicate unaware old iproute2 util */ + TC_LINKLAYER_ETHERNET, + TC_LINKLAYER_ATM, +}; +#define TC_LINKLAYER_MASK 0x0F /* limit use to lower 4 bits */ + struct tc_ratespec { unsigned char cell_log; - unsigned char __reserved; + __u8linklayer; /* lower 4 bits */ unsigned short overhead;
Re: [Cerowrt-devel] some kernel updates
Hi Jesper, On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer jbro...@redhat.com wrote: On Thu, 22 Aug 2013 22:13:52 -0700 Dave Taht dave.t...@gmail.com wrote: On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller moell...@gmx.de wrote: Hi List, hi Jesper, So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustments to see whether the recent changes resurrected this feature. Unfortunately the htb_private link layer adjustments still is broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same as without link layer adjustments). On the bright side the tc_stab method still works as well as before (ping RTT around 40ms). I would like to humbly propose to use the tc stab method in cerowrt to perform ATM link layer adjustments as default. To repeat myself, simply telling the kernel a lie about the packet size seems more robust than fudging HTB's rate tables. After the (regression) commit 56b765b79 (htb: improved accuracy at high rates), the kernel no-longer uses the rate tables. See, I am quite a layman here, spelunking through the tc and kernel source code made me believe that the rate tables are still used (I might have looked at too old versions of both repositories though). My commit 8a8e3d84b1719 (net_sched: restore linklayer atm handling), does the ATM cell overhead calculation directly on the packet length, see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53). Thus, the cell calc should actually be more precise now but see below Is there any way to make HTB report which link layer it assumes? Especially since the kernel already fudges the packet size to account for the ethernet header and then some, so this path should receive more scrutiny by virtue of having more users? As you mention, the default kernel path (not tc stab) fudges the packet size for Ethernet headers, AND I made a mistake (back in approx 2006, sorry) that the overhead cannot be a negative number. Mmh, does this also apply to stab? Meaning that some ATM encap overheads simply cannot be configured correctly (as you need to subtract the ethernet header). Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting the overhead to be larger than it actually is is relatively benign, as it will overestimate packe size. (And its quite problematic to change the kABI to allow for a negative overhead) Again I have no clue but overhead seems to be integer, not unsigned, so why can it not be negative? Perhaps we should change to use tc stab for this reason. But I'm not sure stab does the right thing either, and its accuracy is also limited as its actually also table based. But why should a table be problematic here? As long as we can assure the table is equal or larger to the largest packet we are golden. So either we do the manly and stupid thing and go for 9000 byte jumbo packets for the table size. Or we assume that for the most part ATM users will art best use baby jumbo frames (I think BT does this to allow payload MTU 1500 in spite of PPPoE encapsulation overhead) but than we are quite fine with the default size table maxMTU of 2048 bytes, no? We could easily change the kernel to perform the ATM cell overhead calc inside stab, and we should also fix the GSO packet overhead problem. (for now remember to disable GSO packets when shaping) Yeah I stumbled over the fact that the stab mechanism does not honor the kernels earlier adjustments of packet length (but I seem to be unable to find the actual file and line where this initially is handeled). It would seem relatively easy to make stab take the earlier adjustment into account. Regarding GSO, I assumed that GSO will not play nicely with a AQM anyway as a single large packet will hog too much transfer time... It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC=echo tc and paste here. I also hope is a misconfig. Please show us the config/script. Will do this later. I would be delighted if it is just me being stupid. I would appreciate a link to the scripts you are using... perhaps a git tree? Unfortunately I have no git tree and no experience with git. I do not think I will be able to set something up quickly. But I use a modified version of cerowrt's AQM scripts which I will post later. Now, I have been testing this using Dave's most recent cerowrt alpha version with a 3.10.9 kernel on mips hardware, I think this kernel should contain all htb fixes including commit 8a8e3d84b17 (net_sched: restore linklayer atm handling) but am not fully sure. It does. It have not hit the stable tree yet, but DaveM promised he would pass it along. It does seem Dave Taht have my patch applied: http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-restore
Re: [Cerowrt-devel] some kernel updates
Hi Dave, On Aug 23, 2013, at 07:13 , Dave Taht dave.t...@gmail.com wrote: On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller moell...@gmx.de wrote: Hi List, hi Jesper, So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustments to see whether the recent changes resurrected this feature. Unfortunately the htb_private link layer adjustments still is broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same as without link layer adjustments). On the bright side the tc_stab method still works as well as before (ping RTT around 40ms). I would like to humbly propose to use the tc stab method in cerowrt to perform ATM link layer adjustments as default. To repeat myself, simply telling the kernel a lie about the packet size seems more robust than fudging HTB's rate tables. Especially since the kernel already fudges the packet size to account for the ethernet header and then some, so this path should receive more scrutiny by virtue of having more users? It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC=echo tc and paste here. So I went for TC=logger tc and used log read to harvest as I could not find the echo output, but I guess that should not matter. So here is the result (slightly edited to get rid of the log timestamps and log level): tc qdisc del dev ge00 root tc qdisc add dev ge00 root handle 1: htb default 12 tc class add dev ge00 parent 1: classid 1:1 htb quantum 1500 rate 2430kbit ceil 2430kbit mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:10 htb quantum 1500 rate 2430kbit ceil 2430kbit prio 0 mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:11 htb quantum 1500 rate 128kbit ceil 810kbit prio 1 mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:12 htb quantum 1500 rate 405kbit ceil 2366kbit prio 2 mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:13 htb quantum 1500 rate 405kbit ceil 2366kbit prio 3 mpu 0 linklayer adsl overhead 40 mtu 2047 tc qdisc add dev ge00 parent 1:11 handle 110: fq_codel limit 600 noecn quantum 300 tc qdisc add dev ge00 parent 1:12 handle 120: fq_codel limit 600 noecn quantum 300 tc qdisc add dev ge00 parent 1:13 handle 130: fq_codel limit 600 noecn quantum 300 tc filter add dev ge00 parent 1:0 protocol all prio 999 u32 match ip protocol 0 0x00 flowid 1:12 tc filter add dev ge00 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:11 tc filter add dev ge00 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:12 tc filter add dev ge00 parent 1:0 protocol ip prio 3 handle 3 fw classid 1:13 tc filter add dev ge00 parent 1:0 protocol ipv6 prio 4 handle 1 fw classid 1:11 tc filter add dev ge00 parent 1:0 protocol ipv6 prio 5 handle 2 fw classid 1:12 tc filter add dev ge00 parent 1:0 protocol ipv6 prio 6 handle 3 fw classid 1:13 tc filter add dev ge00 parent 1:0 protocol arp prio 7 handle 1 fw classid 1:11 tc qdisc del dev ge00 handle : ingress tc qdisc add dev ge00 handle : ingress tc qdisc del dev ifb0 root tc qdisc add dev ifb0 root handle 1: htb default 12 tc class add dev ifb0 parent 1: classid 1:1 htb quantum 1500 rate 15494kbit ceil 15494kbit tc class add dev ifb0 parent 1:1 classid 1:10 htb quantum 1500 rate 15494kbit ceil 15494kbit prio 0 tc class add dev ifb0 parent 1:1 classid 1:11 htb quantum 1500 rate 32kbit ceil 5164kbit prio 1 tc class add dev ifb0 parent 1:1 classid 1:12 htb quantum 1500 rate 2582kbit ceil 15430kbit prio 2 tc class add dev ifb0 parent 1:1 classid 1:13 htb quantum 1500 rate 2582kbit ceil 15430kbit prio 3 tc qdisc add dev ifb0 parent 1:11 handle 110: fq_codel limit 1000 ecn quantum 500 tc qdisc add dev ifb0 parent 1:12 handle 120: fq_codel limit 1000 ecn quantum 1500 tc qdisc add dev ifb0 parent 1:13 handle 130: fq_codel limit 1000 ecn quantum 1500 tc filter add dev ifb0 parent 1:0 protocol all prio 999 u32 match ip protocol 0 0x00 flowid 1:12 tc filter add dev ifb0 protocol ip parent 1:0 prio 1 u32 match ip tos 0x00 0xfc classid 1:12 tc filter add dev ifb0 protocol ipv6 parent 1:0 prio 2 u32 match ip6 priority 0x00 0xfc classid 1:12 tc filter add dev ifb0 protocol ip parent 1:0 prio 3 u32 match ip tos 0x20 0xfc classid 1:13 tc filter add dev ifb0 protocol ipv6 parent 1:0 prio 4 u32 match ip6 priority 0x20 0xfc classid 1:13 tc filter add dev ifb0 protocol ip parent 1:0 prio 5 u32 match ip tos 0x10 0xfc classid 1:11 tc filter add dev ifb0 protocol ipv6 parent 1:0 prio 6 u32 match ip6 priority 0x10 0xfc classid 1:11 tc filter add dev ifb0 protocol ip parent 1:0 prio 7 u32 match ip tos 0xb8 0xfc classid 1:11 tc filter add dev ifb0 protocol ipv6 parent 1:0 prio 8 u32 match ip6 priority 0xb8 0xfc classid 1:11 tc filter add dev
Re: [Cerowrt-devel] some kernel updates
Hi Fred, On Aug 23, 2013, at 15:02 , Fred Stratton fredstrat...@imap.cc wrote: [snipp] Thus, in kernels = 3.9, you would need to change/reduce your tc overhead parameter with -14 bytes (iif you accounted encapsulated Ethernet header before) That is what I thought before, but my kernel spelunking made me reconsider and switch to not subtract the 14 bytes since as I understand it the kernel actively does not do it if stab is used. The overhead of stab can be negative, so no problem here, in an int for stab. Meaning that some ATM encap overheads simply cannot be configured correctly (as you need to subtract the ethernet header). Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting the overhead to be larger than it actually is is relatively benign, as it will overestimate packe size. As a point of information, the entire UK uses PPPoA rather than PPPoE, and some hundreds of thousands of users IPoA. Lucky you! I guess one more reason to switch cerowrt over to stab, since PPPoA with VC/mux just adds 10 bytes of overhead, so if the ethernet would be accounted for already that would mean overhead -4 which HTB can not represent anyway. That said, unlike Jesper, I am not sure that tc stab includes the ethernet header by itself currently. Thanks for you input. [snipp] Best Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] some kernel updates
Hi Jesper hi List, On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer jbro...@redhat.com wrote: On Thu, 22 Aug 2013 22:13:52 -0700 Dave Taht dave.t...@gmail.com wrote: On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller moell...@gmx.de wrote: Hi List, hi Jesper, So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustments to see whether the recent changes resurrected this feature. Unfortunately the htb_private link layer adjustments still is broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same as without link layer adjustments). On the bright side the tc_stab method still works as well as before (ping RTT around 40ms). I would like to humbly propose to use the tc stab method in cerowrt to perform ATM link layer adjustments as default. To repeat myself, simply telling the kernel a lie about the packet size seems more robust than fudging HTB's rate tables. After the (regression) commit 56b765b79 (htb: improved accuracy at high rates), the kernel no-longer uses the rate tables. My commit 8a8e3d84b1719 (net_sched: restore linklayer atm handling), does the ATM cell overhead calculation directly on the packet length, see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53). Thus, the cell calc should actually be more precise now but see below Especially since the kernel already fudges the packet size to account for the ethernet header and then some, so this path should receive more scrutiny by virtue of having more users? As you mention, the default kernel path (not tc stab) fudges the packet size for Ethernet headers, AND I made a mistake (back in approx 2006, sorry) that the overhead cannot be a negative number. Meaning that some ATM encap overheads simply cannot be configured correctly (as you need to subtract the ethernet header). (And its quite problematic to change the kABI to allow for a negative overhead) Perhaps we should change to use tc stab for this reason. But I'm not sure stab does the right thing either, and its accuracy is also limited as its actually also table based. We could easily change the kernel to perform the ATM cell overhead calc inside stab, and we should also fix the GSO packet overhead problem. (for now remember to disable GSO packets when shaping) It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC=echo tc and paste here. I also hope is a misconfig. Please show us the config/script. I guess you nailed it. While I got no output whatsoever from echo func __detect_linklayer +p /sys/kernel/debug/dynamic_debug/control. I also followed Dave's advise to dump the tc commands to file (see earlier mail). I turns out that the script only added the HTB link layer adjustments to egress and not to ingress as well, fixing that pushed the ping RTT for ht.'s link layer adjustemtes (at 95% of linerate) down to ~45ms which is close enough to what stab delivers. I would appreciate a link to the scripts you are using... perhaps a git tree? Now, I have been testing this using Dave's most recent cerowrt alpha version with a 3.10.9 kernel on mips hardware, I think this kernel should contain all htb fixes including commit 8a8e3d84b17 (net_sched: restore linklayer atm handling) but am not fully sure. It does. It have not hit the stable tree yet, but DaveM promised he would pass it along. It does seem Dave Taht have my patch applied: http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-restore-linklayer-atm-handling.patch While I am not able to build kernels, it seems that I am able to quickly test whether link layer adjustments work or not. SO aim happy to help where I can :) So, what is you setup lab, that allow you to test this quickly? -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] some kernel updates
Hi Dave, I guess I found the culprit: once I added $ADSLL to the ingress() in simple.qos: ingress() { CEIL=$DOWNLINK PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty BE_RATE=`expr $CEIL / 6` # Min for best effort BK_RATE=`expr $CEIL / 6` # Min for background BE_CEIL=`expr $CEIL - 64` # A little slop at the top LQ=quantum `get_mtu $IFACE` $TC qdisc del dev $IFACE handle : ingress 2 /dev/null $TC qdisc add dev $IFACE handle : ingress $TC qdisc del dev $DEV root 2 /dev/null $TC qdisc add dev $DEV root handle 1: ${STABSTRING} htb default 12 $TC class add dev $DEV parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit $ADSLL $TC class add dev $DEV parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0 $ADSLL $TC class add dev $DEV parent 1:1 classid 1:11 htb $LQ rate 32kbit ceil ${PRIO_RATE}kbit prio 1 $ADSLL $TC class add dev $DEV parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2 $ADSLL $TC class add dev $DEV parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3 $ADSLL # I'd prefer to use a pre-nat filter but that causes permutation... $TC qdisc add dev $DEV parent 1:11 handle 110: $QDISC limit 1000 $ECN `get_quantum 500` `get_flows ${PRIO_RATE}` $TC qdisc add dev $DEV parent 1:12 handle 120: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BE_RATE}` $TC qdisc add dev $DEV parent 1:13 handle 130: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BK_RATE}` diffserv $DEV ifconfig $DEV up # redirect all IP packets arriving in $IFACE to ifb0 $TC filter add dev $IFACE parent : protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action mirred egress redirect dev $DEV } I get basically the same RRUL ping RTTs for htb_private as for tc_stab. So Jesper was right the patch seems to fix the issue. I guess I should send out my current version of yours and Toke's AQM scripts soon. Best Sebastian P.S.: I am not sure whether I want to tackle the PIE issue today... On Aug 23, 2013, at 21:47 , Dave Taht dave.t...@gmail.com wrote: quick note: running this script requires that you ifconfig ifb0 up at some point. In my case on cerowrt you took care of that already... On Fri, Aug 23, 2013 at 12:38 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, On Aug 23, 2013, at 07:13 , Dave Taht dave.t...@gmail.com wrote: On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller moell...@gmx.de wrote: Hi List, hi Jesper, So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustments to see whether the recent changes resurrected this feature. Unfortunately the htb_private link layer adjustments still is broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same as without link layer adjustments). On the bright side the tc_stab method still works as well as before (ping RTT around 40ms). I would like to humbly propose to use the tc stab method in cerowrt to perform ATM link layer adjustments as default. To repeat myself, simply telling the kernel a lie about the packet size seems more robust than fudging HTB's rate tables. Especially since the kernel already fudges the packet size to account for the ethernet header and then some, so this path should receive more scrutiny by virtue of having more users? It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC=echo tc and paste here. So I went for TC=logger tc and used log read to harvest as I could not find the echo output, but I guess that should not matter. So here is the result (slightly edited to get rid of the log timestamps and log level): tc qdisc del dev ge00 root tc qdisc add dev ge00 root handle 1: htb default 12 tc class add dev ge00 parent 1: classid 1:1 htb quantum 1500 rate 2430kbit ceil 2430kbit mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:10 htb quantum 1500 rate 2430kbit ceil 2430kbit prio 0 mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:11 htb quantum 1500 rate 128kbit ceil 810kbit prio 1 mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:12 htb quantum 1500 rate 405kbit ceil 2366kbit prio 2 mpu 0 linklayer adsl overhead 40 mtu 2047 tc class add dev ge00 parent 1:1 classid 1:13 htb quantum 1500 rate 405kbit ceil 2366kbit prio 3 mpu 0 linklayer adsl overhead 40 mtu 2047 tc qdisc add dev ge00 parent 1:11 handle 110: fq_codel limit 600 noecn quantum 300 tc qdisc add dev ge00 parent 1:12 handle 120: fq_codel limit 600 noecn quantum 300 tc qdisc add dev ge00 parent 1:13 handle 130: fq_codel limit 600 noecn quantum 300 tc filter add dev ge00 parent 1:0 protocol all prio 999 u32 match ip protocol 0 0x00 flowid 1:12 tc filter add dev ge00 parent 1:0
Re: [Cerowrt-devel] some kernel updates
Hi Toke, I guess I should have been clearer in stating that you are author of the AQM scripts. On Aug 23, 2013, at 19:23 , Toke Høiland-Jørgensen t...@toke.dk wrote: Sebastian Moeller moell...@gmx.de writes: Well, partly the option for HTB was already in his script but under tested, I changed the script to add stab and to allow easier configuration of overhead, mow, mtu and tsize (just for stab) from the guy, but the code is Dave's. I attached the scripts. functions.sh gets the values from the configuration GUI. I extended the way the linklayer option strings are created, but basically it is the same method that dave used. And I do see the right overhead values appear in tc -d qdisc, so at least something is reaching HTB. Sorry, that I have no repository for easier access. The repository containing the cerowrt-specific packages is at https://github.com/dtaht/ceropackages-3.3 -- the AQM script specifically is here: https://github.com/dtaht/ceropackages-3.3/tree/master/net/aqm-scripts With the gui at: https://github.com/dtaht/ceropackages-3.3/tree/master/luci/luci-app-aqm This is quite helpful, only Jesper would need access to my modified scripts which are not in the repository (yet) Best Sebastian -Toke ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] some kernel updates
Hi Dave, On Aug 23, 2013, at 22:29 , Dave Taht dave.t...@gmail.com wrote: On Fri, Aug 23, 2013 at 12:56 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, I guess I found the culprit: once I added $ADSLL to the ingress() in simple.qos: I had that in there originally. I ripped it out because it seemed to help with ADSL at the time - as I was unaware the extent that the whole subsystem was busted! Ah, and I had added my stab based version to both ingress() and egress() assuming that both links need to be kept under control. So with fixed htb link layer adjustment (LLA) it only worked on the uplink and in retrospect if I look at my initial test data I actually see one of the hallmarks of a working LLA for the upstream. (The upstream good-put was reduced compared to the no LLA test, caused by LLA making the actually sent packets larger so fewer packets fit through the shaped link). But since I was not expecting only half a working system I overlooked that in the data. But looking at the latency of the ping RTT probes it becomes quite clear that only doing link layer adjustments on the uplink is even worse than not doing it all (because the latency is still almost as bad as without LLA but the up-link bandwidth is reduced). I like to think of the process we've just gone through as wow, we just fixed the uk, and a few other countries. :) Feels kind of good, doesn't it? (Too bad the pay sucks.) Oh, I can not complain about pay, I have a day job in totally different field, so this is more of a hobby for me :) I mean, jeeze, chopping another 30+ms off the latency of that many systems should get medals from economists worldwide monitoring productivity. Does anyone have a date/kernel version on when linklayer overhead compensation stopped working? There was a bug even prior to 3.8 that looked bad. (and RED was busted for 3 years). Another step would be trying to improve openwrt's native qos system somewhat in the DSL case. They don't use this subsystem (probably because it didn't work), and it's also broke on ipv6. (They use conn track) Oh, in the bql-40 time frame I hacked the stab based LLA into their generate.sh and it worked quite well, even though at time my measurements were quite crude. SInce their qos scripts are HFSC based the HTB private implementation is not going to do them any good. Luckily now that does not seem to matter as both methods now perform identically as they should. (Well, now Jespers last changes are nicer than the old table lookup, but it should be relatively say to implant the same for stab, heck once I got my linux machine up I might take this as my first attempt at making local changes to the kernel :) ). So adding it to openwrt proper should be a piece of cake. Do you know by any chance who would be the best person to contact for that, ? At some point I'd like to have a mechanism for saner diffserv classification on egress, and to clamp ingress values to egress ones. There is a ton of work going on on finding sane codepoints on webrtc in the ietf…. ingress() { CEIL=$DOWNLINK PRIO_RATE=`expr $CEIL / 3` # Ceiling for prioirty BE_RATE=`expr $CEIL / 6` # Min for best effort BK_RATE=`expr $CEIL / 6` # Min for background BE_CEIL=`expr $CEIL - 64` # A little slop at the top LQ=quantum `get_mtu $IFACE` $TC qdisc del dev $IFACE handle : ingress 2 /dev/null $TC qdisc add dev $IFACE handle : ingress $TC qdisc del dev $DEV root 2 /dev/null $TC qdisc add dev $DEV root handle 1: ${STABSTRING} htb default 12 $TC class add dev $DEV parent 1: classid 1:1 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit $ADSLL $TC class add dev $DEV parent 1:1 classid 1:10 htb $LQ rate ${CEIL}kbit ceil ${CEIL}kbit prio 0 $ADSLL $TC class add dev $DEV parent 1:1 classid 1:11 htb $LQ rate 32kbit ceil ${PRIO_RATE}kbit prio 1 $ADSLL $TC class add dev $DEV parent 1:1 classid 1:12 htb $LQ rate ${BE_RATE}kbit ceil ${BE_CEIL}kbit prio 2 $ADSLL $TC class add dev $DEV parent 1:1 classid 1:13 htb $LQ rate ${BK_RATE}kbit ceil ${BE_CEIL}kbit prio 3 $ADSLL # I'd prefer to use a pre-nat filter but that causes permutation... $TC qdisc add dev $DEV parent 1:11 handle 110: $QDISC limit 1000 $ECN `get_quantum 500` `get_flows ${PRIO_RATE}` $TC qdisc add dev $DEV parent 1:12 handle 120: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BE_RATE}` $TC qdisc add dev $DEV parent 1:13 handle 130: $QDISC limit 1000 $ECN `get_quantum 1500` `get_flows ${BK_RATE}` diffserv $DEV ifconfig $DEV up # redirect all IP packets arriving in $IFACE to ifb0 $TC filter add dev $IFACE parent : protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action mirred egress redirect dev $DEV } I get basically the same RRUL ping RTTs for htb_private as for tc_stab. So Jesper was right the patch seems to fix the issue. I guess I should send out my current
Re: [Cerowrt-devel] some kernel updates
Hi Dave, so I git around to do the PIE tests... On Aug 23, 2013, at 07:13 , Dave Taht dave.t...@gmail.com wrote: On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller moell...@gmx.de wrote: Hi List, hi Jesper, So I tested 3.10.9-1 to assess the status of the HTB atm link layer adjustments to see whether the recent changes resurrected this feature. Unfortunately the htb_private link layer adjustments still is broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, same as without link layer adjustments). On the bright side the tc_stab method still works as well as before (ping RTT around 40ms). I would like to humbly propose to use the tc stab method in cerowrt to perform ATM link layer adjustments as default. To repeat myself, simply telling the kernel a lie about the packet size seems more robust than fudging HTB's rate tables. Especially since the kernel already fudges the packet size to account for the ethernet header and then some, so this path should receive more scrutiny by virtue of having more users? It's my hope that the atm code works but is misconfigured. You can output the tc commands by overriding the TC variable with TC=echo tc and paste here. Now, I have been testing this using Dave's most recent cerowrt alpha version with a 3.10.9 kernel on mips hardware, I think this kernel should contain all htb fixes including commit 8a8e3d84b17 (net_sched: restore linklayer atm handling) but am not fully sure. It does. `@Dave is there an easy way to find which patches you applied to the kernels of the cerowrt (testing-)releases? Normally I DO commit stuff that is in testing, but my big push this time around was to get everything important into mainline 3.10, as it will be the stable release for a good long time. So I am still mostly working the x86 side at the moment. I WAS kind of hoping that everything I just landed would make it up to 3.10. But for your perusal: http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/ has most of the kernel patches I used in it. 3.10.9-2 has the ipv6subtrees patch ripped out due to another weird bug I'm looking at. (It also has support for ipv6 nat thx to the ever prolific stephen walker heeding the call for patches...). 100% totally untested, I have this weird bug to figure out how to fix next: http://lists.alioth.debian.org/pipermail/babel-users/2013-August/001419.html I fear it's a comparison gone south, maybe in bradley's optimizations for not kernel trapping, don't know. 3.10.9-2 also disables dnsmasq's dhcpv6 in favor of 6relayd. I HATE losing the close naming integration, but, had to try this If you guys want me to start committing and pushing patches again, I'll do it, but most of that stuff will end up in 3.10.10, I think, in a couple days. The rest might make 3.12. Pie has to survive scrutiny on the netdev list in particular. While I have you r attention :) I also tested 3.10.9-1's pie and it is way better than 3.10.6-1's (RRUL ping RTTs around 110 ms instead of 3000ms) but still worse than fq_codel (ping RTTs around 40ms with proper atm link layer adjustments). This is with simple.qos I imagine? Simplest should do better than that with pie. Judging from how its estimator works I think it will do badly with multiple queues. But testing will tell... But, yea, this pie is actually usable, and the previous wasn't. Thank you for looking at it! It is different from cisco's last pie drop in that it can do ecn, does local congestion notification, has a better use of net_random, it's mostly KernelStyle, and I forget what else. There is still a major rounding error in the code, and I'd like cisco to fix the api so it uses identical syntax to codel. Right now you specify target 8 to get target 7, and the ms is implied. target 5 becomes target 3. This is as confusing as it is funny…. The default target is a whopping 20 (rounded to 19), which is in part where your 70+ms of extra delay came from. Multiple parties have the delusion that 20ms is good enough. It certainly is better than nothing, if the hardware does not allow codel…, but it is not like changing target has that big an effect on RRUL ping RTT: AQM nominal target[ms] estimated ping RTT[ms] down avg good-put[Mbits/s] up avg good-put[Mbits/s] pie 20 110 2.8 0.4 pie 8 100 2.7 0.41 pie 5 90 2.7 0.4 so the target does
Re: [Cerowrt-devel] some kernel updates
-c 100 -s 16 your.best.host.ip). Also test whether the RTTs are in the same ballpark when you reduce the ping period to 0.01 (you might have to increase the period until the RTTs are close to the standard 1 ping per second case). I can then run this through my matlab code to detect the actual overhead. (I am happy to share the code as well, if you have matlab available; it might even run under octave but I have not tested that since the last major changes). Whatever byte value is used for tc-stab makes no change. I assume you talk about the overhead? Missing link layer adjustment will eat between 50% and 10% of your link bandwidth, while missing overhead values will be more benign. The only advise I can give is to pick the overhead that actually describes your link. I am willing to help you figure this out. I have applied the ingress modification to simple.qos, keeping the original version., and tested both. For which cerowrt version? It is only expected to do something for 3.10.9-1 and upwards, before that the HTB lionklayer adjustment did NOT work. I have changed the Powerline adaptors I use to ones with known smaller buffers, though this is unlikely to be a ate-limiting step. I have changed the 2Wire gateway, known to be heavily buffered, with a bridged Huawei HG612, with a Broadcom 6368 SoC. This device has a permanently on telnet interface, with a simple password, which cannot be changed other than by firmware recompilation… Telnet, however, allows txqueuelen to be reduced from 1000 to 0. None of these changes affect the problematic uplink delay. So how did you measure the uplink delay? The RRUL plots you sent me show an increase in ping RTT from around 50ms to 80ms with tc_stab and fq_codel on simplest.qos, how does that reconcile with 650ms uplink delay, netalyzr? On 24 Aug 2013, at 21:51, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, On Aug 23, 2013, at 22:29 , Dave Taht dave.t...@gmail.com wrote: On Fri, Aug 23, 2013 at 12:56 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, I guess I found the culprit: once I added $ADSLL to the ingress() in simple.qos: I had that in there originally. I ripped it out because it seemed to help with ADSL at the time - as I was unaware the extent that the whole subsystem was busted! Ah, and I had added my stab based version to both ingress() and egress() assuming that both links need to be kept under control. So with fixed htb link layer adjustment (LLA) it only worked on the uplink and in retrospect if I look at my initial test data I actually see one of the hallmarks of a working LLA for the upstream. (The upstream good-put was reduced compared to the no LLA test, caused by LLA making the actually sent packets larger so fewer packets fit through the shaped link). But since I was not expecting only half a working system I overlooked that in the data. But looking at the latency of the ping RTT probes it becomes quite clear that only doing link layer adjustments on the uplink is even worse than not doing it all (because the latency is still almost as bad as without LLA but the up-link bandwidth is reduced). I like to think of the process we've just gone through as wow, we just fixed the uk, and a few other countries. :) Feels kind of good, doesn't it? (Too bad the pay sucks.) Oh, I can not complain about pay, I have a day job in totally different field, so this is more of a hobby for me :) I mean, jeeze, chopping another 30+ms off the latency of that many systems should get medals from economists worldwide monitoring productivity. Does anyone have a date/kernel version on when linklayer overhead compensation stopped working? There was a bug even prior to 3.8 that looked bad. (and RED was busted for 3 years). Another step would be trying to improve openwrt's native qos system somewhat in the DSL case. They don't use this subsystem (probably because it didn't work), and it's also broke on ipv6. (They use conn track) Oh, in the bql-40 time frame I hacked the stab based LLA into their generate.sh and it worked quite well, even though at time my measurements were quite crude. SInce their qos scripts are HFSC based the HTB private implementation is not going to do them any good. Luckily now that does not seem to matter as both methods now perform identically as they should. (Well, now Jespers last changes are nicer than the old table lookup, but it should be relatively say to implant the same for stab, heck once I got my linux machine up I might take this as my first attempt at making local changes to the kernel :) ). So adding it to openwrt proper should be a piece of cake. Do you know by any chance who would be the best person to contact for that, ? At some point I'd like to have a mechanism for saner diffserv classification on egress
Re: [Cerowrt-devel] some kernel updates
Hi Fred, On Aug 25, 2013, at 16:26 , Fred Stratton fredstrat...@imap.cc wrote: Thank you. This is an initial response. Am using 3.10.2-1 currently, with the standard AQM interface. This does not have the pull down menu of your interface, which is why I ask if both are active. I have seen your follow-up mail that you actually used 3.10.9-2. I think that has the first cut of the script modifications that still allow to select both. Since I have not tested it any other way I would recommend to enable just one of them at the same time. Since the implementation of both is somewhat orthogonal and htb_private actually works in 3.10.9, best case you might actually get the link layer adjustments (LLA) and the overhead applied twice, wasting bandwidth. So please either use the last set of modified files I send around or wait for Dave to include them in ceropackages... On 25 Aug 2013, at 14:59, Sebastian Moeller moell...@gmx.de wrote: Hi Fred, On Aug 25, 2013, at 12:17 , Fred Stratton fredstrat...@imap.cc wrote: On 25 Aug 2013, at 10:21, Fred Stratton fredstrat...@imap.cc wrote: As the person with the most flaky ADSL link, I point out that None of these recent, welcome, changes, are having any effect here, with an uplink sped of circa 950 kbits/s. Okay, how flaky is you link? What rate of Errors do you have while testing? I am especially interested in CRC errors and ES SES and HEC, just to get an idea how flaky the line is... The reason I mention this is that it is still impossible to watch iPlayer Flash streaming video and download at the same time, The iPlayer stream fails. The point of the exercise was to achieve this. The uplink delay is consistently around 650ms, which appears to be too high for effective streaming. In addition, the uplink stream has multiple breaks, presumably outages, if the uplink rate is capped at, say, 700 kbits/s. Well, watching video is going to stress your downlink so the uplink should not saturate by the ACKs and the concurrent downloads also do not stress your uplink except for the ACKs, so this points to downlink errors as far as I can tell from the data you have given. If the up link has repeated outages however, your problems might be unfixable because these, if long enough, will cause lost ACKs and will probably trigger retransmission, independent of whether the link layer adjustments work or not. (You could test this by shaping you up and downlink to = 50% of the link rates and disable all link layer adjustments, 50% is larger than the ATM worst case so should have you covered. Well unless you del link has an excessive number of tones reserved for forward error correction (FEC)). Uptime 100655 downstream 12162 kbits/s CRC errors 10154 FEC Errors 464 hEC Errors 758 upstream 1122 kbits/s no errors in period. Ah, I think you told me in the past that Target snr upped to 12 deciBel. Line can sustain 10 megabits/s with repeated loss of sync.at lower snr. so sync at 12162 might be too aggressive, no? But the point is that as I understand iPlayer works fine without competing download traffic? To my eye the error numbers look small enough to not be concerned about. Do you know how long the error correction period is? Could you perform the following test by any chance: state iPlayer and yor typical downloads and then have a look at http://gw.home.lan:81und the following tab chain Status - Realtime Graphs - Traffic - Realtime Traffic. If during your test the Outbound rate stays well below you shaped limit and you still encounter the stream failure I would say it is save to ignore the link layer adjustments as cause of your issues. Am happy reducing rate to fifty per cent, but the uplink appears to have difficulty operating below circa 500 kbits/s. This should not be so. I shall try a fourth time. That sounds weird, if you shape to below 500 upload stops working or just gets choppier? Looking at your sync data 561 would fit the ~50% and above 500 requirements. YouTube has no problems. I remain unclear whether the use of tc-stab and htb are mutually exclusive options, using the present stock interface. Well, depending on the version of the cerowrt you use, 3.10.9-1 I believe lacks a functional HTB link layer adjustment mechanism, so you should select tc_stab. My most recent modifications to Toke and Dave's AQM package does only allow you to select one or the other. In any case selecting BOTH is not a reasonable thing to do, because best case it will only apply overhead twice, worst case it would also do the (link layer adjustments) LLA twice See initial comments. The current ISP connection is IPoA LLC. Correction - Bridged LLC. Well, I think you should try to figure out your overhead empirically and check the encapsulation. I would recommend you run
Re: [Cerowrt-devel] some kernel updates
Hi Fred, since you have a very good test with iPlayer, you can simply repeat your experiments to figure out where shaping becomes to unstable for iPlayer. It would be interesting to see whether RRUL (or other netsurf-wrapper tests) show qualitative differences around the same numerical shaping values. On Aug 25, 2013, at 21:31 , Fred Stratton fredstrat...@imap.cc wrote: Re-reading your comment, I have reset the upload rate higher to 900 kbits/s. On 25 Aug 2013, at 20:08, Fred Stratton fredstrat...@imap.cc wrote: That is very helpful. With a sync rate of about 12000 kbits/s, and a download rate of about 10900 kbits/s. I have set the download rate to 5000 kbits/s. For upload similarly 1200/970/500, all kbits/s. I can now mostly watch video in iPlayer and download at circa 300 - 400 kbits/s simultaneously, using htb, with tc-stab disabled. QED So slowly increase both shaped rates until iPlayer becomes unhappy to better define the threshold? Best Sebastian On 25 Aug 2013, at 19:41, Dave Taht dave.t...@gmail.com wrote: So it sounds like you need a lower setting for the download than what you are using? It's not the upload that is your problem. Netanalyzer sends one packet stream and thus measures 1 queue only. fq_codel will happily give it one big queue for a while, while still interleaving other flows's packets into the stream at every opportunity. as for parsing rrul I generally draw a line with my hand and multiply by 4, then fudge in the numbers for the reverse ack and measurement streams. You are saying that you judge the result solely by eye. presumably. As written it was targetted at 4Mbit and up which is why the samples are discontinuous in your much lower bandwidth situation. Aha. Problem solved. I do agree that rrul could use a simpler implementation, perhaps one that tested two download streams only, and provided an estimate as to the actual bandwidth usage, and scale below 4Mbit better. On Sun, Aug 25, 2013 at 11:30 AM, Fred Stratton fredstrat...@imap.cc wrote: On 25 Aug 2013, at 18:53, Sebastian Moeller moell...@gmx.de wrote: Hi Fred, On Aug 25, 2013, at 16:26 , Fred Stratton fredstrat...@imap.cc wrote: Thank you. This is an initial response. Am using 3.10.2-1 currently, with the standard AQM interface. This does not have the pull down menu of your interface, which is why I ask if both are active. I have seen your follow-up mail that you actually used 3.10.9-2. I think that has the first cut of the script modifications that still allow to select both. Since I have not tested it any other way I would recommend to enable just one of them at the same time. Since the implementation of both is somewhat orthogonal and htb_private actually works in 3.10.9, best case you might actually get the link layer adjustments (LLA) and the overhead applied twice, wasting bandwidth. So please either use the last set of modified files I send around or wait for Dave to include them in ceropackages… I have retained the unmodified script. I shall return to that. On 25 Aug 2013, at 14:59, Sebastian Moeller moell...@gmx.de wrote: Hi Fred, On Aug 25, 2013, at 12:17 , Fred Stratton fredstrat...@imap.cc wrote: On 25 Aug 2013, at 10:21, Fred Stratton fredstrat...@imap.cc wrote: As the person with the most flaky ADSL link, I point out that None of these recent, welcome, changes, are having any effect here, with an uplink sped of circa 950 kbits/s. Okay, how flaky is you link? What rate of Errors do you have while testing? I am especially interested in CRC errors and ES SES and HEC, just to get an idea how flaky the line is... The reason I mention this is that it is still impossible to watch iPlayer Flash streaming video and download at the same time, The iPlayer stream fails. The point of the exercise was to achieve this. The uplink delay is consistently around 650ms, which appears to be too high for effective streaming. In addition, the uplink stream has multiple breaks, presumably outages, if the uplink rate is capped at, say, 700 kbits/s. Well, watching video is going to stress your downlink so the uplink should not saturate by the ACKs and the concurrent downloads also do not stress your uplink except for the ACKs, so this points to downlink errors as far as I can tell from the data you have given. If the up link has repeated outages however, your problems might be unfixable because these, if long enough, will cause lost ACKs and will probably trigger retransmission, independent of whether the link layer adjustments work or not. (You could test this by shaping you up and downlink to = 50% of the link rates and disable all link layer adjustments, 50% is larger than the ATM worst case so should have you covered. Well unless you del link has
Re: [Cerowrt-devel] 3.10.10-1 development build released
Hi Dave, so I ant for the shiny 3.10.11-2, worked great (using Fred's mtd -r method, thanks Fred) On Sep 10, 2013, at 02:28 , Dave Taht dave.t...@gmail.com wrote: + readlink fix (hopefully fixes sysupgrade) I guess this will be testable at the next version update... + usual merge with openwrt head (tons of ath9k changes) Oh, as if you knew that I had a number of: ath: phy1: Failed to stop TX DMA, queues= lines in dmesg, quick testing did not allow me to get those with 3.10.11-2, but I will need to test further... + dnsmasq 2.67test10 + ipv6subtrees back in + the final htb atm patches So I tested tc_stab and htb_private from the AQM tab, both work equally well. + eliminated maxpacket check in codel - did not fold in edumazet's new fq code - 100% totally untested. May a braver soul than I give it a shot. I won't be near a cero box til thursday, otherwise. http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.10-1/ -I'm not sure if I got the last of the aqm gui patches in there or not… I think so, at least it works :) ... Anyway... I had hopes to get a stable release out in august. I AM very happy about the major stuff that got fixed, instead... but... Since we didn't... I now have a ton of other matters piled up. Not least of which is a pending trip to england and the eu. Have a great trip. So for the next month I don't see how I'm going to be able to put more than a day a week into cerowrt. Tops. So I have tagged up this release and pushed all the baked portions of the sources to github. Thanks a lot. I'm still a little dubious of the ipv6 subtrees bit…. RRUL-Testing against Toke's server shows great results, local rrul testing between osx 10.8.4 machine on sw10 to a net server running on an linux x86_64 3.10.1 machine on se00 is quite bad though (I assume I now run into the wifi issues on the macbook or the router as this is the first time I test against a machine with considerable larger bandwidth than the wlan). The rrul plots still are quite interesting, as I could nicely see anticoorelation between up and down bandwidth (shared medium) If I get round to it I would like to re-enable fq_codel on all interfaces (now it is just running at ge00/ifb0) to see whether this can ameliorate the issue at least a bit. Note, I enabled the log for /usr/sbin/deblaot (by editing/etc/hotplug.d/iface/00-debloat) and got the following: root@nacktmulle:~# cat /tmp/debloat.log fq_codel_ll fq_codel_ll fq_codel_ll fq_codel_ll root@nacktmulle:~# cat /tmp/debloat2.log Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] } Usage: tc [ OPTIONS ] OBJECT { COMMAND | help } tc [-force] -batch filename where OBJECT := { qdisc | class | filter | action | monitor } OPTIONS := { -s[tatistics] | -d[etails] |
Re: [Cerowrt-devel] 3.10.11-2 development build debloat bug
Hi Dave, On Sep 13, 2013, at 20:03 , Dave Taht dave.t...@gmail.com wrote: I have pushed out a 3.10.11-3 that has the encapsulation fixes for fred, and the fix for debloat. (It is otherwise untested, as is seemingly growing more usual for me) Thanks, a lot. I will try to test it soon, I hope, and report back :) . Isolating wifi problems is very hard. The first step is finding and eliminating other sources of interference on the channels you are on or migrating to a different channel. There are multiple halfway decent scanning tools, a couple referenced here: https://plus.google.com/u/0/107942175615993706558/posts/PHPR7uL89Sq Yeah, I had checked that earlier and had one other person on 36, so I switched o 44 (leaving 42 alone to avoid running into the other wlan's potential upper 20MHz band (36 is the lowest in Germany)), and I see no other SSID on 44 or 48. And that neither close to the router or close to the laptop's usual position so I think I can rule out the hidden node problem. There might be non-wifi interference sources though. But the funny thing is the same laptop works well to access the internet (probably due to he low bandwidth involved), so RRUL against a remote target always looks good, only against local targets it looks rather odd… It turns out this is from the MacBooks wifi sending without any inhibition, simply steamrolling the router. (I confirmed that hypothesis by shaping the MacBooks uplink… I guess wireless is quite weird...) On the 5ghz spectrum you usually have more channels available, so nothing as fancy and graphical is needed (IMHO, but I'm a command line guy) so a simple iwlist gw11 scanning will show the ones in use, and then you can often find a clear channel from the approved list. Ah looks like a good method when running linux, I will remember that, currently my linux machine has no wifi adapter but is hooked up over traditional 1Gb ethernet to one of the routers switch ports I note that the 5ghz radio in cero is set to HT40+ - so being on channel 36 bleeds over onto 40. Some data indicates that competing with another AP on HT20 channel 40 (or some other competing set of channels) can be very bad. So you should try to find a HT40+ clear set of channels that are legal for your country, or go back to HT20 if you can't find a safe pair to use. http://en.wikipedia.org/wiki/List_of_WLAN_channels Thanks a lot that is good confirmation that the steps I undertook were not unreasonable ;). I guess it would be nice if wireless was not operating half duplex… ( http://sing.stanford.edu/fullduplex/ ) Best Sebastian On Fri, Sep 13, 2013 at 3:01 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, On Sep 12, 2013, at 06:18 , Dave Taht dave.t...@gmail.com wrote: Well, actually, I don't know when the syntax changed, but now the -b option needs a - for reading from standard input. Boy this file is getting crufty... cero2@snapon:~/src/ceropackages-3.3/net/debloat/files$ git diff debloat diff --git a/net/debloat/files/debloat b/net/debloat/files/debloat index e675008..d1cf939 100755 --- a/net/debloat/files/debloat +++ b/net/debloat/files/debloat @@ -29,7 +29,7 @@ params = { MDISC, BIGDISC, NORMDISC, BINS, MAX_HWQ_BY -- Useful defaults env = { [TC] = /sbin/tc, - [TCARG] = -b, + [TCARG] = -b -, [INSMOD] = /sbin/modprobe, [ETHTOOL] = /sbin/ethtool, [LSMOD] = /sbin/lsmod, (END) Thanks that fixed the non-ge00 interfaces. As to the abysmal performance with macosx over 5GHz wlan, that still is there, but I suspect the macbok to be the culprit here (plus the wlan connection is somewhere ion the edge between changing transmigrates so might be a moving target). Many thanks Sebastian On Wed, Sep 11, 2013 at 9:10 PM, Dave Taht dave.t...@gmail.com wrote: On Wed, Sep 11, 2013 at 1:36 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, so I ant for the shiny 3.10.11-2, worked great (using Fred's mtd -r method, thanks Fred) On Sep 10, 2013, at 02:28 , Dave Taht dave.t...@gmail.com wrote: + readlink fix (hopefully fixes sysupgrade) I guess this will be testable at the next version update... + usual merge with openwrt head (tons of ath9k changes) Oh, as if you knew that I had a number of: ath: phy1: Failed to stop TX DMA, queues= lines in dmesg, quick testing did not allow me to get those with 3.10.11-2, but I will need to test further... + dnsmasq 2.67test10 + ipv6subtrees back in + the final htb atm patches So I tested tc_stab and htb_private from the AQM tab, both work equally well. + eliminated maxpacket check in codel - did not fold in edumazet's new fq code - 100% totally untested. May
Re: [Cerowrt-devel] TSO sizing and FQ scheduler
Hi Maciej, On Nov 5, 2013, at 14:22 , Maciej Soltysiak mac...@soltysiak.com wrote: Hi list, 3.12 landed with TSO sizing and FQ scheduler. Is there significant benefit of trying to port these to Cero's 3.10 ? According to Eric Dumazet, these two help for flows terminating on the device in question, not for flows just passing through the device. So unless your cerowrt router offers lots of network services it most likely will not profit from these features… Also, IIRC, we disable TSO on cerowrt by default (though TSO sizing might mean that this decision could be revisited). I'm assuming we're not going head on to 3.12 for kernel base for cero as of yet? I would hope for 3.10 to be the kernel for the forceable future, due to its promised 2 years? maintenance window. Best Sebastian Best regards, Maciej ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] CeroWrt 3.10.18-1 Field Report
Hi Richard, On Nov 12, 2013, at 06:47 , Richard E. Brown richb.hano...@gmail.com wrote: I used the sysupgrade process to upgrade my primary router from 3.7.5-2 firmware to 3.10.18-1. - I initially goofed, and installed the wrong build firmware (I installed the WNDR3800 image on a WNDR3700v2 router.) The symptoms were that the router worked, but not very well. Speedtest was gave miserable speeds; netalyzr didn’t work at all. (It said there were serious problems: see http://n3.netalyzr.icsi.berkeley.edu/summary/id=36ea240d-26536-45539c09-7334-456b-b81a ) I was able to download the proper system upgrade firmware, but it took forever. Don’t do it :-) - After installing the proper image (for WNDR3700v2), PPPoE didn’t immediately come up on my 7000/768kbps ADSL from Fairpoint. I had to go to the Edit page for the ge00 interface, and click Apply (without making any changes to the saved settings). This caused the link to come right up. - The henet 6in4 tunnel did not work. The router received the expected global IPv6 address, and handed an IPv6 global address to my notebook, but neither the router nor the notebook were able to ping ipv6.google.com. I removed that interface from the configs using the GUI. - Had to enable and set AQM parameters, since they’re saved differently from the QoS settings in the 3.7.5-2 firmware. Set parameters to ~ 90% of link speeds Just curious, did you specify overhead and encapsulation? - The kernel.log shows lots of the stack traces below: 2-5 per second on a long-term basis. These look quite weird, the error is a slow patch warning from hfsc_schedule_watchdog . But, hfsc is the queuing discipline used by stock OpenWrt, cerowrt , so far, has only used HTB (last I checked was cerowrt 3.10.11-3). So my guess is that you were running the default QOS system instead (or worse in addition) to cerowrt's. It would be great to see the output of: tc -d qdisc ; tc -s class show dev ifb0 ; tc -s class show dev ge00 to check what is up with the AQM system... Did you by any change use the QOS tab in 3.7.5-2 instead of running AQM or simple_qos.sh from rc.local/ifup? If so did you direct sys upgrade to keep the old configuration files? - This may be related to the netalyzr test - after netalyzr completed a run that complained that nothing worked (see above), these errors stopped for a while. - However, using NetalyzrCLI.jar, I got the following results where most everything worked: http://n2.netalyzr.icsi.berkeley.edu/summary/id=43ca208a-24217-0cc69e65-e649-4be6-b2c5 - The PPPoE running on ge00 link seemed to bounce every 10-15 minutes, and I often had to bring it up manually. - Reverting to 3.7.5-2. [ 992.386718] [ cut here ] [ 992.390625] WARNING: at net/sched/sch_hfsc.c:1428 hfsc_dequeue+0x258/0x49c [sch_hfsc]() [ 992.398437] Modules linked in: ifb ath9k iptable_nat ath9k_common pppoe nf_nat_ipv4 nf_conntrack_ipv4 mac80211 cfg80211 ath9k_hw xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_policy xt_pkttype xt_physdev xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_hashlimit xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY ts_kmp ts_fsm ts_bm pptp pppox ppp_async nf_nat_irc nf_nat_ftp nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp libcrc32c iptable_raw iptable_mangle iptable_filter ipt_ah ipt_REJECT ipt_MASQUERADE ipt_ECN ip_tables crc_ccitt compat ath sch_teql sch_tbf sch_sfq sch_red sch_qfq sch_prio sch_pie sch_ns2_codel sch_nfq_codel sch_netem sch_htb sch_gred sch_efq_codel sch_dsmark sch_codel em_text em_nbyte em_meta em_cmp cls_basic act_police act_ipt act_connmark act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress xt_set ip_set_list_set ip_set_hash_netport ip_set_hash_netiface ip_set_hash_net ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_NPT ip6t_MASQUERADE ip6table_nat nf_nat_ipv6 nf_nat ip6t_REJECT ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 pppoatm ppp_generic slhc ip_gre gre sit ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 ip6_tunnel tunnel6 tunnel4 ip_tunnel tun tcp_ledbat af_key xfrm_user xfrm_ipcomp xfrm_algo vfat fat autofs4 br2684 atm nls_utf8 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nls_iso8859_1 nls_cp437 ipv6 chainiv eseqiv crypto_wq sha1_generic krng rng md5 hmac des_generic deflate zlib_inflate zlib_deflate cbc authenc aead arc4 crypto_blkcipher usb_storage input_polldev leds_gpio ohci_hcd
Re: [Cerowrt-devel] CeroWrt 3.10.18-1 Field Report
Hi All, it turns out that not being able/willing to read can make you do busy work. It seems I forgot to add firmware as device to my mtd invocation… I guess I would never have tried the GUI if I had gotten the mtd command right the first time :) best Sebastian On Nov 13, 2013, at 00:06 , Sebastian Moeller sebastian.moel...@gmail.com wrote: Hi Richard, hi Dave, hi list, so I could not resist the lure of 3.10.18-1 and upgraded my 3.10.11-2.; which turned out to be slightly more involved than I had expected. 1) SYSUPGRADE root@nacktmulle:/# sysupgrade -d 60 -n /home/persistent/cerowrts/3.10.18-1/openwrt-ar71xx-generic-wndr3700v2-squashfs-sysupgrade.bin killall: watchdog: no process killed Sending TERM to remaining processes ... netifd dynamic_dns_upd sleep minissdpd lighttpd crond lighttpd pimd snmpd xinetd dbus-daemon dnsmasq zebra babeld watchquagga smbd nmbd avahi-daemon ahcpd rngd ntpd ubusd askfirst Sending KILL to remaining processes ... ubusd askfirst Switching to ramdisk... mount: /proc is not a block device umount: /tmp/root: not mounted Failed to switch over to ramfs. Please reboot. Rebooting still returned me back to 3.10.11-2 2) MTD root@nacktmulle:/tmp# mtd -r write /tmp/firmware.img Usage: mtd [options ...] command [arguments ...] device[:device...] The device is in the format of mtdX (eg: mtd4) or its label. mtd recognizes these commands: unlock unlock the device refresh refresh mtd partition erase erase all data on device write imagefile|- write imagefile (use - for stdin) to device jffs2write file append file to the jffs2 partition on the device fixtrx fix the checksum in a trx header on first boot Following options are available: -q quiet mode (once: no [w] on writing, twice: no status messages) -n write without first erasing the blocks -r reboot after successful command -f force write without trx checks -e device erase device before executing the command -d name directory for jffs2write, defaults to tmp -j name integrate file into jffs2 data when writing an image -p write beginning at partition offset -o offset offset of the image header in the partition(for fixtrx) -F part[:size[:entrypoint]][,part...] alter the fis partition table to create new partitions replacing the partitions provided as argument to the write command (only valid together with the write command) Example: To write linux.trx to mtd4 labeled as linux and reboot afterwards mtd -r write linux.trx linux Still no upgrade performed, but at least it is clearer why, my command was incomplete… BUT I seem to recall that it was exactly this command that actually allowed me to install 3.10.11-2 in the first place, weird. 3) LUCI (http://gw.home.lan:81/cgi-bin/luci/;stok=19113c7f25269daca52ed92ef4d4b802/admin/system/flashops/) I disabled the Keep Settings checkbox, uploaded the image (after making sure /tmp had enough space) followed the flash image… link e voila, 3.10.18-1 up and running in no time I have no idea what the GUI actually does differently from calling sysupgrade on the command line. So the upshot is Juergen Botz is right and the GUI seems to work, at least if one does not keep the old configuration. (And for that problem I followed caves advice and just saved /overlay before upgrading, so I could see the old configuration files and compare.) Since I am using cerowrt as secondary router I have no input on the PPPoE issues…. best Sebastian On Nov 12, 2013, at 22:17 , Sebastian Moeller moell...@gmx.de wrote: Hi Richard, On Nov 12, 2013, at 18:26 , Richard E. Brown richb.hano...@gmail.com wrote: On Nov 12, 2013, at 4:11 AM, Sebastian Moeller moell...@gmx.de wrote: - Had to enable and set AQM parameters, since they’re saved differently from the QoS settings in the 3.7.5-2 firmware. Set parameters to ~ 90% of link speeds Just curious, did you specify overhead and encapsulation? No, I simply used the defaults on that page. Ah, you might want to try setting the link layer adaptation mechanism to tc_stab, the link layer to adls or atm and the overhead to 40 (or look at http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ to figure out the fixed per packet overhead of your link). This should allow you to specify a larger percentage of your link rate as shaped rate… But see the attached screenshot of my AQM tab PastedGraphic-1.tiff
Re: [Cerowrt-devel] if you can't tell
Dave Taht dave.t...@gmail.com wrote: I am simply horribly, horribly behind on email. I just got back from canada day before yesterday and am trying to dig out. IETF was both productive and non-productive in many ways... was the consensus 3.10.18 was busted worse than 3.10.17? .19 is out but I'll sink more time into it overall to solve various bugs (sysupgrade via gui works? but not at the command line?) once and for all, rather than keep integrating -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Hi Dave, I can not offer a useful opinion, since I updated from 3.10.11-2. But for me it seems to work, but I have not yet done thorough testing... Yes sysupgrade from the command line gets stuck while update from the GUI works. Once I find the I want to figure out what is different between these two, but no ETA for that from me. What about just documenting to either use mtd or the GUI and consider this sufficiently worked around? Best Regards Sebastian -- ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] if you can't tell
Hi Dave, On Nov 14, 2013, at 20:57 , Dave Taht dave.t...@gmail.com wrote: On Thu, Nov 14, 2013 at 11:36 AM, Richard E. Brown richb.hano...@gmail.com wrote: Dave, The flurry of comments on 3.10.18 that I posted were partially caused because I used sysupgrade (the first time ever! Previously, I had used tftp.) and I think some problems were caused because I kept the configuration. I’m looking for a bit of time to re-flash, this time not keeping the configs and running my config.sh script to set things up. Rich PS I don’t see a 3.10.19 posted on http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/ yet. You guys are so eager to subject yourselves to new releases! take a weekend off! It's lovely in california right now… ;) it happens to be cold wet and foggy in Germany, a really nice autumn as it should be in that part of the world, but I sure miss California (I always loved how the rain season made SoCal green again) I didn't see anything in 3.10.19 that is truly needed right now. there is some good work being done on fixing random number generation in the mainline, dnsmasq has an update, pie is 99.9% ready for mainline, but there's nothing I can do to push those faster... and I am ashamed to admit that the big reason why I haven't dug in to fix sysupgrade was because I haven't cleaned up the yurt in a while. Somewhere in it is buried the bus pirate http://dangerousprototypes.com/docs/Bus_Pirate that I need to get to the serial port to get to see what the heck is going wrong at the command line. So I'm going to shovel out and reorg the place while the weather is good and hopefully that will show up. I have a feature request for the aqm gui - in that many fields don't need to be exposed if the encapsulation is ethernet. I fear in making the dsl users happy we will confuse the others. Okay, unless more skilled hands take this over, I will have a go at this (most likely next week, as I expect visitors over the weekend). Interestingly are all options also valid for ethernet, so what about putting all those commands on a second tab that it easier to ignore? Anyway, I see what I can do... What other open questions do we have? Firewall rules good? minissd working? upnp working? I have not yet tested anything in particular, but then I see no problems either, so my (current) take is 3.11.18 is a keeper :) Best Regards enjoy the weekend Sebastian -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] CeroWrt 3.10.18-1 Field Report
Hi All, small update, I have a hunch that comment 3 of https://dev.openwrt.org/ticket/13958 might be relevant for us: additional 2cents: turns out i had mount-utils installed, whose /usr/bin/mount had different output than the busybox one. this broke the function rootfs_type() in /lib/upgrade/common.sh, causing the sysupgrade script to switch something unswitchable... indeed we have a binary in /usr/bin/mount and: export PATH=/usr/bin:/usr/sbin:/bin:/sbin in /etc/profile So, it might be enough to replace the following in /lib/upgrade/commons.sh rootfs_type() { mount | awk '($3 ~ /^\/$/) ($5 !~ /rootfs/) { print $5 }' } with rootfs_type() { busy box mount | awk '($3 ~ /^\/$/) ($5 !~ /rootfs/) { print $5 }' } to get sys upgrade to work again. But I have not tested that (and most likely will not be able to do so before next week); yet I wanted to document this potential fix for the greater public... Best Regards Sebastian On Nov 13, 2013, at 00:11 , Sebastian Moeller moell...@gmx.de wrote: Hi All, it turns out that not being able/willing to read can make you do busy work. It seems I forgot to add firmware as device to my mtd invocation… I guess I would never have tried the GUI if I had gotten the mtd command right the first time :) best Sebastian On Nov 13, 2013, at 00:06 , Sebastian Moeller sebastian.moel...@gmail.com wrote: Hi Richard, hi Dave, hi list, so I could not resist the lure of 3.10.18-1 and upgraded my 3.10.11-2.; which turned out to be slightly more involved than I had expected. 1) SYSUPGRADE root@nacktmulle:/# sysupgrade -d 60 -n /home/persistent/cerowrts/3.10.18-1/openwrt-ar71xx-generic-wndr3700v2-squashfs-sysupgrade.bin killall: watchdog: no process killed Sending TERM to remaining processes ... netifd dynamic_dns_upd sleep minissdpd lighttpd crond lighttpd pimd snmpd xinetd dbus-daemon dnsmasq zebra babeld watchquagga smbd nmbd avahi-daemon ahcpd rngd ntpd ubusd askfirst Sending KILL to remaining processes ... ubusd askfirst Switching to ramdisk... mount: /proc is not a block device umount: /tmp/root: not mounted Failed to switch over to ramfs. Please reboot. Rebooting still returned me back to 3.10.11-2 2) MTD root@nacktmulle:/tmp# mtd -r write /tmp/firmware.img Usage: mtd [options ...] command [arguments ...] device[:device...] The device is in the format of mtdX (eg: mtd4) or its label. mtd recognizes these commands: unlock unlock the device refresh refresh mtd partition erase erase all data on device write imagefile|- write imagefile (use - for stdin) to device jffs2write file append file to the jffs2 partition on the device fixtrx fix the checksum in a trx header on first boot Following options are available: -q quiet mode (once: no [w] on writing, twice: no status messages) -n write without first erasing the blocks -r reboot after successful command -f force write without trx checks -e device erase device before executing the command -d name directory for jffs2write, defaults to tmp -j name integrate file into jffs2 data when writing an image -p write beginning at partition offset -o offset offset of the image header in the partition(for fixtrx) -F part[:size[:entrypoint]][,part...] alter the fis partition table to create new partitions replacing the partitions provided as argument to the write command (only valid together with the write command) Example: To write linux.trx to mtd4 labeled as linux and reboot afterwards mtd -r write linux.trx linux Still no upgrade performed, but at least it is clearer why, my command was incomplete… BUT I seem to recall that it was exactly this command that actually allowed me to install 3.10.11-2 in the first place, weird. 3) LUCI (http://gw.home.lan:81/cgi-bin/luci/;stok=19113c7f25269daca52ed92ef4d4b802/admin/system/flashops/) I disabled the Keep Settings checkbox, uploaded the image (after making sure /tmp had enough space) followed the flash image… link e voila, 3.10.18-1 up and running in no time I have no idea what the GUI actually does differently from calling sysupgrade on the command line. So the upshot is Juergen Botz is right and the GUI seems to work, at least if one does not keep the old configuration. (And for that problem I followed caves advice and just saved /overlay before upgrading, so I could see the old configuration files and compare.) Since I am using cerowrt as secondary
Re: [Cerowrt-devel] if you can't tell
Hi Dave, On Nov 14, 2013, at 20:57 , Dave Taht dave.t...@gmail.com wrote: On Thu, Nov 14, 2013 at 11:36 AM, Richard E. Brown richb.hano...@gmail.com wrote: Dave, The flurry of comments on 3.10.18 that I posted were partially caused because I used sysupgrade (the first time ever! Previously, I had used tftp.) and I think some problems were caused because I kept the configuration. I’m looking for a bit of time to re-flash, this time not keeping the configs and running my config.sh script to set things up. Rich PS I don’t see a 3.10.19 posted on http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/ yet. You guys are so eager to subject yourselves to new releases! take a weekend off! It's lovely in california right now... I didn't see anything in 3.10.19 that is truly needed right now. there is some good work being done on fixing random number generation in the mainline, dnsmasq has an update, pie is 99.9% ready for mainline, but there's nothing I can do to push those faster... and I am ashamed to admit that the big reason why I haven't dug in to fix sysupgrade was because I haven't cleaned up the yurt in a while. Somewhere in it is buried the bus pirate So this is weird, but I noticed that we have a mount binary in /usr/bin that seems to be earlier in our path than busy box's /bin/mount. So I just went and replaced all mount invocations in /lib/upgrade/common.sh with busy box mount and that seems to have done the trick (it upgraded and rebooted into a pristine 3.10.18-1, I am not sure whether the firmware partition was really overwritten, but the overlay partition surely was wiped). It seems that the option handling of /usr/bin/mount chokes on the actual mount invocations in that script (I noticed that the move of /proc failed and from then on it was downhill). I am not sure whether my modification to common.sh is not too ugly to live, but I assume that the grown-ups ail find a proper way to fix this now :) And what I currently do not understand is why the GUI method worked since that is just calling the sys upgrade script from /... http://dangerousprototypes.com/docs/Bus_Pirate that I need to get to the serial port to get to see what the heck is going wrong at the command line. So I'm going to shovel out and reorg the place while the weather is good and hopefully that will show up. I have a feature request for the aqm gui - in that many fields don't need to be exposed if the encapsulation is ethernet. I fear in making the dsl users happy we will confuse the others. So I had a quick go at this one: Please put the attached file /usr/lib/lua/luci/model/cbi/aqm.lua aqm.lua Description: Binary data this should hide all confusing fields until htb_private or tc_stab are selected under the field named: Which linklayer adaptation mechanism to use; especially useful for ADSL/ATM links. So none will hide all the cruft. Since most of the options also can work with ethernet links (think PPPoE on a non-ATM carrier will still cause an 8 byte per packet overhead). What other open questions do we have? Firewall rules good? minissd working? upnp working? -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] PIE and ADSL2+
Hi Fred, On Nov 20, 2013, at 11:08 , Fred Stratton fredstrat...@imap.cc wrote: I have been using PIE instead of fq_codel for approximately 10 days. It works well. Intrigued by your report I went ahead and tested simple.qos with fq_codel and pie (cerowrt 3.10.18-1) with rrul against demo.tohojo.dk: /netperf-wrapper -l 300 -H demo.tohojo.dk rrul -p all_scaled -t my_silly_name Pie (with the default target of 20ms(?) shows around 120 ms ping delay (fq_codel shows 45ms) also the average downlink with fq_codel is roughly 10% higher than with pie. So at least in that test fq_codel seems better than pie. That said, compared to ping latencies up to 300ms (my primary router somehow restricts ;agencies to roughly 300ms) with no AQM, just rate shaping with HTB, pie still keeps the internet more useable. Should it? I think its designers wanted it to be a competent disc, so I guess it should :) Has PIE been optimized for ADSL? Best Regards Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt-3.10.21-1 development release
Hi Dave, On Dec 2, 2013, at 16:17 , Dave Taht dave.t...@gmail.com wrote: As best as I recall it was needed for ext4 and btrfs support on mounting external devices, but that was years ago. Thanks for the information; so I used this a starting point and I have the impression that now a days the mount-utils are needed for mount by label and mount by UUID. And if I understand correctly the new block-mount and ubox opkgs already allow mount-by-UUID without mount-utils. So even though I am currently mount by label, maybe mount-util can be relegated to an optional install in cerowrt, hoping that might solve the sysupgrade challenges. best Sebastian On Mon, Dec 2, 2013 at 2:44 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, On Dec 2, 2013, at 02:07 , Dave Taht dave.t...@gmail.com wrote: This is nothing more than a resync with openwrt and a bugfix for dnsmasq. It is completely untested. + fresh merge with openwrt ++ bunch of ath9k fixes + update to dnsmasq 2.68rc4 (fixes cname and a few other bugs) - haven't found time to address http://www.bufferbloat.net/issues/436 plan to update the machine involved to this version. hope to get more reports from the field. ? Would like to find someone with comcast ipv6 to try this on - the /sbin/mount bug explanation sounded plausible but haven't tried it will do so shortly The quickest test should be to deinstall mount-utils before running sys upgrade (as far as I know mount-utils is the source of the incompatible mount binary). The new cerowrt will automatically bring in its already installed mount-utils, so everything should work after the upgrade. I have not tested this yet, but I assume this is what I'll try the next time :) BTW, what is the reason we need mount-utils in the first place, or what is the busy box mount command missing? best regards sebastian - have several reports of a successful fragmentation? crash attack in openwrt in general, but no details. I'm taking a bunch of machines into the lab thursday and hope to work on the latter problem while putting several new machines/OSes through their paces... It seemes likely I will do another build between now and thursday. In short, not a lot of reason to try this release. Feel free to keep digesting your turkey. https://www.youtube.com/watch?v=W5_8U4j51lI New version of pie should get dropped next week. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
[Cerowrt-devel] sysupgrade failure solved
Dear list, for some time now we had the issue that the sysupgrade command did not work. So here is the output of my first attempt to sysupgrade 3.10.18-1 to 3.10.21-1: root@nacktmulle:~# sysupgrade -d 60 -n -v /home/persistent/cerowrts/3.10.21-1/3.10.21-1-sysupgrade.bin killall: watchdog: no process killed Sending TERM to remaining processes ... udhcpc lighttpd crond lighttpd snmpd xinetd dbus-daemon odhcp6c zebra babeld watchquagga avahi-daemon rngd ntpd pimd minissdpd dnsmasq sh ubusd askfirst netifd Sending KILL to remaining processes ... ubusd askfirst Switching to ramdisk... mount: /proc is not a block device umount: /tmp/root: not mounted Failed to switch over to ramfs. Please reboot. root@nacktmulle:~# As you can see this did not work. After a reboot I removed the mount-utils packet: root@nacktmulle:~# sysupgrade -d 60 -n -v /home/persistent/cerowrts/3.10.21-1/3.10.21-1-sysupgrade.bin killall: watchdog: no process killed Sending TERM to remaining processes ... udhcpc lighttpd crond lighttpd snmpd xinetd dbus-daemon odhcp6c zebra babeld watchquagga avahi-daemon rngd ntpd pimd minissdpd dnsmasq ubusd askfirst netifd Sending KILL to remaining processes ... ubusd askfirst Switching to ramdisk... Performing system upgrade... Unlocking firmware ... Writing from stdin to firmware ... Upgrade completed The root cause for the sysupgrade filure with installed mount-utils is the fact that mount-utils' /usr/bin/mount has different calling conventions than busy box's /bin/mount which in result make sysupgrade fail. So the easiest solution for sysupgrade is to remove mount-utils before running sysupgrade. (An alternative proposed earlier would be to edit the sysupgrade script to always call /bin/mount and /bin/umount) Bonus, on my router I mount a swp partition and a home partition (ext4) from an usb stick, both mount fine without mount-utils installed. So maybe we can move mount-utils out of the default installs? Best Regards Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Wireless failures 3.10.17-3
Hi List, hi Dave, On Dec 11, 2013, at 19:41 , Dave Taht dave.t...@gmail.com wrote: I have the regrettable problem of mostly testing the 5ghz channel due to interference issues on the 2ghz band. What I am seeing in the last several releases of the 3.8.x and 3.10 series is after tons of traffic and multiple days of uptime a DMA tx error which you can see via the logread or dmesg tool, and once it happens, at least sometimes, that radio can go away and not be resettable. cannot stop tx dma is the error. I think I can make tho error appear at will by running netperf-wrapper against my wndr3700v2, just tested under 3.10.21-1: /netperf-wrapper -l 300 -H gw.home.lan rrul -p all -t hms-beagle_cerowrt3.10.21-1_2_nacktmulle dmesg on the router: [ 53.007812] IPv6: ADDRCONF(NETDEV_CHANGE): gw11: link becomes ready [28792.039062] ath: phy1: Failed to stop TX DMA, queues=0x00e! [28794.078125] ath: phy1: Failed to stop TX DMA, queues=0x00e! [28807.164062] ath: phy1: Failed to stop TX DMA, queues=0x00e! [28809.191406] ath: phy1: Failed to stop TX DMA, queues=0x002! [28823.269531] ath: phy1: Failed to stop TX DMA, queues=0x00e! dmesg was clean before so these 5 failures are from the rrul test over the 5GHz radio running the same over the 2.4GHz radio adds the following: [29200.921875] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29206.980468] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29209.019531] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29211.066406] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29215.109375] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29227.195312] ath: phy0: Failed to stop TX DMA, queues=0x006! [29233.257812] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29238.308593] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29240.351562] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29247.417968] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29251.480468] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29253.515625] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29256.558593] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29262.617187] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29264.652343] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29269.699218] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29273.75] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29278.804687] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29281.859375] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29291.933593] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29294.972656] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29304.050781] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29312.117187] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29315.167968] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29322.246093] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29325.292968] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29330.355468] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29332.390625] ath: phy0: Failed to stop TX DMA, queues=0x00a! [29334.445312] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29336.484375] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29337.527343] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29343.617187] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29349.679687] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29358.757812] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29361.816406] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29363.851562] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29364.882812] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29370.937500] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29371.976562] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29376.031250] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29378.062500] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29381.105468] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29388.175781] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29393.230468] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29401.292968] ath: phy0: Failed to stop TX DMA, queues=0x003! [29403.332031] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29413.429687] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29417.480468] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29422.542968] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29424.582031] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29427.636718] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29429.671875] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29431.718750] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29433.765625] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29445.835937] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29449.898437] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29454.960937] ath: phy0: Failed to stop TX DMA, queues=0x00f! [29461.023437] ath: phy0: Failed to stop TX DMA, queues=0x00e! [29463.062500] ath: phy0: Failed to stop
[Cerowrt-devel] 3.10.23-1 wifi issues
Hi Dave, so I tried to upgrade to 3.10.23-1 to test the wether the ath TX DMA error would be gone. But alas, I can connect to the router via wifi after the upgrade (I did the upgrade twice, once from the GUI and once from the common line). So I am switching back to 3.10.21-1 for the time being. The symptoms are that only the two -guest interfaces show up in the AP on my macbook and I cannot connect to those. This was the case with the fresh pristine cerowrt as well as after changing the configuration to what worked well so far and rebooting. I hope others have more luck with 3.10.23. best Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] 3.10.23-1 wifi issues
Hi Fred, On Dec 12, 2013, at 01:12 , Fred Stratton fredstrat...@imap.cc wrote: I have the same problem. I cannot connect to wireless clients via g or n. The clients do not receive DHCP ipv4 addresses. I think that Fred characterized the issue way better than I did. I agree that my symptoms are best explained with a missing ipv4 address (on the gw interfaces); but I also did not see the sw interfaces show up, that is my macbook only showed the gw's as available APs. So maybe it is the dnsmasq issue again that might require another kick to recognize all interfaces??? Best Sebastian Regression to 3.10.21-1 solves the problem, as you say. On 11/12/13 23:18, Sebastian Moeller wrote: Hi Dave, so I tried to upgrade to 3.10.23-1 to test the wether the ath TX DMA error would be gone. But alas, I can connect to the router via wifi after the upgrade (I did the upgrade twice, once from the GUI and once from the common line). So I am switching back to 3.10.21-1 for the time being. The symptoms are that only the two -guest interfaces show up in the AP on my macbook and I cannot connect to those. This was the case with the fresh pristine cerowrt as well as after changing the configuration to what worked well so far and rebooting. I hope others have more luck with 3.10.23. best Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Wireless failures 3.10.17-3
Hi Sujith, On Dec 13, 2013, at 10:27 , Sujith Manoharan suj...@msujith.org wrote: Sebastian Moeller wrote: It is a net gear WNDR3700 v2, so according to: http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680 MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros AR9220 802.11an. Sure, I hope I got the right one. Now this is not from the same boot as the one with the errors, but I assume that does not make a difference… Since I am located in Germany I set the regulatory domain to DE. please let me know if I you need any additional information or testing (note I am not set up to build cerowrt myself, so I would need Dave Täht's help to build a modified firmware) Can you try this patch ? I will, but it will take some time, as I cannot build the firmware for this device myself, but need help. So I let you know once I tested the patched kernel. Best Regards many thanks Sebastian diff --git a/drivers/net/wireless/ath/ath9k/ar9002_mac.c b/drivers/net/wireless/ath/ath9k/ar9002_mac.c index 8d78253..0337de7 100644 --- a/drivers/net/wireless/ath/ath9k/ar9002_mac.c +++ b/drivers/net/wireless/ath/ath9k/ar9002_mac.c @@ -76,9 +76,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) mask2 |= ATH9K_INT_CST; if (isr2 AR_ISR_S2_TSFOOR) mask2 |= ATH9K_INT_TSFOOR; + + if (!(pCap-hw_caps ATH9K_HW_CAP_RAC_SUPPORTED)) { + REG_WRITE(ah, AR_ISR_S2, isr2); + isr = ~AR_ISR_BCNMISC; + } } - isr = REG_READ(ah, AR_ISR_RAC); + if (pCap-hw_caps ATH9K_HW_CAP_RAC_SUPPORTED) + isr = REG_READ(ah, AR_ISR_RAC); + if (isr == 0x) { *masked = 0; return false; @@ -97,11 +104,23 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) *masked |= ATH9K_INT_TX; - s0_s = REG_READ(ah, AR_ISR_S0_S); + if (pCap-hw_caps ATH9K_HW_CAP_RAC_SUPPORTED) { + s0_s = REG_READ(ah, AR_ISR_S0_S); + s1_s = REG_READ(ah, AR_ISR_S1_S); + } else { + s0_s = REG_READ(ah, AR_ISR_S0); + REG_WRITE(ah, AR_ISR_S0, s0_s); + s1_s = REG_READ(ah, AR_ISR_S1); + REG_WRITE(ah, AR_ISR_S1, s1_s); + + isr = ~(AR_ISR_TXOK | + AR_ISR_TXDESC | + AR_ISR_TXERR | + AR_ISR_TXEOL); + } + ah-intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXOK); ah-intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXDESC); - - s1_s = REG_READ(ah, AR_ISR_S1_S); ah-intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXERR); ah-intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXEOL); } @@ -120,7 +139,12 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) if (isr AR_ISR_GENTMR) { u32 s5_s; - s5_s = REG_READ(ah, AR_ISR_S5_S); + if (pCap-hw_caps ATH9K_HW_CAP_RAC_SUPPORTED) { + s5_s = REG_READ(ah, AR_ISR_S5_S); + } else { + s5_s = REG_READ(ah, AR_ISR_S5); + } + ah-intr_gen_timer_trigger = MS(s5_s, AR_ISR_S5_GENTIMER_TRIG); @@ -133,6 +157,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) if ((s5_s AR_ISR_S5_TIM_TIMER) !(pCap-hw_caps ATH9K_HW_CAP_AUTOSLEEP)) *masked |= ATH9K_INT_TIM_TIMER; + + if (!(pCap-hw_caps ATH9K_HW_CAP_RAC_SUPPORTED)) { + REG_WRITE(ah, AR_ISR_S5, s5_s); + isr = ~AR_ISR_GENTMR; + } + } + + if (!(pCap-hw_caps ATH9K_HW_CAP_RAC_SUPPORTED)) { + REG_WRITE(ah, AR_ISR, isr); + REG_READ(ah, AR_ISR); } if (sync_cause) { A version that applies over OpenWrt trunk is here: http://msujith.org/dir/patches/wl/Dec-13-2013/0001-ath9k-Interrupt-handling-fix-for-AR9002-family.patch Sujith -- Sandra, Okko, Joris, Sebastian Moeller Telefon: +49 7071 96 49 783, +49 7071 96 49 784, +49 7071 96 49 785 GSM: +49-1577-190 31 41 GSM: +49-1517-00 70 355 Moltkestrasse 6 72072 Tuebingen Deutschland ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt
Re: [Cerowrt-devel] Wireless failures 3.10.17-3
Hello Felix, On Dec 13, 2013, at 17:51 , Felix Fietkau n...@openwrt.org wrote: On 2013-12-13 10:48, Sebastian Moeller wrote: Hi Sujith, On Dec 13, 2013, at 10:27 , Sujith Manoharan suj...@msujith.org wrote: Sebastian Moeller wrote: It is a net gear WNDR3700 v2, so according to: http://wiki.openwrt.org/toh/netgear/wndr3700 it is a Atheros AR7161 rev 2 680 MHz soc with the following wireless parts: Atheros AR9223 802.11bgn / Atheros AR9220 802.11an. Sure, I hope I got the right one. Now this is not from the same boot as the one with the errors, but I assume that does not make a difference… Since I am located in Germany I set the regulatory domain to DE. please let me know if I you need any additional information or testing (note I am not set up to build cerowrt myself, so I would need Dave Täht's help to build a modified firmware) Can you try this patch ? I will, but it will take some time, as I cannot build the firmware for this device myself, but need help. So I let you know once I tested the patched kernel. On OpenWrt/CeroWrt you should not patch it into the kernel. You need to add it as a patch for package/kernel/mac80211. Ah, thanks, good to know. Vielen Dank. (I still need Dave's help in integrating this patch into a firmware image so I can actually test it...) Best Regards Sebastian - Felix ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] aqm gui feedback on cerowrt-3.10.24-1 for linklayer adaption
Hi Dave, On Dec 14, 2013, at 07:26 , Dave Taht dave.t...@gmail.com wrote: one of the things that makes me happy with all-up testing is that occasionally after completely blowing up my own work, I get to critique fresh work that isn't mine, in an area with which I have no expertise, with gratitude that I don't have to figure out the answer. :) So I spent some time clicking wildly all over the AQM gui webpage to see what I could break. 1) the aqm gui code doesn't work due to a bug at line 66. sc:depends(advanced, 1). sc has to be initialized first, which happens later in the file. Extra line removed in ceropackages, committed, pushed, you will need to do a pull. Merge failure? Worse, process failure, instead of actually committing tested code I manually edited some changes (that could not break) into the repository and pushed those without actually testing them. Will try to be more careful in the future I put in an additional s so in stead of sc:depends(advanced, 1) it should have read c:depends(advanced, 1) as the current identifier in that section is c. I pulled your changes, made my edit and puhed it again. 2) it's not clear to me we have to support both the stab and htb_private methods of fixing htb's linklayer. It was important that these be fixed for everyone else that uses htb, but is one of these is faster than the other? So, tc_stab is generic in that it should work with HFSC, HTB and TBF, while htb_private will only work with HTB (it seems TBF also has a private method but I have no clue whether that actually works, I just noticed that someone from huawei is posting changes to TBF on linux net-dev). My take is that we should just stick to stab so we can keep the same configuration fiels for most scripts people might come up with (thing free.qos without HTB). I have no idea which is faster though. I seem to recall one was a calculated value in the kernel, the other some sort of table. If I recall correctly HTB lost its internal tables to allow higher rates and/or precision; Jesper than fixed the htb_private method to account for the link layer on the fly. So currently HTB is more advanced than tc_stab, since HTB will allow arbitrarily large packets and still do the right thing, while tc_stab will either need humungous tables or will not work with jumbo packets and GRO. I think for shaping on a home router we could not care less. People who can afford such large packets and GRO on the router probably have bandwidths available that cerowrt does not handle well anyway (picture me heavily handwaving here). Does this choice need to be made by the user? Well, no the user should not care. We should make that decision for the user; then again unless we are able to constantly check both methods against each other one will bitrott, so maybe we should default to tc_stab and make htb_private an advanced configuration option? Also we should try to drape tc_stab into the future and teach it the same trick htb_private does (and then also fix the fact that tc_stab ignores the information we have about overhead). If no one beats me to it I will try to prepare my first patch to the kernel to fix this sometime next year; but I reallr really have no time for that in the near future, as I have to papers to write and grants to write as well as apply for a new position. (Also I am not too keen of getting a patch into the kernel, I just want this issue fixed; but since it looks this itches me most...) The two variants benchmarked? Jesper? 3) Clicking advanced configuration on and off toggles display of the qdisc and qdisc script, Did it really? I think with your change only the qdisc script should have toggled (but I am no Luci expert). and twiddling with the linklayer value brings up all the extra DSL detail. Yea! ... and I think I was wrong in mentally visualizing the thing If these were made tabs [Basic, Queueing Discipline, Linklayer, Priorities], there would be more room for explanatory text in particular and better alignment with the look and feel of the rest of the gui. I agree, then we would not need to hide anything, and could bring on more options like interval and target (and/or some of the pie details) Note that priorities is a placeholder for somehow bringing out something remotely similar to what openwrt's qos system already does and what AQM (ceroshaper? some other name is needed) does implicitly with optimizing for dns and ntp. So a way to place different packets into the different priority bands (in simple.qos)? ECN enablement should be brought out in Queueing discipline via the ALLECN variable. It seems likely ALLECN needs to have 4 states rather than 3, which needs to also be fixed in the scripts. What about two fields then ECN inbound and ECN outbound, same for states but easier to understand... While I'm at it, perhaps having tabs
Re: [Cerowrt-devel] Field Report on CeroWrt 3.10.24-1
Hi Rich, On Dec 15, 2013, at 06:16 , Rich Brown richb.hano...@gmail.com wrote: I did a tftp install of CeroWrt 3.10.42-1 on my secondary WNDR3800. I then used the “secondary” script to reconfigure the subnets and SSIDs to be different from my primary CeroWrt router. I know that a lot of things are still in flux, but I thought I should comment that I noticed the following: 0) It seems to work mostly. I could connect my MacBook on Ethernet, but not wireless (see below) I ran RRUL with reasonable results (I think) 1) Only the ge00 interface was in its proper firewall zone (wan); I used the GUI to move all the gwxx to guest and se00 and swxx interfaces to lan. 2) None of the wireless SSIDs (2.4 or 5 GHz) allowed connections. It appears that they’re there, my MacBook sees them, but it cannot get an address for itself on those SSIDs. 3) Clicking the AQM tab gave the following diagnostic info: /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute cbi dispatcher target for entry '/admin/network/aqm'. The called action terminated with an exception: /usr/lib/lua/luci/model/cbi/aqm.lua:63: attempt to index global 'sc' (a nil value) stack traceback: [C]: in function 'assert' /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' /usr/lib/lua/luci/dispatcher.lua:195: in function /usr/lib/lua/luci/dispatcher.lua:194 3a) To work around this (as noted in another message on the list), remove leading “s” of line 63 of /usr/lib/lua/luci/model/cbi/aqm.lua to read: c:depends(advanced, 1”) Sorry for that, I committed an untested change (adding two lines, how much can go wrong?) and forgot to edit the 2nd copy... 4) In the AQM tab, I’m not sure which linklayer adaptation mechanism to use. If you have DSL either is good. In case you work with jumbo packets on ge00 or use GSO you should use htb_private, otherwise both are fine. (I will try to get patches for tc_stab into the kernel that makes this difference moot ad might alls us to consolidate on the generic tc_stab). I will add this information onto the GUI. (Since there are only few users for the link layer adjustments, both methods are somewhat prone to bitrott, so I think it has value to expose both so that we can cross test both, assuming both will not go bad at the same kernel revision...) It would be good to have a concise summary of the proper settings for various use cases. Well, yes it would, unfortunately it is slightly tricky to do so. Dave proposed a redesign of the AQM GUI page with tabs for the different functional pieces. When I prototype this I will try to include more information about properly selecting those values. That said typically mpu shopule be zero, tcMTU should be 2047 (as the interface MTU will be around 1500), tsize should be 128. Overhead is the trickiest as it depends on the actual encapsulation used on your link. If I am correct the maximum for this is 44 and a typical value is 40, so we could default to 44 to do no damage but we would waste bandwidth for almost everybody. What do you think about including links in the GUI so the user can go and read up on this? (I would recommend http://www.faqs.org/rfcs/rfc2684.html and http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ but both are not that easy to digest...) (And to install a set of defaults that will “do the right thing” for the majority of people, so we don’t have to explain it very often.) Okay, realistically the most important thing is to select one of the mechanisms to account for the link layer if you are on ATM based DSL, so typically ADSL1, ADSL2, ADSL2+ (with an off chance with VDSL1), assuming a typical link the link MTU will be ~1500 so the defaults for tcMTU tsize and MPU will work fine. We could set the default link layer to ADSL and the default overhead to 40 if Dave agrees, to preconfigure a reasonable default… I have been thinking about how to detect the link layer quantization and the protocol overhead automatically, but so far do not have anything useful to include with cerowrt (on a fast link one needs to measure all night to get small enough deviations to reliably detect the quantisation). If you are willing to play guinea pig I will send you the measurement script... 5) I did *not* try the Hurricane Electric 6in4 tunnel. Best regards, Rich Brown Hanover, NH ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Field Report on CeroWrt 3.10.24-1
Hi Fred, thanks to your input. On Dec 15, 2013, at 12:55 , Fred Stratton fredstrat...@imap.cc wrote: Over the last 30 years, graphical interfaces have been bloated with pages of explanatory text. If more explanation of the interface is required, it could be incorporated in the wiki. Links could also be incorporated in the wiki. So, I think it would be nice if the GUI contains enough information for a typical user to set up the system and forget about it. Alas, the ATM encapsulation is a bit complicated and arcane, so a bit of explanation seems required. What does the rest of you think: keep the GUI clean or include a bit background information? I suggest the AQM lua interface is kept simple and therefore easier to maintain. I hope that there is not much maintenance necessary once all features are supported. At least I the ATM encapsulation issues, hopefully, are constant and will not change in the future… (except, I dream, they will become obsolete once ATM goes the way of the Dodo…). But, hey, so far we have one voice for more detail/instructions and one for terseness. As said before, I will still try to rearrange the AQM single tab into a collection of tabs, that should allow to include a bit more help, hopefully without sacrificing simplicity too much; so in the end both Rich and Fred might find it acceptable. (It just means that we will have to choose the terse help text very carefully; help would be greatly appreciated) Best Sebastian On 15/12/13 11:33, Sebastian Moeller wrote: Hi Rich, On Dec 15, 2013, at 06:16 , Rich Brown richb.hano...@gmail.com wrote: I did a tftp install of CeroWrt 3.10.42-1 on my secondary WNDR3800. I then used the “secondary” script to reconfigure the subnets and SSIDs to be different from my primary CeroWrt router. I know that a lot of things are still in flux, but I thought I should comment that I noticed the following: 0) It seems to work mostly. I could connect my MacBook on Ethernet, but not wireless (see below) I ran RRUL with reasonable results (I think) 1) Only the ge00 interface was in its proper firewall zone (wan); I used the GUI to move all the gwxx to guest and se00 and swxx interfaces to lan. 2) None of the wireless SSIDs (2.4 or 5 GHz) allowed connections. It appears that they’re there, my MacBook sees them, but it cannot get an address for itself on those SSIDs. 3) Clicking the AQM tab gave the following diagnostic info: /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute cbi dispatcher target for entry '/admin/network/aqm'. The called action terminated with an exception: /usr/lib/lua/luci/model/cbi/aqm.lua:63: attempt to index global 'sc' (a nil value) stack traceback: [C]: in function 'assert' /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' /usr/lib/lua/luci/dispatcher.lua:195: in function /usr/lib/lua/luci/dispatcher.lua:194 3a) To work around this (as noted in another message on the list), remove leading “s” of line 63 of /usr/lib/lua/luci/model/cbi/aqm.lua to read: c:depends(advanced, 1”) Sorry for that, I committed an untested change (adding two lines, how much can go wrong?) and forgot to edit the 2nd copy... 4) In the AQM tab, I’m not sure which linklayer adaptation mechanism to use. If you have DSL either is good. In case you work with jumbo packets on ge00 or use GSO you should use htb_private, otherwise both are fine. (I will try to get patches for tc_stab into the kernel that makes this difference moot ad might alls us to consolidate on the generic tc_stab). I will add this information onto the GUI. (Since there are only few users for the link layer adjustments, both methods are somewhat prone to bitrott, so I think it has value to expose both so that we can cross test both, assuming both will not go bad at the same kernel revision...) It would be good to have a concise summary of the proper settings for various use cases. Well, yes it would, unfortunately it is slightly tricky to do so. Dave proposed a redesign of the AQM GUI page with tabs for the different functional pieces. When I prototype this I will try to include more information about properly selecting those values. That said typically mpu shopule be zero, tcMTU should be 2047 (as the interface MTU will be around 1500), tsize should be 128. Overhead is the trickiest as it depends on the actual encapsulation used on your link. If I am correct the maximum for this is 44 and a typical value is 40, so we could default to 44 to do no damage but we would waste bandwidth for almost everybody. What do you think about including links in the GUI so the user can go and read up on this? (I would recommend http://www.faqs.org/rfcs/rfc2684.html and http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ but both are not that easy to digest...) (And to install a set
Re: [Cerowrt-devel] aqm gui feedback on cerowrt-3.10.24-1 for linklayer adaption
hI Dave hi list, On Dec 14, 2013, at 07:26 , Dave Taht dave.t...@gmail.com wrote: one of the things that makes me happy with all-up testing is that occasionally after completely blowing up my own work, I get to critique fresh work that isn't mine, in an area with which I have no expertise, with gratitude that I don't have to figure out the answer. :) So I spent some time clicking wildly all over the AQM gui webpage to see what I could break. 1) the aqm gui code doesn't work due to a bug at line 66. sc:depends(advanced, 1). sc has to be initialized first, which happens later in the file. Extra line removed in ceropackages, committed, pushed, you will need to do a pull. Merge failure? 2) it's not clear to me we have to support both the stab and htb_private methods of fixing htb's linklayer. It was important that these be fixed for everyone else that uses htb, but is one of these is faster than the other? I seem to recall one was a calculated value in the kernel, the other some sort of table. Does this choice need to be made by the user? The two variants benchmarked? Jesper? So I just went ahead and hid htb_private for the time being (by commenting out the definition in aqm.lua, can only be reenabled by editing, this is not ideal, but at lease confusing than the situation before) 3) Clicking advanced configuration on and off toggles display of the qdisc and qdisc script, and twiddling with the linklayer value brings up all the extra DSL detail. Yea! ... and I think I was wrong in mentally visualizing the thing If these were made tabs [Basic, Queueing Discipline, Linklayer, Priorities], there would be more room for explanatory text in particular and better alignment with the look and feel of the rest of the gui. Note that priorities is a placeholder for somehow bringing out something remotely similar to what openwrt's qos system already does and what AQM (ceroshaper? some other name is needed) does implicitly with optimizing for dns and ntp. The tabs are in. Since Priorities would be empty it does not exist yet. Let's see how you like the rest... ECN enablement should be brought out in Queueing discipline via the ALLECN variable. It seems likely ALLECN needs to have 4 states rather than 3, which needs to also be fixed in the scripts. While I'm at it, perhaps having tabs for each physical interface is not a horrible idea, but I shudder to think of people rate-limiting their wifi in the hope that that would help. ? 5) Adding a second interface shows @ge01 as an option, which isn't a real interface, and se00 as an option and not the gw* or sw* interfaces. Adding se00 with the default option gives me an error One or more required fields have no value! One or more required fields have no value! One or more required fields have no value! One or more required fields have no value! (and I'm pretty sure the aqm-scripts break even if this is correctly written to the config file) 6) feel free to add your copyright to the code. :) I return now to figuring out why bringing up the wifi is so hosed. I will probably be reverting the kernel, netifd, and other things, way, way, way back to when they used to work. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] aqm gui feedback on cerowrt-3.10.24-1 for linklayer adaption
Hi Dave, On Dec 14, 2013, at 07:26 , Dave Taht dave.t...@gmail.com wrote: one of the things that makes me happy with all-up testing is that occasionally after completely blowing up my own work, I get to critique fresh work that isn't mine, in an area with which I have no expertise, with gratitude that I don't have to figure out the answer. :) So I spent some time clicking wildly all over the AQM gui webpage to see what I could break. 1) the aqm gui code doesn't work due to a bug at line 66. sc:depends(advanced, 1). sc has to be initialized first, which happens later in the file. Extra line removed in ceropackages, committed, pushed, you will need to do a pull. Merge failure? 2) it's not clear to me we have to support both the stab and htb_private methods of fixing htb's linklayer. It was important that these be fixed for everyone else that uses htb, but is one of these is faster than the other? I seem to recall one was a calculated value in the kernel, the other some sort of table. Does this choice need to be made by the user? The two variants benchmarked? Jesper? 3) Clicking advanced configuration on and off toggles display of the qdisc and qdisc script, and twiddling with the linklayer value brings up all the extra DSL detail. Yea! ... and I think I was wrong in mentally visualizing the thing If these were made tabs [Basic, Queueing Discipline, Linklayer, Priorities], there would be more room for explanatory text in particular and better alignment with the look and feel of the rest of the gui. Note that priorities is a placeholder for somehow bringing out something remotely similar to what openwrt's qos system already does and what AQM (ceroshaper? some other name is needed) does implicitly with optimizing for dns and ntp. ECN enablement should be brought out in Queueing discipline via the ALLECN variable. It seems likely ALLECN needs to have 4 states rather than 3, which needs to also be fixed in the scripts. Done, that is all 4 states for inbound and outbound ECN can be configured via the GUI now. While I'm at it, perhaps having tabs for each physical interface is not a horrible idea, but I shudder to think of people rate-limiting their wifi in the hope that that would help. ? 5) Adding a second interface shows @ge01 as an option, which isn't a real interface, and se00 as an option and not the gw* or sw* interfaces. Adding se00 with the default option gives me an error One or more required fields have no value! One or more required fields have no value! One or more required fields have no value! One or more required fields have no value! (and I'm pretty sure the aqm-scripts break even if this is correctly written to the config file) 6) feel free to add your copyright to the code. :) I return now to figuring out why bringing up the wifi is so hosed. I will probably be reverting the kernel, netifd, and other things, way, way, way back to when they used to work. -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt-3.10.24-5 dev build released
Hi Rich, On Dec 16, 2013, at 14:45 , Rich Brown richb.hano...@gmail.com wrote: Dave, List, + hopefully nasty interface initialization bug fixed http://www.bufferbloat.net/issues/437 In a one-out-of-one test, this build causes all six wireless interfaces to start up: CEROwrt (and guest) on 2.4 and 5 GHz, and both babel SSID’s. In addition, the ‘wifi’ command does not give any diagnostic output as I had mentioned in my earlier message. Keepin’ my fingers crossed. That sounds quite promising. I would be delighted, if stability permits, if you could test the current AQM implementation and send me your feedback and or open questions. It seems I have been looking at the link layer issue for so long that I do not realize which information is required so please let me know if parts are to terse or too verbose. Best Regards sebastian Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] treating 2.4ghz as -legacy?
Hi Dave, On Dec 16, 2013, at 20:46 , Dave Taht dave.t...@gmail.com wrote: I have long used 5 as an indicator that the 5ghz channel was better. This goes back to a long thread on nanog, like 4? 5? years ago, where the hope was to train users that 5 was better. Well, it's turned out that 5 is frequently better, but not always, AND that clients tend to go for the shortest of the SSIDs available. So a thought would be to create another ad-hoc standard for deprecating 2.4 ghz, and have the shorter SSID be the 5ghz one. Ideas for the 2ghz channel: CEROwrt-legacy CEROwrt2 I'm not huge on legacy because it's rather long but am stuck for standards, I'd like a default 2.4 ghz SSID that clearly indicates the real use to which 2.4ghz is suitable, like: CEROwrt-GET-OFF-MY-BABY-MONITOR-YOU-FREAK ideas for another ssid naming standard slightly longer than a single digit that would make sense to mom? While being not really for mom, I went for name_2.4GHz and name_5GHz. Pretty clear, and the her name is shorter :) best Sebastian -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] aqm gui feedback on cerowrt-3.10.24-1 for linklayer adaption
Hi Jesper, On Dec 17, 2013, at 09:03 , Jesper Dangaard Brouer bro...@redhat.com wrote: On Sat, 14 Dec 2013 13:24:06 +0100 Sebastian Moeller moell...@gmx.de wrote: On Dec 14, 2013, at 07:26 , Dave Taht dave.t...@gmail.com wrote: 2) it's not clear to me we have to support both the stab and htb_private methods of fixing htb's linklayer. It was important that these be fixed for everyone else that uses htb, but is one of these is faster than the other? So, tc_stab is generic in that it should work with HFSC, HTB and TBF, while htb_private will only work with HTB (it seems TBF also has a private method but I have no clue whether that actually works, I just noticed that someone from huawei is posting changes to TBF on linux net-dev). My take is that we should just stick to stab so we can keep the same configuration fiels for most scripts people might come up with (thing free.qos without HTB). I have no idea which is faster though. I seem to recall one was a calculated value in the kernel, the other some sort of table. If I recall correctly HTB lost its internal tables to allow higher rates and/or precision; Jesper than fixed the htb_private method to account for the link layer on the fly. So currently HTB is more advanced than tc_stab, since HTB will allow arbitrarily large packets and still do the right thing, while tc_stab will either need humungous tables or will not work with jumbo packets and GRO. I think for shaping on a home router we could not care less. People who can afford such large packets and GRO on the router probably have bandwidths available that cerowrt does not handle well anyway (picture me heavily handwaving here). Yes, with my recent fix, the HTB linklayer should be more precise than stab (as HTB linklayer is no longer table based). But for DSL stab is precise unless MTU is larger than table size + overhead. stab modifies the apparent size of packets and that has no precision issue :) But I think stab should do the same you did with HTB, namely calculate the link layer adjustment on the fly. BUT as stab is more generic, e.g. works on all schedulers, we should move towards that. We should fix stab, in the kernel, to account for stuff like GSO, and if needed we could easily do on-the-fly ATM cell alignment (like the HTB linklayer patch). I agree, the account for GSO is i single line change, so should be easy, then the fly calculation is a tiny bit more involved. But in difference to HTB for stab the kernel knows the requested link layer so no heuristic is needed! Does this choice need to be made by the user? Well, no the user should not care. We should make that decision for the user; then again unless we are able to constantly check both methods against each other one will bitrott, so maybe we should default to tc_stab and make htb_private an advanced configuration option? Also we should try to drape tc_stab into the future and teach it the same trick htb_private does (and then also fix the fact that tc_stab ignores the information we have about overhead). What! - Does stab ignore the overhead?!? Oh, sorry for being imprecise here. Stab does take the overhead into account you put in the stab invocation just like HTB. It does not currently use the kernels information about GSO, so if handed a GSO packet it will not account for any ethernet header. For the non offload situation not a big deal, you just include the 14? bytes ethernet header in overhead, but hopelessly wrong in the GSO situation. Currently cerowrt does not use GSO so that is theoretical for now. The overhead for small (e.g. ACK) packet is *very* important in the ATM/ADSL case, as the small encap overhead cause the packet to use two ATM frames, which is important to account for, because this represent a very big percentage overhead (62%). Over-more ADSL is especially prone to have many ACK packets travel their upload link (due to the larger download link capacity). If no one beats me to it I will try to prepare my first patch to the kernel to fix this sometime next year; but I reallr really have no time for that in the near future, as I have to papers to write and grants to write as well as apply for a new position. (Also I am not too keen of getting a patch into the kernel, I just want this issue fixed; but since it looks this itches me most...) The two variants benchmarked? Jesper? I have actually not played with stab. Best Regards Sebastian -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] aqm gui feedback on cerowrt-3.10.24-1 for linklayer adaption
Hi Fred, On Dec 17, 2013, at 12:39 , Fred Stratton fredstrat...@imap.cc wrote: In the new interface, htb_private is not an explicit option. IF aqm is enabled, and linklayer protocol is 'none', is htb_private implicitly chosen? No, currently, you have no access to htb_private. I am still thinking about how to expose this option or if at all. The best way forward would be to teach the kernel's stab implementation the two tricks htb_private knows better and then forget about htb_private at all… BUt if you need htb_private, all you need to foo is to uncomment one line in aqm.lua. best Sebastian On 17/12/13 08:22, Sebastian Moeller wrote: Hi Jesper, On Dec 17, 2013, at 09:03 , Jesper Dangaard Brouer bro...@redhat.com wrote: On Sat, 14 Dec 2013 13:24:06 +0100 Sebastian Moeller moell...@gmx.de wrote: On Dec 14, 2013, at 07:26 , Dave Taht dave.t...@gmail.com wrote: 2) it's not clear to me we have to support both the stab and htb_private methods of fixing htb's linklayer. It was important that these be fixed for everyone else that uses htb, but is one of these is faster than the other? So, tc_stab is generic in that it should work with HFSC, HTB and TBF, while htb_private will only work with HTB (it seems TBF also has a private method but I have no clue whether that actually works, I just noticed that someone from huawei is posting changes to TBF on linux net-dev). My take is that we should just stick to stab so we can keep the same configuration fiels for most scripts people might come up with (thing free.qos without HTB). I have no idea which is faster though. I seem to recall one was a calculated value in the kernel, the other some sort of table. If I recall correctly HTB lost its internal tables to allow higher rates and/or precision; Jesper than fixed the htb_private method to account for the link layer on the fly. So currently HTB is more advanced than tc_stab, since HTB will allow arbitrarily large packets and still do the right thing, while tc_stab will either need humungous tables or will not work with jumbo packets and GRO. I think for shaping on a home router we could not care less. People who can afford such large packets and GRO on the router probably have bandwidths available that cerowrt does not handle well anyway (picture me heavily handwaving here). Yes, with my recent fix, the HTB linklayer should be more precise than stab (as HTB linklayer is no longer table based). But for DSL stab is precise unless MTU is larger than table size + overhead. stab modifies the apparent size of packets and that has no precision issue :) But I think stab should do the same you did with HTB, namely calculate the link layer adjustment on the fly. BUT as stab is more generic, e.g. works on all schedulers, we should move towards that. We should fix stab, in the kernel, to account for stuff like GSO, and if needed we could easily do on-the-fly ATM cell alignment (like the HTB linklayer patch). I agree, the account for GSO is i single line change, so should be easy, then the fly calculation is a tiny bit more involved. But in difference to HTB for stab the kernel knows the requested link layer so no heuristic is needed! Does this choice need to be made by the user? Well, no the user should not care. We should make that decision for the user; then again unless we are able to constantly check both methods against each other one will bitrott, so maybe we should default to tc_stab and make htb_private an advanced configuration option? Also we should try to drape tc_stab into the future and teach it the same trick htb_private does (and then also fix the fact that tc_stab ignores the information we have about overhead). What! - Does stab ignore the overhead?!? Oh, sorry for being imprecise here. Stab does take the overhead into account you put in the stab invocation just like HTB. It does not currently use the kernels information about GSO, so if handed a GSO packet it will not account for any ethernet header. For the non offload situation not a big deal, you just include the 14? bytes ethernet header in overhead, but hopelessly wrong in the GSO situation. Currently cerowrt does not use GSO so that is theoretical for now. The overhead for small (e.g. ACK) packet is *very* important in the ATM/ADSL case, as the small encap overhead cause the packet to use two ATM frames, which is important to account for, because this represent a very big percentage overhead (62%). Over-more ADSL is especially prone to have many ACK packets travel their upload link (due to the larger download link capacity). If no one beats me to it I will try to prepare my first patch to the kernel to fix this sometime next year; but I reallr really have no time for that in the near future, as I have to papers to write and grants to write as well as apply for a new position
Re: [Cerowrt-devel] cerowrt-3.10.24-5 dev build released
Hi David, On Dec 18, 2013, at 05:34 , David Lang da...@lang.hm wrote: On Tue, 17 Dec 2013, Rich Brown wrote: - From what you’ve said, I don’t have much hope for doing it automagically. But maybe we can provide clues to help the customer do to the right thing. Perhaps the first dropdown could be “Link Layer Adjustments (used on DSL or ATM)” with options for “None/ADSL/SDSL/VDSL over PTM/VDSL over ATM/PPPoATM” and maybe others. CeroWrt could automatically set the proper link layer adaptations for each. We could also include a link to the wiki for a flow chart for setting each of these cases, especially the questions they should ask their ISP. Let's start with the first question, what is the difference between these as far as what the config should be? 1) ATM based carriers (ADSL1, ADSL2, ADSL2+, potentially VDSL1): link layer has to be set to ADSL 2) PTM based carriers (VDSL2): link layer has to be set to ethernet 3) cable, GPON (fiber): link layer has to be set to ethernet 1) and 2) typically have additional overhead to account for, 3) may or may not Only 3) with no overhead is fine with no link layer adaptation mechanism. forget the GUI or automated settings. If I am configuring a Cerowrt box mmanually, what should I set differently for the different types of configs? Current GUI settings (might change) A) ATM based transports: 1) Which link layer adaptation mechanism: tc_stab 2) link layer: ADSL 3) overhead: see http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ (section: Overhead and MTU Calculations) B) PTM based transports: 1) Which link layer adaptation mechanism: tc_stab 2) link layer: ethernet 3) overhead: unclear but see:http://www.dslreports.com/forum/r27565251-Internet-Per-packet-overhead-on-Bell-s-VDSL-ATM-based- C) cable, GPON (fiber): 1) Which link layer adaptation mechanism: none, assuming no per packet overhead otherwise 1) Which link layer adaptation mechanism: tc_stab 2) link layer: ethernet 3) overhead: unclear but see:http://www.dslreports.com/forum/r27565251-Internet-Per-packet-overhead-on-Bell-s-VDSL-ATM-based- There should be no need to fiddle with the advanced link layer options, unless you link MTU is 1500. Note for link layer ethernet no size table is constructed unless MPU 0. What is the impact of getting it wrong? (if it's like VPN overhead where setting the rate just slightly too high results is lots of wasted 'airtime' by setting it too low results is a amall amount of wasted 'airtime' then a low enough value to be reasonalbe everywere is a good default) User on an ATM based link without link layer adaptation: The shaper will underestimate the relevant wire seize of each packet and hence will not shape enough to avoid filling the potentially bloated buffers of the DSL modem. This effect gets worse the more the packet length distribution is skewed towards small packet (the estimate can be off by around 50% worst case, so this is not good.) But note this basically is the status quo for most users (as far as I know no router/modem sets these options correctly, but I do not claim to know all such systems). ALSO this in theory is testable, on such a system buffer bloat/latency increase should be more severe if one tries to fill the nominal transmit rates with small than with large packets. Misjudging the overhead either wastes bandwidth or also if too large or increases the likelihood to see buffer bloat by overestimating the effective link capacity. Users on a PTM based link, when running with link layer ADSL, will waste 10% of the bandwidth right there (for taking the 48 in 53 encapsulation into consideration). Plus they will overestimate the effective size of small packets and will waste up to 50% of the remaining bandwidth there. Overhead misjudging has the same effect as on ATM (except overheads on PTM typically should be smaller I guess so this effect might not be too relevant). To summarize, using the wrong link layer adaptation will hurt the user, ATM users will suffer buffer bloat, but will use all available bandwidth (well more actually since that creates the bloat), PTM users will suffer severe packet-size-dependent bandwidth decreases. The status quo is more or less fine for groups 2) and 3), not so good for 1). I think most people in 1) caring enough reduced the shaped rates by a larger amount than people in groups 2) and 3) and just accepted that depending on the packet size mix latencies got more variable. Getting it wrong is not advisable… Getting it right requires some non-obvious information from one's ISP. While VDSL will become more prominent in the future, ADSL variants will not disappear for a long time, as VDSL only works (well) on short loops, so people far away from the DSLAM will stay on ATM (or one day
Re: [Cerowrt-devel] cerowrt-3.10.24-5 dev build released
Hi Fred, On Dec 18, 2013, at 12:27 , Fred Stratton fredstrat...@imap.cc wrote: VDSL2 uses PTM. This is what I understand from the available information as well. Also I think that even VDSL1 typically uses PTM, but my knowledge in these matters is quite limited, (I am neither a telco nor do I work for one) In the UK, the regulator has mandated that VDSL2 must be run over fibre, normally to an MSAN within 200-300 metres of the user. So what is the fibre in your cable and fibre category? Fiber as in 3) of my mail would be fiber to the home FTTH. So typically a fiber modem at the home that speaks ethernet to the internal network. Sidenote, to keep things cheap these Modems typically are not set up in a switched fashion, but rather with a hub, each modem sees several users traffic, but will only transmit traffic for its MAC (I think) into each home; giving a clear upgrade patch should residential fiber become too slow :) . The important part is the last mile, basically were the bottleneck link sits. Even if a VDSL DSLAM would be connected to the ISP backbone via an ATM link (and there is no reason I know why you could not run ATM over fiber), from the users perspective this would not make link layer ADSL the correct choice, assuming the DSLAM backbone connection typically is not the bottleneck. (And even then all users would need to adapt for ATM for it to be useful; but all of this seems rather theoretical as there seems to be a push for telcos to consolidate on ethernet as infrastructure, as I understand it). Is it FTTC/FTTH as I describe, or fibre using some other transmission protocol? What you scribe is FFTC, and as stated above we are mostly interested in the DSLAM to MODEM link. I hope to have cleared thngs up… best Sebastian On 18/12/13 10:33, Sebastian Moeller wrote: Hi David, On Dec 18, 2013, at 05:34 , David Lang da...@lang.hm wrote: On Tue, 17 Dec 2013, Rich Brown wrote: - From what you’ve said, I don’t have much hope for doing it automagically. But maybe we can provide clues to help the customer do to the right thing. Perhaps the first dropdown could be “Link Layer Adjustments (used on DSL or ATM)” with options for “None/ADSL/SDSL/VDSL over PTM/VDSL over ATM/PPPoATM” and maybe others. CeroWrt could automatically set the proper link layer adaptations for each. We could also include a link to the wiki for a flow chart for setting each of these cases, especially the questions they should ask their ISP. Let's start with the first question, what is the difference between these as far as what the config should be? 1) ATM based carriers (ADSL1, ADSL2, ADSL2+, potentially VDSL1): link layer has to be set to ADSL 2) PTM based carriers (VDSL2): link layer has to be set to ethernet 3) cable, GPON (fiber): link layer has to be set to ethernet 1) and 2) typically have additional overhead to account for, 3) may or may not Only 3) with no overhead is fine with no link layer adaptation mechanism. forget the GUI or automated settings. If I am configuring a Cerowrt box mmanually, what should I set differently for the different types of configs? Current GUI settings (might change) A) ATM based transports: 1) Which link layer adaptation mechanism: tc_stab 2) link layer: ADSL 3) overhead: see http://ace-host.stuart.id.au/russell/files/tc/tc-atm/ (section: Overhead and MTU Calculations) B) PTM based transports: 1) Which link layer adaptation mechanism: tc_stab 2) link layer: ethernet 3) overhead: unclear but see:http://www.dslreports.com/forum/r27565251-Internet-Per-packet-overhead-on-Bell-s-VDSL-ATM-based- C) cable, GPON (fiber): 1) Which link layer adaptation mechanism: none, assuming no per packet overhead otherwise 1) Which link layer adaptation mechanism: tc_stab 2) link layer: ethernet 3) overhead: unclear but see:http://www.dslreports.com/forum/r27565251-Internet-Per-packet-overhead-on-Bell-s-VDSL-ATM-based- There should be no need to fiddle with the advanced link layer options, unless you link MTU is 1500. Note for link layer ethernet no size table is constructed unless MPU 0. What is the impact of getting it wrong? (if it's like VPN overhead where setting the rate just slightly too high results is lots of wasted 'airtime' by setting it too low results is a amall amount of wasted 'airtime' then a low enough value to be reasonalbe everywere is a good default) User on an ATM based link without link layer adaptation: The shaper will underestimate the relevant wire seize of each packet and hence will not shape enough to avoid filling the potentially bloated buffers of the DSL modem. This effect gets worse the more the packet length distribution is skewed towards small packet (the estimate can be off by around 50% worst case, so this is not good
Re: [Cerowrt-devel] Intuition of this guy improved his 1, 5 Mbit DSL line
Hi Maciej, On Dec 18, 2013, at 12:23 , Maciej Soltysiak mac...@soltysiak.com wrote: Hi, Story from slashodot: A guy moved to a low speed DSL connection. When describing what he's done to make it work better, among other things, he mentioned he configured QoS to be below his cap. He put 1100 Mb limit on a 1500 Mb line. So he is shaping down to 73.34% of his link, probably following the instructions from http://www.linksysinfo.org/index.php?threads/qos-tutorial.68795/ . From personal experience I can say that with cerowrt's aqm and the proper link layer I get decent latency results with just shaping down to 95% of link rate. I just checked and it does not seem that the current tomato fork he uses takes the link layer into account…. So trying cerowrt seems a worthwhile experiment for him to undertake. But from the writeup I assume he is quite happy as is and coming down from 32Mbit/s no matter what 1.5Mbit/s will feel slower. He's not using the benefits of fq_codel on his TomatoUSB but he's on the right track. His intuition is good: http://www.tidbitsfortechs.com/2013/12/surviving-internet-on-low-speed-dsl/ Would be cool for him to check out latest openwrt if his device can handle it. Best regards, Maciej ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Intuition of this guy improved his 1, 5 Mbit DSL line
Hi Fred, On Dec 18, 2013, at 14:35 , Fred Stratton fredstrat...@imap.cc wrote: I used TomatoUSB -the shibby builds, rather than the Toastman ones- for several years prior to moving to OpenWRT and then ceroWRT. The TomatoUSB community has repeatedly shouted down any attempts to discuss Getty's original findings, so it has been up to individuals to find QoS solutions. Shibby has the best QoS implementation. http://tomato.groov.pl I did not get very far. It was not possible to watch flash video and download simultaneously on a 4 megabit/s ADSL2+ line. I am not too amazed, neither the toast man nor shibby forks of tomato seem to handle the ATM link layer properly, at 4Mbit/s down and less up this omission is going to be noticeable no matter how god their AQM/QOS system might be otherwise. Looking today it looks like only Gargoyle actually handles that properly (well almost, they turn on stab unconditionally when PPPoE is used, but that test has some false positives…) The problem has been largely solved with the last 3 builds of CeroWRT. Yeah, Dave (and the rest of course) really developed a fine solution to the home router problem! Best Regards Sebastian On 18/12/13 11:23, Maciej Soltysiak wrote: Hi, Story from slashodot: A guy moved to a low speed DSL connection. When describing what he's done to make it work better, among other things, he mentioned he configured QoS to be below his cap. He put 1100 Mb limit on a 1500 Mb line. He's not using the benefits of fq_codel on his TomatoUSB but he's on the right track. His intuition is good: http://www.tidbitsfortechs.com/2013/12/surviving-internet-on-low-speed-dsl/ Would be cool for him to check out latest openwrt if his device can handle it. Best regards, Maciej ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] CeroWrt 3.10 AQM page
Hi Fred, On Dec 19, 2013, at 16:27 , Fred Stratton fredstrat...@ydl.net wrote: On 19/12/13 15:07, Sebastian Moeller wrote: Hi All, On Dec 19, 2013, at 15:24 , Fred Stratton fredstrat...@ydl.net wrote: 2 comments: You talk about link speed. This has 2 meanings: the rate at which the link syncs; the download speed as measured by speedtest.net, or similar site. The text should be more specific. So what we need is the link speed (ideally already reduced by the link bandwidth dedicated to forward error correction, as that is not availably to us…) There is a large number of reasons why speedtest.net might return variable speeds, but AQM should be tuned to the typical bottleneck link rate minus a few %. All links after the bottleneck (best case with cable already the first link is shared) are basically out of our control, assuming that the other links are at least as wide as the home link. Think about it that way, all DSLAMs are oversubscribed, so will not guarantee full concurrent payed for bandwidth to all connected lines. If we shape to the reduced fraction we get under congestion conditions we always waste a lot of bandwidth. As I understood we can only hope to control our next link reasonably well, so we should only aim for that link. Congestion inside the ISPs network or say overloaded peering connections on the way to speediest.net are outside of the scope of what we can solve with AQM. end of rant I do not think the assertion that all DSLAMs are oversubscribed is correct. I would love to hear from people deeper in the know, but as far as I know, the old telephone system was oversubscribed (the central office had less total line equivalents uplink, than subscriber lines connected). I am pretty sure that in Germany DSLAM are always having more total bandwidth on the user side than o the internet side; I do not want to claim that this typically leads to noticeable congestion, as let's face it, most users have very lean bandwidth usage... The point I was making was different to the one you are addressing. I thought your point was that iy is not clear which speed to measure; my core point is that it is the speed of the link to the ISP (the DDSLAM not necessarily the BRAS). The phrase 'contact your ISP' is in the text. This should be removed. Such contact is a traumatic exercise to which no users of ceroWRT should be subjected. I assume this is ISP contact is about the ADSL encapsulation; the alternatives are either trying to deduce this information from the del modems satays page (not all modems show this) from the ISPs website or by Googling, or empirically by measuring it. Calling the ISP should be the quickest… The ISPs I use have contract support, which changes. They work according to a fault-finding script. Even if you are asking a question and do not have a fault, they go through the script. I have been through 3 levels of tech support, 'gurus', and ISP network support. None can answer simple questions. Germany may be different, but I am paying circa 5 euros a month for unlimited internet access by ADSL2+. This price sounds great (is that all or do you have to pay for a mandatory phone line as well). The situation in Germany is not much better. I could use a boutique ISP, and get native ipv6 with a /48, and intelligent customer support. Downloads are limited to 40GiB per month for circa 50 euros. So here, talking to customer support is not viable. Then the non technical user pretty much is out of luck in getting this information. In that case we might think about defaulting overhead to 40 (the 44 maximum should be really really rare) so that almost all users should be okay… So we are back at needing a reliable robust automatic link layer detection…. best regards sebastian Best Regards Sebastian On 19/12/13 13:42, Rich Brown wrote: Hi Sebastian and Fred, [I’m changing the subject line of this thread…] Great comments. I knew my glib assertions and fuzzy explanations would bring out cogent thoughts. I’ll give the rest of the list a chance to peruse the draft page and then work on it tonight. http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 Rich On Dec 19, 2013, at 5:49 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Rich, On Dec 19, 2013, at 05:12 , Rich Brown richb.hano...@gmail.com wrote: Hi Sebastian, Perhaps we could extend the Interface configuration page to add a “Link uses DSL/ADSL:” checkbox right below the Protocol dropdown. Default would be off, but when customers go to the GE00 interface to enter their PPPoE/PPPoATM/ISP credentials, they’d see this additional checkbox. Checking it would feed that info to the AQM tab. (And perhaps there could be a link there either to the AQM tab, or to the wiki for more information
Re: [Cerowrt-devel] CeroWrt 3.10 AQM page
Hi Fred, On Dec 20, 2013, at 11:33 , Fred Stratton fredstrat...@imap.cc wrote: On 20/12/13 10:12, Sebastian Moeller wrote: Hi Fred, On Dec 19, 2013, at 16:27 , Fred Stratton fredstrat...@ydl.net wrote: On 19/12/13 15:07, Sebastian Moeller wrote: Hi All, On Dec 19, 2013, at 15:24 , Fred Stratton fredstrat...@ydl.net wrote: 2 comments: You talk about link speed. This has 2 meanings: the rate at which the link syncs; the download speed as measured by speedtest.net, or similar site. The text should be more specific. So what we need is the link speed (ideally already reduced by the link bandwidth dedicated to forward error correction, as that is not availably to us…) There is a large number of reasons why speedtest.net might return variable speeds, but AQM should be tuned to the typical bottleneck link rate minus a few %. All links after the bottleneck (best case with cable already the first link is shared) are basically out of our control, assuming that the other links are at least as wide as the home link. Think about it that way, all DSLAMs are oversubscribed, so will not guarantee full concurrent payed for bandwidth to all connected lines. If we shape to the reduced fraction we get under congestion conditions we always waste a lot of bandwidth. As I understood we can only hope to control our next link reasonably well, so we should only aim for that link. Congestion inside the ISPs network or say overloaded peering connections on the way to speediest.net are outside of the scope of what we can solve with AQM. end of rant I do not think the assertion that all DSLAMs are oversubscribed is correct. I would love to hear from people deeper in the know, but as far as I know, the old telephone system was oversubscribed (the central office had less total line equivalents uplink, than subscriber lines connected). I am pretty sure that in Germany DSLAM are always having more total bandwidth on the user side than o the internet side; I do not want to claim that this typically leads to noticeable congestion, as let's face it, most users have very lean bandwidth usage... The point I was making was different to the one you are addressing. I thought your point was that iy is not clear which speed to measure; my core point is that it is the speed of the link to the ISP (the DDSLAM not necessarily the BRAS). The end user cannot assess the speed pf the link you mention. Well typically this, what I call link speed is reported either on the invoice/contract or somewhere on the modems status page. There is also not necessarily one link only between the telephone exchange equipment and the ISP. As far as I understand the situation the only relevant link for AQM is the slowest dedicated link between us and the backbone. On a DSL line that typically is the copper pair between the DSLAM and the socket at the users home. The lines after that are typically shared (and hence larger than thus users dedicated payed-for access rates). For the shared components of the path we have to hope that the ISP manages these well. Here, BT, the Deutsche Telekom equivalent, does not have a monopoly. It has 37 per cent of the retail market. Its wholesale arm, OpenReach, operates the infrastructure independently. ISPs only use part of this infrastructure. TalkTalk use BT phone lines from here to the exchange 1.4km away. The DSLAM is TalkTalk equipment. The backhaul - fibre along the West Coast Main Line- is owned by TalkTalk. Sky use BT phone lines from here to the exchange 1.4km away. The DSLAM is Skyg equipment.The backhaul is owned by BT/OpenReach. I am talking about the maximum sync rate for the line versus the actual achieved rate, which varies more. Bith are available to the end user. Both can be used for calculation. Ah, we need the actual current sync rate (not the line capacity). The point is we need to know how fast can we actually push bits over to the DSLAM. This is the speed we need to stay below to avoid filling the ample buffers in the DSL modem on the uplink, and to avoid filling the DSLM's buffer on the downlink. I know that earless rate adaptation SRA will make this somewhat more difficult, it is rarely used. Heck what we need is a standardized way for routers to acquire the current up and download speeds from dsl/cable-modems... The phrase 'contact your ISP' is in the text. This should be removed. Such contact is a traumatic exercise to which no users of ceroWRT should be subjected. I assume this is ISP contact is about the ADSL encapsulation; the alternatives are either trying to deduce this information from the del modems satays page (not all modems show this) from the ISPs website or by Googling, or empirically by measuring it. Calling the ISP should be the quickest… The ISPs I use have contract support, which changes. They work
Re: [Cerowrt-devel] Anything but AQM
Hi David, On Dec 20, 2013, at 22:22 , dpr...@reed.com wrote: Given that there is no likelihood of making localized queue management intelligent because it has no global information whatsoever, I strongly suggest that smart intelligent and even active are hugely misleading. They are based on a completely false premise - that queues should be allowed to build at all, and that local information can solve highly transient global problems. Dumb Queue Management is going to be far superior. Keep the queue at zero length, and try to be fair. Well, this thread is about the marketing of the concept, we assume that the solution Dave created by combining a number of great components works much better than what is out there. Now we want other people to want and get the fruits of Dave's work in their home router. So I for one assume a catchy name is going to help to make it popular. DQM will need another expansion than dumb QM to be a keeper... There's a simple way to do the latter - use a filter (similar to a Bloom filter) that captures recent/frequent users of the queue, and when the queue on an outbound link grows more than about 2-3 packets (double buffering is all you need to keep the link full) discard the most recent and frequent packets (or send information that tells them to slow down). For the bloom filter have a look at the BLUE aqm, but the way I understood the cam codel paper is that it basically does that keep the standing queue short while allowing some burstiness in the input... There's been a lot of wasted time and effort trying to build queues long enough so that you can be intelligent, but by then you have already lost the battle. You've gotten into a positive feedback loop where you have encouraged the endpoints to send more packets than you can ever drain out of the queue. I truly, truly do not understand why people don't look at realistic network loads and structures. On Friday, December 20, 2013 3:52pm, Rich Brown richb.hano...@gmail.com said: Dave, You wrote: What's in a name? AQM has been pretty thoroughly defined to equal active queue *length* management and not packet scheduling. Overloading AQM what cerowrt does is apt to cause even more confusion in the field than it already does. We discussed using LBO as a word but that appears hopelessly overloaded with leveraged buy out. I go back to one I liked a while back: Smart Queue Management. (SQM) This got dissed on the aqm list too, but so far a viable alternative TLA has not appeared. It's sufficiently different to hang a different definition off of (Smart queue management is an intelligent combination of better packet scheduling (flow queuing) techniques along with with active queue length management (aqm)”) and Any ideas for a name for packet scheduling, prioritization, and active queue management better than just AQM, or QoS? SQM Smarter Queue Management CeroShaper LBO Latency and Bandwidth Optimisation I was prepared to agree with “SQM”, and had written a long note (below) when my brain uttered “Intelligent Queue Management”. I’m not convinced that one is better than the other… Rich = The benefits of SQM == Wikipedia sez… SQM may refer to: - Sociedad Química y Minera de Chile - a Chilean mining and chemical enterprise - Software quality management - Spectrum quality management - Supplier Quality Management - Sensors Quality Management Inc. - provides unbiased evaluations of a company's operations relating to issues of quality, service, cleanliness and value - Sky Quality Meter, a device for measuring light pollution and also : São Miguel do Araguaia airport IATA code sqm may refer to : - square metre - Windows Live Messenger log file extension So it doesn’t appear that there are any seriously conflicting uses of that TLA… And I prefer “Smart Queue Management” to “Smarter Queue Management”. We’ll leave to someone else to go out on the weak branch and espouse “Smarter queue management” and “Smartest queue management”. (What comes after that? “Smart and a half”, Smart**2?) Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] CeroWrt 3.10 AQM page
Hi Rich, On Dec 20, 2013, at 22:25 , Rich Brown richb.hano...@gmail.com wrote: Folks, I have always hungered for a two-part entry for the up and download link speeds. It’s a little bit of a crock to make every customer break out their calculator to compute 95% (or 92% or whatever discount they wish to impose) and type in that computed number. I think I would prefer to let customers enter the link speed and the discount factor separately, and let the computer figure it out. Something like this photoshopped image: PastedGraphic-2.tiff This looks quite doable, if we go that route we definitely need independent toggles for up and down (at least as advanced options). What is harder is to actually report back the shaped rates in the GUI, at least with my level of expertise with the GUI and lua that is. Or even separate fudge factors for the Download and Upload stats - it would be cool if they were side-by-side, e.g. Download Speed (kbit/s)6500 95 Upload Speed (kbit/s) 700 95 That looks nice, no idea yet how to implement that though. @Sebastian: please don’t get heart failure :-) I have no idea whether the luci GUI toolbox gives you this kind of flexibility… Nor do I. Alas I am off for a two week family holiday away from home, so I will most likely not do any changes/prototyping before next year… Best Regards sebastian Best, Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt-3.10.24-5 dev build released
Dave Taht dave.t...@gmail.com wrote: On Fri, Dec 20, 2013 at 1:01 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, On Dec 20, 2013, at 19:01 , Dave Taht dave.t...@gmail.com wrote: I wanted to say how much I was enjoying catching up on this thread. I think only one question came up for me during it, which is support for a bfifo and pfifo qdisc? (if I missed something let me know ) Support for these are darn useful for the research and I have long meant to fold in the modified code I use for that. Byte limits are very common for cable and dsl technologies and doing tests with 64k,128k,256k, and 512k bfifos is quite revealing. (I have a ton of plots lying about for this, I should put them up somewhere) Sooo... I just checked in the limit stuff (untested) into aqm-scripts. It requires that the limit option be dynamic and exposed to the gui, and in the case of a bfifo is a byte limit rather than a packet limit. There needs to be sane values for limit clamped somehow, as 1000 bytes would be bad, and 512000 packets would be bad also. I just noticed we probably should go for ingress_Limit and egress_Limit as there are different in simple_qos.sh, I assume for a good reason… I am not huge on CamelCase or HungarianNotation, or iThinkThis, btw. The way I tend to think about things is that shell environment variables tend to be ALLCAPS, and that C and openwrt uci variables tend to be lowercase. I'm not big on under_scores as they are somewhat hard to see, and I'm not really sure what luci's PreferredSyntax (?) is. There are now several different styles running through the aqm-scripts Sorry, my bad. I do not really care that much, except I want expressive variable names which tend to be longish. And longish names do need some sort of separation of the parts INGRESSECN is harder to parse for than INGRESS_ECN and like wise for ingressecn and ingress_ecn, and camelcase serves the same purpose just uglier. But from now on I will follow your wishes. Best Sebastian But that said, yes, breaking apart the two limits for egress and ingress makes sense particularly for the byte limits, where you might be be emulating a dslam (64k bytes) on one side, and a dumb modem (256k bytes) on the other. Elsewhere, prior to now, the limits were there merely to keep memory usage under control. There is no need for 10k packets worth of buffering. There is not much need for more than 600 packets ever at the speeds we are running at today, and usually are in the dozens, so I'd defaulted to 600 packets on egress and 1000 on ingress as being big enough limits to nearly never hit on pie and fq_codel. I really do hate having more knobs that can be messed up. best sebastian As for folding the selection of bfifo or pfifo into the gui, it's not clear that we are doing researcher mode, vs mom mode in a suitably abstract way. Certainly I can imagine many a researcher wanting the gui. While I'm at it, there are some statistics like drops, and backlog, etc, that a gui-ish interface might help. polling tc -s qdisc show dev ge00 # and/or class show dev ge00 I am curious if anyone is seeing the DMA tx error in 3.10.24-5? I have one box that has now been up 4.4 days with no errors, but I haven't pushed it. I'll be beating it up through the weekend and taking a look at the gui work so far. Hi Dave, -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Proper AQM settings for my connection?
Hector Ordorica hechack...@gmail.com wrote: I'm running 3.10.13-2 on a WNDR3800, and have used the suggested settings from the latest draft: http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 I have a 30Mb down / 5Mb upload cable connection. With fq_codel, even undershooting network upload bandwidth by more than 95%, I'm seeing 500ms excessive upload buffering warnings from netalyzr. Download is ok at 130ms. I was previously on a 3.8 release and the same was true. So I have been fooled by netalyzr before just as you now. Netalyzr uses a very peculiar probe to measure the depth of the buffers: a totally nonreactive inelastic flood of UDP packets of relative short duration. The only real world traffic that looks like this is a denial of service attack on your router. Fq_codel tries very hard to be a good citizen that steers flows gently to their fair share of the bandwidth, in case flows do not react fq_codel will slowly take the gloves of so to say and restrict these flows more aggressively. The netalyzr probe now is too short for fq_codel to actually get serious in its packet dropping. Now real traffic typically, be it TCP or UDP tries to adjust to dropped packets by reducing the transmission rate. In other words netalyzt measures a sort of worst case buffering for fq_codel. Note for pfifo_fast this worst case is actually something you encounter with real traffic as well. So what netalyzr is missing is a report telling you whether th e reported buffering will increase the overall latency of the system, or not To summarize unless you see UDP floods as a typical use case for your internet connection, the netalyzr buffering numbers have no great significance for day to day use of your internet connection, if your are using a modern qdisc like fq_codel or pie. As Dave taught me in the past, you can easily test this hypothesis by modifying the limit parameter of fq_codel in simple, .qos or simplest.qos. The larger limit and the slower the link speed in the measured direction the greater the reported buffering. With pie (and default settings), the buffer warnings go away: http://n2.netalyzr.icsi.berkeley.edu/summary/id=43ca208a-32182-9424fd6e-5c5f-42d7-a9ea And the connection performs very well while torrenting and gaming. Should I try new code? Or can I tweak some variables and/or delay options in scripts for codel? Thanks for your work, Hector ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Hi Hector, -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Update to Setting up SQM for CeroWrt 3.10 web page. Comments needed.
Hi Rich, On Dec 28, 2013, at 15:27 , Rich Brown richb.hano...@gmail.com wrote: Hi Sebastian, I would love to comment further, but after reloading http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 just returns a blank page and I can not get back to the page as of yesterday evening… I will have a look later to see whether the page resurfaces… I’m not sure what happened to this page for you. It’s available now (at least to me) at that URL… Well, it is back for me as well, Rich So without much further ado... Queueing Discipline - the details... CeroWrt is the proof-of-concept for the CoDel and fq_codel algorithms that prevent large flows of data (downloads, videos, etc.) from affecting applications that use a small number of small packets. The default of fq_codel and the simple.qos script work very well for most people. [What are the major features of the simple.qos, simplest.qos, and drr.qos scripts?] simple.qos, has a shaper and three classes with different priorities simplest.qos has a shaper and just one class for all traffic drr.qos, no idea yet, I have not tested it nor looked at it closely Explicit Congestion Notification (ECN) is a mechanism for notifying a sender that its packets are encountering congestion and that the sender should slow its packet delivery rate. We recommend that you turn ECN off for the Upload (outbound, egress) direction, because fq_codel handles and drops packets before the bottleneck, providing the congestion signal to local senders. Well, we recommend to disable egress ECN as marked packets still need to go over the slow bottleneck link. Dropping these instead frees up the egress queue and will allow faster reactivity on the slow uplink. With a slow enough uplink, every packet counts... For the Download (inbound, ingress) link, we recommend you turn ECN on so that CeroWrt can inform the remote sender that it has detected congestion. The same signaling is achieved by dropping the packet and not sending an ACK packet for that data, but this takes a bit longer as it relays on some timer in the sender. [Is this still relevant? Arriving packets have already cleared the bottleneck, and hence dropping has no bandwidth advantage anymore. ] I think it still is relevant. If you make your own queue setup script, you can pass parameters to them using the Dangerous Configuration strings. The name forewarns you. Well the dangerous string is just appended to the tc command that sets up the queuing disciplines, so you can use this to modify the existing invocation, say by changing values away from implicit defaults. Like in Fred's case where he added target 25ms in the egress string to change the target from the 5ms default. 3. Link Layer Adaptation You must set the Link Layer Adaptation options correctly so that CeroWrt can perform its best with VoIP, gaming, and other protocols that rely on short packets. The general rule for selecting the Link Layer Adaption is: • If you use any kind of DSL/ADSL connection to the Internet (that is, if you get your internet service through the telephone line), you should choose the ATM item. ADSL is the keyword here, people on VDSL most likely will not need to set ATM, but ethernet. Leave the Per-packet Overhead set to zero. I know I am quite wobbly on this topic, but we should recommend to use 40 as default here if ATM was selected. • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link) You will have at least 8 byte overhead, probably more. Unfortunatelly I have no idea how to measure the overhead on non-ATM links. , PPPoATM, or bridging that isn’t Ethernet, you should choose [what?] and set the Per-packet Overhead to [what?] I have to pass, maybe someone with such a link can chime in here? Then again these setups should be rare enough to just punt (we could let the users know they are on their own and ask for the conclusion they reached to incorporate into the wiki). • If you use Ethernet, Cable modem, Fiber, or other kind of connection to the Internet, you should choose “none (default)”. The decision tree should be, you have no ATM carrier and you do not know of any per packet overhead you should select none. If you cannot tell what kind of link you have, first try the ATM choice and run the Quick Test for Bufferbloat. If the results are good, you’re done. This will not really work, on non-ATM links selecting ATM will overestimate the wire size of packets thereby retaining excellent latency even at high nominal shaped ratios (should even work well at 105% of link capacity). To really test for this we need a test that measures the link capacity for different packet sizes, but I digress. You can also try the other link layer
Re: [Cerowrt-devel] Update to Setting up SQM for CeroWrt 3.10 web page. Comments needed.
Hi Fred, On Dec 28, 2013, at 21:09 , Fred Stratton fredstrat...@imap.cc wrote: On 28/12/13 19:54, Sebastian Moeller wrote: Hi Fred, On Dec 28, 2013, at 15:27 , Fred Stratton fredstrat...@imap.cc wrote: On 28/12/13 13:42, Sebastian Moeller wrote: Hi Fred, On Dec 28, 2013, at 12:09 , Fred Stratton fredstrat...@imap.cc wrote: IThe UK consensus fudge factor has always been 85 per cent of the rate achieved, not 95 or 99 per cent. I know that the recommendations have been lower in the past; I think this is partly because before Jesper Brouer's and Russels Stuart's work to properly account for ATM quantization people typically had to deal with a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to stochastically account for the padding of the last cell and the per packet overhead both of which affect the effective good put way more for small than large packets, so the 85% never worked well for all packet sizes. My hypothesis now is since we can and do properly account for these effects of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I know the recommended fudge factors are never ever explained by more than this works empirically... The fudge factors are totally empirical. IF you are proposing a more formal approach, I shall try a 90 per cent fudge factor, although 'current rate' varies here. My hypothesis is that we can get away with less fudge as we have a better handle on the actual wire size. Personally, I do start at 95% to figure out the trade-off between bandwidth loss and latency increase. You are now saying something slightly different. You are implying now that you are starting at 95 per cent, and then reducing the nominal download speed until you achieve an unspecified endpoint. So I typically start with 95%, run RRUL and look at the ping latency increase under load. I try to go as high with the bandwidth as I can and still keep the latency increase close to 10ms (the default fq_codel target of 5ms will allow RTT increases of 5ms in both directions so it adds up to 10). The last time I tried this I ended up at 97% of link rate. Devices express 2 values: the sync rate - or 'maximum rate attainable' - and the dynamic value of 'current rate'. The actual data rate is the relevant information for shaping, often DSL modems report the link capacity as maximum rate attainable or some such, while the actual bandwidth is limited to a rate below what the line would support by contract (often this bandwidth reduction is performed on the PPPoE link to the BRAS). As the sync rate is fairly stable for any given installation - ADSL or Fibre - this could be used as a starting value. decremented by the traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor applied to that. I would like to propose to use the current rate as starting point, as 'maximum rate attainable' = 'current rate'. 'current rate' is still a sync rate, and so is conventionally viewed as 15 per cent above the unmeasurable actual rate. No no, the current rate really is the current link capacity between modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM stream, so we have to subtract all the additional layers until we reach the IP layer... You are saying the same thing as I am. I guess the point I want to make is that we are able to measure the unmeasurable actual rate, that is what the link layer adaptation does for us, if configured properly :) Best Regards Sebastian As you are proposing a new approach, I shall take 90 per cent of 'current rate' as a starting point. I would love to learn how that works put for you. Because for all my theories about why 85% was used, the proof still is in the (plum-) pudding... No one in the UK uses SRA currently. One small ISP used to. That is sad, because on paper SRA looks like a good feature to have (lower bandwidth sure beats synchronization loss). The ISP I currently use has Dynamic Line Management, which changes target SNR constantly. Now that is much better, as we should neuter notice nor care; I assume that this happens on layers below ATM even. The DSLAM is made by Infineon. Fibre - FTTC - connections can suffer quite large download speed fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon is not confined to ADSL links. On the actual xDSL link? As far as I know no telco actually uses SRA (seamless rate adaptation or so) so the current link speed will only get lower not higher, so I would expect a relative stable current rate (it might take a while, a few days to actually slowly degrade to the highest link speed supported under all conditions, but I hope you still get my point) I understand
Re: [Cerowrt-devel] SQM Question #1: How does SQM shape the ingress?
Rich Brown richb.hano...@gmail.com wrote: As I write the SQM page, I find I have questions that I can’t answer myself. I’m going to post these questions separately because they’ll each generate their own threads of conversation. QUESTION #1: How does SQM shape the ingress? I know that fq_codel can shape the egress traffic by discarding traffic for an individual flow that has dwelt in its queue for too long (greater than the target). Other queue disciplines use other metrics for shaping the outbound traffic. But how does CeroWrt shape the inbound traffic? (I have a sense that the simple.qos and simplest.qos scripts are involved, but I’m not sure of anything beyond that.) So ingress shaping conceptually works just as egress shaping. The shaper accepts packets at any speed from both directions but limits the speed used for transmitting them. So if your ingress natural bandwidth would be 100Mbit/s you would set the shaper to say 95Mbit/s, so the shaper will create an internal artificial bottleneck just in front of its queue, so that it can control the critical queue. Technically, this works by creating an artificial intermediate functional block device? (IFB), moving all ingress traffic to this device and setting up classification and shaping on that device. I hope this helps... Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Hi Rich, -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #2: How does CeroWrt use info gleaned from the link layer adaptation?
Rich Brown richb.hano...@gmail.com wrote: QUESTION #2: How does CeroWrt use info gleaned from the link layer adaptation? The link layer adaptations work in correcting the kernels estimate of a packets behavior on the wire. In the tc_stab case the kernel calculates the effective size of the packet on the wire, that is it pretends the packet is larger than it really is, so for a given bandwidth it estimates the correct time it takes for that packet to be actually transmitted. In the htb_private case the kernel keeps the packet's size (more or less) intact but adjusts its estimate of the packets transmit rate. Both methods boil down to the same idea, make sure the packet scheduler will only send packet N+1 after packet N has just cleared the wire. Specifically, the link layer adaptation all seem to be designed to compute the actual time it takes to transmit a packet, accounting for Ethernet PPPoE header bytes, other overhead, and ATM 48-in-53 framing. And the annoying size dependent padding of the last ATM cell. How does CeroWrt use this time calculation? Does it simply make sure that the target time doesn’t get too low for a particular flow’s queue? Thanks to the link layer adjustments (lla) cero now estimates the correct time each packet takes and will not send any faster than the shaped rate allows. If no lla is performed cero would overestimate the link capacity, send more than expected and potentially fill the modems bloated buffers. Traditionally people tried to reduce their shaped rate by 10% to at least account for the 48 in 53 framing, but failed miserably for small packets since overhead and padding can more than double the wire size of a packet. Note that ACQ packets typically are small as are voice over IP packets. I hope this helps Sebastian (I could imagine that a short packet over ATM would take 2x the (naive) expected/calculated time for a packet of that length, and that flow would be penalized. Is there more to it?) ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Hi Rich -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #1: How does SQM shape the ingress?
looking at the code last year I figured it wouldn't correctly detect inbound problems on cable in particular, but I think I have a filter for that now) I would really like to get away from requiring a measurement from the user and am willing to borrow ideas from anyone. http://gargoylerouter.com/phpbb/viewtopic.php?f=5t=2793start=10 The gargoyle approach is to monitor the CMTS queue by sending periodic ping probes and adjusting its ingress shaping to keep the CMTS queue short. This relies on the CMTS being dumb, any ICMP slow-pathing or flow based queueing will throw a wrench into ACC as far as I understand it. Incidentally I liked gargoyle when I tried it. Those of you that have secondary routers here might want to give it a go. (They did sfqred briefly then went back to sfq + acc) In reading what I just wrote I'm not sure how to make any of this clear to mom. Scheduling granularity? Mom typically will not use openWRT let alone ceroWRT :) as much as I dispose the unfairness of it, this will only become really useful once commercial home-router manufacturers/programmers will include something similar in their products... I like what sebastian wrote below, but I think a picture or animation would make it clearer. On Sat, Dec 28, 2013 at 11:23 PM, Sebastian Moeller moell...@gmx.de wrote: Rich Brown richb.hano...@gmail.com wrote: As I write the SQM page, I find I have questions that I can’t answer myself. I’m going to post these questions separately because they’ll each generate their own threads of conversation. QUESTION #1: How does SQM shape the ingress? I know that fq_codel can shape the egress traffic by discarding traffic for an individual flow that has dwelt in its queue for too long (greater than the target). Other queue disciplines use other metrics for shaping the outbound traffic. But how does CeroWrt shape the inbound traffic? (I have a sense that the simple.qos and simplest.qos scripts are involved, but I’m not sure of anything beyond that.) So ingress shaping conceptually works just as egress shaping. The shaper accepts packets at any speed from both directions but limits the speed used for transmitting them. So if your ingress natural bandwidth would be 100Mbit/s you would set the shaper to say 95Mbit/s, so the shaper will create an internal artificial bottleneck just in front of its queue, so that it can control the critical queue. Technically, this works by creating an artificial intermediate functional block device? (IFB), moving all ingress traffic to this device and setting up classification and shaping on that device. I hope this helps... Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Hi Rich, -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #3: How shall we recommend people set their upload/download speeds?
Hi list, On Dec 29, 2013, at 09:53 , Dave Taht dave.t...@gmail.com wrote: On Sat, Dec 28, 2013 at 8:33 PM, Rich Brown richb.hano...@gmail.com wrote: QUESTION #3: How shall we recommend people set their upload/download speeds? Although we have already spent a lot of time on the list batting around ways to think about this, it seems to me that there are only two choices for recommendations, especially given that most people looking at CeroWrt are in a “TL;DR” mind set: What does TL;DR mean? Too Long; Didn't Read... 1) If you don’t have an accurate sense of your actual link speed (e.g., you haven’t done a link speed measurement), you should take your provider’s published specs, knock them down by 15%, and enter those values in the SQM Basic Settings tab. [Or should they take an additional 15% off that already reduced value?] 92% on the up, 85% on the down are starting points. 2) If you have done measurements of your link speed, you should enter values that are 95% of each direction’s measured speed. Is this the right recommendation? That is what I thought, but it seems 95% was too optimistic… I have found that ShaperProbe's (http://www.measurementlab.net/tools/shaperprobe) capacity estimate is pretty good at least for the low speeds I could test. A short rant on the inadaquacy of speedtest and other tests would be nice. Netanalyzr is the closest thing to a good test these days but it requires java and is inaccurate above 20mbit. The problem is that no one really wants to supply the large amount of bandwidth required that everyone can max out their download long enough for a reasonable test; this is why an often read recommendation is to use curl or wget to concurrently down or up load large files from several beefy servers, to actually be able to assess the local link capacity. The only redeeming feature of seediest and friends is, that underestimating the available link capacity will lead to decent latency and lean buffers :). best Sebastian NB: In the Details… section, we can recommend ways to measure current link speeds, encourage people to make the measurement during quiet times, link to the “Quick test for Bufferbloat” page, etc. for those who want to dig further. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #2: How does CeroWrt use info gleaned from the link layer adaptation?
Hi Dave, On Dec 29, 2013, at 09:54 , Dave Taht dave.t...@gmail.com wrote: I would like it if we had a couple per-provider recomendations and relevant discussion. I think this is a can of worms we should open carefully ;). In Germany several providers serve part of their area with their own infrastructure while reselling bitstream access provided by other carriers in other parts of their area, with potential differences in encapsulation. Also in the US att is in the process of relabeling their ADSL2 access as U-Verse High Speed Internet (the same label they use for VDSL as well) making it quite complicated to give proper recommendations… But to make a start here are only connections I actually measured (there is no guarantee that all connections of the same ISP use the same technology): DateCountry ISP nominal Speed D/U line type link layer access type overheadcomment 2013-12-29 Germany Deutsche Telekom16Mbit/s 2.8Mbit/s ADSL2+ ATM PPPoE, LLC/SNAP RFC-2684 40bytes 16M down also offered as VDSL version without ATM 2013-12-29 Germany Deutsche Telekom 2Mbit/s 0.2Mbit/s ADSL1 ATM PPPoE, LLC/SNAP RFC-2684 40bytes connections below 16M down are always ADSL, either v1 or v2+ 2013-12-29 Germany Netaachen 18Mbit/s 1.0Mbit/s ADSL2+ ATM PPPoE, LLC/SNAP RFC-268440bytes General Notes: Deutsche Telekom; the network is slowly moved to fiber to the node/curb using VDSL2 (PTM) for speeds = 16M down and ADSL2+ (ATM) for slower speeds . These new DSLAMs replacements are called MSANs, customers on ADSL ports are still using ATM on the last mile and need the link layer adjustments. Best Regards Sebastian On Sat, Dec 28, 2013 at 11:36 PM, Sebastian Moeller moell...@gmx.de wrote: Rich Brown richb.hano...@gmail.com wrote: QUESTION #2: How does CeroWrt use info gleaned from the link layer adaptation? The link layer adaptations work in correcting the kernels estimate of a packets behavior on the wire. In the tc_stab case the kernel calculates the effective size of the packet on the wire, that is it pretends the packet is larger than it really is, so for a given bandwidth it estimates the correct time it takes for that packet to be actually transmitted. In the htb_private case the kernel keeps the packet's size (more or less) intact but adjusts its estimate of the packets transmit rate. Both methods boil down to the same idea, make sure the packet scheduler will only send packet N+1 after packet N has just cleared the wire. Specifically, the link layer adaptation all seem to be designed to compute the actual time it takes to transmit a packet, accounting for Ethernet PPPoE header bytes, other overhead, and ATM 48-in-53 framing. And the annoying size dependent padding of the last ATM cell. How does CeroWrt use this time calculation? Does it simply make sure that the target time doesn’t get too low for a particular flow’s queue? Thanks to the link layer adjustments (lla) cero now estimates the correct time each packet takes and will not send any faster than the shaped rate allows. If no lla is performed cero would overestimate the link capacity, send more than expected and potentially fill the modems bloated buffers. Traditionally people tried to reduce their shaped rate by 10% to at least account for the 48 in 53 framing, but failed miserably for small packets since overhead and padding can more than double the wire size of a packet. Note that ACQ packets typically are small as are voice over IP packets. I hope this helps Sebastian (I could imagine that a short packet over ATM would take 2x the (naive) expected/calculated time for a packet of that length, and that flow would be penalized. Is there more to it?) ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel Hi Rich -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] been writing all night
Hi Dave, On Dec 29, 2013, at 14:28 , Dave Taht dave.t...@gmail.com wrote: it is now 5:26 am. I have not had an all night writing or coding binge since I quit smoking back in july. I bought a pack this afternoon. It turns out that chain-smoking has benefits to my writing process... I revised the aqm page http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 Great, some comments on the link layer situation: 3. Link Layer Adaptation You must set the Link Layer Adaptation options correctly so that CeroWrt can perform its best with VoIP, gaming, and other protocols that rely on short packets. The general rule for selecting the Link Layer Adaption is: • If you use any kind of DSL/ADSL connection to the Internet (that is, if you get your internet service through the telephone line), you should choose the ATM item. This should read if you use any kind of ADSL line you should use ATM, VDSL users should select ethernet (VDSL retained the ability to be deployed over ATM, but to my knowledge it typically uses PTM, a much saner transport layer for data) Leave the Per-packet Overhead set to zero. For ATM, getting the overhead too small will on average create one half ATM cell of padding per packet size that is not accounted for by HTB, getting it to large will make HTB over judge the transmission time on the wire, wasting a bit of bandwidths (again by half a cell per packet size). Underestimating the overhead has a worse effect than over judging it, so I vote for changing our default to an overhead of 40bytes; with 44 being the absolute maximum and rare, and being 40 the second largest and as far as I know the most likely with PPPoE, LLC/SNAP, RFC2684 encapsulation. (Rant: From a user perspective IP over ATM would be the best, as it makes more of the payed-for link speed useable for actual traffic). For non-ATM links, often telcos still use PPPoE (in Germany for VDSL2, and even fiber GPON) so a small overhead of 8 bytes, I think would be applicable. But given the typical fiber and VDSL2 link rates, misjudging that will not have a really bad effect. (With ATM we have the fact that the actual overhead is way larger and that it might drag in an additional almost empty padded ATM cell.) Now, the recommendations in the wiki, should contain the information, that with VDSL there is the slight chance of ATM encapsulation and give a hint of how to diagnose that. -- dtaht -- I am so unable to parse the huge email thread on the DSL issue. • If you use Ethernet, Cable modem, Fiber, or other kind of connection to the Internet, you should choose “none (default)”, and move on. Even though there might be a small overhead on all of those, I think this is sound advise to keep things simple. To complicate things further VDSL1 uses HDLC as link layer which would be quite nasty to handle (HDLC wire size is not simply size dependent as in ATM, no it is actually data dependent, with worst case 2fold increase from data size to wire size, that would require to actually search each data packet for occurrence of octets that will be escaped on the wire) if VDSL1 had a significant deployment, that is... • [What is the proper description here?] If you use PPPoE (but not over ADSL/DSL link), Select ethernet and specify a proper overhead, I assume 8 bytes PPPoATM, You are on ATM, hence enable ATM, 40 bytes overhead will waste some bandwidth but retain latency. or bridging that isn’t Ethernet, I have to pass, no idea... you should choose [what?] and set the Per-packet Overhead to [what?] If you cannot tell what kind of link you have, first try the ATM choice and run the Quick Test for Bufferbloat. If the results are good, you’re done. You can also try the other link layer adaptations to see which performs better. Mmmh, the ATM link layer adjustments will work on all underlaying carriers, as it will effectively just estimate a wire transmit time ethernet transmit time. So on non-ATM that just results in more bandwidth wasted, latency stays well. The proof is rather the other way around only with link layer ATM, people on ATM link will be able to set the shaped rates to around 90% at all. Unfortunately this effect is most pronounced for small packet sizes and we have no easy way for people to test the performance with small packets… (that said, maybe reducing the MTU in the router might work, I need to test this...) (which I'll rename and crosslink to a few other places) and wrote http://www.bufferbloat.net/projects/cerowrt/wiki/Wondershaper_Must_Die Nice. in addition to all the other emails that came out of me today. I was unaware btw, that shaperprobe had found a home at mlabs. I've been shipping shaperprobe in cerowrt since the bismark days, so perhaps with an update to that and some code to
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi Rich, first attempt to send this was lost somehow…, so a re-send On Jan 4, 2014, at 19:16 , Rich Brown richb.hano...@gmail.com wrote: QUESTION #5: I still don’t have any great answers for the Link Layer Adaptation overhead descriptions and recommendations. In an earlier message, (see https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-December/001914.html and following messages), Fred Stratton described the overheads carried by various options, and Sebastian Moeller also gave some useful advice. After looking at the options, I despair of giving people a clear recommendation that would be optimal for their equipment. Consequently, I believe the best we can do is come up with “good enough” recommendations that are not wrong, and still give decent performance. Not wanting to be a spoilsport, but IMHO the issue is complicated hence no simple recommendations. I know that my last word was that 40bytes would be a good default overhead, but today I had the opportunity to measure the overhead on fast ADSL connection in Luxembourg and found that in this double-play situation (television and internet via DSL) that an other wise invisible VLAN was further increasing the overhead (from the 40 expected to 44 bytes). At least on faster links these combo packets (internet, phone and potentially telephone) are becoming more and more common, so maybe the recommendation should be 44 (hopping that FCS are truly rare). In this spirit, I have changed Draft #3 of the “Setting up SQM” page to reflect this understanding. See http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 ADSL/ATM link: Choose “ADSL/ATM, and set Per Packet Overhead to 40 While I prefer ATM, I think all deployed ADSL is on ATM so these are synonyms for our purpose. I prefer ATM since the most critical part of the link layer adjustments is caused by the impedance mismatch between what ATM offers and what the data transport layer requires. I have the impression that ADSL might still evolve to a different carrier, while ATM is basically in maintenance mode (not much new deployment if any). VDSL2 link: Choose “VDSL”, and set Per Packet Overhead to 8 There are several issues with this; VDSL is not the direct predecessor of VDSL2 (rather VDSL2 is the successor of ADSL2+ with some similarities to VDSL). Lumping VDSL with VDSL2 will require us figuring out whether both behave the same. From my cursory reading of the standards of both I think VDSL is not unlikely to be using an ATM link layer, VDSL2 is unlikely to do the same, both seem technically able to use ATM. Other kind of link (e.g., Cable, Fiber, Ethernet, other not listed): Choose “None (default)”, and set Per Packet Overhead to 0 This is not going to be worse than today, so sounds fine (it would be good to know whether there is truly no overhead on these links in practical useage). Quick vote: anyone on this list using ceroWRT on an VDSL/VDSL2 link or cable fiber whatnot that could do some quick testing for us? NB: I have changed the first menu choice to “ADSL/ATM” and the second to “VDSL” in the description. I am fine with changing names, just see what the consensus is for the names. I would ask that we change to GUI to reflect those names as well. This makes it far easier/less confusing to talk about the options. Agreed. As always, I welcome help in setting out clear recommendations that work well for the vast majority of people who try CeroWrt. Thanks. I guess, we need a new wiki page detailing the procedure to figure out the link layer (and overhead if on ATM). Best Sebastian Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi Dave, On Jan 6, 2014, at 04:46 , Dave Taht dave.t...@gmail.com wrote: On Sat, Jan 4, 2014 at 12:22 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Rich, On Jan 4, 2014, at 19:16 , Rich Brown richb.hano...@gmail.com wrote: QUESTION #5: I still don’t have any great answers for the Link Layer Adaptation overhead descriptions and recommendations. In an earlier message, (see https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-December/001914.html and following messages), Fred Stratton described the overheads carried by various options, and Sebastian Moeller also gave some useful advice. After looking at the options, I despair of giving people a clear recommendation that would be optimal for their equipment. Consequently, I believe the best we can do is come up with “good enough” recommendations that are not wrong, and still give decent performance. Not wanting to be a spoilsport, but IMHO the issue is complicated hence no simple recommendations. I know that my last word was that 40bytes would be a good default overhead, but today I had the opportunity to measure the overhead on fast ADSL connection in Luxembourg and found that in this double-play situation (television and internet via DSL) that an other wise invisible VLAN was further increasing the overhead (from the 40 expected to 44 bytes). At least on faster links these combo packets (internet, phone and potentially telephone) are becoming more and more common, so maybe the recommendation should be 44 (hopping that FCS are truly rare). In this spirit, I have changed Draft #3 of the “Setting up SQM” page to reflect this understanding. See http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 ADSL/ATM link: Choose “ADSL/ATM, and set Per Packet Overhead to 40 While I prefer ATM, I think all deployed ADSL is on ATM so these are synonyms for our purpose. I prefer ATM since the most critical part of the link layer adjustments is caused by the impedance mismatch between what ATM offers and what the data transport layer requires. I have the impression that ADSL might still evolve to a different carrier, while ATM is basically in maintenance mode (not much new deployment if any). VDSL2 link: Choose “VDSL”, and set Per Packet Overhead to 8 There are several issues with this; VDSL is not the direct predecessor of VDSL2 (rather VDSL2 is the successor of ADSL2+ with some similarities to VDSL). Lumping VDSL with VDSL2 will require us figuring out whether both behave the same. From my cursory reading of the standards of both I think VDSL is not unlikely to be using an ATM link layer, VDSL2 is unlikely to do the same, both seem technically able to use ATM. Other kind of link (e.g., Cable, Fiber, Ethernet, other not listed): Choose “None (default)”, and set Per Packet Overhead to 0 This is not going to be worse than today, so sounds fine (it would be good to know whether there is truly no overhead on these links in practical useage). Well, I just spent a painful hour trying to find an optimum for the device on my mom's (cable) network, which I plan to replace with one that is ipv6 compatible shortly. (I am settled in one place for a while, so may setup a local dhcpv6-pd server, too - but my primary mission is to get a test suite going) Before: http://snapon.lab.bufferbloat.net/~cero2/taht_mahal/taht_house_comcast_long-3.svg (note it took a long time to fully saturate the buffers as the test box was 50ms away. I usually test with one 16ms away) After: http://snapon.lab.bufferbloat.net/~cero2/taht_mahal/fq_codel_30_8_taht_house_comcast-long.svg The upload value isn't making that much sense, and I lost a lot of throughput. Variety of other tests in that dir. Quick vote: anyone on this list using ceroWRT on an VDSL/VDSL2 link or cable fiber whatnot that could do some quick testing for us? There doesn't appear to be any need for overhead related settings on cable. There might be some use on docsis 3 in changing the htb burst value. That is what I thought before as well, nowadays I am not so sure. Our problem is that we need to get a reliable estimate of the IP bandwidth between modem and CTMS, so we can reliably avoiding filling the modems buffers. The catch is that the advertised speed typically is given in some raw reference frame not close enough to what a user/router typically can handle out of the box. With cable I am confident that there are some packet headers that make sure only one modem picks up the data (and from the euphemistic bandwidth descriptions on the DSL side I am quite confident that this is in-band not out-of-band with an independent bandwidth supply.) Dave, would you be willing to collect a ping sample on your cable link, so I could use this as control for my ATM detector code? As an example
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi Fred, On Jan 6, 2014, at 10:52 , Fred Stratton fredstrat...@imap.cc wrote: I have been operating the latest build with 6relayd disabled. The henet /48 I have been allocated is subnetted correctly, presumably by dnsmasq. I adopted the suggestions to use nfq_codel and an egress target of 25ms , with an overhead of 40 on a PPPoE connection. I chose to watch the first 2 episodes of the 3 part third series of 'Sherlock', live on iPlayer, and these streamed correctly and uninterrupted for 90 minutes. This was not previously possible. (Quite whether they were up to the standard of previous episodes is another matter.) I can watch iPlayer with little stutter whilst downloading Arch Linux by torrent, downloading other files at the same time. So, for a relatively slow ADSL2+ line, the current build works well. Out of curiosity, to what percentage of the current line rate (you know the one reported by your modem) you shaped up- and downlink? And in case you have too much time on your hand, how does the same feel with an overhead of 10 (to see how bad an overhead underestimate would feel for a user), since you currently happen to have a quite sensitive subjective latency evaluation system set up :)… Best Regards Sebastian On 06/01/14 03:29, Dave Taht wrote: On Sat, Jan 4, 2014 at 10:40 AM, Fred Stratton fredstrat...@imap.cc wrote: Link Names: For consistency, if ADSL is used as a portmanteau term, them VDSL should be used as the equivalent for VDSL and VDSL2. CeroWRT has to decide whether it is an experimental build, or something that will eventually be used in production, so these decisions can be made consistently. Well, what I was aiming for was for us to get the sqm scripts and gui up to where they were better than the standard openwrt qos scripts and then push them up to openwrt to where they could be more widely deployed. Aside from being able to dynamically assign priorities in the gui, we are there. Except that nfq_codel is currently getting better results than fq_codel at low bandwidths, and I'm tempted to pour all of simple.qos into C. As for cero's future - certainly since all the snowden revelations I've been going around saying that friends don't let friends run factory firmware. I would like a stable build of sqm and cerowrt to emerge, and to then go off and work on improving wifi. Regrettably what seems to be happening is more backwards than forwards on the former, and ramping up on the ath9k and ath10k is taking more time than I'd like, and it seems likely I'll be working on those primarily on another platform and only eventually pushing the results out to cero, mainline kernel So it's still at the keep plugging away point for sqm, ipv6, cero in general, with the stable release always just out of sight. Tackling the ipv6 problem is next on my agenda on cero, and getting a test suite going is next on my day job. I concur with your ADSL setup suggestion as default. I have been running the Sebastian Moeller ping script overnight to calculate ADSL overhead for the last several days. After several hours of curve fitting using Octave, an overhead result is displayed. This novel approach works well. It would be nice to get to where we could autoconfigure a router using tools like these with no human intervention. This includes bandwidth estimation. The overhead for the particular setup I use was 40 for PPPoE, and 10 for PPPoA. The default you suggest is a suitable starting point, I suggest. On 04/01/14 18:16, Rich Brown wrote: QUESTION #5: I still don’t have any great answers for the Link Layer Adaptation overhead descriptions and recommendations. In an earlier message, (see https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-December/001914.html and following messages), Fred Stratton described the overheads carried by various options, and Sebastian Moeller also gave some useful advice. After looking at the options, I despair of giving people a clear recommendation that would be optimal for their equipment. Consequently, I believe the best we can do is come up with “good enough” recommendations that are not wrong, and still give decent performance. In this spirit, I have changed Draft #3 of the “Setting up SQM” page to reflect this understanding. See http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 ADSL/ATM link: Choose “ADSL/ATM, and set Per Packet Overhead to 40 VDSL2 link: Choose “VDSL”, and set Per Packet Overhead to 8 Other kind of link (e.g., Cable, Fiber, Ethernet, other not listed): Choose “None (default)”, and set Per Packet Overhead to 0 NB: I have changed the first menu choice to “ADSL/ATM” and the second to “VDSL” in the description. I would ask that we change to GUI to reflect those names as well. This makes it far easier/less confusing to talk about the options. As always, I
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi Fred, On Jan 6, 2014, at 15:22 , Fred Stratton fredstrat...@imap.cc wrote: The line rate is 11744/1022 kb/s, but changes moment to moment. SNR is 12.1 decibel. I am using 11000/950 kb/s as settings. So 100 * 11000 / 11744 = 93.66% of downlink line rate and 100* 950 / 1022 = 92.95 % of uplink line rate; quite impressive given the common wisdom of 85% :). I shall try your suggestion when there is something worth watching live, to provide a valid comparison, which may not be before 21:30 CET on Sunday. Oh, take your time, this is really not essential, butit would be a nice data point for figuring out how important the correct overhead estimate really is in real life, theory being theory and all… Best Regards Sebastian On 06/01/14 14:12, Sebastian Moeller wrote: Hi Fred, On Jan 6, 2014, at 10:52 , Fred Stratton fredstrat...@imap.cc wrote: I have been operating the latest build with 6relayd disabled. The henet /48 I have been allocated is subnetted correctly, presumably by dnsmasq. I adopted the suggestions to use nfq_codel and an egress target of 25ms , with an overhead of 40 on a PPPoE connection. I chose to watch the first 2 episodes of the 3 part third series of 'Sherlock', live on iPlayer, and these streamed correctly and uninterrupted for 90 minutes. This was not previously possible. (Quite whether they were up to the standard of previous episodes is another matter.) I can watch iPlayer with little stutter whilst downloading Arch Linux by torrent, downloading other files at the same time. So, for a relatively slow ADSL2+ line, the current build works well. Out of curiosity, to what percentage of the current line rate (you know the one reported by your modem) you shaped up- and downlink? And in case you have too much time on your hand, how does the same feel with an overhead of 10 (to see how bad an overhead underestimate would feel for a user), since you currently happen to have a quite sensitive subjective latency evaluation system set up :)… Best Regards Sebastian On 06/01/14 03:29, Dave Taht wrote: On Sat, Jan 4, 2014 at 10:40 AM, Fred Stratton fredstrat...@imap.cc wrote: Link Names: For consistency, if ADSL is used as a portmanteau term, them VDSL should be used as the equivalent for VDSL and VDSL2. CeroWRT has to decide whether it is an experimental build, or something that will eventually be used in production, so these decisions can be made consistently. Well, what I was aiming for was for us to get the sqm scripts and gui up to where they were better than the standard openwrt qos scripts and then push them up to openwrt to where they could be more widely deployed. Aside from being able to dynamically assign priorities in the gui, we are there. Except that nfq_codel is currently getting better results than fq_codel at low bandwidths, and I'm tempted to pour all of simple.qos into C. As for cero's future - certainly since all the snowden revelations I've been going around saying that friends don't let friends run factory firmware. I would like a stable build of sqm and cerowrt to emerge, and to then go off and work on improving wifi. Regrettably what seems to be happening is more backwards than forwards on the former, and ramping up on the ath9k and ath10k is taking more time than I'd like, and it seems likely I'll be working on those primarily on another platform and only eventually pushing the results out to cero, mainline kernel So it's still at the keep plugging away point for sqm, ipv6, cero in general, with the stable release always just out of sight. Tackling the ipv6 problem is next on my agenda on cero, and getting a test suite going is next on my day job. I concur with your ADSL setup suggestion as default. I have been running the Sebastian Moeller ping script overnight to calculate ADSL overhead for the last several days. After several hours of curve fitting using Octave, an overhead result is displayed. This novel approach works well. It would be nice to get to where we could autoconfigure a router using tools like these with no human intervention. This includes bandwidth estimation. The overhead for the particular setup I use was 40 for PPPoE, and 10 for PPPoA. The default you suggest is a suitable starting point, I suggest. On 04/01/14 18:16, Rich Brown wrote: QUESTION #5: I still don’t have any great answers for the Link Layer Adaptation overhead descriptions and recommendations. In an earlier message, (see https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-December/001914.html and following messages), Fred Stratton described the overheads carried by various options, and Sebastian Moeller also gave some useful advice. After looking at the options, I despair of giving people a clear recommendation that would be optimal for their equipment. Consequently, I believe the best we can do
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi Dave, thanks a lot for the explanation. On Jan 6, 2014, at 16:03 , Dave Taht dave.t...@gmail.com wrote: On Jan 6, 2014 5:56 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, hi List, On Jan 6, 2014, at 04:29 , Dave Taht dave.t...@gmail.com wrote: On Sat, Jan 4, 2014 at 10:40 AM, Fred Stratton fredstrat...@imap.cc wrote: Link Names: For consistency, if ADSL is used as a portmanteau term, them VDSL should be used as the equivalent for VDSL and VDSL2. CeroWRT has to decide whether it is an experimental build, or something that will eventually be used in production, so these decisions can be made consistently. Well, what I was aiming for was for us to get the sqm scripts and gui up to where they were better than the standard openwrt qos scripts and then push them up to openwrt to where they could be more widely deployed. Aside from being able to dynamically assign priorities in the gui, we are there. Except that nfq_codel is currently getting better results than fq_codel at low bandwidths, and I'm tempted to pour all of simple.qos into C. Since you wore nfq_codel, what is the secret sauce here? 1) It uses a 'tighter' version of Codel than what is currently in Linux. It doesn't work as well on longer rtts but holds down queue lengths at shorter rtts better and responds quicker than normal codel. This is a slightly more expensive version of codel too in that it uses two invsqrt via newtons method to get more accurate results. 2) it rotates the flow list more like how sfq does yielding better mixing which leads to higher survival rates for sparse flows and more balance across all flows. (This is a one line change to fq-codel) At higher bandwidths (say, 50mbit) being more drr like (fqcodel) actually tends to do better than sfqlike as bunching up some packet deliveries makes hosts respond quicker. 3) common to all the codels in this was elimination of the maxpacket check which mildly increases drop probability. Compared to the orders of magnitude we already get from fq codel the sum benefit of these fixes is in the very small percentage points. Without an extensive testing and simulation campaign I've been reluctant to attempt pushing them upstream. What I have mostly thought about instead was bundling up simple.QoS into c (call it cake or broadbandeq), Using these mods, adding in fixes for things that are hard now, like full diffserv support and something lighter than htb. But enotime, funding etc. Until 3 hit seeing benefit from nfqcodel was even harder to see, and I'd like to drop out 3 and revisit the data to see if the improvement is a chimera or not. In case 3) and potentially 2 are the critical parts, do you see a chance of getting these included upstream as part of fq_codels that need special tc options to trigger? I assume 1 to be larger and potentially harder to sell upstream (then again since codel is intended to run knob-free, maybe even adding 2 and 3 is controversial...). Best Regards Sebastian As for cero's future - certainly since all the snowden revelations I've been going around saying that friends don't let friends run factory firmware. I would like a stable build of sqm and cerowrt to emerge, and to then go off and work on improving wifi. Regrettably what seems to be happening is more backwards than forwards on the former, and ramping up on the ath9k and ath10k is taking more time than I'd like, and it seems likely I'll be working on those primarily on another platform and only eventually pushing the results out to cero, mainline kernel So it's still at the keep plugging away point for sqm, ipv6, cero in general, with the stable release always just out of sight. Tackling the ipv6 problem is next on my agenda on cero, and getting a test suite going is next on my day job. Any further hints on the nature of your day job possible :) I concur with your ADSL setup suggestion as default. I have been running the Sebastian Moeller ping script overnight to calculate ADSL overhead for the last several days. After several hours of curve fitting using Octave, an overhead result is displayed. This novel approach works well. It would be nice to get to where we could autoconfigure a router using tools like these with no human intervention. This includes bandwidth estimation. I fully agree that it would be nice. Also it ail e hard unless we take control over the actual bottleneck link… With DSL connections, the DSL modem knows a lot about the link properties, if the modem would be onboard we could programmatically as about bandwidth and encapsulation; for the more typical case with an independent modem or even modem router we have no clear path accessing that information. With cable I have even less hope
[Cerowrt-devel] regarding ECN
Hi Rich, I had a quick look at the current state of the page and I really like what you created there. I noticed that for ECN you argue: For the Download (inbound, ingress) link, we recommend you turn ECN on so that CeroWrt can inform the local receiver that it has detected congestion without losing a packet. We might want to argue for ingress that the packet has travelled most of its path already, dropping it would require some timer to expire while marking with ECN will get the congestion throttle back signal faster to our end point and hence allow our systems to react to congestion more promptly. best sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi David, On Jan 7, 2014, at 13:11 , David Personette dper...@gmail.com wrote: I was going to test the recommended bridge settings for overhead (32 IIRC), because as far as I can tell there is no PPPoE involved. I've never seen it in the modems config (in the brief period it has an IP before I put it in bridge mode as well so the routable IP goes to my actual router), or needed to configure it on my router. Ah, so there are 2 major variations of bridged: 1) LLC/SNAP: Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding) 2) VC-MUX: Bridged - 24 (ATM - 10, ethernet 14, possibly FCS - 4+padding) (he FCS padding potentially turns this into 4 variations, but it should be really rare, or so I heard). You could just slowly reduce the overhead and see how the link behaves; honestly I do not know how prominent a slight overhead underestimate would feel, so by all means go ahead and try :). If you have a mac or linux computer on your network, you could try to measure the overhead with the attached ping_sweeper5_dp.sh script (needs editing). Then you could run tc_stab_parameter_guide_04.m in matlab or octave (on the matlab command prompt change into the directory containing the script and the log file run [ tmp ] = tc_stab_parameter_guide_04( fullfile(pwd, 'ping_sweep_ADSL2_20140104_122844.txt')) ; make sure to replace ping_sweep_ADSL2_20140104_122844.txt with the name of your log file. The measurement will take around 3 hours (for 1 samples per size, for your link 1000 would be enough) and wants an undisturbed network (I typically run this over night); the parsing of the log file will also consume 20 minutes or more, the actual analysis will take a few seconds… If you go that route I would love it if you could share your log file, since I only have one old bridged LLC/SNAP example. (I intend to put all scripts and an instruction on the wiki, with example plots for the different results). Best Regards Sebastian tc_stab_parameter_guide_04.m Description: Binary data ping_sweeper5_dp.sh Description: Binary data I am seeing my effective bandwidth be higher by about 50/KBs on downloads. On Netflix, my Roku used to try HD upon starting playback then (after 20-30 seconds thinking about it) fail back to SD, but now the HD streams are working flawlessly for hours. -- David P. On Tue, Jan 7, 2014 at 6:34 AM, Sebastian Moeller moell...@gmx.de wrote: Hi David, On Jan 7, 2014, at 12:08 , David Personette dper...@gmail.com wrote: I'm in the US, but live in a relatively rural area. My only internet options are DSL and satellite. The local provider is Century Link (it used to be Sprint, but they sold their copper phone business off). I have the fastest service that they offer (based on distance from the DSLAM), 4 down / .5 up. And you are not alone, a considerable percentage of the population wherever you look is hanging on such connections. So cerowrt should really help those folk as well as luckier ones. I have had SmokePing monitoring my latency to the first hop outside my network for over a year now (I've been on CeroWRT the whole time). My baseline (no load) latency is 31ms. I used to have AQM throttling back to 80% of my already pathetic bandwidth. I would still regularly see periods lasting minutes to hours when latency would be 80 - 120ms. I only recently grokked what you were talking about with tc_stab since I got back from the holidays with the family, I set things up as you suggested for Fred (nfq_codel, target 25ms in advanced egress, ATM, per packet overhead 40, The exact number depends on the encapsulation your ISP uses, 40 is right for a typical PPPoE over LLC/SNAP connection, if that is correct for your link you are fine, otherwise contact me if you want to empirically find out the proper value for your link. and set my SQM bandwidth limits to 95%). Since the 30th my worst case latency has been 41ms. the fq_codels really are great if in control of the bottleneck, really good work by bright people! Plus I get to use more of my actual bandwidth. Well, that I am not so sure. By enabling link layer ATM the router will automatically take care of the ATM cell overhead for you (basically reducing the effective rate to ~90% of the link, in other words you get the same effect by shaping to 90%). It will also handle the per packet overhead and the nasty potential padding of the last ATM cell (both have a stronger effect on small packets and are hard to actually account for by static rate reduction; link layer ATM comes again to the rescue by taking these two into account individually for each packet based on the packet size). So effectively 95% with link layer adjustments might mean a lower wire rate than 80% without; the important thing is that with the link layer
Re: [Cerowrt-devel] New Linksys router
hI there, On Jan 7, 2014, at 17:03 , David Personette dper...@gmail.com wrote: RAM and flash size? According to http://www.linksys.com/en-us/press/releases/2014-01-06_Linksys_wrt_revolutionizes_wireless_networking : RAM 256MB, flash 128MB, dual core ARM -- David P. On Tue, Jan 7, 2014 at 10:56 AM, Rich Brown richb.hano...@gmail.com wrote: I was going to find some time today to call the product manager and/or marketing contact to see if they can give advice. What else would the collective wisdom care to know about? Rich On Jan 7, 2014, at 10:46 AM, Toke Høiland-Jørgensen t...@toke.dk wrote: Looks promising, if somewhat pricey? http://www.linksys.com/en-us/press/releases/2014-01-06_Linksys_wrt_revolutionizes_wireless_networking Anyone knows what chipset it features/will feature? -Toke ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM Question #5: Link Layer Adaptation Overheads
Hi Rich, On Jan 11, 2014, at 17:29 , Rich Brown richb.hano...@gmail.com wrote: Folks, Thanks for all the good responses to this question. I have incorporated the info into the SQM page on the wiki, so that it now gives broad recommendations for each link type, and refers people to the (new) “Everything you wanted to know about Link Layer Adaptation” page. This allows someone coming to CeroWrt for the first time to set up their router and get decent results that will be vastly better than any other router/firmware combo that they might encounter. I think this makes our project more approachable, while still giving an inquiring mind the ability to tune their own installation. You can read the new pages at: http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 This looks quite good. -and- http://www.bufferbloat.net/projects/cerowrt/wiki/Everything_you_wanted_to_know_about_Link_Layer_Adaptation This needs some love (probably by me), but I will most likely not find a lot of time for this before March… Best Regards Sebastian Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] VDSL
Hi Rich, On Jan 11, 2014, at 17:29 , Rich Brown richb.hano...@gmail.com wrote: The VDSL option is burning up way too many of my brain cells. Given that VDSL may well use ATM, while VDSL2 is unlikely to use ATM, I think that it isn’t useful to include it as a top-level choice . In the new pages, I propose removing the VDSL choice, leaving: - “ATM (any kind of DSL link)” and - “None” Well, that looks like a decent recommendation for the wiki. The SQM configuration page still needs to expose all three values, atm, ethernet, and none so that people can actually change things... and letting people tune the settings after reading the “Everything you wanted to know…” page if they want to optimize. Exactly, we still need options exposed for making this possible. http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 -and- http://www.bufferbloat.net/projects/cerowrt/wiki/Everything_you_wanted_to_know_about_Link_Layer_Adaptation Rich Best Regards Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Perfection vs. Good Enough
Hi Rich, On Jan 11, 2014, at 17:31 , Rich Brown richb.hano...@gmail.com wrote: Folks, I am so pleased with the state of CeroWrt. The software has improved enormously, to the point that we all get really good performance from our routers at home. If you want a real eyeful of the progress we’ve made, check list at the bottom of the Release Notes: http://www.bufferbloat.net/projects/cerowrt/wiki/CeroWrt_310_Release_Notes CeroWrt is working great. We have two great testimonials for how it has improved network performance (from Fred Stratton and David Personnette, see https://lists.bufferbloat.net/pipermail/cerowrt-devel/2014-January/001961.html and https://lists.bufferbloat.net/pipermail/cerowrt-devel/2014-January/001970.html) I have been using 3.10.24-8 at home without hiccups (after I turned on SQM :-) since it was shipped. We’ve got a really great program. But - I’m afraid we’re letting perfection be the enemy of the good. Here are a couple indications: - The rest of the world doesn’t know about this good work. If you look at the front page of the site, we’re recommending CeroWrt 3.7.5-2 from last February. It has Codel, but not much more. Our understanding of the world has expanded by an order of magnitude, but we’re not making it available to anyone. - The entire discussion of link layers has held us back. That’s why I proposed to cut back the choices to ATM and None, and let people figure out the details if they want to/have time to optimize. - We have tons of updated modules (dnsmasq, IPv6, quagga, mosh) which we should get out to the world. - The entire product is much tighter, works better, and we can be proud of it. As Dave Täht pointed out in a recent note: Compared to the orders of magnitude we already get from fq codel, the sum benefit of these [Link Layer Adaptation] fixes is in the very small percentage points. I do not agree with this sentiment, as I understood Dave was talking about different modifications to fq_codel (nfq_codel and efq_codel), this was not about the link layer; for an ATM link if you get the link layer wrong the shaper does at best work stochastically; and if the shaper does not work well we are back at square one: badly managed buffers out of our control filling up causing delays worth seconds. So unless you shape down to ~50% of link rate, you will get at least temporary buffer bloat on an ATM link, unless you take all the ATM peculiarities into account (basically what link layer ATM is doing). This is true of the entire CeroWrt build. Proposal: We should “finish up the last bits” to make 3.10.24-8 (or a close derivative) be a stable release. It has been working fine AFAIK for lots and lots of us. It certainly has been as well tested as other branches. I see the following: - Look through the release notes (very bottom of the page at the URL above) and review the items that Dave was worried about for the 3.10.24-8 release - Make a decision on Link Layer Adaptation choices, and implement it. It is quite clear to me, that I failed to explain the matters surrounding ATM links properly. But if I can not explain this to a small group of technical experts there is no chance for me to explain this to lay persons. I will try my best to contribute to the more than you ever wanted to know about link layer adaptation page. Best Regards Sebastian - What else? Best, Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] VDSL
Hi Rich, On Jan 11, 2014, at 18:55 , Rich Brown richb.hano...@gmail.com wrote: HI Sebastian, Well, that looks like a decent recommendation for the wiki. The SQM configuration page still needs to expose all three values, atm, ethernet, and none so that people can actually change things... So two questions really: 1) (From my previous note) What’s the difference between the current “Ethernet” (for VDSL) and “None” link layer adaptations? Currently, none completely disables the link layer adjustments, ethernet enables them, but will only use the overhead (unless you specify an tcMPU, but that is truly exotic). 2) When we distinguish the Ethernet/VDSL case, I would really like to use a different name from “Ethernet” because it seems confusingly similar to having a real Ethernet path/link (e.g., direct connection to internet backbone without any ADSL, cable modem, etc.) On the one hand I agree, but the two options are called ATM (well for tc adsl is a valid alias for ATM) and ethernet if you pass them to tc (what we do), and I would really hate it to hide this under fancy names. I see no chance of renaming those options in tc, so we are sort of stuck with them and adding another layer of indirection seems too opaque to me. This is why I put some explanation behind the option names in the list box… Best Sebastian Thanks. Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Perfection vs. Good Enough
Hi Rich, On Jan 11, 2014, at 19:47 , Rich Brown richb.hano...@gmail.com wrote: Hi Sebastian, It is quite clear to me, that I failed to explain the matters surrounding ATM links properly. But if I can not explain this to a small group of technical experts there is no chance for me to explain this to lay persons. I will try my best to contribute to the more than you ever wanted to know about link layer adaptation page. Oh dear, please don’t think that. Oh, I am sure there are people with more insight, who could explain the complications away, just I am not one of those :) I think you strike a good balance in the description of being informative and improving the typical user's experience with frightening people away with too much detail. I am happy to help with the scary part on the details page... You really have done a great job describing the problems of ATM links. I have snagged all the relevant points from the Link Layer discussion, and plan to include them in the “more than you want to know…” page. Your points A-G describing link framing in https://lists.bufferbloat.net/pipermail/cerowrt-devel/2014-January/001963.html will be a major part of the discussion. Note, I do not do networking for a living, so there will be inaccuracies in there (I tried my best though) But the fact that we can see many ways to improve the software shouldn’t stop us from celebrating our immense success to date. One way to do this would be by shipping a stable version to the world to try. And to do that, we need to provide good enough instructions that’ll work for new people, and all the details for those who want to dig further. Oh, yes, and I think your wiki page gets this balance right. On the other hand, I won’t say No to any help you want to provide. :-) I know you’re busy, so I’m happy to take a shot at it later this week. Thanks for all you’ve contributed to the conversation. Best Sebastian Rich ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] VDSL
Hi Aaron, On Jan 11, 2014, at 21:23 , Aaron Wood wood...@gmail.com wrote: Rich, Sebastian, (and others), First, a hello. I've been lurking on the bufferbloat mailing list for a bit, and just joined here as well to better follow what's going on, and see if there's any way I can help. Next, I have an ADSL+ link in Paris (Free.fr), and am willing to run a number of tests using the various LLA options and overhead estimations. But 3 hours on a dead-quiet link could be hard to deal with. Well, it takes 3 hours to collect 1 samples for 100 different packet sizes (creating a maximum transient traffic of 150kbit/s), you could collect less (your RTT will be dominated by your uplink, so 2000 to 3000 samples per size will be plenty, reducing the time to something more manageable). And as long as your link has 150 kbit/sec reserved on the uplink (since it is more critical there) you should be fine. Unless you saturate your link, additional traffic to the probe will increase the variance of the ping times, but you might be able to get away with, just try to measure for as long as you are comfortable with. (I typically just run this over night… and there is not much going on in our network at that time, but it is not dead-quiet either). I'm happy to run an hour's worth of netperf tests to a nearby server, slowly working through the parameter space, and then comparing the results.\ Again, I recommend to perform the ATM quantization probe against the nearest host (already in the ISPs network, ideally the DSLAM/MSAN) to reduce the variance in the data... My ad-hoc comparisons last week with various modes showed that too high of settings for the overhead (coupled with the already reduced bw limit in the shaper) killed the bulk upload/download performance (which I care about on a meager 18Mbps/1Mbps link). Yes, as expected you pay a bandwidth price for getting the overhead too large. The price you pay for specifying the overhead to low is that for some packet sizes there is no difference while some other packet sizes will drag in an additional ATM cell of 43 bytes. A larger packet sizes typical for bulk upload this additional 43 bytes will not really kill your through put, for small packets this can be catastrophic. Think your specify verhead 0 no link layer adjustments and send a 50 byte packet, the shaper thinks this takes 64 bytes, but on an ATM link with PPPoE LLC/SNAP you actually have 40 bytes overhead + 64 byte packet = 104 bytes which require 3 ATM cells of 48 byte totaling 3*48 = 144 bytes on the wire, your packet more than doubled (factor 2.25). And if you specify an overhead that is too small, you still get some packet sizes that drag in an additional cell with the same consequences. But note that these effects depend on the size of the packets, which people typically do not vary during bandwidth tests... But I found that setting the bw shaper limit to the reported line speed (from the modem), and then adjusting the overhead parameters got me the same bulk performances, with the same latencies (or what appeared to be the same, I need to do more data crunching on the results, and run again in a quieter setting). As I tried to indicate above for bulk traffic the effect of 47 bytes in 1500 might be lost in the noise (and 47 bytes per packet is the worst case), think roughly 3 percent worst case link capacity overestimation (and if you shape to 95% this already fits into the shaped margin). It would also help if my target server was closer than it is. The only server I know of is in Germany, 55ms away (unloaded). I think you should measure the per racket overhead where it actually matters, on the link to the DSLAM/ISP, any congestion on shared network segments further away from the router cannot really be controlled well by shaping in the router, so let's not try to fix that part of the route :) . OTOH, I can say that any changes here over the defaults are still gilding the lily (to an end user). I think I need to come up with a worst case ATM carrier effects test, so people can feel the pain quicker. The dang dependence on packet size really is unfortunate as no bulk speed tester I know tries a decent set of packet sizes... But Free.fr's router/modem already uses codel, so it wasn't that bad to begin with (vs. Numericable on a docsis 3 Netgear router/modem). If I understand correctly, at least the newer free boxes should be very hard to beat, can you disable the modems AQM for testing? === I can also say that I found the current verbiage on that particular page a bit clear as mud. Which page? Even knowing what my network is (to some degree, since the Free.fr modem can tell me), it was difficult to follow, and I quickly found myself at about 3/4 my previous speeds, with no visible improvement in latency (although
Re: [Cerowrt-devel] Managed to break 802.11n (on a 3800)
Hi Aaron, On Jan 16, 2014, at 16:03 , Aaron Wood wood...@gmail.com wrote: All, I'm noting this here in case anyone is interested. After I write this up, I'm going to start from scratch on the configuration, and factory-reset the router. = The 5GHz radio on my 3800 seems to be in a very odd state. I'm not quire sure what state it's in, but it seems to be only doing HT20 1x1. And in a fairly broken manner at that. Running the rrul test (over wifi directly to the router as the netserver), tcp uploads were 25Mbps or so, but download was 5Mbps. This is with your mac? Try rrul_noclassification, macosx (at least 10.8) will not do RRUL fair to a fast host. Why I do not know… it always prioritizes the upload, as if it did not see/trust the downstream markings (heck maybe it is busy using all bandwidth for upstream so that it literally never sees the markings on the downstream packets..) About the other issue I do not know anything… Best Regards Sebastian This is me 1-2 meters from the router. Load was never more than 0.33. (I can share the results of people are interested). After a full power cycle, wifi isn't coming up at all. = How I got here: I'm in France, and had dutifully set my unit with the FR country code when setting up CeroWRT. I had noticed some odd latencies (periodic 100-200ms latency every 10-20 seconds over wifi) on the 5GHz network. The router was on channel 36, and I wanted to move it up to the far-upper ranges, so I tried to specify a custom channel to do so (140). This was the channel I thought I had been using with stock (Netgear) firmware. Wifi didn't come back up after applying the changes, and the luci interface seemed to be tripping up over stuff that it was reading out of the configuration files. I ssh'd in via ethernet, and fixed up the configurations by hand. Except the driver is still reporting that the 5GHz network won't kick into 802.11n modes, and won't use HT40. It seems to be sure it's configured for it, but isn't using it. Further, digging into the rc_stats files with the minstrel speeds, I found some very odd data (not what I was expecting to see: (laptop, which can do 2x2 HT40) rate throughput ewma prob this prob this succ/attempt success attempts D 6 6.0 99.9 100.0 2( 2)65 65 9 0.00.00.0 0( 0) 0 0 12 2.9 25.0 100.0 0( 0) 1 1 18 4.3 25.0 100.0 0( 0) 1 1 24 5.6 25.0 100.0 0( 0) 1 1 A P 3632.4 99.9 100.0 0( 0)51 51 C 4810.4 25.0 100.0 0( 0) 1 1 B5411.5 25.0 100.0 0( 0) 1 1 Total packet count::ideal 53 lookaround 7 (AppleTV, 1x1 HT20) root@cerowrt:/sys/kernel/debug/ieee80211/phy1/netdev:sw10# cat stations/58\:55\:ca\:51\:b5\:4b/rc_stats rate throughput ewma prob this prob this succ/attempt success attempts 6 3.5 57.8 100.0 0( 0) 6 6 9 3.9 43.7 100.0 0( 0) 2 2 12 5.1 43.7 100.0 0( 0) 2 2 1810.0 57.8 100.0 0( 0) 3 3 D 2413.1 57.8 100.0 0( 0) 3 3 C 3614.2 43.7 100.0 0( 0) 2 2 B4818.2 43.7 100.0 0( 0) 2 2 A P 5446.2 99.9 100.0 1( 1) 348 367 Total packet count::ideal 331 lookaround 37 Whereas what I'm seeing for the 2.4GHz radio is: root@cerowrt:/sys/kernel/debug/ieee80211/phy0/netdev:sw00/stations# cat 10\:9a\:dd\:30\:96\:34/rc_stats type rate throughput ewma prob this prob retry this succ/attempt successattempts CCK/LP1.0M 0.7 100.0 100.0 0 0( 0) 2 2 CCK/SP2.0M 0.00.0 0.0 0 0( 0) 0 0 CCK/SP5.5M 0.00.0 0.0 0 0( 0) 0 0 CCK/SP 11.0M 0.00.0 0.0 0 0( 0) 0 0 HT20/LGI MCS05.6 100.0 100.0 1 0( 0) 2 2 HT20/LGI MCS10.00.0 0.0 0 0( 0) 0 0 HT20/LGI
Re: [Cerowrt-devel] Managed to break 802.11n (on a 3800)
Hi Dave, thanks again. On Jan 16, 2014, at 23:50 , Dave Taht dave.t...@gmail.com wrote: On Thu, Jan 16, 2014 at 3:10 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Aaron, On Jan 16, 2014, at 20:08 , Aaron Wood wood...@gmail.com wrote: Sebastian, after sorting out the router, it's still biased, but far less so, about a 2:1 ratio between upload and download. So I See offen 10:1 and worse @165Mbit/s raw wireless rate I get mixed results, but they aren't good. It's hard to comment on each graph in email, but I'll try. I generally run rrul with the --disable-log option. Log scales helped back when we were still comparing against pfifo fast. Good point, I had not thought about this and just sheepishly copied the netperf-wrapper invocation from a scratch buffer…. oops. The really bad download graph. Crazy results Download bandwith is bad because the upload starts and fills the queue first, the download has to wait to fill the queue and generally gets dropped earlier than the upload. This is one of the many reasons I don't care for IW10…. But aren't those two different queues? I am confused... The upload gets better slowly due to how slow tcp is ramping up over the half-duplex wifi channel. Yepp, my sentiment as well, the sharing between up and down sucks badly. I naively assumed that cero would sort of manage TX-ops and share these equally between its own sending needs and the remote station… I guess wifi is too complicated (and I had thought last mile wired connectivity was wonderfully weird...) I just checked again and I get crazy results for both RRUL and RRUL_NOCLASSIFICATION: rrul_noclassification_macbook_2_cerowrt_5GHz.pngrrul_macbook_2_cerowrt_5GHz.png in both cases I get ~ 10:1 out-in imbalance. I think that with a larger quantum on the AP they will be in less imbalance, and you should try nfq_codel also. So for this I would modify the debloat script, correct? The larger quantum will also hurt, too right answer has always been per station queues. Which I will happily test once they are implemented :) And even crazier just had one rrul where both in and out came up almost perfectly at 1:1 Thinking of it again, it might have been a case of really really low total bandwidth, so until this reoccurs I think it is a fluke... Hmm. Wifi is weird, isn't it? It's not like ethernet at all. Too bad the universe insists on trying to defy the laws of physics by trying to make it act like ethernet…. Oh, there was one new blurb last year about going full duplex on wifi, which might help to make wiki behave closer to what people nowadays correlate with ethernet... . Interestingly the classification really works in giving different bandwidth for the different classes. (And in rrul_noclassification, where the still classified UDP probes make it through the EF flow gets shorter latencies…). having 4 full queues and a txop each is far worse than 1 queue with better aggregation, IMHO. So, the one queue would need to shave off all TOS (excuse my occasional shouting, but all caps is the quickest way to avoid auto correction turning my english even funnier), and have say HTB (or god forbid prio) keep some semblance of priority on the packets instead of letting wifi do its let's waste a few tx ops thing. Is it just me or should wifi basically get a better tx-ops sheduler? Note that measuring through cerowrt to a wired host (with too restrictive firewall settings) get: rrul_macbook_2_cerowrt_2_happy-horse_5GHz.pngrrul_noclassification_macbook_2_cerowrt_2_happy-horse_5GHz.png You are seeing the upload ramp up along tcp's lines and the download ramp down as it gets progressively more starved. The sum seems constant, so yes. with the MacBooks uplink still dominant (actually continually getting more bandwidth…). Well, you only have X bandwidth, in the air, total. A better way of saying it might be the macbook is taking better advantage of it's txops to ship more data in an aggregate. Mmh, it looks like it gets more tx-ops or cero gets increasingly bad in filling its tx-ops, no? Since I my only wireless connected machines are macs and nobody else complained about this issue I assume it is an osx issue I honestly think that aside from benchmarks, bandwidth is irrelevant on wifi. Lower latency is something that you actually feel, and when accessing the web or doing a videoconference, that's the part that matters. Oh, sure, and my quick and dirty real world test (bidirectional data transfer initiated from the macbook turned out quite useable and balanced). And I only see this on the local net were wireless is the bottleneck. (Silly Idea, all I need to do is switch the wired machine to 10Mbit ethernet and I will be fine
Re: [Cerowrt-devel] Managed to break 802.11n (on a 3800)
Hi Dave, On Jan 17, 2014, at 00:12 , Dave Taht dave.t...@gmail.com wrote: On Thu, Jan 16, 2014 at 5:56 PM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, many thanks for all the information elucidation, as always. I enjoy trying to find the words to explain. On Jan 16, 2014, at 23:30 , Dave Taht dave.t...@gmail.com wrote: On Thu, Jan 16, 2014 at 10:29 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Aaron, On Jan 16, 2014, at 16:03 , Aaron Wood wood...@gmail.com wrote: All, I'm noting this here in case anyone is interested. After I write this up, I'm going to start from scratch on the configuration, and factory-reset the router. = The 5GHz radio on my 3800 seems to be in a very odd state. I'm not quire sure what state it's in, but it seems to be only doing HT20 1x1. And in a fairly broken manner at that. Running the rrul test (over wifi directly to the router as the netserver), tcp uploads were 25Mbps or so, but download was 5Mbps. This is with your mac? Try rrul_noclassification, macosx (at least 10.8) will not do RRUL fair to a fast host. Why I do not know… it always prioritizes the upload, as if it did not see/trust the downstream markings (heck maybe it is busy using all bandwidth for upstream so that it literally never sees the markings on the downstream packets..) rrul with classification blows up 802.11e on all devices, everywhere. The VO and VI queues generally get all the bandwidth. Been saying that a while. VO and VI should be strictly admission controlled and are not, anywhere. All the queues fill and bad things happen. What should happen in a 802.11n world is that a set of packets should wind up in the best queue for the TXOP, and VO used not at all. rrul_noclassification better looks like the intent for classification was for 802.11e and thus works better. There are a couple other tests in the netperf-wrapper suite that don't use classification at all, that might be saner to use. Ah, so in rrul_noclassification, the UDP flows still are tos marked (at least that is reported in the plots and visible in the plots), but even using tcp_bidirectional I see a crazy imbalance 80:1, so this laptop's Broadcom BCM43xx (apple is not as informative as I would like about the components, but the firmware marker points at broadcom I would say) isn't better than the intel wifi in your's I would say… the iwl is a nightmare. the 802.11ac stuff is looking bad too. Another issue with the current implementation of rrul is my intent with the specification was to test voip-like streams, an isochronous 10ms packet in each direction. The implementation currently sends measurement flows based on the RTT, just like ping. As the RTT declines in length, the amount of space used up by the measurement flow gets bigger and bigger. At a 3ms RTT, just the EF measurement flow eats ~2/3s of the available txops as it runs through the VO queue, which is limited to a single packet per txop. So, how much data could one fit into a txop? Would it make sense for the driver to pad the VO txop with other data just to efficiently use the air bottleneck? The other measurement flows like the CS5 flow, eat the VI queue, and the BE and BK queues get starved for tops. Ah so this is why I only see the TOS UDP data in the rrul_noclassification test, as they are otherwise crowded out by the tcp streams of same class, and nbot reported after the first drop... I can barely explain to myself how the queues are supposed to get airtime scheduled, see the 802.11e page on wikipedia. I thought 802.11e was a bad idea in the first place... but what rrul does is try to get txops on all 4 queues, which means it needs 4x as much airtime (this is not accurate), and grabs airtime for it's VO queue first most of the time, followed by VI, BE, and bk. I think for wifi testing with the current rrul test there needs to be a new test that does everything in BE. (toke?) Classification is very rarely used in the real world anyway. So that means the UDP streams as well? Most of the usage of rrul to date has been over longer RTTs over ethernet... (again, I'm delighted y'all are doing this, and I do hope to get a more voip-like test) Yeah netperf-wrapper has been a delight in getting the ATM mess sorted out, great work. And now with the successor in the works things will get even better :) tcp_bidirectional_hms-beagle_2_cerowrt.png lastly, if you are doing a test over the internet, many providers pee on the tos bits. Unless you've done a packet capture, you can't trust that you are actually seeing classified packets coming back from the internet. Good point, comparing just the local rrul plots with the ones to demo, I see what you mean, there is a tiny bit of the priority classes visible in the uplink (bur barely) and none at all
Re: [Cerowrt-devel] going down the todo list
Hi Dave, hi list, On Jan 19, 2014, at 13:51 , Dave Taht dave.t...@gmail.com wrote: I am going to try to knock out a new release by tomorrow... -1) has minidnssd and upnp been working for others correctly? 0) Presently fooling with a new skin with the gui (it's in 3.10.26-2 - don't! install that unless you merely want to look at the gui). I have no opinion on graphical matters, yours solicited. 1) I have found that sqm does not always start correctly on boot. There is some dependency on something firing to get it to start. 2) dnsmasq's dnssec support isn't quite baked enough to think about putting into a stabler release. 3) I updated most of the onboard doc, still have to finish the credits file 4) bcp38 turns out to be hard to do correctly in our commonly double-natted universe. I think I will try to make the facility available but only enable it partially by default. In the double-NAT case, should not the primary router actually do the bcp38 processing? So then we could get away with detecting the double-NAT by looking at the external address on ge00, and only do the src sanitizing if the address is fully routable? Or would that interfere with carrier-grade NAT (some german ISPs only give 10.this.or.that address for IPv4)? Question, since I do not understand bcp38's impactions: Will this interfere with reaching private networks (like 192.168.N.N) on the external side of the router? It would be great if that would still be possible as all? cable modems respond to 192.168.100.1 and it would be sweet if these could still be reached through cerowrt (there are recipes for stock openwork how to make that possible, so as long as these keep working all would be well :) ) best Sebastian 5) David personette fixed https support for the gui so we will switch to https for the next round 6) squash incoming diffserv bits. I think perhaps wireshark is grabbing the packets before iptables thus I don't see them squashed 7) native ipv6 and dhcpv6-pd support - as discussed on the list, a full solution is gated on steven barth. The massive rework of the routing infrastructure he put in friday needs to be tested too, though. I am hopefully gaining ipv6 from comcast today to see stuff for myself. 8) src/dst routing test of babels - needs work 9) updated shaperprobe, uftp4, and ditg - no progress 10) iwl related crash and unaligned instructions - I have some data on when and how much they happen now, still no insight as to why -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] SQM restart and problems on boot fixed
Hi Dave, On Jan 19, 2014, at 17:36 , Dave Taht dave.t...@gmail.com wrote: The /etc/hotplug.d/iface/00-debloat script has been wrong in the face of the qos-scripts, aqm-scripts, and stuff inbetween. Thus on a fresh boot, or after a DHCP renew or a variety of other circumstances, the portion of sqm that sets up the egress portion of itself gets wiped out. this explains a lot of network performance issues that others have had after a cero box had been up for a while... the sqm code was getting partially disabled! This was also probably wrong on a ton of previous releases going back to 3.7.5 or earlier. (however since the name has changed, it would be aqm for stuff prior to the great renaming, and for 3.7.5 the solution is also different because we weren't using uci at the time. but for 3.10.24 and later replace /etc/hotplug.d/iface/00-debloat with this. #!/bin/sh #DEBLOAT_LOG=/tmp/debloat.log #DEBLOAT_LOG2=/tmp/debloat2.log DEBLOAT_LOG=/dev/null DEBLOAT_LOG2=/dev/null SQM=0 SQM=`uci get sqm.${DEVICE}.enabled` [ $ACTION = ifup -a $SQM != 1 ] { IFACE=$DEVICE QMODEL=fq_codel_ll /usr/sbin/debloat $DEBLOAT_LOG 2 $DEBLOAT_LOG2 } Excellent find, must have been tricky to catch with it only triggering ever so often. I assume that this triggering will also have removed the HTB from egress, so that tc -d qdisc always reported the actual running setup? best Sebastian -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] Managed to break 802.11n (on a 3800)
Hi Toke, On Jan 19, 2014, at 20:01 , Toke Høiland-Jørgensen t...@toke.dk wrote: Dave Taht dave.t...@gmail.com writes: hah. Calling it that is the opposite of my intent with default blow-up of 802.11e - which has been to convince everyone it's busted and to fix it. rrul_be ? Added an rrul_be test to netperf-wrapper git. Will do a new release version once I've fixed one or two other things. :) Great! just pull the repository. I noticed the new hostnames are mandatory police kick in: bash-3.2$ ./netperf-wrapper --list-tests Available tests: cisco_5tcpup : RTT Fair Realtime Response Under Load cisco_5tcpup_2udpflood : Cisco 5TCP up + 2 6Mbit UDP Error occurred: No hostname specified. Maybe you could allow listing the tests without a hostname? Anyway., just ran the rrul_be test from my macbook to cerowrt and I still see the massive bias for upload of around 10:1, same as with rrul and rrul_coclassification, and tcp_bidirectional. When I ran tcp_bidirectional on the universities wireless network (the only test that actually runs there), I get larger downloads than uploads. Wireless is weird... Best Sebastian -Toke ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt issues (3.10.24-8)
Hi Dave, quick question, how does one turn of logging for babeld? It seems that if daemonized it defaults to logging to /var/log/babeld.log (or similar). Is setting the log file to /dev/null really the answer? (Since I have no the IPv6 issue not yet resolved, I assume babeld is unhappy) I resorted to stopping babeld completely, but that feels like a crutch… Best Sebastian On Jan 27, 2014, at 22:14 , Dave Taht dave.t...@gmail.com wrote: certainly turn off the babeld log! I will leave it off in the next release. On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson ste...@fruitless.org wrote: Looking more, the buffer errors are showing up in syslog well before tmpfs fills up. Is the memtester openwrt package available for cerowrt? I don't see it under `Available packages`. Thanks, Steve On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson ste...@fruitless.org wrote: On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht dave.t...@gmail.com wrote: On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson ste...@fruitless.org wrote: Hi everybody, I've been using cerowrt as a secondary wifi network (just a single AP for now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state and eventually rebooted. I've had this happen a few times now and am looking for ways to debug the issue. I'm new to cerowrt and openwrt so any advice is appreciated. Since I use it as a secondary network, this is no way critical. Yea! I appreciate caution before putting alpha software on your gw. I'm not looking for free tech support but I couldn't find anything on the wiki about troubleshooting. I'd love to start a page and write some shell scripts to diagnose and report issues. I know that a cerowrt router is meant to be a research project rather a consumer device but these things seem helpful regardless. Sure, let me know your wiki account. I have been lax about granting access of late as the signup process is overrun by spammers. My username is stevej on the wiki. Thanks! Please let me know if you'd prefer I not email the list with these issues or if you'd rather I used trac or a different forum. The list is where most stuff happens. Also in the irc channel. If it gets to where it needs to be tracked we have a bugtracker at http://www.bufferbloat.net/projects/cerowrt/issues The first question I have is: Are you on comcast? Cerowrt had a dhcpv6-pd implementation that just worked from feburary through december. Regrettably they changed the RA announcement interval to a really low number around then... and this triggers a firewall reload every minute on everything prior to the release I point to below. If there is a memory leak somewhere that would have triggered it. I am on ATT ADSL2+ with a Motorola NVG510 modem. In this state, I can connect to the cerowrt base station via wifi but am unable to route packets to the internet. I can connect to :81 and see the login page but logging in results in a lua error at `/cgi-bin/luci` /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function dispatcher target for entry '/'. The called action terminated with an exception: /usr/lib/lua/luci/sauth.lua:87: Session data invalid! stack traceback: [C]: in function 'assert' /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' /usr/lib/lua/luci/dispatcher.lua:195: in function /usr/lib/lua/luci/dispatcher.lua:194 I can ssh into the device and cat various log files until the router hangs and reboots. here's a few relevant lines from my terminal history before the device rebooted (I'm assuming a watchdog kicked in and rebooted it). root@buffy2-1:~# ping google.com ping: bad address 'google.com' root@buffy2-1:~# free total used free shared buffers Mem:126336 110332160040 5616 -/+ buffers: 10471621620 Swap:000 root@buffy2-1:~# uptime 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17 root@buffy2-1:~# dmesg [0.00] Linux version 3.10.24 (cero2@snapon) (gcc version 4.6.4 (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24 10:50:15 PST 2013 [skipping some lines] [ 13.156250] Error: Driver 'gpio-keys-polled' is already registered, aborting... [ 19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready [ 19.421875] ar71xx: pll_reg 0xb8050010: 0x [ 19.429687] se00: link up (1000Mbps/Full duplex) [ 22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready [ 23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes ready [ 23.757812] ar71xx: pll_reg 0xb8050014: 0x [ 23.757812] ge00: link up (1000Mbps/Full duplex) [ 23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes ready root@buffy2-1:~# ifconfig ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1 inet
Re: [Cerowrt-devel] cerowrt issues (3.10.24-8)
On January 29, 2014 5:10:18 PM CET, Dave Taht dave.t...@gmail.com wrote: On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, quick question, how does one turn of logging for babeld? It seems that if daemonized it defaults to logging to /var/log/babeld.log (or similar). Is setting the log file to /dev/null really the answer? seems so. Okay, I guess I will try that then... (Since I have no the IPv6 issue not yet resolved, I assume babeld is unhappy) I resorted to stopping babeld completely, but that feels like a crutch... no daemon in an embedded system should ever write to flash in an uncontrollable manner. What bugs me is that it basically keeps repeating the same error over and over again. If it would rate limit and:or push messages to the system log it would be nicer. I will also argue that not being able to find the channel is a bug that messes with diversity routing in particular. I have actually not yet understood what it wants to tell me ;), since I got your attention, is there an easy way to run a babel client under macosx? Best Regards Sebastian Best Sebastian On Jan 27, 2014, at 22:14 , Dave Taht dave.t...@gmail.com wrote: certainly turn off the babeld log! I will leave it off in the next release. On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson ste...@fruitless.org wrote: Looking more, the buffer errors are showing up in syslog well before tmpfs fills up. Is the memtester openwrt package available for cerowrt? I don't see it under `Available packages`. Thanks, Steve On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson ste...@fruitless.org wrote: On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht dave.t...@gmail.com wrote: On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson ste...@fruitless.org wrote: Hi everybody, I've been using cerowrt as a secondary wifi network (just a single AP for now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state and eventually rebooted. I've had this happen a few times now and am looking for ways to debug the issue. I'm new to cerowrt and openwrt so any advice is appreciated. Since I use it as a secondary network, this is no way critical. Yea! I appreciate caution before putting alpha software on your gw. I'm not looking for free tech support but I couldn't find anything on the wiki about troubleshooting. I'd love to start a page and write some shell scripts to diagnose and report issues. I know that a cerowrt router is meant to be a research project rather a consumer device but these things seem helpful regardless. Sure, let me know your wiki account. I have been lax about granting access of late as the signup process is overrun by spammers. My username is stevej on the wiki. Thanks! Please let me know if you'd prefer I not email the list with these issues or if you'd rather I used trac or a different forum. The list is where most stuff happens. Also in the irc channel. If it gets to where it needs to be tracked we have a bugtracker at http://www.bufferbloat.net/projects/cerowrt/issues The first question I have is: Are you on comcast? Cerowrt had a dhcpv6-pd implementation that just worked from feburary through december. Regrettably they changed the RA announcement interval to a really low number around then... and this triggers a firewall reload every minute on everything prior to the release I point to below. If there is a memory leak somewhere that would have triggered it. I am on ATT ADSL2+ with a Motorola NVG510 modem. In this state, I can connect to the cerowrt base station via wifi but am unable to route packets to the internet. I can connect to :81 and see the login page but logging in results in a lua error at `/cgi-bin/luci` /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function dispatcher target for entry '/'. The called action terminated with an exception: /usr/lib/lua/luci/sauth.lua:87: Session data invalid! stack traceback: [C]: in function 'assert' /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' /usr/lib/lua/luci/dispatcher.lua:195: in function /usr/lib/lua/luci/dispatcher.lua:194 I can ssh into the device and cat various log files until the router hangs and reboots. here's a few relevant lines from my terminal history before the device rebooted (I'm assuming a watchdog kicked in and rebooted it). root@buffy2-1:~# ping google.com ping: bad address 'google.com' root@buffy2-1:~# free total used free shared buffers Mem:126336 110332160040 5616 -/+ buffers: 10471621620 Swap:000 root@buffy2-1:~# uptime 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17 root@buffy2-1:~# dmesg [0.00] Linux version 3.10.24 (cero2@snapon) (gcc version 4.6.4 (OpenWrt/Linaro GCC 4.6-2013.05
Re: [Cerowrt-devel] cerowrt issues (3.10.24-8)
Hi Dave, On Jan 29, 2014, at 19:21 , Dave Taht dave.t...@gmail.com wrote: On Wed, Jan 29, 2014 at 9:44 AM, Sebastian Moeller moell...@gmx.de wrote: On January 29, 2014 5:10:18 PM CET, Dave Taht dave.t...@gmail.com wrote: On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, quick question, how does one turn of logging for babeld? It seems that if daemonized it defaults to logging to /var/log/babeld.log (or similar). Is setting the log file to /dev/null really the answer? seems so. Okay, I guess I will try that then... (Since I have no the IPv6 issue not yet resolved, I assume babeld is unhappy) I resorted to stopping babeld completely, but that feels like a crutch... no daemon in an embedded system should ever write to flash in an uncontrollable manner. What bugs me is that it basically keeps repeating the same error over and over again. If it would rate limit and:or push messages to the system log it would be nicer. I will also argue that not being able to find the channel is a bug that messes with diversity routing in particular. I have actually not yet understood what it wants to tell me ;), since I got your attention, is there an easy way to run a babel client under macosx? for coping with the mac I use macports to get a compiler and support for open source software. Ah, same here (even though I would have thought you a homebrew user, no idea why) ;) Alas, port search babel does not find anything babeld related... I haven't ever tried to run babeld on the mac I have, I will put it on my list… I just thought it would be nice to finally get into the stay connected while switching between wired and wire-less fun, but this is in no way essential for me. Best Regards Sebastian Best Regards Sebastian Best Sebastian On Jan 27, 2014, at 22:14 , Dave Taht dave.t...@gmail.com wrote: certainly turn off the babeld log! I will leave it off in the next release. On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson ste...@fruitless.org wrote: Looking more, the buffer errors are showing up in syslog well before tmpfs fills up. Is the memtester openwrt package available for cerowrt? I don't see it under `Available packages`. Thanks, Steve On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson ste...@fruitless.org wrote: On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht dave.t...@gmail.com wrote: On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson ste...@fruitless.org wrote: Hi everybody, I've been using cerowrt as a secondary wifi network (just a single AP for now) for a few weeks now. Recently, my wndr3800 got stuck in a bad state and eventually rebooted. I've had this happen a few times now and am looking for ways to debug the issue. I'm new to cerowrt and openwrt so any advice is appreciated. Since I use it as a secondary network, this is no way critical. Yea! I appreciate caution before putting alpha software on your gw. I'm not looking for free tech support but I couldn't find anything on the wiki about troubleshooting. I'd love to start a page and write some shell scripts to diagnose and report issues. I know that a cerowrt router is meant to be a research project rather a consumer device but these things seem helpful regardless. Sure, let me know your wiki account. I have been lax about granting access of late as the signup process is overrun by spammers. My username is stevej on the wiki. Thanks! Please let me know if you'd prefer I not email the list with these issues or if you'd rather I used trac or a different forum. The list is where most stuff happens. Also in the irc channel. If it gets to where it needs to be tracked we have a bugtracker at http://www.bufferbloat.net/projects/cerowrt/issues The first question I have is: Are you on comcast? Cerowrt had a dhcpv6-pd implementation that just worked from feburary through december. Regrettably they changed the RA announcement interval to a really low number around then... and this triggers a firewall reload every minute on everything prior to the release I point to below. If there is a memory leak somewhere that would have triggered it. I am on ATT ADSL2+ with a Motorola NVG510 modem. In this state, I can connect to the cerowrt base station via wifi but am unable to route packets to the internet. I can connect to :81 and see the login page but logging in results in a lua error at `/cgi-bin/luci` /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function dispatcher target for entry '/'. The called action terminated with an exception: /usr/lib/lua/luci/sauth.lua:87: Session data invalid! stack traceback: [C]: in function 'assert' /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' /usr/lib/lua/luci/dispatcher.lua:195: in function /usr/lib/lua/luci
Re: [Cerowrt-devel] cerowrt issues (3.10.24-8)
Hi Steve, On Jan 29, 2014, at 19:24 , Steve Jenson ste...@fruitless.org wrote: On Wed, Jan 29, 2014 at 9:44 AM, Sebastian Moeller moell...@gmx.de wrote: On January 29, 2014 5:10:18 PM CET, Dave Taht dave.t...@gmail.com wrote: On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, quick question, how does one turn of logging for babeld? It seems that if daemonized it defaults to logging to /var/log/babeld.log (or similar). Is setting the log file to /dev/null really the answer? seems so. Okay, I guess I will try that then... Here's the directive I'm using in /etc/babeld.conf log-file /dev/null and then you can restart either via the web gui or `/etc/rc.d/S70babeld restart` Ah, thanks. Since I am on 3.10.28-1 this was /etc/init.d/babeld restart. And I opted for putting: option 'log-file' '/dev/null' into /etc/config/babeld, since that seemed the more openwork way of doing things; I wonder whether it really is wise to carry both files… Babeld runs again, and no /var/log/babeld.log appeared, but whether it works I do not know (and I doubt it given that babeld.log was growing due to nasty repeating error messages...) Best Regards Sebastian ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt issues (3.10.24-8)
Hi David, On Jan 29, 2014, at 20:56 , David Personette dper...@gmail.com wrote: On Wed, Jan 29, 2014 at 2:51 PM, Sebastian Moeller moell...@gmx.de wrote: I have actually not yet understood what it wants to tell me ;), since I got your attention, is there an easy way to run a babel client under macosx? for coping with the mac I use macports to get a compiler and support for open source software. Ah, same here (even though I would have thought you a homebrew user, no idea why) ;) Alas, port search babel does not find anything babeld related... I haven't ever tried to run babeld on the mac I have, I will put it on my list… I just thought it would be nice to finally get into the stay connected while switching between wired and wire-less fun, but this is in no way essential for me. FYI, it does look like homebrew has babel. dperson@argos2$ brew search babel babeld gpsbabelopen-babel Thanks. The first one seems to be the real McCoy (the others are also in macports, for what it is worth). I am very reluctant to switch to homebrew though since I got macports working well and am weary of the learning urge involved in getting the same proficiency in home-brew…. A different question, did you manage figure out your del link's overhead? If not I would be happy to help (especially if octave/mtlab availability is an issue), just send me the ping log file (best via a file drop). Best Regards Sebastian -- David P. ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel
Re: [Cerowrt-devel] cerowrt issues (3.10.24-8)
Hi Dave, hi list, On Jan 30, 2014, at 17:21 , Dave Taht dave.t...@gmail.com wrote: On Wed, Jan 29, 2014 at 11:55 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Steve, On Jan 29, 2014, at 19:24 , Steve Jenson ste...@fruitless.org wrote: On Wed, Jan 29, 2014 at 9:44 AM, Sebastian Moeller moell...@gmx.de wrote: On January 29, 2014 5:10:18 PM CET, Dave Taht dave.t...@gmail.com wrote: On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller moell...@gmx.de wrote: Hi Dave, quick question, how does one turn of logging for babeld? It seems that if daemonized it defaults to logging to /var/log/babeld.log (or similar). Is setting the log file to /dev/null really the answer? seems so. Okay, I guess I will try that then... Here's the directive I'm using in /etc/babeld.conf log-file /dev/null and then you can restart either via the web gui or `/etc/rc.d/S70babeld restart` Ah, thanks. Since I am on 3.10.28-1 this was /etc/init.d/babeld restart. And I opted for putting: option 'log-file' '/dev/null' into /etc/config/babeld, since that seemed the more openwork way of doing things; I wonder whether it really is wise to carry both files... I stuck it in /etc/config/babeld. Babeld runs again, and no /var/log/babeld.log appeared, but whether it works I do not know (and I doubt it given that babeld.log was growing due to nasty repeating error messages...) It's working. It is just not making an optimal routing decision between AP-managed networks and meshy ones. Ah, if babel can work around this issue than there is no good reason to spam the log with repeats (at least not at the frequency it currently does, once per day might be more reasonable...) The feature is called diversity routing, and it is key to making wireless networks scale better. There are (now), quite a few papers on it, but I like Juliusz's best... http://www.pps.univ-paris-diderot.fr/~jch/software/babel/wbmv4.pdf Interesting, thanks for the pointer. Notably this feature is also in batman, but it's called something else that I forget. In an example with two radios on a cerowrt AP: If you have a packet come in from channel 36, it's best that it goes out via ethernet if possible, channel 11 if not, and not channel 36. Even if the number of hops seems less, don't go back out 36, if at all possible, use a different route. So right now babel is incorrectly distinquishing between the AP managed SSIDs (sw00, sw10, gw10, gw00), so the routing decisions there are sub-optimal. Is it really hard to get the radio and frequency for each interface from linux? As in most cases you are going to go out ethernet or one of the more meshy interfaces, or you have no choice but to send stuff along on one SSID... it's not very sub-optimal. Okay, that does not justify the log spam ;) still, annoying. rule 22 in embedded design is never write infinitely long files as the probability of running out of memory or flash always hits 100% I agree (and then I would be happier if openwrt would have a preemptive mechanism to avoid this situation, be it log rotation and deletion or similar methods.) Best Regards mangy thanks for the explanation. Sebastian Best Regards Sebastian -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html ___ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel