Re: [Cake] [Codel] Proposing COBALT
> On 3 Jun, 2016, at 22:09, Noah Causin <n0manlet...@gmail.com> wrote: > > Was the issue, where the drops and marks did not seem to occur, resolved? Examination of packet dumps obtained under controlled conditions showed that marking and dropping *did* occur as normal, and I got a normal response from a local machine sending through a virtual delay line. My Internet connection is such that extremely short RTTs never occur. However, it seems that some Internet servers I use often do not respond as much as they should to ECN marking, resulting in excessively long queues despite a relatively small number of flows. It rather reminds me of the symptoms one would expect to see if DCTCP found its way onto the public Internet. And these are very popular servers with an extremely large userbase. However it’s also possible that the ECN information is somehow disappearing en route. I plan to investigate in more detail once COBALT is up and running, with behaviour I can reason about more intuitively than the “evolved Codel” Cake has been using up to now. With COBALT integrated into Cake, I’ll also be able to directly track the number of unresponsive flows. Part of that investigation may be to enquire as to whether DCTCP is in fact in use. If so, the TCP Prague people should be brought into the loop, as this would constitute evidence that Codel can’t control DCTCP via ECN under practical Internet conditions. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Possible BUG - parent backlog incorrectly updated in case of NET_XMIT_CN
> On 7 Jun, 2016, at 14:20, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > I had a nose at CAKE but couldn't quit work out if a similar issue is present > but I suspect it is. Certainly if Eric can't get it right "My prior attempt > to fix the backlogs of parents failed." then it's not an 'obvious to solve' > problem :-) It appears my code already handles it correctly. This is most likely because it inherited the analogous handling from the old function called here. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Possible BUG - parent backlog incorrectly updated in case of NET_XMIT_CN
>> And there’s also the problem that we might not need to drop packets as >> large as the incoming packet in order to fit the latter into the queue >> - so this corrected correction may be *negative* (the queue is longer >> than before) - but qdisc_tree_reduce_backlog() only takes an unsigned >> parameter here. > > That's a very minor detail. > > If the code does : > > reduce_backlog(unsigned quantity) > { > q->backlog -= quantity; > } > > Then the fact that @quantity is signed or unsigned is irrelevant. Not so - unless you are very sure that q->backlog is the same size as quantity. In an increasingly 64-bit world, that is by no means assured in the future. I don’t like relying on wraparound behaviour without making that assumption explicit, which is precisely what the signed types in C are for. >> IMHO the NET_XMIT_CN semantics are broken. It might be better to drop >> support for it, since it should rarely be triggered. > > What exact part is broken ? > Semantic looks good to me. It’s broken in that a negative correction may be required in the first place. It places additional burden on every producer of the CN signal who isn’t a tail-dropper. I can only assume that the behaviour was designed with only tail-drop in mind - and as we both know, that is not the only option any more. It appears to me that there are four things than enqueue() may want to tell its caller: 1: That the packet was enqueued successfully (NET_XMIT_SUCCESS). 2: That the packet was enqueued successfully, but some other relevant packet had to be dropped due to link congestion. 3: That the packet was *not* enqueued, and this was due to link congestion. This is potentially useful for tail-dropping queues, including AQMs. 4: That the packet was *not* enqueued, for some reason other than link congestion; the “error case". Currently NET_XMIT_CN appears to cover case 3, whereas Cake normally wants cases 1 & 2 (and will only signal case 4 if a GSO aggregate couldn’t be segmented). For the time being, I have changed my development branch of Cake to always signal case 1 except for the error case. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] New to cake. Some questions
> why are the vdsl options suffixed with _ptm, but the atm options are not? Because the “vcmux” and “llc” suffixes are sufficient to imply ATM cell framing. > is the currently selected set of keywords minimal and complete? I did some careful research back when I added that feature, including taking some suggestions from you, and according to that: yes, it is correct and complete, and every keyword is related to a real protocol. Though some are not widely used in practice, they *are* widely supported in ubiquitous consumer-grade equipment. I haven’t seen any evidence to the contrary; if you have any, please show it, if not, PLEASE SHUT UP about it. That, by the way, is *me* being blunt to the point of rudeness. > why name something conservative that will for all peop;e not using an ATM > link cost between 9 to 40% of goodput? The use-case for the “conservative” keyword is essentially: “I know what the raw bitrate of the link is, but I have no sodding clue what overhead it has”. The goal is to prevent the dumb buffers elsewhere from filling up and undoing our good work. Yes, it will overcompensate, leading to reduced throughput. That’s recognised and accepted as far as I’m concerned; the worst corner cases are with very small packets, which frankly matter less. If you don’t want overcompensation, figure out what the real overhead is and set that. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] New to cake. Some questions
> On 10 Jun, 2016, at 00:36, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > >> 5. Is there still the udp packet dropping problem? e.g. games that are using >> udp. >> If yes does it make sense to apply diffserv classes manually? How to do this? > What udp packet problem? He’s probably referring to the tendency of non-flow-isolating AQMs to drop packets indiscriminately when under load. Cake is flow-isolating and thus applies a separate AQM algorithm to each flow. As such, UDP gaming/VoIP traffic won’t get dropped unless it exceeds its fair share of the link, which is unlikely for a well-designed, lightweight protocol. We really should make an effort to put a more intuitive GUI interface on this. These questions indicate a user overwhelmed by many options without guidance. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Codel] Proposing COBALT
> On 4 Jun, 2016, at 04:01, Andrew McGregor <andrewm...@gmail.com> wrote: > > There are undoubtedly DCTCP-like ECN responses widely deployed, since > that is the default behaviour in Windows Server (gated on RTT in some > versions). But also, ECN bleaching exists, as do servers with ECN > response turned off even though they negotiate ECN. It would be good > to know some specifics as to which site, whose DC they're hosted in, > etc. I’m keeping my mouth shut until I’ve analysed the specific traffic in more detail, so I know what I’m accusing people of and precisely who to accuse. It’s even possible that the fault lies in my ISP’s network - I think they’ve made some significant changes recently. If people are really negotiating ECN and then ignoring its signals at the host level, that’s a clear RFC violation. Fortunately, I think this particular site would be interested in correcting such behaviour if confirmed and explained. > Also, do you have fallback behaviour such that an ECN-unresponsive > flow eventually sees drops? I think that will be essential. Yes, COBALT essentially *is* such a mechanism. The Codel half always uses ECN if it’s available (and drops otherwise), but the BLUE half - the part responsible for handling unresponsive flows in the first place - always uses packet drops. Cake also performs “head drop on the longest queue” when the global queue limit is reached (as does fq_codel). This can be considered a second such mechanism, though a much blunter one; it is significantly superior to tail-drop for two major reasons, but can easily result in burst loss. It is also this overflow which acts as the up-trigger for BLUE; the longest queue not only gets the instant head-drop but a notification to its COBALT instance. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Codel] Proposing COBALT
> On 4 Jun, 2016, at 04:01, Andrew McGregor <andrewm...@gmail.com> wrote: > > ...servers with ECN response turned off even though they negotiate ECN. It appears that I’m looking at precisely that scenario. A random selection of connections from a packet dump show very high marking rates, which are apparently acknowledged using CWR, but a subsequent dropped packet (probably due to queue overflow) takes many seconds to be retransmitted (I’m using a rather high memory limit for observation purposes). Overall the TCP behaviour is approximately normal for NewReno on a dumb FIFO, and the ECN signalling is completely ignored. This doesn’t rule out the possibility that it’s a different Reno relative, such as Westwood+ or Compound. There’s often more than one CWR per RTT. This isn’t a consistent characteristic; some connections have normal-looking CWRs while others issue them every three packets, as if they’re fishing for “more accurate” ECN feedback. It might vary by host; I didn’t keep track of that. But this can’t be DCTCP; even that should back off in the face of a 100% marking rate, which is often achieved at my low bandwidth and with very persistent queues. Other servers respond normally to ECN signals, ruling out interference by my ISP. It’s possible the ECE flag is wiped and the CWRs are faked, but there’s no legitimate reason to do that. The CWRs ultimately make no difference, since at 100% CE marks, every ack has ECE set anyway. Turning off ECN negotiation at the client results in a much better managed queue with similar throughput. It’s not immediately obvious whether that’s due to a functioning congestion response or simply the AQM clearing out the queue the hard way. It’ll be interesting to see what effect COBALT has here, when I get it to actually work. As for who these servers are: Valve Software’s Steam platform. I did say they were large and popular. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Codel] Proposing COBALT
> On 4 Jun, 2016, at 17:01, moeller0wrote: > > Maybe cake should allow to switch from the default mark by ECN policy to mark > by drop per command line argument? At least that would allow much easier in > the field testing… As is there is only the option of disabling ECN at the > endpoint(s)… I consider ignoring ECN in the way I described to be a fault condition inevitably resulting in unresponsive traffic. As a fault condition, it should be rare. The main effect in practice is that the RTT for the affected flows grows well beyond normal, but since they are bulk transfers, this has only a minor detrimental effect (much of which is incurred sender-side in the form of retransmission buffers two orders of magnitude larger than necessary). Rather than further complicate Codel or Cake, I’d like to simply apply a general solution for unresponsive traffic, ie. COBALT. - Jonathan Mortob ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake separate qos for lan
> On 31 Mar, 2016, at 14:49, Allan Pinto <allan...@gmail.com> wrote: > > im not sure if we can route it to ifb this way,( maybe only works with tc > redirection?) as i feel that i have just set up a routing loop. Both IFB and IMQ use the original routing information to direct the packet to the original device after passing through. There is thus no second pass through the routing tables (or the firewall). I think to use routing tables in any way for this purpose, you would need to use a virtual-ethernet-pair device, not IFB or IMQ. Consider that a fallback option if we can’t get anything else to work. I’m also thinking of cleaner ways to do this by means of adding code, rather than masses of configuration. It really shouldn’t be this hard to produce a working setup. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] new code point proposed
> On 5 Apr, 2016, at 23:28, moeller0 <moell...@gmx.de> wrote: > >> Tin 0 = LLT “Lo” traffic (inc. existing low-loss & high-throughput classes), >> 256/256, 100%, increased target & interval. >> Tin 1 = Best Effort traffic, 256/256, 100%, standard target & interval. >> Tin 2 = LLT “La” traffic (inc. existing low-latency classes), 256/256, 100%, >> standard target, reduced interval. > > This might back fire, as far as I understand interval is the reaction > time window for a flow, this needs to be roughly in the ballpark of the RTT, > reducing it (significantly) will make the AQM quite trigger happy. This might > be in line with the LA proposal, but what if LA traffic has to cross a > satellite link? The entire point *is* to make the AQM very trigger-happy for “La" traffic. By selecting the “La” DSCP (or any other low-latency DSCP, for that matter), the originator of the traffic is requesting that behaviour. Reduced throughput is an expected side-effect. Satellite links have nasty effects on latency-sensitive traffic all by themselves. I don’t think we need to worry too much about that combination. If the flow uses less than its fair share of bandwidth, the AQM won’t trigger anyway. In any case, Codel’s behaviour and default parameters are tuned for conventional TCP. Latency-sensitive traffic generally doesn’t use conventional TCP, so the usual assumptions go out of the window. I propose retaining the standard “target” parameter on Tin 2 to avoid triggering AQM with a single large packet, but reducing “interval” to make Codel’s behaviour more suitable for UDP and DCTCP traffic. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake separate qos for lan
> On 27 Mar, 2016, at 08:31, Allan Pinto <allan...@gmail.com> wrote: > > Cache-Server >| > internet Gateway ---> L2 switch --> LInux router with cake - - [ pppoe > connection ] --> customer Aha - that is a different topology than we usually assume. So the egress side of the port is the right one to consider. It’s nice to see Cake being considered for the provider's side of the link. Interesting. Cake doesn’t have its own facilities to do the sort of specific discrimination you want. All the mechanisms it has are geared to sharing a fixed capacity as equitably as is feasible. So you will need to divide the traffic using some other mechanism, and pass it through two separate instances of Cake. Ideally you want one instance set for the capacity of the physical link, with all traffic passing through it, and the second instance set for the allocation for non-cache traffic, with the cache traffic bypassing it. The customer will then get full link capacity when accessing the cache and nothing else, and latency will still be controlled well with a complex mix of traffic. I recommend you use the IMQ mechanism (http://lartc.org/howto/lartc.imq.html) to achieve the ideal configuration above: ip link set imq0 up tc qdisc replace dev ppp0 root handle 1: cake pppoe-vcmux bandwidth $FULL_RATE triple-isolate tc qdisc replace dev imq0 root handle 2: cake raw bandwidth $NONCACHE_RATE flows iptables -t mangle -A PREROUTING -o ppp0 -s $CACHE_IP -j IMQ —todev 0 That reminds me - we need to update the documentation to properly describe the overhead and triple-isolation keywords. You might need a different overhead setting than “pppoe-vcmux” depending on the details of your link. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake separate qos for lan
> On 26 Mar, 2016, at 17:14, Allan Pinto <allan...@gmail.com> wrote: > > I'm experimenting in replacing a mikrotik router with plain linux. by > following the instructions on the cake page, i have setup the following line > in the /etc/ppp/ip-ip script so that the user will be limited to bandwidth > using cake. > > /usr/sbin/tc qdisc add dev $pppdev root cake bandwidth ${BURST_DOWN}bit > > but i have certain lan traffic available to the customer which should be > available at higher speed, for eg. cache traffic and i want to set that speed > to 20mbit default . > > if i understand correctly i will have to mark traffic coming in from that lan > source using iptables, can someone guide me how to set bandwidth only for > that source to be higher. I’m not certain what your topology is here. Is the cache inside or outside the point where the router is fitted, or is it within the router, or off to the side on a separate port? Also, the command shown will apply a limit to *egress* traffic on the given port. If you need to do *ingress* shaping, there’s a different sequence of commands. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake separate qos for lan
> On 29 Mar, 2016, at 00:01, Stephen Hemminger <step...@networkplumber.org> > wrote: > > IMQ was abandoned by original author because there was no way to make it > reliable and SMP safe. That’s a shame, because for this use-case, it’s a heck of a lot easier to get it doing the right thing - simply because iptables is that much more flexible than the u32 filter. But maybe u32, in the hashing configuration, scales better. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Proposing COBALT
> On 20 May, 2016, at 19:05, David Lang <da...@lang.hm> wrote: > > don't know, I was thinking more the dslreports speedtest site and that sort > of thing. I imagine both dslreports and netalyzr would find this metric interesting, if they don’t have it already. They’re both in a position to examine packet traces associated with each test. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Codel] Proposing COBALT
> On 24 May, 2016, at 16:47, Jeff Weeks <jwe...@sandvine.com> wrote: > >> In COBALT, I keep the drop-scheduler running in this phase, but without >> actually dropping packets, and *decrementing* count instead of incrementing >> it; the backoff phase then >> naturally ends when count returns to zero, instead of after an arbitrary >> hard timeout. The loop simply ensures that count will reduce by the correct >> amount, even if traffic >> temporarily ceases on the queue. Ideally, this should cause Codel’s count >> value to stabilise where 50% of the time is spent above target sojourn time, >> and 50% below. (Actual >> behaviour won’t quite be ideal, but it should be closer than before.) > > I tried this as well, at one point, but can't remember, off-hand, why I > didn't stick with it; will have to see if I can find mention of it in my > notes. > What trigger are you using to decrement count? I initially did a crude > decrement of count every interval, but then you end up with a ramp-down time > which is considerably slower then the ramp-up (and the ramp up is slow to > begin with). > I assume you're actually re-calculating the next drop, using the > 1/sqrt(count) but instead of dropping and increasing count, you're simply > decreasing count, so the time to get from 1->N is the same as the time to get > to N->1? That’s basically right. In retrospect, it seems like a very obvious approach to the backoff problem. :-) Of course, due to the “priming” delay and the possibility of the signalling frequency exceeding the packet rate, it’s likely to take *less* time to ramp down than to ramp up; this is why the ramping down is guarded by a while loop. >> As another simplification, I eliminated the “primed” state (waiting for >> interval to expire) as an explicit entity, by simply scheduling the first >> drop event to be at now+interval when >> entering the dropping state. This also eliminates the first_above_time >> variable. Any packets with sojourn times below target will bump Codel out >> of the dropping state anyway. > > How do you handle the case where you're scheduled a drop event 100ms in the > future, and we immediately see low latency; is the event descheduled? > If not, what if we then see high latency again; can the still-scheduled-event > cause us to start dropping packets earlier than 100ms? The first drop event is scheduled by setting the “dropping” flag, ensuring that “count” is nonzero, and setting the “drop_next” timestamp to now+interval. Any packet below the target sojourn time clears the “dropping” flag, which prevents marking or dropping from occurring - which is why the explicit “primed” state is eliminated. Since the timestamp is set in this way whenever the “dropping” flag transitions from cleared to set, there are no spurious drop events. The code is in the sch_cake repo if you want to examine the details. I promise it’s a lot easier to read than the original Codel code. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake, codel5.h, ecn marking & dropping. Confused
> On 4 May, 2016, at 12:57, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > In essence my (mis)understanding of this code is something like: We've > got here because we've been dropping and codel is telling is to continue > to drop. With that decided we enter a do..while, the first thing to > happen is to ECN mark and let that marked packet escape to send the > signal. Otherwise we appear to iterate around the loop. So here's the > nub of my question: the INET_ECN_set_ce is done on every iteration of > that loop...with its potential early escape..do we escape on every > iteration? Do we need to twiddle the ECN bits on every packet that > we're about to drop? And we seem to mark the packet on exit of the loop > anyway. It’s rather oddly structured code, to be sure. The vital clue for you may be that you can only set CE on an IPv{4,6} packet which already has something *other* than Not-ECT set; it’s impossible for non-IP packets, and not ECN compliant for Not-ECT IP packets. So INET_ECN_set_ce() returns true only when it succeeds. On a UDP flood stream, typically Not-ECT is set, so the early-out never triggers. Instead Codel drops a packet, schedules the next drop, and checks whether the next drop schedule has already been reached (which can happen for high drop rates and/or slow transmission rates). It then attempts to set CE on the first packet transmitted after a drop sequence, just in case it was a mixed ECT/Not-ECT stream; strenuous efforts to get the congestion signal heard as early as possible. This is more likely when flow isolation isn’t in use, or when it is per-host instead of per-flow. That’s also why the first attempt to set CE is within the drop loop. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Fwd: [Codel] fq_codel_drop vs a udp flood
> On 6 May, 2016, at 22:14, David Lang <da...@lang.hm> wrote: > > On Fri, 6 May 2016, Jonathan Morton wrote: > >>> On 6 May, 2016, at 21:50, David Lang <da...@lang.hm> wrote: >>> >>> what IP id are you referring to? I don't remember any such field in the >>> packet header. >> >> It’s the third halfword. > > half a word is hardly enough to be unique across the Internet, anything that > small would lead to lots of attackes that inserted garbage data into threads. It doesn’t need to be globally unique. It merely identifies, in conjunction with src/dst address pair (so 80 bits in total), a particular sequence of fragments to be reassembled into the original packet. If the fourth halfword is zero (or has only the Don’t Fragment bit set), the IP ID field has no meaning. Hence the entire second word can be considered fragmentation related. I agree that it’s not a very robust mechanism; it breaks under extensive packet reordering at high packet rates (circumstances which are probably showing up in iperf tests against flow-isolating AQMs). It would be better not to have fragmentation at the IP layer at all. But it’s not as bad as you say; it does work for low packet rates, which is all it was intended for. Here’s my preferred reference diagram: https://nmap.org/book/tcpip-ref.html - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Codel] Proposing COBALT
> On 28 Jun, 2016, at 11:40, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > Would you like me to split out 'sparse_flows' and 'decaying_flows'? No. A flow with BLUE active won’t be in “decaying flows” continuously until traffic ceases on it, but will likely jump rapidly between “decaying”, “sparse” and possibly “bulk” as well, if the drop rate is temporarily too high. Jittery stats are hard to read and use. I also have an independent reason to avoid adding more stats (without deleting some others) - there is only so much vertical space on my PowerBook’s screen. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] flow dissector idea/enhancement - help
> On 30 Jun, 2016, at 12:33, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > +#ifdef CONFIG_NET_SCH_ESFQ_NFCT > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct = nf_ct_get(skb, ); > +#endif Good find. If this actually works the way we want it to, it’ll make all the host-dependent modes (including triple-isolation) much more useful on the outer side of a NAT. My main concern is that the conntrack state might not be sorted out until it hits the firewall or routing logic. I’ll be very pleased if it happens sooner, or is actually triggered by the query rather than passing to some specific stage of processing. I have other work to do on the host and flow processing, but I think that’ll be independent of the hash function, which is where you want to be looking. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] conntrack lookup continuation
> On 3 Feb, 2017, at 21:01, John Sager <j...@sager.me.uk> wrote: > > As cake uses diffserv to classify, it would be good to carry dscp in the > conntrack & transfer it to incoming packets with an 'action' on the ingress > filter, but carrying dscp specifically in the conntrack record would be > quite a significant change to other parts of linux. Hence the use of fwmark > and the conntrack mark field, which already exist. Yes, this is what I thought you meant. As fwmark just sets “a number” on the conntrack record, there’s no reason in principle not to have that number be a DSCP (or some reasonably transformed representation of one). The trick is then for cake to extract that number from the conntrack record (having looked it up), and if it looks valid, to use it as the packet’s DSCP instead of the one on the packet itself. In principle, that should not be difficult. For the moment however, I’ve got my hands full with writing a report on performance tests I’ve been running, and then getting reacquainted with some code changes that happened while I was looking elsewhere. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] conntrack lookup continuation
> On 31 Jan, 2017, at 16:49, Felix Resch <ful...@beif.de> wrote: > > Since we now already do the conntrack-lookup for the nat keyword, would it be > expensive to implement a kind of internal conntrack-mark-and-restore by > cake-tin? > > E.g. when traffic leaves throu canke tin#x, the conntrack entry will get a > fwmark and return traffic is put in the corresponding tin/bin on the ingress > cake. That’s an interesting idea. At this point I don’t know how easy it is to implement, though. Certainly we need to clean up some other things first. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] diffserv3 tin 2 target 50% of interval?
> On 22 Feb, 2017, at 13:12, Pete Heist <petehe...@gmail.com> wrote: > > Ok, but for what it’s worth, so far I’m not seeing this confer any benefit as > far as latency is concerned. I will make full results available later, but > for now, here are two plots for the rrul test The RRUL test, when viewed in Flent, only shows the latency induced by one flow (bulk) on another (ping). This is influenced mainly by the flow-isolation and priority-queue mechanisms, not by AQM. Where AQM helps is the effect of a flow on its *own* latency. A bulk flow benefits relatively little from reduced latency, and mainly in the area of loss recovery; it also wants to operate in (or very near) the saturated regime as much as possible. A latency-sensitive flow, by contrast, wants to avoid the saturated regime and its induced latency completely, and will accept higher packet loss to achieve that. Cake keeps the inter-flow induced latency down to very near its theoretical minimum, given certain practical constraints such as timer and CPU-scheduling latency. That’s mostly why you don’t see a latency difference in RRUL. The other major factor is that RRUL loads all tins with bulk data, which means the Voice tin in particular is running in deprioritised mode. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] diffserv3 tin 2 target 50% of interval?
> ... ‘interval' should generally remain at 100ms and that ‘target' should be > computed at around 5-10% of interval, and preferably closer to 5%. > Is there a justification for setting the interval outside the guidelines > suggested by CoDel’s authors? The interval is reduced on Tin 2 because it is intended for latency-sensitive traffic, which merits a more aggressive AQM response than for best-effort traffic, which tends to be more throughput-sensitive. This has the happy side-effect of giving an additional incentive to not use latency-sensitive DSCPs for bulk traffic. It looks like you still aren’t using the latest version of tc, as that identifies the three tins as “Bulk”, “Best Effort”, and “Voice”, rather than numerically. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Cake latency update
> On 9 Feb, 2017, at 18:36, Pete Heist <petehe...@gmail.com> wrote: > > I’m seeing good latency results for Cake at lower MCS levels (graphs below), > in case that wasn’t already known. Yes - despite its complexity, Cake has always performed well on latency in comparison to other qdiscs. I gather this time you’re comparing it against the mac80211 fq_codel, rather than a conventional qdisc stack? - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Cake latency update
> On 10 Feb, 2017, at 11:21, Pete Heist <petehe...@gmail.com> wrote: > > Here are the results at various bitrates (all half-duplex rate limiting on > this CPU). Hold on a minute. What does “half-duplex rate limiting” mean exactly? - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Cake latency update
> On 10 Feb, 2017, at 12:05, Pete Heist <petehe...@gmail.com> wrote: > > It means that both the ingress and egress have been redirected over the same > IFB device and QoS'd together. Okay, I guessed as much but wanted to be sure. I can’t think of any theoretical reason for these results. Cake’s flow isolation should be robust enough to cope transparently with bidirectional traffic in half-duplex mode. As you say, a C2D should easily be able to keep up, and at these modest rates I can even discount PCI bandwidth as a concern. So I might need to try to reproduce it here. Does the problem go away if you use a wired link with the same setup otherwise? Or is that inconvenient to try? I have some ath9k equipped machines, but they would need to be set up. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Make-wifi-fast] Flent results for point-to-point Wi-Fi on LEDE/OM2P-HS available
> On 16 Feb, 2017, at 18:51, Pete Heist <petehe...@gmail.com> wrote: > > At first I was thinking to just remove diffserv markings entirely, say with > Cake’s besteffort flag, but I think that “good” and “otherwise unknowing” > users would suffer, which I think in FreeNet is a vast majority of users. That’s not what the “besteffort” flag does. It ignores DSCPs and puts all traffic into a single tin, but doesn’t remove the DSCP marking. >> In a sense if there are thresholds for permissible VO/VI traffic fractions >> below which the AP will not escalate its own priority this will come close >> to throttling the high priority senders, no? > > I thought Aaron’s suggestion sounds both sensible and not difficult to > implement. That way we wouldn’t even have to regularly monitor it, and anyone > who is marking all their packets thinking they’re doing themselves a favor is > just limiting their max throughput. > > Could there be another keyword in Cake to do this automatically, say > “fairdiffserv", or would this just be feature bloat for what is already a > sophisticated shaper? I don’t know if there are sensible mappings from dscp > value to max percentage throughput that would work most of the time, or if > there could also be an adjustable curve parameter that controls the > percentage backoff as you go up dscp levels. This is actually what Cake already does by default (the “diffserv3” mode). If you look at the detailed statistics (tc -s qdisc), you’ll see that each tin has a “threshold” bandwidth. If there’s more traffic than that threshold in that tin, the tin will be deprioritised - it can still use all of the bandwidth left spare by other tins’ traffic, but no more than that. Additionally, diffserv3 mode uses more aggressive AQM settings on the “voice” tin than the “best effort” tin, on the grounds that the former is a request for minimum latency. This should also discourage bulk traffic from using unnecessarily high DSCPs. However, in both the “besteffort” and “diffserv3” cases, the DSCP may be interpreted independently by the NIC as well as Cake. In the case of wifi, this affects the medium grant latency and priority. If the link isn’t saturated, this shouldn’t affect Cake’s prioritisation strategy much if at all, but it does have implications for the effect of other stations sharing the frequency. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake at longer rtts
> On 18 Jan, 2017, at 05:35, Dave Taht <dave.t...@gmail.com> wrote: > > It has been quite some time since I looked at cake at longer RTTs. > > Here's a single flow test comparing it to sonic fiber default fifo > (50ms delay in the ONT), to fq_codel, to cake at 110mbit. > > http://www.taht.net/~d/sonic_cake_vs_fq_codel_vs_fifo_70ms.png Seems to be more aggressive than fq_codel on the slow-start response, but very similar thereafter. I think that’s consistent with what I believe I’ve implemented, and with some tests I’ve been running over here. In my experience, Westwood+ performs better than CUBIC with Codel, because it estimates the true BDP better and hops directly to it when the signal arrives. Probably BBR has similar properties, but with different semantics. I sometimes use longer assumed-RTTs, ie. “oceanic” or “satellite”, in order to get a little more throughput. However, this significantly reduces the latency control in ingress mode, because every slow-start is then uncontrolled for a full second. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] 5ms target hurting tcp throughput tweakable?
> On 26 Feb, 2017, at 15:16, Andy Furniss <adf.li...@gmail.com> wrote: > > Is there any way or plans to allow users to relax slightly the target? You can do that by selecting a higher assumed RTT, for instance with the “oceanic” or “satellite” keywords. This also increases the interval, which makes the AQM less aggressive in general. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Putting cake under dsmark on ingress ifb
> On 26 Feb, 2017, at 15:41, Andy Furniss <adf.li...@gmail.com> wrote: > > This works in the sense that I can now get tcpdump on a PC on the > lan to show incoming icmp with tos 0xb8. > > So I try to add cake under dsmark ini the hope that it sees icmp as ef > but it doesn't, icmp goes to best effort. I suspect packets are queuing in cake first and only then going through dsmark. What happens if you reverse the nesting? - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Putting cake under dsmark on ingress ifb
> On 28 Feb, 2017, at 16:11, Andy Furniss <adf.li...@gmail.com> wrote: > > tc filter add dev ppp0 parent : prio 1 protocol ip u32 match ip protocol > 1 0xff action xt -j DSCP --set-dscp-class ef action mirred egress redirect > dev ifb0 Okay, that’s potentially useful information - that a qdisc filter action can set DSCPs (albeit in this case by circuitous means). An action module which does that more directly would undoubtedly be easier to use and more efficient, but one doesn’t seem to exist yet. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [RFC PATCH] Don't give malicious CE senders a free ride
> On 6 Oct, 2016, at 07:28, Dave Taht <dave.t...@gmail.com> wrote: > > After doing myself in with BBR not respecting CE marks, I'm now thinking > that perhaps the right thing for red/fq_codel/cake/pie/etc to do is actually > *drop* stuff that it sees that is already CE marked, when it too wants > to mark, instead of giving it a free ride. > > It's an unfriendly world out there. I’m consciously relying on FQ behaviour and the BLUE component of COBALT to take care of flows unresponsive to CE marks. They harm their own latency and don’t get (much) improvement in throughput, FQ keeps them from impacting other flows, and BLUE starts dropping if their window growth is unbounded. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] BUG_ON vs WARN_ON
> On 6 Oct, 2016, at 07:19, Dave Taht <dave.t...@gmail.com> wrote: > > master Right - that’s stable code. I’m doing experimental stuff in the cobalt branch. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] BUG_ON vs WARN_ON
> On 5 Oct, 2016, at 18:45, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > I wonder what it was that caused yesterday's issues? I really must try again > when I've more time to get proper access. I’m having trouble reproducing it here. I know one of my boxes froze the very first time I loaded it, but it’s been running fine ever since. Another machine is currently refusing to insert the module, claiming a wrong exec format. It’s all a bit bizarre. I do have a few more avenues of enquiry to explore, though. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] BUG_ON vs WARN_ON
> On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > How amenable are you to changing all 4 BUG_ON instances in cake to WARN_ON? > > Linus isn't a complete fan and I'm thinking that producing a stack trace and > trying to carry on is more helpful to a remote accessed, no serial interface > type device than just killing the kernel dead. > > Quite possibly other bad things(tm) will happen shortly after...or maybe > there will be enough time look at dmesg for that stack trace. The two in cake_heap_swap() can probably go away completely - they were there to make sure the heap algorithms were working correctly. They never did trigger, as it happens. The other two are genuine serious bugs (array overflow) if they ever trigger. It’s safer to leave them as-is. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Master branch updated
> On 4 Oct, 2016, at 11:46, moeller0 <moell...@gmx.de> wrote: > > About that PTM accounting, could you explain why you want to perform the > adjustment as a a “virtual” size increase per packet instead of a “virtual” > rate reduction? The shaper works by calculating the time occupied by each packet on the wire, and advancing a virtual clock in step with a continuous stream of packets. The time occupation, in turn, is calculated as the number of bytes which appear on the wire divided by the number of bytes that wire can pass per second. As an optimisation, the division is turned into a multiplication by the reciprocal. I’m quite keen to keep the “bytes per second” purely derived from the raw bitrate of the link, because that is the value widely advertised by ISPs and network equipment manufacturers everywhere. Hence, overhead compensation is implemented purely by increasing the accounted size of the packets. I have been careful here to calculate ceil(len * 65/64) here, so that the overhead is never underestimated. For example, a 1500-byte IP packet becomes 1519 with bridged PTM or 1527 with PPPoE over PTM, before the PTM calculation itself. These both round up to 1536 before division, so 24 more bytes will be added in both cases. This is less than 2 bits more than actually required (on average), so wastes less than 1/6200 of the bandwidth when full-sized packets dominate the link (as is the usual case). Users are unlikely to notice this in practice. Next to all the other stuff Cake does for each packet, the overhead compensation is extremely quick. And, although the code looks very similar, the PTM compensation is faster than the ATM compensation, because the factor involved is a power of two (which GCC is very good at optimising into shifts and masks). This is fortunate, since PTM is typically used on higher-bandwidth links than ATM. Now, if you can show me that the above is in fact incorrect - that significant bandwidth is wasted on some real traffic profile, or that cake_overhead() figures highly in a CPU profile on real hardware - then I will reconsider. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] BUG_ON vs WARN_ON
> On 7 Oct, 2016, at 16:27, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > It's now ok...so far :-) Okay. I think I’ve found a couple of other things to improve, so stand by… - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] BUG_ON vs WARN_ON
>> I wonder what it was that caused yesterday's issues? I really must try >> again when I've more time to get proper access. > > I’m having trouble reproducing it here. I know one of my boxes froze the > very first time I loaded it, but it’s been running fine ever since. Another > machine is currently refusing to insert the module, claiming a wrong exec > format. It’s all a bit bizarre. > > I do have a few more avenues of enquiry to explore, though. Aha - I managed to capture a kernel panic, which appears to trace to the lookup in the accelerator array. It’s a read-only access, so it only panics if it hits unpaged memory, rather than corrupting anything. Of course, if it reads outside the array, it’ll increment the deficit by a random value, but that usually won’t prevent traffic flowing. The lookup is indexed on the host refcnt, which I’m using as the count of flows attached to that host. It seems likely that it isn’t being maintained correctly in all cases, so it can wrap around past zero very soon after being attached, without needing much traffic. I’ll try to fix that, and put a sanity check in as well to be certain. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Master branch updated
> On 4 Oct, 2016, at 18:22, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > Ha ha! I don't know if you're back from shopping yet...and I'm not sure that > I've broken it (cobalt branch)...but it has broken my router! Hmm. It’s been running all day with plenty of traffic over here - but it did crash the very first time I loaded it, just not the second. I will need to exercise it some more, preferably on a non-critical machine. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake for net-next 4.8
> On 25 Sep, 2016, at 21:30, Dave Taht <dave.t...@gmail.com> wrote: > > I quickly got sch_cake to work on top of net next. The attached diff > is probably not correct in some respect or another (what's to_free > for? And it looks like statistics collection has been parallelized > elsewhere) Yet another mail I had to fish out of my spam folder. Google really doesn’t seem to like you - I’m going to try looking for a whitelist option in the webmail interface (which I usually avoid). I’m pretty sure to_free is for attaching a list of skbs for deferred bulk deletion. Might help under sustained overload, but not in the common case. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] de-natting & host fairness
> On 28 Sep, 2016, at 06:33, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > those ternaries are if/else in disguise... Many CPUs can handle those as conditional moves without branching - including ARM in particular; near-universal conditional execution was one of its original headline features. Most x86 CPUs (except very old ones) and some of the embedded-class PowerPCs (which are often found in “big” network appliances) also qualify. Unswitching those would potentially be a retrograde step on those CPUs. However the presence of a conditional function call suggests that unswitching would not in fact be harmful, except for some duplication of source code - since the branch has to be made anyway. I think many compilers would be able to perform the loads before the branch and the stores after it, which would execute very slickly, while some CPUs do not execute large numbers of conditional moves very efficiently. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] de-natting & host fairness
> On 26 Sep, 2016, at 06:20, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > Another github user 'tegularius' presented some beautifully crafted code that > did the lookups in a much neater way. Originally it too had an 'ingress' > lookup problem. This was worked on and I hacked some conditional 'denat' > options into cake & tc. > > For your 'delight' a denat cake > https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with a > matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat As I’m now at the stage of trying to merge this, I’m going to make some executive design decisions: - De-NAT IPv4 packets only. I think it’s safe to assume that IPv6 NAT will be rare, and in any case will typically preserve host distinctions. This eliminates switch blocks in favour of simple if blocks. - Don’t bother with the distinction between src-NAT and dst-NAT lookups. The full lookup has to be done anyway and then masked off, the use-case for the limited functionality is nebulous, and all we’re doing is adding a lot of nasty conditional branches to the fast path. This in turn reduces the configuration interface for the feature to a flag, which I’ll call “nat”. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake for net-next 4.8
> On 27 Sep, 2016, at 19:04, Dave Taht <dave.t...@gmail.com> wrote: > > Annoying. Perhaps my link to the blog in my .sig? Perhaps they object > to my verbosity? This seems relevant in the headers: Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com; spf=pass (google.com: best guess record for domain of cake-boun...@lists.bufferbloat.net designates 45.79.142.77 as permitted sender) smtp.mailfrom=cake-boun...@lists.bufferbloat.net; dmarc=fail (p=NONE dis=NONE) header.from=gmail.com I forget exactly how DKIM and DMARC work, but DKIM seems to be stymied by the list adding its own message footer (which is nevertheless best-practice). I don’t know how relevant a DMARC fail is to their filter, or how relevant it *should* be. Could the listserver replace an original, verified DKIM certificate with its own after adding the footer? On the upside, I was able to add a filter specifically saying “never send to Spam folder”, and it appears to be working so far. But everyone probably needs to do that; it’s not a scalable solution, only a workaround. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake for net-next 4.8
> On 25 Sep, 2016, at 21:30, Dave Taht <dave.t...@gmail.com> wrote: > > Judging from me tearing apart how TCP BBR works (presently) with ecn, > it looks like we need to add the equivalent to fq_codel ce_threshold > behaviors as well. If I’m reading the legend correctly, you are setting ce_threshold to 1ms to get the better-controlled result. But that effectively disables the codel algorithm and turns it into a simple “mark all packets over 1ms sojourn” for ECN capable traffic, because it’s a tighter limit than codel’s target. That’s too aggressive for non-BBR traffic. In these cases, I think you have to relax and let the FQ action take care of it. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake for net-next 4.8
> On 27 Sep, 2016, at 22:29, Dave Tahtwrote: > > OK, at some point, but I have to abandon the lab, it looks like: Yet another reason to be glad I live in a cold and slightly damp country. > https://groups.google.com/forum/#!topic/bbr-dev/VNUBKAeJSdc If BBR is not currently responding to CE marks, then ce_threshold should have no effect. This leaves me confused as to what the graph actually shows. But COBALT does have a better chance of controlling the spike after a short gap, because its resuming behaviour is more aggressive than standard Codel. It will just have to do so with drops rather than marks. - Jonatan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] de-natting & host fairness
> On 26 Sep, 2016, at 16:28, moeller0 <moell...@gmx.de> wrote: > > Does that mean an initial packet(s) for a flow will be “misclassified” (not > really since there should be no record yet to snatch the translated IP from) > do all those initially non-classified packets end up in the same bin? The initial packet will normally be outgoing, so it’ll go through conntrack before reaching the qdisc. If it’s incoming, then it’ll be “related to” an existing connection or else won’t be natted - though I’m not sure whether “related” connections pre-emptively get conntrack entries before traffic has been seen. If not, that initial packet will be associated with the NAT box by the qdisc, rather than the internal host, while subsequent packets will correctly be associated with the internal host. That assumes we have qdiscs attached to the egress and ingress sides of a WAN-facing interface, as normally desired. The code looks sane at first glance, so I’ll give it a try at my end. With any luck, I’ll be able to improve triple-isolate’s performance enough to make that the default, too. I should probably use a different data structure than a ring buffer, so that there is less in the way of linear searching for an unblocked flow. The current default is “flows”, which doesn’t need NAT information to unambiguously distinguish flows from each other. However, “hosts” mode does need it when running in a NAT environment, otherwise internal hosts will erroneously be lumped together with the NAT box. Triple-isolate is effectively a combination of “hosts” and “flows” - that is probably the easiest way to understand it. I think it is reasonable to turn on conntrack lookups by default whenever host information is relevant. This is potentially true for all modes except “flowblind” and “flows”. Also long overdue are the more subtle overhead compensation factor for PTM, and the two extra keywords for DOCSIS’ asymmetric overhead. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
[Cake] Master branch updated
I’ve just merged the NAT, PTM and Linux-4.8 compatibility stuff into the master branch of Cake. It’s stable code and a definite improvement. This frees up the Cobalt branch for more experimentation, such as the rewrite of triple-isolate that I also just pushed. I found a way to make it more DRR-like, by simply scaling down the quantum used for each host by the number of flows attached to that host. I still need to test whether it works as well as the old version, but it should at least be less CPU intensive. In particular it should no longer require bursts of CPU activity when the host deficits expire, and host deficit expiry should no longer be explicitly synchronised. See if, between you, you can break it before I get back from shopping. :-) - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Cake Diffserv8 Incorrect?
That looks normal to me. If you really want a 1:1 mapping from CS to tin, you should use "precedence". But that's an obsolete interpretation of the TOS field. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] diffserv based on firewall mark
> On 12 Oct, 2016, at 08:52, ching lu <lschin...@gmail.com> wrote: > > I deprioritize bittorrent traffic by marking related connections in > iptables (e.g. detect by port number) and route them to corresponding > HTB class and qdisc. > > How can i archive the same goal using the cake qdisc? Modify your iptables rules to set the DSCP rather than a kernel-internal mark. You probably want "-j DSCP —set-dscp-class CS1”, as CS1 is the “bulk low priority” code. Cake’s default Diffserv mode will pick that up appropriately. You also need to make sure Cake sees your packets *after* they’ve been through the firewall, which generally means attaching it to the egress port in each direction, not the ingress port. You’ve probably already done this, if you’re happy with your HTB setup. If you have multiple LAN interfaces (eg, both Ethernet and wifi), you should loop the inbound traffic through a common IFB device (and attach Cake to that instead of the physical interfaces) to simplify configuration. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Master branch updated
> On 4 Oct, 2016, at 19:28, Jonathan Morton <chromati...@gmail.com> wrote: > >> Ha ha! I don't know if you're back from shopping yet...and I'm not sure >> that I've broken it (cobalt branch)...but it has broken my router! > > Hmm. It’s been running all day with plenty of traffic over here - but it did > crash the very first time I loaded it, just not the second. I will need to > exercise it some more, preferably on a non-critical machine. Okay, that bug is fixed and I’ve made further improvements to the triple-isolate algorithm. It no longer needs quite as much spaghetti logic in the fast path, and might even be easier to understand from reading the code, since it’s now more obviously a modification of DRR++ rather than a brute-force wrapper around it. It should certainly give smoother behaviour and be less CPU intensive in common cases. In brief, what I now do is to scale the *flow* quantum down by the higher of the two hosts’ flow counts. I’ve even dealt with underflow of the quotient using a dithering mechanism, which should also ensure that flows random-walk out of lockstep with each other. It works sufficiently well that I was able to set Cake to 2.5Mbit besteffort triple-isolate, then watch a 720p YouTube video on one machine while another was downloading a game update using a 30-flow swarm. I’d call that a success. Hammer away at it, and then we’ll see if we can merge it up to master. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] upstreaming cake in 2017?
> On 23 Dec, 2016, at 03:43, Stephen Hemminger <step...@networkplumber.org> > wrote: > > It would also help to have a description of which use-case cake is trying to > solve: > - how much configuration (lots HTB) or zero (fq_codel) One of Cake’s central goals is that configuration should be straightforward for non-experts. Some flexibility is sacrificed as a result, but many common use-cases are covered with very concise configuration. That is why there are so many keywords. > - AP, CPE, backbone router, host system? The principal use-case is for either end of last-mile links, ie. CPE and head-end equipment - though actual deployment in the latter is much less likely than in the former, it remains a goal worth aspiring to. This is very often a bottleneck link for consumers and businesses alike. Cake could also be used in strategic locations in internal (corporate or ISP) networks, eg. building-to-building or site-to-site links. For APs, the make-wifi-fast stuff is a better choice, because it adapts natively to the wifi environment. Cake could gainfully be used on the wired LAN side of an AP, if inbound wifi traffic can saturate the wired link. Deployment on core backbone networks is not a goal. For that, you need hardware-accelerated simple AQM, if anything, simply to keep up. > Also what assumptions about the network are being made? As far as Diffserv is concerned, I explicitly assume that the standard RFC-defined DSCPs and PHBs are in use, which obviates any concerns about Diffserv policy boundaries. No other assumption makes sense, other than that Diffserv should be ignored entirely (which is also RFC-compliant), or that legacy Precedence codes are in use (which is deprecated but remains plausible) - and both of these additional cases are also supported. Cake does *not* assume that DSCPs are trustworthy. It respects them as given, but employs straightforward countermeasures against misuse (eg. higher “priority” applies only up to some fraction of capacity), and incentives for correct use (eg. latency-sensitive tins get more aggressive AQM). This improves deployability, and thus solves one half of the classic chicken-and-egg deployment problem. So, if Cake gets deployed widely, an incentive for applications to correctly mark their traffic will emerge. Incidentally, the biggest arguments against Precedence are: that it has no class of *lower* priority than the default (which is useful for swarm traffic), and that it was intended for use with strict priority, which only makes sense in a trusted network (which the Internet isn’t). If you have complex or unusual Diffserv needs, you can still use Cake as leaf qdiscs to a classifier, ignoring its internal Diffserv support. Cake's shaper assumes that the link has consistent throughput. This assumption tends to break down on wireless links; you have to set the shaped bandwidth conservatively and still accept some occasional reversion to device buffering. BQL helps a lot, but implementing it for certain types of device is very hard. Conversely, Cake’s shaper carefully tries *not* to rely on downstream devices having large buffers of their own, unlike token-bucket shapers. Indeed, avoiding this assumption improves latency performance at a given throughput and vice versa. Cake also assumes in general that the number of flows on the link at any given instant is not too large - a few hundred is acceptable. Behaviour should degrade fairly gracefully once flow-hash collisions can no longer be avoided, and will self-recover to peak performance after anomalous load spikes. This assumption is however likely to break down on backbones and major backhaul networks. Cake does support treating entire IP addresses as single flows, which may extend its applicability. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] an experiment with an alternate hasher
> On 26 Mar, 2017, at 19:00, Dave Taht <dave.t...@gmail.com> wrote: > > popcount is, regrettably, an sse4.2-only instruction A read through the ARM ISA Quick Reference Card: http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf …shows that there is no equivalent instruction on ARM CPUs at least up to ARMv7, which I think covers all current-generation consumer-grade routers. However, the operation can be constructed using log2(N) operations on any modern CPU as a sequence of masks, shifts and adds. GCC has a “builtin” intrinsic function to use a popcnt instruction where present, and this algorithm otherwise. Obviously this will only be of any use if the resulting hash is of good quality. An obvious problem with popcnt is that inputs of 1, 2, 4, 8, etc have the same popcnt (1), and it is trivial for an attacker to exploit this property. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Choosing a tin to work on
> On 12 Apr, 2017, at 15:48, xnor <xno...@gmail.com> wrote: > > Hey, > >>>s64 tdiff = b->tin_time_next_packet - now; >>>if(tdiff <= 0 || tdiff <= best_time) { >>>best_time = tdiff; >>>best_tin = tin; >>>} >> >> I'll try to answer this based on a vague bit of understanding...and then >> Jonathan can shoot me down as well :-) >> >> So the 'best_time' is signed because we can have a packet in a tin that is >> not yet due to be sent (a positive result...ie. we have got here early) >> and/or we can have a tin that is due now/overdue (a zero/negative result, >> we've got here late) >> >> The complication is that we can have multiple tins overdue, so we want the >> highest priority *and* least overdue (least late) tin - this is the reason >> for searching in tin_order[oi] and as a result tdiff can be <=0 and bigger >> than best_time. >> >> best_time is initialised to the 'most early' time possible. > that makes no sense to me. > > best_time is initialized to some high value (though why is it not 0x7FFF > L, the highest possible positive s64?) such that no matter how far > tdiff is in the future, best_time will always be set to the lower tdiff. > (Just like you would initialize a variable keeping track of the max to the > lowest possible value, you set a min variable to the highest possible value.) > > But if you wanted best_time to end up as the lowest value, then you would > have to only set it if tdiff < best_time (or <= if you actually want to > prefer the last tin if they happen to have the same tdiff), and not also if > tdiff <= 0. > > For example, if tdiff values were 5000, -5000, 0 then best_time would be set > to 0. The last value less than or equal to zero will always win as best_tin. > If all tdiff values are positive then best_time will end up as the lowest > value however. For some reason I never seem to have got the initial question post. The two of you are broadly right though. The intention here is: - Find the highest-priority tin with a packet already due; - If none are due yet, find the one with a packet due soonest. This should have the same type of “soft priority” behaviour as the previous WRR version, but I was hoping it could reduce the latency for processing sparse low-latency tins while under steady bulk load. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] A quick question about FQ_CoDel vs Cake
> On 6 Apr, 2017, at 16:51, Luis E. Garcia <l...@bitamins.net> wrote: > > I've been doing some testing of Cake on LEDE (WD MyNet 750) and on EdgeOS > (Ubiquity ERPoe). One big question that I have is why does Cake have a > higher/better average throughput than FQ_CoDel? The graph seems a bit > smoother through the speed test. > > The test are against a 10down/2up Mbps connection from a local provider. The main difference that’s probably responsible for this is Cake’s integrated deficit-mode shaper, which is more accurate on short timescales than the more typical token-bucket shaper that fq_codel is used with. There’s also some difference in the Codel implementation which might or might not be relevant, specifically in the calculation of “count” after a relatively brief exit from dropping state. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] flow isolation for ISPs
> On 6 Apr, 2017, at 11:27, Pete Heist <petehe...@gmail.com> wrote: > > There is a table of member ID to a list of MAC addresses for the member, so > if there could somehow be fairness based on that table and by MAC address, > that could solve it, but I don’t see how it could be implemented. One option would be to use HTB with FLOWER filters to sort out the subscribers into classes, and use Cake or fq_codel as a child qdisc per class. Remember that Cake can be used in “unlimited” mode to rely on an external shaping source. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Questions on Cake/cobalts experiment ingress mode
> On 6 Apr, 2017, at 12:17, Kevin Darbyshire-Bryant > <ke...@darbyshire-bryant.me.uk> wrote: > > Can I now apply two qdiscs to eth0 directly??? An ingress mode cake and an > egress mode cake? Or rather do I just need to tell my existing IFB cake > instance to run in 'ingress' mode. The latter. The fundamental limitation has not been removed (and is not in Cake itself); this is just a modification to how dropped packets are accounted for when the bottleneck is upstream of Cake. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Getting Cake to work better with Steam and similar applications
> On 20 Apr, 2017, at 18:23, Dendari Marini <dendar...@gmail.com> wrote: > > > Could you post the output of calling “tc -s qdisc” here on the list please? > > That should allow to figure out what you actually told cake to do ;0 > qdisc cake 8001: dev eth0 root refcnt 2 bandwidth 900Kbit diffserv3 > dual-srchost nat rtt 100.0ms raw > qdisc cake 8002: dev ifb4eth0 root refcnt 2 bandwidth 16Mbit diffserv3 > dual-dsthost nat ingress rtt 100.0ms raw Looks like most of your options are okay, including the correct “dual” modes and “ingress” mode in the right place. However, I think you need to adjust your bandwidth and overhead settings, otherwise Cake isn’t reliably in control of the bottleneck queues. Try these to begin with: … bandwidth 850Kbit conservative dual-srchost nat … bandwidth 15Mbit conservative dual-dsthost nat ingress That should give you correct operation, and you can fine-tune from there. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Getting Cake to work better with Steam and similar applications
> On 20 Apr, 2017, at 19:05, Dendari Marini <dendar...@gmail.com> wrote: > > Just did quick test with your settings. First thing I noticed is my final > download bandwidth is about 12Mbps, Steam on PC1 downloads at 1.4-1.5MB/s > while downloading a file on PC2 seems to max out at ~250KB/s. From my > understanding I should see each PC download at ~700KB/s, or am I mistaken? It should be possible to get equal bandwidth to each PC, yes. It could be that with such a strongly asymmetric connection, Cake still isn’t actually the bottleneck in the download direction. Try setting it to 10Mbit and see if that brings it under better control. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Getting Cake to work better with Steam and similar applications
>> So please add “atm overhead 32" to cake on eth0 or “atm overhead 40” to cake >> instances on pppoe (these packets do not have the PPPoE header added yet and >> hence appear 8 bytes to small). > > Thanks for your help, will definitely use them. Just wondering if I use > "pppoe-vcmux/bridged-llcsnap" on eth0 or "pppoe-llcsnap" on pppoe0 would have > the same effect? Or are there some other "under-the-hood" changes when using > them? On the pppoe interface, use pppoe-vcmux if your modem is set to use VC-MUX, or pppoe-llcsnap if it’s set to use LLC-SNAP (they might be described using slightly different terms, but should still be recognisable as one or the other). This probably depends on your ISP, and may further vary regionally within the same ISP. I really prefer to use the self-explanatory keywords (which is why I added them in the first place) instead of opaque magic numbers. This is a point on which Sebastian has long disagreed with me. >> Question: if you set the shaper’s to 50% of line rate (8.75/0.5?) do you >> still see that unfairness? And if you add “atm overhead 40” to cake on >> pppoe0 and set the shaper to 90% of line rates (15.75/0.9) how does the >> Steam affect per-host fairness? Also how transient are these connections >> team uses? > > Actually did more testing about this and it seems that as far I have set the > bandwidth to ~15Mbps (so ~15% less of my max speed) and use the "nat" > parameter, the per-host fairness works even without the "dual-host" and > "overhead" parameters. I definitely find this very interesting, is this > behaviour caused by the way Steam downloads games? By default, Cake uses triple-isolate mode, which uses information about both source and destination hosts to perform per-host isolation; this usually works well regardless of which side of the connection has the LAN hosts. The “dual” modes let you specify that fact explicitly, making it a little more robust and predictable. Without overhead compensation, Cake will actually use more of the physical link than it thinks it does - by default it only accounts for raw IP or Ethernet packets, depending on the type of interface it’s attached to. With full-size packets as in a bulk download, the difference is relatively small, so the 15% margin is just about sufficient to make things work. But with small packets mixed in, the difference grows, such that Cake might no longer control the bottleneck with some traffic mixes. The “conservative” keyword I recommended earlier (which is exactly equivalent to Sebastian’s recommendation of “atm overhead 48”) reverses that situation; Cake will then always end up using *less* of the physical link than it accounts for, which is safe for troubleshooting with. The keyword is there specifically so that we do’t have to figure out the precise overhead profile before tackling more substantive issues. At any rate, it has nothing to do with Steam specifically. >> As far as I can tell cake can drill down to the required IP/TCP/UDP fields >> independent of whether there are VLAN tags or PPPoE headers so cake should >> not care (except for the different overhead specifications you need to add >> as stated above). BUT if instantiated on eth0 cake will see pppoe LCP >> packets and might decide to drop them, which can take down the link, so out >> of caution I would still instantiate on pppoe in your case. > > Yeah, with further testing it seems the interface wasn't the culprit but I'll > still do all my testing on pppoe0 just to be safe. > > Anyway I was wondering if there's some kind of manual for Cake and the > various parameters, I'm looking to set it up best way possible but there are > some parameters which I'm not sure what they do (one of them being "ingress”). With the correct version of iproute2 installed, just issue “man tc-cake”. That’s the official documentation. Currently it doesn’t have the ingress keyword yet. That’ll be fixed soon. > Also while reading on the bufferbloat.net Cake page I noticed a possible > "fix" for BitTorrent (by setting it as "background", > https://www.bufferbloat.net/projects/codel/wiki/Cake/#diffserv-support), I'm > wondering if this can be done with Steam too? It’s possible, if you can figure out which traffic is Steam in the first place, and write filters to match on it. This is complicated by the fact that Valve runs a sophisticated CDN to handle their rather impressive bandwidth load. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] low bandwidth default params best effort vs voice latency.
Okay, I think I’ve worked out what is happening. At 250KB/s, it takes 6ms to get one 1500-byte bulk packet down the pipe. This is unavoidable, so having a bulk flow competing with your game traffic will always increase your peak latency by that much. With three independent game streams in play, it’s possible for them all to transmit simultaneously *and* to coincide with a bulk packet having just been sent. With overheads, it will take a total of 8.5ms to get all four packets through. This corresponds nicely to your best-effort results; you’re getting very close to the theoretical best performance there. So Diffserv marking actually can’t improve your performance in this particular case. But it shouldn’t make it worse either. You’re actually seeing a nearly 6ms increase in peak latency, which corresponds neatly to an additional bulk packet ending up ahead of the game traffic in the queue. That’s not supposed to happen, but I think I can see how it *can* happen with the current Diffserv logic. It’s a weighted DRR, much like what is used between flow queues a little further down - but it *doesn’t* have a special bonus for sparse tins. That’s something I clearly need to fix. To help remind me, please do open an issue on the Github project. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [LEDE-DEV] Cake SQM killing my DIR-860L - was: [17.01] Kernel: bump to 4.4.51
> On 3 Mar, 2017, at 06:31, Eric Luehrsen <ericluehr...@hotmail.com> wrote: > > Also with SQM you may not what idealized entropy in your queue > distribution. It is desired by some to have host-connection fairness, > and not so much interest in stream-type fairness. So overlap in a few > hash "tags" may not be always such a bad thing depending on how it works > itself out. That sort of thing is explicitly catered for by the triple-isolate algorithm. I don’t want to rely on particular hash behaviour to achieve an inferior result. I’d much rather have a good hash with maximal entropy. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [LEDE-DEV] Cake SQM killing my DIR-860L - was: [17.01] Kernel: bump to 4.4.51
> On 3 Mar, 2017, at 01:55, John Yates <j...@yates-sheets.org> wrote: > > The virtue of a prime number of buckets is that when you mod > your 32-bit hash value to get a bucket index you harvest _all_ > of the entropy in the hash, not just the entropy in the bits you > preserve. True, but you incur the cost of a division, which is very much non-trivial on ARM CPUs, which are increasingly common in CPE. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [LEDE-DEV] Cake SQM killing my DIR-860L - was: [17.01] Kernel: bump to 4.4.51
> On 3 Mar, 2017, at 01:16, John Yates <j...@yates-sheets.org> wrote: > > What are the requirements for this hashing function? > - How much data is being hashed? I am guessing a limited number of bytes > rather than an entire packet payload. Generally it’s what we call the “5-tuple”: two addresses (which could be IPv4, IPv6, or potentially Ethernet MAC), two port numbers (16 bits each), and a transport protocol ID (1 byte). In Cake, the hash function is by default run three times, in order to get separate hashes over just the source address and just the destination address, as well as the full 5-tuple. These are necessary to operate the triple-isolate algorithm. There may be an opportunity for optimisation by producing all three hashes in parallel. > - What is the typical number of hash table buckets? Is it prime or a power > of 2? It’s a power of two, yes. The actual number of buckets is 1024, but Cake uses the full 32-bit hash as a “tag” for hash collision detection without having to store and compare the entire 5-tuple. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] flow isolation for ISPs
> On 6 Apr, 2017, at 11:27, Pete Heist <petehe...@gmail.com> wrote: > > Suppose there is a cooperative ISP that has some members who access the > network through a single device (like a router with NAT), while others use > multiple devices and leave routing to the ISPs routers. (No need to suppose, > actually.) > > There’s fairness at the IP address level (currently with esfq, maybe soon > with Cake), but it's not fair that members with multiple devices effectively > get one hash bucket per device, so if you have more devices connected at > once, you win. There is a table of member ID to a list of MAC addresses for > the member, so if there could somehow be fairness based on that table and by > MAC address, that could solve it, but I don’t see how it could be implemented. > > Is it possible to customize the hashing algorithm used for flow isolation, > either with Cake or some other way? That is an important use-case, and one that Cake is not presently designed to explicitly accommodate. Currently, the design assumes a single Cake instance per subscriber or household, and fairness between hosts within a household is assumed to be a relatively simple problem. Also, Cake’s general philosophy of simplifying configuration means that it’s unlikely to ever support “lists” or “tables” of explicit parameters. This is a conscious design decision to enable its use by relative non-experts. Arguably, even some of the existing options could reasonably be streamlined away. With that said, a related qdisc *with* such support is eminently feasible, and could easily be the focus of a project. I think it would be worth gathering requirements for such a thing and considering potential funding sources. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Getting Cake to work better with Steam and similar applications
> On 25 Apr, 2017, at 21:22, Dendari Marini <dendar...@gmail.com> wrote: > > The good news is that using switch0 as inbound and pppoe0 as outbound works, > and I was able to set up Steam as bulk using the interface on the ER-X (used > DSCP 8 and used a custom DPI category). I confirmed this was working by > looking at the bulk traffic increasing (using the "tc -s qdisc" command) and > by starting another download (Steam gets pretty much nothing in this case). > > The bad news is this isn't enough to fix my gaming issue (still having ping > spikes, latency variation and packet loss), and even using it with Steam > configured to use just one connection didn't change much from my previous > testing. > > So I'm really confused :\ > What could cause ping spikes in this case (assuming the multiple connections > aren't the issue)? As noted, it’s far more difficult to control latency from downstream of a bottleneck link. If a bulk sender decides to send burstily, those bursts will always collect in the dumb queue at the far end and delay other traffic. The only true solution is to install a smart queue at the upstream end - but that’s not under your control. You may see some improvement from wholesale reducing the inbound bandwidth, to say 10Mbit. This is especially true given the high asymmetry of your connection, which might require dropped acks upstream to keep filled downstream - and dropped acks will tend to increase burstiness of sending on unpaced senders. You should also try to ensure ECN is fully enabled on your LAN hosts, especially the ones running Steam. This will help to reduce retransmissions and loss-recovery cycles. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] flow isolation with ipip
Cake makes use of Linux' "packet dissecting" infrastructure. If the latter knows about the tunnelling protocol, Cake should naturally see the IP and port numbers of the inner payload rather than the outer tunnel. I don't know, however, precisely what tunnels are supported. At minimum, don't ever expect encrypted tunnels to behave this way! - Jonathan Morton On 18 Jun 2017 21:13, "Cong Xu" <davidx...@gmail.com> wrote: > Hi, > > I wonder if cake's flow isolation works with the ipip tunnel? I hope to > guarantee the networking fair-share among containers/VMs in the same host. > Thus, I used sfq/fq to associate with each tc class created in advance to > provide both shaping and scheduling. The scripts roughly look like this > (Assume 2 containers hosting iperf client run in the same host. One > container sends 100 parallel streams via -P 100 to iperf server running in > another host, the other one send 10 parallel streams with -P 10.): > > tc qdisc add dev $NIC root handle 1: htb default 2 > tc class add dev $NIC parent 1: classid 1:1 htb rate ${NIC_RATE}mbit > burst 1m cburst 1m > tc class add dev $NIC parent 1:1 classid 1:2 htb rate ${RATE1}mbit ceil > ${NIC_RATE}mbit burst 1m cburst 1m > tc class add dev $NIC parent 1:1 classid 1:3 htb rate ${RATE2}mbit ceil > ${NIC_RATE}mbit burst1m cburst 1m > tc qdisc add dev $NIC parent 1:2 handle 2 sfq perturb 10 > tc qdisc add dev $NIC parent 1:3 handle 3 sfq perturb 10 > tc filter ad ... > > It works well, each container running iperf gets the almost same bandwidth > regardless of the flows number. (Without the sfq, the container sending 100 > streams acchieves much higher bandwidth than the 10 streams guy.) > > -- simultaneous 2 unlimited (100 conns vs 10 conns) > - > job "big-unlimited-client" created > job "small-unlimited-client" created > -- unlimited server <-- unlimited client (100 conns) > - > [SUM] 0.00-50.01 sec 24.9 GBytes 4.22 Gbits/sec 16874 > sender > [SUM] 0.00-50.01 sec 24.8 GBytes 4.21 Gbits/sec > receiver > > -- unlimited server <-- unlimited client (10 conns) > - > [SUM] 0.00-50.00 sec 24.4 GBytes 4.19 Gbits/sec 13802 > sender > [SUM] 0.00-50.00 sec 24.4 GBytes 4.19 Gbits/sec > receiver > > However, if the ipip is enabled, sfq dose not work anymore. > > -- simultaneous 2 unlimited (100 conns vs 10 conns) > - > job "big-unlimited-client" created > job "small-unlimited-client" created > -- unlimited server <-- unlimited client (100 conns) > - > [SUM] 0.00-50.00 sec 27.2 GBytes 4.67 Gbits/sec 391278 > sender > [SUM] 0.00-50.00 sec 27.1 GBytes 4.65 Gbits/sec > receiver > > -- unlimited server <-- unlimited client (10 conns) > - > [SUM] 0.00-50.00 sec 6.85 GBytes 1.18 Gbits/sec 64153 > sender > [SUM] 0.00-50.00 sec 6.82 GBytes 1.17 Gbits/sec > receiver > > The reason behind is that the src/dst ip addresses using ipip tunnel are > same for all flows which are the src/dst ip of the host NICs instead of > veth ip of each container/VM, and there is no ports number for the outside > header of ipip packet. I verified this by capturing the traffic on NIC and > analyzing it with wireshark. The real src/dst ip of container/VM is visible > on the tunnel device (e.g. tunl0). Theoretically, this issue can be solved > if I set up tc class and sfq on tunl0 instead of host NIC. I tried it, > unfortunately, it did not work either. fq does not work for the same > reason, because both sfq and fq use the same flow classifier (src/dst ips > and ports). So, I just wonder if cake works with ipip tunnel or not. > > I appreciate if you can provide any help based on your expertise. Thanks. > > Regards, > Cong > > ___ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake > > ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] overhead and mpu
I designed Cake to require as little networking expertise as feasible of the end user. You only really need to know your network topology. So don't overthink this - just set "bandwidth 48Mbit docsis ingress" on your inbound shaper, and "bandwidth (whatever) docsis egress" on the outbound. Why 48 instead of 50? Because Cake needs that when downstream of the bottleneck link, just like any other shaper, so that it controls the bottleneck queue. If your ISP decided to install Cake themselves, they wouldn't need that hack, because they have access to the upstream end of the downlink. Now if you still want to know the technical details... Cake's shaper works by calculating the time it takes to get a given packet through the bottleneck link, measured from the beginning of transmission of that packet to the beginning of transmission of the next packet. To do that, it needs to know the number of bytes per second that the link can carry, and the actual number of byte times occupied by the packet on the wire - not all of which might be included in your typical frame header and payload; we also have to count preamble, checksum, quiet times, etc. These are layer 2 concerns, and in fact Cake normally sees layer 2 packets when attached to an Ethernet interface. Just give it the raw data, and it'll take care of nearly all the nasty calculations and expert knowledge you'd need with most other shapers. The rest of Cake is largely concerned with choosing which packet to deliver next, given the choice among all the packets in its queue. This involves examining layer 3 and layer 4 information within the packet, where available. There are layer 3 (IP) packets without a layer 4 (TCP/UDP) payload, and layer 2 packets without an IP payload, which Cake also has to deal with sanely. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] overhead and mpu
Because 1514 is the normal maximum size of an Ethernet frame. This can easily differ from the size actually used on the wire and from the size used by the provisioning shaper. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] best way at getting at tcp ack data?
On 01/10/17 06:10, Dave Taht wrote: I was thinking about how I'd go about adding saner ack filtering [1] to cake I didn't see any obvious improvement in that reference over what Cake does already. What are you thinking of? -- - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] best way at getting at tcp ack data?
Ah, so effectively you want to drop acks more aggressively than data packets when they become a saturating flow in themselves, but without disturbing the cues that TCP relies on. There is some logic behind that, since COBALT ramps up quite slowly with very small packets like acks. I think it's worth opening an issue to remind me to look into that. I still don't have a replacement for my MBP, which is complicating matters here. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] best way at getting at tcp ack data?
There's a much better way to achieve what you want anyway - ECN. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] best way at getting at tcp ack data?
No, because acks are cumulative. Dropping an ack doesn't normally trigger any retransmissions, and is actually difficult to detect reliably at the sender. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake flenter results round 1
Here's the difference between "srchost" and "dual-srchost": the latter imposes per-flow fairness on traffic to each host, with a separate queue/AQM per flow like with "flows". The former only has one queue/AQM per host. Analogously for dsthost. Then "hosts" mode allocates a separate queue for each host-pair encountered. But "triple-isolate" isn't quite analogous to "hosts". Instead it tries to heuristically behave like either of the dual modes, depending on which one is likely to be on the LAN side of the link. This allows it to be a reasonable default setting, though the "dual" modes will perform more reliably if chosen correctly. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake flenter results round 1
It's not at all obvious how we'd detect that. Packets are staying in the queue for less time than the codel target, which is exactly what you'd get if you weren't saturated at all. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake flenter results round 1
Looking at the Cake stats for that run, it doesn't seem to have been signalling congestion at all, when you'd expect it to with 13 bulk flows running through it. Something odd is going on there. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Recomended HW to run cake and fq_codel?
My pet suggestion here is to represent latency as its inverse, "responsiveness" with units of Hz. This has the dual advantages of bigger numbers being better, and the figures being directly comparable with framerates. As you say, the methodology will need to be very carefully specified, so that we get a meaningful measurement that's hard to game. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] allocate_src allocate_dst
Not quite. The "dual" enums have more than one bit set, so you have to test them for equality. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] allocate_src allocate_dst
Gotos are fairly common in kernel code, chiefly for exception handling. Obviously structured code is still preferred where it makes sense, but there are cases where it would actually confuse matters. I hope to be able to spend most of tomorrow going over the code as it currently stands. I even have an up-to-date net-next tree on one of my machines to build against. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] new patchset for upstream net-next
It won't link unless conntrack is in the kernel, and *that* is costly for some. What we could do is make NAT support optional in Kconfig and have that option depend on conntrack. Would need a little fettling of cake itself to make the NAT support properly optional at compile time. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [PATCH 1/3] pkt_sched.h: add support for sch_cake API
The stats are a little bit unwieldy, true. Is there an example of the TLV style of operation that I could study? - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] lan keyword affects host fairness
This is most likely an interaction of the AQM with Linux' scheduling latency. At the 'lan' setting, the time comstants are similar in magnitude to the delays induced by Linux itself, so congestion might be signalled prematurely. The flows will then become sparse and total throughput reduced, leaving little or no back-pressure for the fairness logic to work against. For this reason, you might have better luck with the next higher RTT setting. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake flenter results round 3
I'd like to see variations using Cake's "internet", "oceanic" and "satellite" presets at this RTT. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake flenter results round 3
If you use netem to simulate the physical link, you could test Cake's ingress mode downstream of it. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Cannot install tc-adv(for cake) on Fedora 27
It would need stdint.h and limits.h. It's possible one of those has been left out of the code. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] signed-off-by requests
Probably there should be one from me - but I need to have built and tried it myself first, to be sure. There's still some new code I haven't gone over in detail yet. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake_heap* could use some comments
These are standard heap algorithms whose operation should be easily recognisable. The build loop runs over the set of non-leaf nodes, implicitly accessing both children which are automatically at double the index. The timeout is simply there to elide the overhead of maintaining the heap when hard dropping hasn't been necessary for a while. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake vs fqcodel with 1 client, 4 servers
I might try to implement a dynamic target adjustment later today. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] ack filter rrul result at 1000/100
On 16/11/17 15:55, Bret Towe wrote: I have a docsis setup atm that is 300/7 Obligatory: https://www.youtube.com/watch?v=XnFSb8xcmN4 -- - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] total download rate with many flows
In fact, that's why I put a failsafe into ingress mode, so that it would never stall completely. It can happen, however, that throughput is significantly reduced when the drop rate is high. If throughput is more important to you than induced latency, switch to egress mode. Unfortunately it's not possible to guarantee both low latency and high throughput when operating downstream of the bottleneck link. ECN gives you better results, though. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] total download rate with many flows
It's a kind offer, and since I'm in Finland (also EU), there shouldn't be any problem with VAT. However, I just managed to find a copy of the failsafe code on one of my other machines, and merged it to cobalt branch. I honestly thought it had been pushed already, and that the only reason for it being otherwise was that it must be stranded. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] total download rate with many flows
On 16/11/17 19:06, Georgios Amanakis wrote: I am skimming through the code but cannot see where the failsafe shaper's rate of advance is set at one-quarter rate. Could you point this out? Oddly enough, I can't find it either. It could be that the code wasn't pushed yet - or even that it only exists on my MBP hard drive, which I still don't have a working MBP to put in again. Due to HFS+ compression, I can't actually read the file by attaching it to a Linux machine. -- - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] total download rate with many flows
I'm curious as to why you think Cobalt is more aggressive than Codel. It does use more accurate approximations to the mathematical ideal than the "reference" codel does. It is however very odd that the Diffserv mode has any effect on this at all. It could be explained if a lot of the traffic is marked CS1, since the Bulk tin has looser AQM parameters. That suggests that selecting 'satellite' might help similarly. Something worth trying would be to alter the failsafe shaper's rate of advance. Currently it has a one-quarter rate, which might be too restrictive. Tests at one-half and three-quarters might therefore be interesting. Otherwise, I don't think trying to modify the way ingress mode works will do the right things. Ultimately, this arises because Cake is having to drop packets in order to signal congestion, and when there's a lot of flows, a lot of signals must be sent to get through to them all. With ECN working, it doesn't need to waste bandwidth merely to signal. The only other reasonable approach is to somehow reduce the signalling rate under heavy flow load. That requires informing Cobalt of the number of bulk flows, and using that to scale the signalling frequency somehow. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Bloat] Update Cake page on bufferbloat.net?
I think "mostly true but incomplete" is the best way to describe the current pages. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Fwd: [iproute2 1/2] tc: Add support for the CBS qdisc
I looked up how it's supposed to work. Seems like it relies on a fixed bandwidth underlying link, and piggybacks on its clocks to maintain the associated state in realtime. Obviously designed to be implemented in hardware using an adder and an accumulator, rather than on a CPU. The parameters required are a direct reflection of what such hardware requires, and are not at all user friendly. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] Fwd: [iproute2 1/2] tc: Add support for the CBS qdisc
The name is certainly intriguing, but just look at all those parameters! Cake's shaper doesn't need them. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake flenter results round 4
Just to note, deficiencies in host fairness are most likely directly linked to less-than-full throughput. One host is able to use bandwidth left unused by the other. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake vs fqcodel with 1 client, 4 servers
The latest push now enforces 4 x MTU x flows on the target intra-flow latency. When lightly loaded, the normal target still applies. This gives a noticeable improvement on the goodput of World of Warships' updater, which does its level best to stuff 150 HTTP flows through my 512Kbps downlink, while retaining reasonable inter-flow latency as measured by using other applications and/or hosts. I'd like to see a more controlled test of this, to compare with the previous behaviour. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake vs fqcodel with 1 client, 4 servers
Ah, the push didn't actually complete. It's there now. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] [Fwd: Re: RHODIUM - nuking blue (for testing)]
Smells like the NAS and OneDrive are using nonstandard, and at least partially noncompliant TCP stacks. Did the ack stream carry ECE flags? But yes, that is exactly the sort of abuse that the BLUE portion of COBALT is meant to cope with. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake
Re: [Cake] cake vs fqcodel with 1 client, 4 servers
These ideas have all been considered at great length in the past, and resulted in the Codel algorithm in the first place. You might want to read some of the original literature on it to understand my reasoning. - Jonathan Morton ___ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake