Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 16 May 2016 at 19:04, Dave Tahtwrote: > On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin wrote: >> On 16 May 2016 at 01:34, Roman Yeryomin wrote: >>> On 6 May 2016 at 22:43, Dave Taht wrote: On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin wrote: > On 6 May 2016 at 21:43, Roman Yeryomin wrote: >> On 6 May 2016 at 15:47, Jesper Dangaard Brouer wrote: >>> >>> I've created a OpenWRT ticket[1] on this issue, as it seems that >>> someone[2] >>> closed Felix'es OpenWRT email account (bad choice! emails bouncing). >>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project >>> is in some kind of conflict. >>> >>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 >>> >>> [2] >>> http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 >> >> OK, so, after porting the patch to 4.1 openwrt kernel and playing a >> bit with fq_codel limits I was able to get 420Mbps UDP like this: >> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 > > Forgot to mention, I've reduced drop_batch_size down to 32 0) Not clear to me if that's the right line, there are 4 wifi queues, and the third one is the BE queue. >>> >>> That was an example, sorry, should have stated that. I've applied same >>> settings to all 4 queues. >>> That is too low a limit, also, for normal use. And: for the purpose of this particular UDP test, flows 16 is ok, but not ideal. >>> >>> I played with different combinations, it doesn't make any >>> (significant) difference: 20-30Mbps, not more. >>> What numbers would you propose? >>> 1) What's the tcp number (with a simultaneous ping) with this latest patchset? (I care about tcp performance a lot more than udp floods - surviving a udp flood yes, performance, no) >>> >>> During the test (both TCP and UDP) it's roughly 5ms in average, not >>> running tests ~2ms. Actually I'm now wondering if target is working at >>> all, because I had same result with target 80ms.. >>> So, yes, latency is good, but performance is poor. >>> before/after? tc -s qdisc show dev wlan0 during/after results? >>> >>> during the test: >>> >>> qdisc mq 0: root >>> Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues >>> 17) >>> backlog 1545794b 1021p requeues 17 >>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>> new_flows_len 0 old_flows_len 0 >>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>> new_flows_len 0 old_flows_len 0 >>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues >>> 17) >>> backlog 1541252b 1018p requeues 17 >>> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 >>> new_flows_len 0 old_flows_len 1 >>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>> new_flows_len 0 old_flows_len 0 >>> >>> >>> after the test (60sec): >>> >>> qdisc mq 0: root >>> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues >>> 28) >>> backlog 0b 0p requeues 28 >>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>> new_flows_len 0 old_flows_len 0 >>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>> new_flows_len 0 old_flows_len 0 >>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 >>> target 80.0ms ce_threshold 32us interval 100.0ms ecn >>> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues >>> 28) >>> backlog 0b 0p requeues 28 >>> maxpacket 1514 drop_overlimit 2770176
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On Mon, May 16, 2016 at 1:14 AM, Roman Yeryominwrote: > On 16 May 2016 at 01:34, Roman Yeryomin wrote: >> On 6 May 2016 at 22:43, Dave Taht wrote: >>> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin >>> wrote: On 6 May 2016 at 21:43, Roman Yeryomin wrote: > On 6 May 2016 at 15:47, Jesper Dangaard Brouer wrote: >> >> I've created a OpenWRT ticket[1] on this issue, as it seems that >> someone[2] >> closed Felix'es OpenWRT email account (bad choice! emails bouncing). >> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project >> is in some kind of conflict. >> >> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 >> >> [2] >> http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 > > OK, so, after porting the patch to 4.1 openwrt kernel and playing a > bit with fq_codel limits I was able to get 420Mbps UDP like this: > tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 Forgot to mention, I've reduced drop_batch_size down to 32 >>> >>> 0) Not clear to me if that's the right line, there are 4 wifi queues, >>> and the third one >>> is the BE queue. >> >> That was an example, sorry, should have stated that. I've applied same >> settings to all 4 queues. >> >>> That is too low a limit, also, for normal use. And: >>> for the purpose of this particular UDP test, flows 16 is ok, but not >>> ideal. >> >> I played with different combinations, it doesn't make any >> (significant) difference: 20-30Mbps, not more. >> What numbers would you propose? >> >>> 1) What's the tcp number (with a simultaneous ping) with this latest >>> patchset? >>> (I care about tcp performance a lot more than udp floods - surviving a >>> udp flood yes, performance, no) >> >> During the test (both TCP and UDP) it's roughly 5ms in average, not >> running tests ~2ms. Actually I'm now wondering if target is working at >> all, because I had same result with target 80ms.. >> So, yes, latency is good, but performance is poor. >> >>> before/after? >>> >>> tc -s qdisc show dev wlan0 during/after results? >> >> during the test: >> >> qdisc mq 0: root >> Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues >> 17) >> backlog 1545794b 1021p requeues 17 >> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >> backlog 0b 0p requeues 0 >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >> new_flows_len 0 old_flows_len 0 >> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >> backlog 0b 0p requeues 0 >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >> new_flows_len 0 old_flows_len 0 >> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues >> 17) >> backlog 1541252b 1018p requeues 17 >> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 >> new_flows_len 0 old_flows_len 1 >> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >> backlog 0b 0p requeues 0 >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >> new_flows_len 0 old_flows_len 0 >> >> >> after the test (60sec): >> >> qdisc mq 0: root >> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues >> 28) >> backlog 0b 0p requeues 28 >> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >> backlog 0b 0p requeues 0 >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >> new_flows_len 0 old_flows_len 0 >> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >> backlog 0b 0p requeues 0 >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >> new_flows_len 0 old_flows_len 0 >> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues >> 28) >> backlog 0b 0p requeues 28 >> maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0 >> new_flows_len 0 old_flows_len 1 >> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 16 May 2016 at 01:34, Roman Yeryominwrote: > On 6 May 2016 at 22:43, Dave Taht wrote: >> On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin >> wrote: >>> On 6 May 2016 at 21:43, Roman Yeryomin wrote: On 6 May 2016 at 15:47, Jesper Dangaard Brouer wrote: > > I've created a OpenWRT ticket[1] on this issue, as it seems that > someone[2] > closed Felix'es OpenWRT email account (bad choice! emails bouncing). > Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project > is in some kind of conflict. > > OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 > > [2] > http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 OK, so, after porting the patch to 4.1 openwrt kernel and playing a bit with fq_codel limits I was able to get 420Mbps UDP like this: tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 >>> >>> Forgot to mention, I've reduced drop_batch_size down to 32 >> >> 0) Not clear to me if that's the right line, there are 4 wifi queues, >> and the third one >> is the BE queue. > > That was an example, sorry, should have stated that. I've applied same > settings to all 4 queues. > >> That is too low a limit, also, for normal use. And: >> for the purpose of this particular UDP test, flows 16 is ok, but not >> ideal. > > I played with different combinations, it doesn't make any > (significant) difference: 20-30Mbps, not more. > What numbers would you propose? > >> 1) What's the tcp number (with a simultaneous ping) with this latest >> patchset? >> (I care about tcp performance a lot more than udp floods - surviving a >> udp flood yes, performance, no) > > During the test (both TCP and UDP) it's roughly 5ms in average, not > running tests ~2ms. Actually I'm now wondering if target is working at > all, because I had same result with target 80ms.. > So, yes, latency is good, but performance is poor. > >> before/after? >> >> tc -s qdisc show dev wlan0 during/after results? > > during the test: > > qdisc mq 0: root > Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17) > backlog 1545794b 1021p requeues 17 > qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17) > backlog 1541252b 1018p requeues 17 > maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 > new_flows_len 0 old_flows_len 1 > qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > > > after the test (60sec): > > qdisc mq 0: root > Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) > backlog 0b 0p requeues 28 > qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) > backlog 0b 0p requeues 28 > maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0 > new_flows_len 0 old_flows_len 1 > qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > > >> IF you are
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 16 May 2016 at 02:07, Eric Dumazetwrote: > On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote: > >> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 >> target 80.0ms ce_threshold 32us interval 100.0ms ecn >> Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues >> 17) >> backlog 1541252b 1018p requeues 17 >> maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 >> new_flows_len 0 old_flows_len 1 > > Why do you have ce_threshold set ? You really should not (even if it > does not matter for the kind of traffic you have at this moment) No idea, it was there always. How do I unset it? Setting it to 0 doesn't help. > If your expected link speed is around 1Gbps, or 80,000 packets per > second, then you have to understand that 1024 packets limit is about 12 > ms at most. > > Even if the queue is full, max sojourn time of a packet would be 12 ms. > > I really do not see how 'target 80 ms' could be hit. Well, as I said, I've tried different options. Neither target 20ms (as Dave proposed) not 12ms save the situation. > You basically have FQ, with no Codel effect, but with the associated > cost of Codel (having to take timestamps) > > >
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On Mon, 2016-05-16 at 01:34 +0300, Roman Yeryomin wrote: > qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 > target 80.0ms ce_threshold 32us interval 100.0ms ecn > Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17) > backlog 1541252b 1018p requeues 17 > maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 > new_flows_len 0 old_flows_len 1 Why do you have ce_threshold set ? You really should not (even if it does not matter for the kind of traffic you have at this moment) If your expected link speed is around 1Gbps, or 80,000 packets per second, then you have to understand that 1024 packets limit is about 12 ms at most. Even if the queue is full, max sojourn time of a packet would be 12 ms. I really do not see how 'target 80 ms' could be hit. You basically have FQ, with no Codel effect, but with the associated cost of Codel (having to take timestamps)
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 7 May 2016 at 12:57, Kevin Darbyshire-Bryantwrote: > > > On 06/05/16 10:42, Jesper Dangaard Brouer wrote: >> Hi Felix, >> >> This is an important fix for OpenWRT, please read! >> >> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, >> without also adjusting q->flows_cnt. Eric explains below that you must >> also adjust the buckets (q->flows_cnt) for this not to break. (Just >> adjust it to 128) >> >> Problematic OpenWRT commit in question: >> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e >> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from >> causing too much cpu load with higher speed (#21326)") > I 'pull requested' this to the lede-staging tree on github. > https://github.com/lede-project/staging/pull/11 > > One way or another Felix & co should see the change :-) If you would follow the white rabbit, you would see that it doesn't help >> >> >> I also highly recommend you cherry-pick this very recent commit: >> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") >> https://git.kernel.org/davem/net-next/c/9d18562a227 >> >> This should fix very high CPU usage in-case fq_codel goes into drop mode. >> The problem is that drop mode was considered rare, and implementation >> wise it was chosen to be more expensive (to save cycles on normal mode). >> Unfortunately is it easy to trigger with an UDP flood. Drop mode is >> especially expensive for smaller devices, as it scans a 4K big array, >> thus 64 cache misses for small devices! >> >> The fix is to allow drop-mode to bulk-drop more packets when entering >> drop-mode (default 64 bulk drop). That way we don't suddenly >> experience a significantly higher processing cost per packet, but >> instead can amortize this. > I haven't done the above cherry-pick patch & backport patch creation for > 4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats > me to it :-) > > Kevin >
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 6 May 2016 at 22:43, Dave Tahtwrote: > On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin wrote: >> On 6 May 2016 at 21:43, Roman Yeryomin wrote: >>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer wrote: I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] closed Felix'es OpenWRT email account (bad choice! emails bouncing). Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project is in some kind of conflict. OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 >>> >>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a >>> bit with fq_codel limits I was able to get 420Mbps UDP like this: >>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 >> >> Forgot to mention, I've reduced drop_batch_size down to 32 > > 0) Not clear to me if that's the right line, there are 4 wifi queues, > and the third one > is the BE queue. That was an example, sorry, should have stated that. I've applied same settings to all 4 queues. > That is too low a limit, also, for normal use. And: > for the purpose of this particular UDP test, flows 16 is ok, but not > ideal. I played with different combinations, it doesn't make any (significant) difference: 20-30Mbps, not more. What numbers would you propose? > 1) What's the tcp number (with a simultaneous ping) with this latest patchset? > (I care about tcp performance a lot more than udp floods - surviving a > udp flood yes, performance, no) During the test (both TCP and UDP) it's roughly 5ms in average, not running tests ~2ms. Actually I'm now wondering if target is working at all, because I had same result with target 80ms.. So, yes, latency is good, but performance is poor. > before/after? > > tc -s qdisc show dev wlan0 during/after results? during the test: qdisc mq 0: root Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17) backlog 1545794b 1021p requeues 17 qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17) backlog 1541252b 1018p requeues 17 maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 after the test (60sec): qdisc mq 0: root Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) backlog 0b 0p requeues 28 qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) backlog 0b 0p requeues 28 maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 > IF you are doing builds for the archer c7v2, I can join in on this... (?) I'm not but I have c7 somewhere, so I can do a build for it and also test, so we are on the same page. > I did do a test of the ath10k "before", fq_codel *never engaged*, and > tcp induced latencies
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 06/05/16 10:42, Jesper Dangaard Brouer wrote: > Hi Felix, > > This is an important fix for OpenWRT, please read! > > OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, > without also adjusting q->flows_cnt. Eric explains below that you must > also adjust the buckets (q->flows_cnt) for this not to break. (Just > adjust it to 128) > > Problematic OpenWRT commit in question: > http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e > 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from > causing too much cpu load with higher speed (#21326)") I 'pull requested' this to the lede-staging tree on github. https://github.com/lede-project/staging/pull/11 One way or another Felix & co should see the change :-) > > > I also highly recommend you cherry-pick this very recent commit: > net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") > https://git.kernel.org/davem/net-next/c/9d18562a227 > > This should fix very high CPU usage in-case fq_codel goes into drop mode. > The problem is that drop mode was considered rare, and implementation > wise it was chosen to be more expensive (to save cycles on normal mode). > Unfortunately is it easy to trigger with an UDP flood. Drop mode is > especially expensive for smaller devices, as it scans a 4K big array, > thus 64 cache misses for small devices! > > The fix is to allow drop-mode to bulk-drop more packets when entering > drop-mode (default 64 bulk drop). That way we don't suddenly > experience a significantly higher processing cost per packet, but > instead can amortize this. I haven't done the above cherry-pick patch & backport patch creation for 4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats me to it :-) Kevin smime.p7s Description: S/MIME Cryptographic Signature
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On Fri, May 6, 2016 at 11:56 AM, Roman Yeryominwrote: > On 6 May 2016 at 21:43, Roman Yeryomin wrote: >> On 6 May 2016 at 15:47, Jesper Dangaard Brouer wrote: >>> >>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] >>> closed Felix'es OpenWRT email account (bad choice! emails bouncing). >>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project >>> is in some kind of conflict. >>> >>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 >>> >>> [2] >>> http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 >> >> OK, so, after porting the patch to 4.1 openwrt kernel and playing a >> bit with fq_codel limits I was able to get 420Mbps UDP like this: >> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 > > Forgot to mention, I've reduced drop_batch_size down to 32 0) Not clear to me if that's the right line, there are 4 wifi queues, and the third one is the BE queue. That is too low a limit, also, for normal use. And: for the purpose of this particular UDP test, flows 16 is ok, but not ideal. 1) What's the tcp number (with a simultaneous ping) with this latest patchset? (I care about tcp performance a lot more than udp floods - surviving a udp flood yes, performance, no) before/after? tc -s qdisc show dev wlan0 during/after results? IF you are doing builds for the archer c7v2, I can join in on this... (?) I did do a test of the ath10k "before", fq_codel *never engaged*, and tcp induced latencies under load, e at 100mbit, cracked 600ms, while staying flat (20ms) at 100mbit. (not the same patches you are testing) on x86. I have got tcp 300Mbit out of an osx box, similar latency, have yet to get anything more on anything I currently have before/after patchsets. I'll go add flooding to the tests, I just finished a series comparing two different speed stations and life was good on that. "before" - fq_codel never engages, we see seconds of latency under load. root@apu2:~# tc -s qdisc show dev wlp4s0 qdisc mq 0: root Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0 new_flows_len 1 old_flows_len 3 qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0 new_flows_len 1 old_flows_len 0 ``` >> This is certainly better than 30Mbps but still more than two times >> less than before (900). The number that I still am not sure we got is that you were sending 900mbit udp and recieving 900mbit on the prior tests? >> TCP also improved a little (550 to ~590). The limit is probably a bit low, also. You might want to try target 20ms as well. >> >> Felix, others, do you want to see the ported patch, maybe I did something >> wrong? >> Doesn't look like it will save ath10k from performance regression. what was tcp "before"? (I'm sorry, such a long thread) >> >>> >>> On Fri, 6 May 2016 11:42:43 +0200 >>> Jesper Dangaard Brouer wrote: >>> Hi Felix, This is an important fix for OpenWRT, please read! OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, without also adjusting q->flows_cnt. Eric explains below that you must also adjust the buckets (q->flows_cnt) for this not to break. (Just adjust it to 128) Problematic OpenWRT commit in question: http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)") I also highly recommend you cherry-pick this very recent commit: net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") https://git.kernel.org/davem/net-next/c/9d18562a227 This should fix very high CPU usage in-case fq_codel goes into drop mode. The problem is that drop mode was considered rare, and implementation
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 6 May 2016 at 21:43, Roman Yeryominwrote: > On 6 May 2016 at 15:47, Jesper Dangaard Brouer wrote: >> >> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] >> closed Felix'es OpenWRT email account (bad choice! emails bouncing). >> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project >> is in some kind of conflict. >> >> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 >> >> [2] >> http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 > > OK, so, after porting the patch to 4.1 openwrt kernel and playing a > bit with fq_codel limits I was able to get 420Mbps UDP like this: > tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 Forgot to mention, I've reduced drop_batch_size down to 32 > This is certainly better than 30Mbps but still more than two times > less than before (900). > TCP also improved a little (550 to ~590). > > Felix, others, do you want to see the ported patch, maybe I did something > wrong? > Doesn't look like it will save ath10k from performance regression. > >> >> On Fri, 6 May 2016 11:42:43 +0200 >> Jesper Dangaard Brouer wrote: >> >>> Hi Felix, >>> >>> This is an important fix for OpenWRT, please read! >>> >>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, >>> without also adjusting q->flows_cnt. Eric explains below that you must >>> also adjust the buckets (q->flows_cnt) for this not to break. (Just >>> adjust it to 128) >>> >>> Problematic OpenWRT commit in question: >>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e >>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from >>> causing too much cpu load with higher speed (#21326)") >>> >>> >>> I also highly recommend you cherry-pick this very recent commit: >>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") >>> https://git.kernel.org/davem/net-next/c/9d18562a227 >>> >>> This should fix very high CPU usage in-case fq_codel goes into drop mode. >>> The problem is that drop mode was considered rare, and implementation >>> wise it was chosen to be more expensive (to save cycles on normal mode). >>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is >>> especially expensive for smaller devices, as it scans a 4K big array, >>> thus 64 cache misses for small devices! >>> >>> The fix is to allow drop-mode to bulk-drop more packets when entering >>> drop-mode (default 64 bulk drop). That way we don't suddenly >>> experience a significantly higher processing cost per packet, but >>> instead can amortize this. >>> >>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk >>> drop, given we also recommend bucket size to be 128 ? (thus the amount >>> of memory to scan is less, but their CPU is also much smaller). >>> >>> --Jesper >>> >>> >>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet >>> wrote: >>> >>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote: >>> > > On 5 May 2016 at 19:12, Eric Dumazet wrote: >>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote: >>> > > > >>> > > >> >>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 >>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn >>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0) >>> > > >> backlog 0b 0p requeues 0 >>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>> > > >> new_flows_len 0 old_flows_len 0 >>> > > > >>> > > > >>> > > > Limit of 1024 packets and 1024 flows is not wise I think. >>> > > > >>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 >>> > > > packet, >>> > > > which is almost the same than having no queue at all) >>> > > > >>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a >>> > > > chance to trigger. >>> > > > >>> > > > So you could either reduce number of buckets to 128 (if memory is >>> > > > tight), or increase limit to 8192. >>> > > >>> > > Will try, but what I've posted is default, I didn't change/configure >>> > > that. >>> > >>> > fq_codel has a default of 10240 packets and 1024 buckets. >>> > >>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413 >>> > >>> > If someone changed that in the linux variant you use, he probably should >>> > explain the rationale. >> >> -- >> Best regards, >> Jesper Dangaard Brouer >> MSc.CS, Principal Kernel Engineer at Red Hat >> Author of http://www.iptv-analyzer.org >> LinkedIn: http://www.linkedin.com/in/brouer
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 6 May 2016 at 15:47, Jesper Dangaard Brouerwrote: > > I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] > closed Felix'es OpenWRT email account (bad choice! emails bouncing). > Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project > is in some kind of conflict. > > OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 > > [2] > http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 OK, so, after porting the patch to 4.1 openwrt kernel and playing a bit with fq_codel limits I was able to get 420Mbps UDP like this: tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 This is certainly better than 30Mbps but still more than two times less than before (900). TCP also improved a little (550 to ~590). Felix, others, do you want to see the ported patch, maybe I did something wrong? Doesn't look like it will save ath10k from performance regression. > > On Fri, 6 May 2016 11:42:43 +0200 > Jesper Dangaard Brouer wrote: > >> Hi Felix, >> >> This is an important fix for OpenWRT, please read! >> >> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, >> without also adjusting q->flows_cnt. Eric explains below that you must >> also adjust the buckets (q->flows_cnt) for this not to break. (Just >> adjust it to 128) >> >> Problematic OpenWRT commit in question: >> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e >> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from >> causing too much cpu load with higher speed (#21326)") >> >> >> I also highly recommend you cherry-pick this very recent commit: >> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") >> https://git.kernel.org/davem/net-next/c/9d18562a227 >> >> This should fix very high CPU usage in-case fq_codel goes into drop mode. >> The problem is that drop mode was considered rare, and implementation >> wise it was chosen to be more expensive (to save cycles on normal mode). >> Unfortunately is it easy to trigger with an UDP flood. Drop mode is >> especially expensive for smaller devices, as it scans a 4K big array, >> thus 64 cache misses for small devices! >> >> The fix is to allow drop-mode to bulk-drop more packets when entering >> drop-mode (default 64 bulk drop). That way we don't suddenly >> experience a significantly higher processing cost per packet, but >> instead can amortize this. >> >> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk >> drop, given we also recommend bucket size to be 128 ? (thus the amount >> of memory to scan is less, but their CPU is also much smaller). >> >> --Jesper >> >> >> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet >> wrote: >> >> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote: >> > > On 5 May 2016 at 19:12, Eric Dumazet wrote: >> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote: >> > > > >> > > >> >> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 >> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn >> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0) >> > > >> backlog 0b 0p requeues 0 >> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >> > > >> new_flows_len 0 old_flows_len 0 >> > > > >> > > > >> > > > Limit of 1024 packets and 1024 flows is not wise I think. >> > > > >> > > > (If all buckets are in use, each bucket has a virtual queue of 1 >> > > > packet, >> > > > which is almost the same than having no queue at all) >> > > > >> > > > I suggest to have at least 8 packets per bucket, to let Codel have a >> > > > chance to trigger. >> > > > >> > > > So you could either reduce number of buckets to 128 (if memory is >> > > > tight), or increase limit to 8192. >> > > >> > > Will try, but what I've posted is default, I didn't change/configure >> > > that. >> > >> > fq_codel has a default of 10240 packets and 1024 buckets. >> > >> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413 >> > >> > If someone changed that in the linux variant you use, he probably should >> > explain the rationale. > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] closed Felix'es OpenWRT email account (bad choice! emails bouncing). Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project is in some kind of conflict. OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 On Fri, 6 May 2016 11:42:43 +0200 Jesper Dangaard Brouerwrote: > Hi Felix, > > This is an important fix for OpenWRT, please read! > > OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, > without also adjusting q->flows_cnt. Eric explains below that you must > also adjust the buckets (q->flows_cnt) for this not to break. (Just > adjust it to 128) > > Problematic OpenWRT commit in question: > http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e > 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from > causing too much cpu load with higher speed (#21326)") > > > I also highly recommend you cherry-pick this very recent commit: > net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") > https://git.kernel.org/davem/net-next/c/9d18562a227 > > This should fix very high CPU usage in-case fq_codel goes into drop mode. > The problem is that drop mode was considered rare, and implementation > wise it was chosen to be more expensive (to save cycles on normal mode). > Unfortunately is it easy to trigger with an UDP flood. Drop mode is > especially expensive for smaller devices, as it scans a 4K big array, > thus 64 cache misses for small devices! > > The fix is to allow drop-mode to bulk-drop more packets when entering > drop-mode (default 64 bulk drop). That way we don't suddenly > experience a significantly higher processing cost per packet, but > instead can amortize this. > > To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk > drop, given we also recommend bucket size to be 128 ? (thus the amount > of memory to scan is less, but their CPU is also much smaller). > > --Jesper > > > On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet > wrote: > > > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote: > > > On 5 May 2016 at 19:12, Eric Dumazet wrote: > > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote: > > > > > > > >> > > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 > > > >> quantum 1514 target 5.0ms interval 100.0ms ecn > > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0) > > > >> backlog 0b 0p requeues 0 > > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > > > >> new_flows_len 0 old_flows_len 0 > > > > > > > > > > > > Limit of 1024 packets and 1024 flows is not wise I think. > > > > > > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet, > > > > which is almost the same than having no queue at all) > > > > > > > > I suggest to have at least 8 packets per bucket, to let Codel have a > > > > chance to trigger. > > > > > > > > So you could either reduce number of buckets to 128 (if memory is > > > > tight), or increase limit to 8192. > > > > > > Will try, but what I've posted is default, I didn't change/configure > > > that. > > > > fq_codel has a default of 10240 packets and 1024 buckets. > > > > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413 > > > > If someone changed that in the linux variant you use, he probably should > > explain the rationale. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer