Re: [Bloat] CAKE in openwrt high CPU

2020-09-04 Thread Sebastian Moeller
Hi Mikael,

Thanks! That looks like a fully saturated core, no? I do not know how to parse 
the symbols here, so not sure what "class" of load is denoted by the star, but 
I would guess something including sirqs? Anyway the average is ~49% load, while 
clearly CPU is pegged already. I assume the htop data is from the HGW...

best regards
Sebastian

> On Sep 4, 2020, at 15:37, Mikael Abrahamsson  wrote:
> 
> On Thu, 3 Sep 2020, Sebastian Moeller wrote:
> 
>>  Mmmh, how did you measure the sirq percentage? Some top versions show 
>> overall percentage with 100% meaning all CPUs, so 35% in a quadcore could 
>> mean 1 fully maxed out CPU (25%) plus an additional 10% spread over the 
>> other three, or something more benign. Better top (so not busybox's) or htop 
>> versions also can show the load per CPU which is helpful to pinpoint 
>> hotspots...
> 
> If I run iperf3 with 10 parallel sessions then htop shows this (in the CAKE 
> upstream direction I believe):
> 
>  1  [*
>   0.7%]   Tasks: 19, 0 thr; 2 running
>  2  
> [*100.0%]
>Load average: 0.48 0.16 0.05
>  3  [#*** 
>  44.4%]   Uptime: 10 days, 04:46:37
>  4  [ 
>  54.2%]
>  Mem[|#*  
>36.7M/3.84G]
>  Swp[ 
>  0K/0K]
> 
> The other direction (-R), typically this:
> 
> 1  [#***  
> 13.0%]   Tasks: 19, 0 thr; 2 running
> 2  [***   
> 53.9%]   Load average: 0.54 0.25 0.09
> 3  [#*
> 55.8%]   Uptime: 10 days, 04:47:36
> 4  
> [**   
>  84.4%]
> 
> Topology is:
> 
> PC - HGW -> Internet
> 
> iperf3 is run on the PC, HGW has CAKE in the -> Internet direction.
> 
>> Best Regards
>>  Sebastian
>> 
>>> 
>>> root@OpenWrt:~# tc -s qdisc
>>> qdisc noqueue 0: dev lo root refcnt 2
>>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>> qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
>>> triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
>>> overhead 0
>>> Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 
>>> 179)
>>> backlog 0b 0p requeues 179
>>> memory used: 2751976b of 15140Kb
>>> capacity estimate: 900Mbit
>>> min/max network layer size:   42 /1514
>>> min/max overhead-adjusted size:   42 /1514
>>> average network hdr offset:   14
>>> 
>>>  Bulk  Best EffortVoice
>>> thresh  56250Kbit  900Mbit  225Mbit
>>> target  5.0ms5.0ms5.0ms
>>> interval  100.0ms  100.0ms  100.0ms
>>> pk_delay  0us 22us232us
>>> av_delay  0us  6us  7us
>>> sp_delay  0us  4us  5us
>>> backlog0b   0b   0b
>>> pkts0   959747   90
>>> bytes   0   93543739440
>>> way_inds0229640
>>> way_miss0  2752
>>> way_cols000
>>> drops   0  1340
>>> marks   000
>>> ack_drop000
>>> sp_flows031
>>> bk_flows010
>>> un_flows000
>>> max_len 068130 3714
>>> quantum  1514 1514 1514
>>> 
>>> 
>>> --
>>> Mikael Abrahamssonemail: 
>>> swm...@swm.pp.se___
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>> 
> 
> -- 
> Mikael Abrahamssonemail: swm...@swm.pp.se

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-04 Thread Mikael Abrahamsson via Bloat

On Thu, 3 Sep 2020, Sebastian Moeller wrote:

	Mmmh, how did you measure the sirq percentage? Some top versions 
show overall percentage with 100% meaning all CPUs, so 35% in a quadcore 
could mean 1 fully maxed out CPU (25%) plus an additional 10% spread 
over the other three, or something more benign. Better top (so not 
busybox's) or htop versions also can show the load per CPU which is 
helpful to pinpoint hotspots...


If I run iperf3 with 10 parallel sessions then htop shows this (in the 
CAKE upstream direction I believe):


  1  [* 
 0.7%]   Tasks: 19, 0 thr; 2 running
  2  
[*100.0%]
   Load average: 0.48 0.16 0.05
  3  [#***  
44.4%]   Uptime: 10 days, 04:46:37
  4  [  
54.2%]
  Mem[|#*   
  36.7M/3.84G]
  Swp[  
0K/0K]

The other direction (-R), typically this:

 1  [#***   
   13.0%]   Tasks: 19, 0 thr; 2 running
 2  [***
   53.9%]   Load average: 0.54 0.25 0.09
 3  [#* 
   55.8%]   Uptime: 10 days, 04:47:36
 4  [** 
   84.4%]

Topology is:

PC - HGW -> Internet

iperf3 is run on the PC, HGW has CAKE in the -> Internet direction.


Best Regards
Sebastian



root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179)
backlog 0b 0p requeues 179
memory used: 2751976b of 15140Kb
capacity estimate: 900Mbit
min/max network layer size:   42 /1514
min/max overhead-adjusted size:   42 /1514
average network hdr offset:   14

  Bulk  Best EffortVoice
 thresh  56250Kbit  900Mbit  225Mbit
 target  5.0ms5.0ms5.0ms
 interval  100.0ms  100.0ms  100.0ms
 pk_delay  0us 22us232us
 av_delay  0us  6us  7us
 sp_delay  0us  4us  5us
 backlog0b   0b   0b
 pkts0   959747   90
 bytes   0   93543739440
 way_inds0229640
 way_miss0  2752
 way_cols000
 drops   0  1340
 marks   000
 ack_drop000
 sp_flows031
 bk_flows010
 un_flows000
 max_len 068130 3714
 quantum  1514 1514 1514


--
Mikael Abrahamssonemail: 
swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat




--
Mikael Abrahamssonemail: swm...@swm.pp.se
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Jonathan Morton
> On 3 Sep, 2020, at 5:32 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Yeah, offloading of some sort is another option, but I consider that
> outside of the "CAKE stays relevant" territory, since that will most
> likely involve an entirely programmable packet scheduler.

Offload of *just* shaping could be valuable in itself at higher rates, when 
combined with BQL, as it would avoid having to interact with the CPU-side timer 
infrastructure so much.  It would also not be difficult at all to implement in 
hardware at line rate, even with overhead compensation.  It's the sort of thing 
you could sensibly do with 74-series logic and a lookup table in a cheap SRAM, 
up to millions of PPS, and considerably faster in FPGA or ASIC territory.

I think that's what the questions about combining "unlimited Cake" with some 
other shaper are angling towards, though I suspect that the way Cake's shaper 
is integrated is still better than having an external one in software.

With that said, it's also possible that something a bit lighter than Cake might 
be appropriate at cable speeds.  There is background work in this general area 
going on, so don't despair.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat


On 3 September 2020 17:31:07 CEST, Luca Muscariello  
wrote:
>On Thu, Sep 3, 2020 at 4:32 PM Toke Høiland-Jørgensen 
>wrote:
>>
>> Luca Muscariello  writes:
>>
>> > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
>> >  wrote:
>> >>
>> >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>> >>
>> >> > Yup, the number of cores is only going to go up, so for CAKE to
>stay
>> >> > relevant it'll need to be able to take advantage of this
>eventually :)
>> >>
>> >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting
>platform,
>> >> it has a quad core machine with 2 x 2.5GbE NICs.
>> >>
>> >> When using something like this for routing with HTB+CAKE for
>bidirectional
>> >> shaping below line rate, what would be the main things that would
>need to
>> >> be improved?
>> >
>> > IMO, hardware offloading for shaping, beyond this specific
>platform.
>> > I ignore if there is any roadmap with that objective.
>>
>> Yeah, offloading of some sort is another option, but I consider that
>> outside of the "CAKE stays relevant" territory, since that will most
>> likely involve an entirely programmable packet scheduler. There was
>some
>> discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
>> algorithm seems promising.
>>
>> -Toke
>>
>> [0] https://linuxplumbersconf.org/event/7/contributions/679/
>> [1] https://www.usenix.org/conference/nsdi19/presentation/saeed
>
>These are all interesting efforts for scheduling but orthogonal to
>shaping
>and not going to help make shaping more scalable.

Eiffel says it can do shaping by way of a global calendar queue... Planning to 
put that to the test :)

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Luca Muscariello
On Thu, Sep 3, 2020 at 4:32 PM Toke Høiland-Jørgensen  wrote:
>
> Luca Muscariello  writes:
>
> > On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
> >  wrote:
> >>
> >> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
> >>
> >> > Yup, the number of cores is only going to go up, so for CAKE to stay
> >> > relevant it'll need to be able to take advantage of this eventually :)
> >>
> >> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform,
> >> it has a quad core machine with 2 x 2.5GbE NICs.
> >>
> >> When using something like this for routing with HTB+CAKE for bidirectional
> >> shaping below line rate, what would be the main things that would need to
> >> be improved?
> >
> > IMO, hardware offloading for shaping, beyond this specific platform.
> > I ignore if there is any roadmap with that objective.
>
> Yeah, offloading of some sort is another option, but I consider that
> outside of the "CAKE stays relevant" territory, since that will most
> likely involve an entirely programmable packet scheduler. There was some
> discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
> algorithm seems promising.
>
> -Toke
>
> [0] https://linuxplumbersconf.org/event/7/contributions/679/
> [1] https://www.usenix.org/conference/nsdi19/presentation/saeed

These are all interesting efforts for scheduling but orthogonal to shaping
and not going to help make shaping more scalable.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat
Luca Muscariello  writes:

> On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
>  wrote:
>>
>> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>>
>> > Yup, the number of cores is only going to go up, so for CAKE to stay
>> > relevant it'll need to be able to take advantage of this eventually :)
>>
>> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform,
>> it has a quad core machine with 2 x 2.5GbE NICs.
>>
>> When using something like this for routing with HTB+CAKE for bidirectional
>> shaping below line rate, what would be the main things that would need to
>> be improved?
>
> IMO, hardware offloading for shaping, beyond this specific platform.
> I ignore if there is any roadmap with that objective.

Yeah, offloading of some sort is another option, but I consider that
outside of the "CAKE stays relevant" territory, since that will most
likely involve an entirely programmable packet scheduler. There was some
discussion of adding such a qdisc to Linux at LPC[0]. The Eiffel[1]
algorithm seems promising.

-Toke

[0] https://linuxplumbersconf.org/event/7/contributions/679/
[1] https://www.usenix.org/conference/nsdi19/presentation/saeed
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Sebastian Moeller
Ho Toke,

> On Sep 3, 2020, at 15:29, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Mikael Abrahamsson  writes:
> 
>> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
>> 
>>> And what about when you're running CAKE in 'unlimited' mode?
>> 
>> I tried this:
>> 
>> # tc qdisc add dev eth0 root cake bandwidth 900mbit
> 
> So the difference from before is just the lack of inbound shaping, or?

Good point, so worst-case just half the load to handle, indicating that 
a single CPU is sufficient for gigabit shaping, but not for dual-gigabit 
shaping, no?

Best Regards
Sebastian


> 
> -Toke
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Sebastian Moeller
Hi Mikael,



> On Sep 3, 2020, at 15:10, Mikael Abrahamsson via Bloat 
>  wrote:
> 
> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
> 
>> And what about when you're running CAKE in 'unlimited' mode?
> 
> I tried this:
> 
> # tc qdisc add dev eth0 root cake bandwidth 900mbit

That still employs the cake shaper, so is not equivalent with 
unlimited, I believe.

[PEDANT_MODE]

900 Mbps without explicit overhead will result in a typical maximum TCP/IPv4 
goodput of

900 * ((1500-20-20)/(1500+14)) = 867.899603699 Mbps
but since ethernet overhead is actually 38 bytes instead of 14 this actually 
occupies 

(900 * ((1500-20-20)/(1500+14))) * ((1500+38)/(1500-20-20)) = 914.266842801 on 
the ethernet link

which for small packets will become problematic:
(900 * ((150-20-20)/(100+14))) * ((150+38)/(150-20-20)) = 1484.21052632 Mbps 
gross speed out of the 1000.0 Gigabit ethernet offers.

in fact, packet sizes below 202 will spend all the "credit" you got from 
reducing the shaper rate to 900 Mbps in the first place.
(900 * ((202-20-20)/(202 +14))) * ((202 +38)/(202-20-20)) = 1000  

Maybe tell cake that you run on ethernet by adding the "ethernet keyword" which 
will both take care of the per-packet overhead of 38 bytes and the minimum 
packet size on the link of 88 bytes?

Please note that for throughput this does not really matter that much, but 
latency-under-load is not going to be pretty when too many small packets are in 
flight...

[/PEDANT_MODE]


> 
> This seems fine from a performance point of view (not that high sirq%, around 
> 35%) and does seem to limit my upstream traffic correctly. Not sure it helps 
> though, at these speeds the bufferbloat problem is not that obvious and easy 
> to test over the Internet :)

Mmmh, how did you measure the sirq percentage? Some top versions show 
overall percentage with 100% meaning all CPUs, so 35% in a quadcore could mean 
1 fully maxed out CPU (25%) plus an additional 10% spread over the other three, 
or something more benign. Better top (so not busybox's) or htop versions also 
can show the load per CPU which is helpful to pinpoint hotspots...

Best Regards
Sebastian

> 
> root@OpenWrt:~# tc -s qdisc
> qdisc noqueue 0: dev lo root refcnt 2
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
> triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
> Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 179)
> backlog 0b 0p requeues 179
> memory used: 2751976b of 15140Kb
> capacity estimate: 900Mbit
> min/max network layer size:   42 /1514
> min/max overhead-adjusted size:   42 /1514
> average network hdr offset:   14
> 
>   Bulk  Best EffortVoice
>  thresh  56250Kbit  900Mbit  225Mbit
>  target  5.0ms5.0ms5.0ms
>  interval  100.0ms  100.0ms  100.0ms
>  pk_delay  0us 22us232us
>  av_delay  0us  6us  7us
>  sp_delay  0us  4us  5us
>  backlog0b   0b   0b
>  pkts0   959747   90
>  bytes   0   93543739440
>  way_inds0229640
>  way_miss0  2752
>  way_cols000
>  drops   0  1340
>  marks   000
>  ack_drop000
>  sp_flows031
>  bk_flows010
>  un_flows000
>  max_len 068130 3714
>  quantum  1514 1514 1514
> 
> 
> -- 
> Mikael Abrahamssonemail: 
> swm...@swm.pp.se___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat
Mikael Abrahamsson  writes:

> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
>
>> And what about when you're running CAKE in 'unlimited' mode?
>
> I tried this:
>
> # tc qdisc add dev eth0 root cake bandwidth 900mbit

So the difference from before is just the lack of inbound shaping, or?

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Toke Høiland-Jørgensen via Bloat
Mikael Abrahamsson  writes:

> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>
>> Yup, the number of cores is only going to go up, so for CAKE to stay 
>> relevant it'll need to be able to take advantage of this eventually :)
>
> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, 
> it has a quad core machine with 2 x 2.5GbE NICs.
>
> When using something like this for routing with HTB+CAKE for bidirectional 
> shaping below line rate, what would be the main things that would need to 
> be improved?

The aforementioned multi-processor support...

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Luca Muscariello
On Thu, Sep 3, 2020 at 3:19 PM Mikael Abrahamsson via Bloat
 wrote:
>
> On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:
>
> > Yup, the number of cores is only going to go up, so for CAKE to stay
> > relevant it'll need to be able to take advantage of this eventually :)
>
> https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform,
> it has a quad core machine with 2 x 2.5GbE NICs.
>
> When using something like this for routing with HTB+CAKE for bidirectional
> shaping below line rate, what would be the main things that would need to
> be improved?

IMO, hardware offloading for shaping, beyond this specific platform.
I ignore if there is any roadmap with that objective.

>
> --
> Mikael Abrahamssonemail: 
> swm...@swm.pp.se___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Mikael Abrahamsson via Bloat

On Tue, 1 Sep 2020, Toke Høiland-Jørgensen wrote:

Yup, the number of cores is only going to go up, so for CAKE to stay 
relevant it'll need to be able to take advantage of this eventually :)


https://www.hardkernel.com/shop/odroid-h2plus/ is an interesting platform, 
it has a quad core machine with 2 x 2.5GbE NICs.


When using something like this for routing with HTB+CAKE for bidirectional 
shaping below line rate, what would be the main things that would need to 
be improved?


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-03 Thread Mikael Abrahamsson via Bloat

On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:


And what about when you're running CAKE in 'unlimited' mode?


I tried this:

# tc qdisc add dev eth0 root cake bandwidth 900mbit

This seems fine from a performance point of view (not that high sirq%, 
around 35%) and does seem to limit my upstream traffic correctly. Not sure 
it helps though, at these speeds the bufferbloat problem is not that 
obvious and easy to test over the Internet :)


root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8034: dev eth0 root refcnt 9 bandwidth 900Mbit diffserv3 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0
 Sent 772001 bytes 959703 pkt (dropped 134, overlimits 221223 requeues 
179)

 backlog 0b 0p requeues 179
 memory used: 2751976b of 15140Kb
 capacity estimate: 900Mbit
 min/max network layer size:   42 /1514
 min/max overhead-adjusted size:   42 /1514
 average network hdr offset:   14

   Bulk  Best EffortVoice
  thresh  56250Kbit  900Mbit  225Mbit
  target  5.0ms5.0ms5.0ms
  interval  100.0ms  100.0ms  100.0ms
  pk_delay  0us 22us232us
  av_delay  0us  6us  7us
  sp_delay  0us  4us  5us
  backlog0b   0b   0b
  pkts0   959747   90
  bytes   0   93543739440
  way_inds0229640
  way_miss0  2752
  way_cols000
  drops   0  1340
  marks   000
  ack_drop000
  sp_flows031
  bk_flows010
  un_flows000
  max_len 068130 3714
  quantum  1514 1514 1514


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-02 Thread Toke Høiland-Jørgensen via Bloat
Jonathan Foulkes  writes:

>> Right, so some benefit might be possible here. Does the NIC have
>> multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell
>> you)?
>
> Here is the output of:
> /sys/devices/virtual/net/eth0.2/queues# ls
> rx-0  tx-0
> /sys/devices/virtual/net/eth0.2/queues/rx-0# cat rps_cpus 
> 0
>
> /sys/devices/virtual/net/eth0.2/queues/tx-0# cat xps_cpus 
> 0

Hmm, so no multiq support on this driver, it looks like. So not sure to
what extent it will be possible to effectively utilise both cores on
this box, sadly :/

>> Yup, the number of cores is only going to go up, so for CAKE to stay
>> relevant it'll need to be able to take advantage of this eventually :)
>
> True, the mid-range market is already there, and so soon will be the
> lower-end. And with ISPs lighting up more and more capacity, the
> demand will be there to be able to shape higher and higher rates.
>
> But I agree with Jonathan Morton that once every deice has sufficient
> capacity, more makes no difference. I went for 100/15 to 300/24 and
> never noticed the difference.
>
> Hell, there are days I switch to my backup 10/0.7 DSL line for a test,
> and forget to switch back, and will work for hours and not notice I’m
> not on the 300Mbps line ;-)

Heh, if you can live with a 10/0.7 line without noticing I think you're
more patient than me ;) But still, fair point; doesn't mean that people
will still not *want* to run a higher speeds, though... :)

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-02 Thread Jonathan Foulkes
> Right, so some benefit might be possible here. Does the NIC have
> multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell
> you)?

Here is the output of:
/sys/devices/virtual/net/eth0.2/queues# ls
rx-0  tx-0
/sys/devices/virtual/net/eth0.2/queues/rx-0# cat rps_cpus 
0

/sys/devices/virtual/net/eth0.2/queues/tx-0# cat xps_cpus 
0

> Yup, the number of cores is only going to go up, so for CAKE to stay
> relevant it'll need to be able to take advantage of this eventually :)

True, the mid-range market is already there, and so soon will be the lower-end.
And with ISPs lighting up more and more capacity, the demand will be there to 
be able to shape higher and higher rates.

But I agree with Jonathan Morton that once every deice has sufficient capacity, 
more makes no difference. 
I went for 100/15 to 300/24 and never noticed the difference.

Hell, there are days I switch to my backup 10/0.7 DSL line for a test, and 
forget to switch back, and will work for hours and not notice I’m not on the 
300Mbps line ;-)

Cheers,

Jonathan

> On Sep 1, 2020, at 5:11 PM, Toke Høiland-Jørgensen  wrote:
> 
> Jonathan Foulkes  writes:
> 
>> Thanks Toke, we currently are on an MT7621a @880, so a dual-core.
> 
> Right, so some benefit might be possible here. Does the NIC have
> multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell
> you)?
> 
>> And we are looking for a good quad-core platform that will support
>> 600Mbps or more with Cake enabled, hopefully with AX radios as well.
> 
> Yup, the number of cores is only going to go up, so for CAKE to stay
> relevant it'll need to be able to take advantage of this eventually :)
> 
> -Toke

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Toke Høiland-Jørgensen via Bloat
Jonathan Foulkes  writes:

> Thanks Toke, we currently are on an MT7621a @880, so a dual-core.

Right, so some benefit might be possible here. Does the NIC have
multiple hardware queues (`ls /sys/class/net/$IFACE/queues` should tell
you)?

> And we are looking for a good quad-core platform that will support
> 600Mbps or more with Cake enabled, hopefully with AX radios as well.

Yup, the number of cores is only going to go up, so for CAKE to stay
relevant it'll need to be able to take advantage of this eventually :)

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Toke Høiland-Jørgensen via Bloat
Jonathan Morton  writes:

>> On 1 Sep, 2020, at 9:45 pm, Toke Høiland-Jørgensen via Bloat 
>>  wrote:
>> 
>> CAKE takes the global qdisc lock.
>
> Presumably this is a default mechanism because CAKE doesn't handle any
> locking itself.
>
> Obviously it would need to be replaced with at least a lock over
> CAKE's complete data structures, taking the lock on each entry point
> and releasing it at each return point, and I assume there is a flag we
> can set to indicate we do so. Finer-grained locking might be possible,
> but CAKE is fairly complex so that might be hard to implement. Locking
> per CAKE instance would at least allow running ingress and egress on
> different CPUs.

What you're describing here is basically the existing qdisc root lock.
It is per instance of the qdisc, and it is held only while enqueueing
and dequeueing packets from that qdisc. So it is possible today to run
the ingress and egress instances of CAKE on different CPUs. All you have
to do is schedule the packets to be processed on different CPUs in the
different directions - which usually means messing with RPS settings for
the NIC, and as I remarked to Sebastian, for many OpenWrt SOCs this is
not really supported...

To make CAKE truly take advantage of multiple CPUs, there are to
options:

1. Make it aware of multiple hardware queues. To do this, we to
   implement the 'attach()' method in the Qdisc_ops struct (see sch_mq
   for an example). The idea here would be to create stub child qdiscs
   with a separate struct Qdisc_ops implementing enqueue() and
   dequeue(). These would be called separately for each hardware queue,
   with their separate locks held at the time; and with proper XPS
   steering, each hardware queue can be serviced by a separate CPU.

2. Set the TCQ_F_NOLOCK in the qdisc flags; this will cause the existing
   enqueue() and dequeue() functions to be called without the root lock
   being held, and the qdisc is responsible for dealing with that
   itself.

Of course in either case, the trick is to get the CAKE data structures
to play nice with concurrent access from multiple CPUs. For option 1.
above, we could just duplicate all the flow queues for each netdev queue
and take the hit in wasted space - or we could partition the data
structure, either statically at init, or dynamically as each flow
becomes active. But at a minimum there would need to be some way for the
shaper to enforce the maximum rate. Maybe a granular lock or an atomic
is good enough for this, though?

Note also that for 2. there's an ongoing issue[0] with packets getting
stuck which is still unresolved, as far as I can tell - so not sure if
this is the right way to go. However, apart from this, the benefit of 2.
is that CAKE could *potentially* process packets on multiple CPUs
without relying on hardware multi-Q. I'm not quite sure if the stack
will actually process packets on more than one CPU without them,
though.

Either way, I suppose some experimentation would be needed to find the
best solution.

-Toke

[0] 
https://lore.kernel.org/netdev/CACS=qq+a0H=e8ylfu95ae7hr0bq9ytcbbn2rfx82ojnppkb...@mail.gmail.com/
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Morton
> On 1 Sep, 2020, at 11:04 pm, Sebastian Moeller  wrote:
> 
>> The challenge are the end users, who only understand the silly ’speed’ 
>> metric, and feel anything that lowers that number is a ‘bad’ thing. It takes 
>> effort to get even technical users to get it.
> 
>   I repeatedly fall into that trap...

For a lot of users, I rather suspect that setting 40/10 Mbps would give them 
entirely sufficient speed, and most existing CPE would be able to keep up with 
those settings even with all of Cake's bells and whistles turned on.

The trouble is that that might be 10% of what the cable company is advertising 
to them.

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Sebastian Moeller
Hi Jonathan,



> On Sep 1, 2020, at 21:31, Jonathan Foulkes  wrote:
> 
> Hi Sebastian, Cake functions wonderfully, it’s a marvel in terms of goodput.
> 
> My comment was more oriented at the metrics process users use to evaluate 
> results. Only those who spend time analyzing just how busy an ‘idle’ network 
> can be know that there are a lot of processes in constant communications with 
> their cloud services. 

True, intestinally, quite a number of speedtests seem to err on the 
side of too high, probably because that way users are happy to see something 
close to their contracted rates...

> The challenge are the end users, who only understand the silly ’speed’ 
> metric, and feel anything that lowers that number is a ‘bad’ thing. It takes 
> effort to get even technical users to get it.

I repeatedly fall into that trap...

> But even beyond the basic, the further cuts induced by fairness is the new 
> wrinkle in dealing with widely varying speed test results with isolation 
> enabled on a busy network.

Yes, but one can try to make lemonade out of it, by running speedtests 
from two devices while observing something like "sudo mtr -ezb4 -i 0.3 8.8.8.8" 
not budging much een though the tests come and go; demonstrating the quality of 
the isolation and that low queueing delay can "happen" even on a busy link.

> 
> The high density of devices and constant chatter with cloud services means 
> the average home has way more devices and connections than many realize. Keep 
> a note of the number of ‘active connections’ displayed on the OpenWRT 
> overview page, you might be surprised (well, not you Seb ;) )

Count me in, I just switched over to a turris omnia (which I had 
crowd-funded before I realized IQrouters will be delivered to Germany ;) ) and 
while playning with its pakon feature I was quite baffled by how many addresses 
are used even in a short amount of time. (All of this is just a hobby to me, so 
I keep forgetting stuff regularly, because I do approach things a bit casually 
at times).

> 
> As an example, on my network, I average 1,000 active connections all day, it 
> rarely drops below 700. And it’s just two WFH professionals and 60+ network 
> devices, not all of which are active at any one time.
> I actually run some custom firewall rules to de-prioritize four IoT devices 
> that generate a LOT of traffic to their services. Two of which power panel 
> monitors with real-time updates. This is why my bulk tin on egress has such 
> high traffic.

Nice, I think being able to deprioritize stuff is one of the best 
reasons for using diffserve.

> 
> Since you like to see tc output, here’s the one from my system after nearly a 
> week.
> I run four-layer Cake as we do a lot of Zoom calls and our accounts are set 
> up to do the appropriate DSCP marking.

I saw your nice writeup of how to do that on the OpenWrt forum IIRC. 
Need to talk to our IT guys at work, whether they are willing to actually 
configure it in the first place.


> 
> root@IQrouter:~# tc -s qdisc
> qdisc noqueue 0: dev lo root refcnt 2 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0
> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 
> target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
>  Sent 51311363856 bytes 86785488 pkt (dropped 53, overlimits 0 requeues 9114) 
>  backlog 0b 0p requeues 9114
>   maxpacket 12112 drop_overlimit 0 new_flow_count 691740 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> qdisc noqueue 0: dev br-lan root refcnt 2 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0
> qdisc noqueue 0: dev eth0.1 root refcnt 2 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0
> qdisc cake 8005: dev eth0.2 root refcnt 2 bandwidth 22478Kbit diffserv4 
> dual-srchost nat nowash ack-filter split-gso rtt 100.0ms raw overhead 0 mpu 
> 64 
>  Sent 6943407136 bytes 35467722 pkt (dropped 51747, overlimits 3912091 
> requeues 0) 
>  backlog 0b 0p requeues 0
>  memory used: 843816b of 4Mb
>  capacity estimate: 22478Kbit
>  min/max network layer size:   42 /1514
>  min/max overhead-adjusted size:   64 /1514
>  average network hdr offset:   14
> 
>Bulk  Best EffortVideoVoice
>   thresh   1404Kbit22478Kbit11239Kbit 5619Kbit
>   target 12.9ms5.0ms5.0ms5.0ms
>   interval  107.9ms  100.0ms  100.0ms  100.0ms
>   pk_delay5.9ms6.4ms3.7ms1.6ms
>   av_delay426us445us124us188us
>   sp_delay 13us 13us 12us  8us
>   backlog0b   0b   0b   0b
>   pkts  3984407 30899121   474818   161123
>   bytes   789740113   5883832402246917562 30556915
>   way_inds65175  

Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Foulkes
Hi Sebastian, Cake functions wonderfully, it’s a marvel in terms of goodput.

My comment was more oriented at the metrics process users use to evaluate 
results. Only those who spend time analyzing just how busy an ‘idle’ network 
can be know that there are a lot of processes in constant communications with 
their cloud services. 
The challenge are the end users, who only understand the silly ’speed’ metric, 
and feel anything that lowers that number is a ‘bad’ thing. It takes effort to 
get even technical users to get it.
But even beyond the basic, the further cuts induced by fairness is the new 
wrinkle in dealing with widely varying speed test results with isolation 
enabled on a busy network.

The high density of devices and constant chatter with cloud services means the 
average home has way more devices and connections than many realize. Keep a 
note of the number of ‘active connections’ displayed on the OpenWRT overview 
page, you might be surprised (well, not you Seb ;) )

As an example, on my network, I average 1,000 active connections all day, it 
rarely drops below 700. And it’s just two WFH professionals and 60+ network 
devices, not all of which are active at any one time.
I actually run some custom firewall rules to de-prioritize four IoT devices 
that generate a LOT of traffic to their services. Two of which power panel 
monitors with real-time updates. This is why my bulk tin on egress has such 
high traffic.

Since you like to see tc output, here’s the one from my system after nearly a 
week.
I run four-layer Cake as we do a lot of Zoom calls and our accounts are set up 
to do the appropriate DSCP marking.

root@IQrouter:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 
target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 51311363856 bytes 86785488 pkt (dropped 53, overlimits 0 requeues 9114) 
 backlog 0b 0p requeues 9114
  maxpacket 12112 drop_overlimit 0 new_flow_count 691740 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 8005: dev eth0.2 root refcnt 2 bandwidth 22478Kbit diffserv4 
dual-srchost nat nowash ack-filter split-gso rtt 100.0ms raw overhead 0 mpu 64 
 Sent 6943407136 bytes 35467722 pkt (dropped 51747, overlimits 3912091 requeues 
0) 
 backlog 0b 0p requeues 0
 memory used: 843816b of 4Mb
 capacity estimate: 22478Kbit
 min/max network layer size:   42 /1514
 min/max overhead-adjusted size:   64 /1514
 average network hdr offset:   14

   Bulk  Best EffortVideoVoice
  thresh   1404Kbit22478Kbit11239Kbit 5619Kbit
  target 12.9ms5.0ms5.0ms5.0ms
  interval  107.9ms  100.0ms  100.0ms  100.0ms
  pk_delay5.9ms6.4ms3.7ms1.6ms
  av_delay426us445us124us188us
  sp_delay 13us 13us 12us  8us
  backlog0b   0b   0b   0b
  pkts  3984407 30899121   474818   161123
  bytes   789740113   5883832402246917562 30556915
  way_inds65175  2580935 10645
  way_miss 1427   91852915960 1120
  way_cols0000
  drops   0 2966  5117
  marks   0  10500
  ack_drop04826300
  sp_flows2410
  bk_flows0000
  un_flows0000
  max_len  103543094 3094  590
  quantum   300  685  342  300

qdisc ingress : dev eth0.2 parent :fff1  
 Sent 43188461026 bytes 67870269 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-guest root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0-1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1-1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Morton
> On 1 Sep, 2020, at 9:45 pm, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> CAKE takes the global qdisc lock.

Presumably this is a default mechanism because CAKE doesn't handle any locking 
itself.

Obviously it would need to be replaced with at least a lock over CAKE's 
complete data structures, taking the lock on each entry point and releasing it 
at each return point, and I assume there is a flag we can set to indicate we do 
so.  Finer-grained locking might be possible, but CAKE is fairly complex so 
that might be hard to implement.  Locking per CAKE instance would at least 
allow running ingress and egress on different CPUs.

Is there an example anywhere on how to do this?

 - Jonathan Morton
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Foulkes
Thanks Toke, we currently are on an MT7621a @880, so a dual-core.
And we are looking for a good quad-core platform that will support 600Mbps or 
more with Cake enabled, hopefully with AX radios as well.

Jonathan

> On Sep 1, 2020, at 12:11 PM, Toke Høiland-Jørgensen  wrote:
> 
> Jonathan Foulkes  writes:
> 
>> Toke, that link returns a 404 for me.
> 
> Ah, seems an extra character snuck in at the end - try this:
> 
> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6
> 
>> For others, I’ve found that testing cake throughput with isolation options 
>> enabled is tricky if there are many competing connections. 
>> Like I keep having to tell my customers, fairness algorithms mean no one 
>> device will ever gain 100% of the bandwidth so long as there are other open 
>> & active connections from other devices.
>> 
>> That said, I’d love to find options to increase throughput for
>> single-tin configs.
> 
> Yeah, doing something about this is on my list, one way or another. Not
> sure how much more we can do in terms of overhead, so we may have to go
> for multi-q (and multi-CPU) support. How many CPU cores does the
> IQrouter have?
> 
> -Toke

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Toke Høiland-Jørgensen via Bloat
Sebastian Moeller  writes:

> Hi Toke,
>
>
>> On Sep 1, 2020, at 18:11, Toke Høiland-Jørgensen via Bloat 
>>  wrote:
>> 
>> Jonathan Foulkes  writes:
>> 
>>> Toke, that link returns a 404 for me.
>> 
>> Ah, seems an extra character snuck in at the end - try this:
>> 
>> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6
>> 
>>> For others, I’ve found that testing cake throughput with isolation options 
>>> enabled is tricky if there are many competing connections. 
>>> Like I keep having to tell my customers, fairness algorithms mean no one 
>>> device will ever gain 100% of the bandwidth so long as there are other open 
>>> & active connections from other devices.
>>> 
>>> That said, I’d love to find options to increase throughput for
>>> single-tin configs.
>> 
>> Yeah, doing something about this is on my list, one way or another. Not
>> sure how much more we can do in terms of overhead, so we may have to go
>> for multi-q (and multi-CPU) support. How many CPU cores does the
>> IQrouter have?
>
>   It might be worth looking how the typical two cake instances
>   distribute across the available CPUs, in some version of OpenWrt
>   all cake's and ethernet interupt processing crowed up on a
>   single CPU leading to "out of CPU" behaviour with 50% idle
>   remaining... I think that usinf a different RPS scheme might
>   work better.

Well, many home routers don't have any functional RPS at all. Also, it
doesn't help since CAKE takes the global qdisc lock. Both of those
issues should be fixed, ideally :)

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Sebastian Moeller
Hi Toke,


> On Sep 1, 2020, at 18:11, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Jonathan Foulkes  writes:
> 
>> Toke, that link returns a 404 for me.
> 
> Ah, seems an extra character snuck in at the end - try this:
> 
> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6
> 
>> For others, I’ve found that testing cake throughput with isolation options 
>> enabled is tricky if there are many competing connections. 
>> Like I keep having to tell my customers, fairness algorithms mean no one 
>> device will ever gain 100% of the bandwidth so long as there are other open 
>> & active connections from other devices.
>> 
>> That said, I’d love to find options to increase throughput for
>> single-tin configs.
> 
> Yeah, doing something about this is on my list, one way or another. Not
> sure how much more we can do in terms of overhead, so we may have to go
> for multi-q (and multi-CPU) support. How many CPU cores does the
> IQrouter have?

It might be worth looking how the typical two cake instances distribute 
across the available CPUs, in some version of OpenWrt all cake's and ethernet 
interupt processing crowed up on a single CPU leading to "out of CPU" behaviour 
with 50% idle remaining... I think that usinf a different RPS scheme might work 
better.

Best Regards
Sebastian

> 
> -Toke
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Sebastian Moeller
HI Jonathan,

> On Sep 1, 2020, at 17:41, Jonathan Foulkes  wrote:
> 
> Toke, that link returns a 404 for me.
> 
> For others, I’ve found that testing cake throughput with isolation options 
> enabled is tricky if there are many competing connections. 

Are you talking about the fact that with competing connections, you 
only see the current isolation quantum's equivalent f the actual rate? In that 
case maybe parse the "tc -s qdisc" output to get an idea how much data/packets 
cake managed to push through in total in each direction instead of relaying on 
the measured goodput? I am probably barking up the wrong tree here...

> Like I keep having to tell my customers, fairness algorithms mean no one 
> device will ever gain 100% of the bandwidth so long as there are other open & 
> active connections from other devices.

That sounds like solid advice ;) Especially in the light of the 
exceedingly useful "ingress" keyword, which under-load-will drop depending on a 
flow's "unresponsiveness" such that more responsive flows end up getting a 
somewhat bigger share of the post-cake throughput...

> 
> That said, I’d love to find options to increase throughput for single-tin 
> configs.

With or without isolation options?

Best Regards
Sebastian

> 
> Cheers,
> 
> Jonathan
> 
>> On Aug 31, 2020, at 7:35 AM, Toke Høiland-Jørgensen via Bloat 
>>  wrote:
>> 
>> Mikael Abrahamsson via Bloat  writes:
>> 
>>> Hi,
>>> 
>>> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential 
>>> router, from my previous WRT1200AC (marvell armada 385).
>>> 
>>> I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 
>>> on the APU2.
>>> 
>>> Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much 
>>> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 
>>> and tried it again, and even the APU2 can only do CAKE up to ~300 
>>> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in 
>>> OpenWrt).
>>> 
>>> Looking in top, I see sirq% sitting at 50% pegged. This is typical what I 
>>> see when CPU based forwarding is maxed out. From my recollection of 
>>> running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE 
>>> using more CPU than FQ_CODEL.
>>> 
>>> Anyone know what's up? I'm fine running FQ_CODEL, it solves any 
>>> bufferbloat but... I thought CAKE supposedly should use less CPU, not 
>>> more?
>> 
>> Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that
>> would be FQ-CoDel+HTB)? An exact config might be useful (or just the
>> output of tc -s qdisc).
>> 
>> If you are indeed not shaping, maybe you're hitting the issue fixed by this 
>> commit?
>> 
>> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n
>> 
>> -Toke
>> ___
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
> 
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Toke Høiland-Jørgensen via Bloat
Jonathan Foulkes  writes:

> Toke, that link returns a 404 for me.

Ah, seems an extra character snuck in at the end - try this:

https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6

> For others, I’ve found that testing cake throughput with isolation options 
> enabled is tricky if there are many competing connections. 
> Like I keep having to tell my customers, fairness algorithms mean no one 
> device will ever gain 100% of the bandwidth so long as there are other open & 
> active connections from other devices.
>
> That said, I’d love to find options to increase throughput for
> single-tin configs.

Yeah, doing something about this is on my list, one way or another. Not
sure how much more we can do in terms of overhead, so we may have to go
for multi-q (and multi-CPU) support. How many CPU cores does the
IQrouter have?

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-09-01 Thread Jonathan Foulkes
Toke, that link returns a 404 for me.

For others, I’ve found that testing cake throughput with isolation options 
enabled is tricky if there are many competing connections. 
Like I keep having to tell my customers, fairness algorithms mean no one device 
will ever gain 100% of the bandwidth so long as there are other open & active 
connections from other devices.

That said, I’d love to find options to increase throughput for single-tin 
configs.

Cheers,

Jonathan

> On Aug 31, 2020, at 7:35 AM, Toke Høiland-Jørgensen via Bloat 
>  wrote:
> 
> Mikael Abrahamsson via Bloat  writes:
> 
>> Hi,
>> 
>> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential 
>> router, from my previous WRT1200AC (marvell armada 385).
>> 
>> I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 
>> on the APU2.
>> 
>> Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much 
>> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 
>> and tried it again, and even the APU2 can only do CAKE up to ~300 
>> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in 
>> OpenWrt).
>> 
>> Looking in top, I see sirq% sitting at 50% pegged. This is typical what I 
>> see when CPU based forwarding is maxed out. From my recollection of 
>> running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE 
>> using more CPU than FQ_CODEL.
>> 
>> Anyone know what's up? I'm fine running FQ_CODEL, it solves any 
>> bufferbloat but... I thought CAKE supposedly should use less CPU, not 
>> more?
> 
> Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that
> would be FQ-CoDel+HTB)? An exact config might be useful (or just the
> output of tc -s qdisc).
> 
> If you are indeed not shaping, maybe you're hitting the issue fixed by this 
> commit?
> 
> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n
> 
> -Toke
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-08-31 Thread Toke Høiland-Jørgensen via Bloat
Mikael Abrahamsson  writes:

> On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:
>
>> Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that
>> would be FQ-CoDel+HTB)? An exact config might be useful (or just the
>> output of tc -s qdisc).
>
> Yeah, I guess I'm also using HTB to get the 900 megabit/s SQM is looking 
> for.

Ah, right, makes more sense :)

> If I only use FQ_CODEL to get interface speeds my performance is fine.

And what about when you're running CAKE in 'unlimited' mode?

>> If you are indeed not shaping, maybe you're hitting the issue fixed by this 
>> commit?
>>
>> https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n
>
> I enabled it just now to get the config.
>
> qdisc cake 8030: dev eth0 root refcnt 9 bandwidth 900Mbit besteffort 
> triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
> overhead 0

Hmm, right, you could try no-split-gso as an option as well; you're
pretty close to the point where we turn it off by default, and you're
getting pretty large packets (max_len), so your performance may be
suffering from the splitting...

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-08-31 Thread Mikael Abrahamsson via Bloat

On Mon, 31 Aug 2020, Toke Høiland-Jørgensen wrote:


Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that
would be FQ-CoDel+HTB)? An exact config might be useful (or just the
output of tc -s qdisc).


Yeah, I guess I'm also using HTB to get the 900 megabit/s SQM is looking 
for.


If I only use FQ_CODEL to get interface speeds my performance is fine.


If you are indeed not shaping, maybe you're hitting the issue fixed by this 
commit?

https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n


I enabled it just now to get the config.

qdisc cake 8030: dev eth0 root refcnt 9 bandwidth 900Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0

 Sent 4346128 bytes 11681 pkt (dropped 0, overlimits 1004 requeues 17)
 backlog 0b 0p requeues 17
 memory used: 33328b of 15140Kb
 capacity estimate: 900Mbit
 min/max network layer size:   42 /1514
 min/max overhead-adjusted size:   42 /1514
 average network hdr offset:   14

  Tin 0
  thresh900Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay 18us
  av_delay  6us
  sp_delay  4us
  backlog0b
  pkts11681
  bytes 4346128
  way_inds   30
  way_miss  735
  way_cols0
  drops   0
  marks   0
  ack_drop0
  sp_flows3
  bk_flows1
  un_flows0
  max_len 22710
  quantum  1514

qdisc ingress : dev eth0 parent :fff1 
 Sent 4716199 bytes 10592 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

...

qdisc cake 8031: dev ifb4eth0 root refcnt 2 bandwidth 900Mbit besteffort 
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw 
overhead 0

 Sent 4946683 bytes 10592 pkt (dropped 0, overlimits 492 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 35Kb of 15140Kb
 capacity estimate: 900Mbit
 min/max network layer size:   60 /1514
 min/max overhead-adjusted size:   60 /1514
 average network hdr offset:   14

  Tin 0
  thresh900Mbit
  target  5.0ms
  interval  100.0ms
  pk_delay 19us
  av_delay  6us
  sp_delay  4us
  backlog0b
  pkts10592
  bytes 4946683
  way_inds   33
  way_miss  969
  way_cols0
  drops   0
  marks   0
  ack_drop0
  sp_flows2
  bk_flows1
  un_flows0
  max_len 21196
  quantum  1514


--
Mikael Abrahamssonemail: swm...@swm.pp.se___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-08-31 Thread Toke Høiland-Jørgensen via Bloat
Mikael Abrahamsson via Bloat  writes:

> Hi,
>
> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential 
> router, from my previous WRT1200AC (marvell armada 385).
>
> I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3 
> on the APU2.
>
> Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much 
> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000 
> and tried it again, and even the APU2 can only do CAKE up to ~300 
> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in 
> OpenWrt).
>
> Looking in top, I see sirq% sitting at 50% pegged. This is typical what I 
> see when CPU based forwarding is maxed out. From my recollection of 
> running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE 
> using more CPU than FQ_CODEL.
>
> Anyone know what's up? I'm fine running FQ_CODEL, it solves any 
> bufferbloat but... I thought CAKE supposedly should use less CPU, not 
> more?

Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper (that
would be FQ-CoDel+HTB)? An exact config might be useful (or just the
output of tc -s qdisc).

If you are indeed not shaping, maybe you're hitting the issue fixed by this 
commit?

https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d37ec10e6n

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] CAKE in openwrt high CPU

2020-08-30 Thread Dave Taht
cake reschedules too much compared to the tweaks we have to keep htb
fed, at these rates.

It was kind of my hope to gain a hw assist in future versions of the
apu series. a programmable completion interrupt is available in some
versions of that chipset,

On Sun, Aug 30, 2020 at 10:27 AM Mikael Abrahamsson via Bloat
 wrote:
>
>
> Hi,
>
> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential
> router, from my previous WRT1200AC (marvell armada 385).
>
> I was running OpenWrt 18.06 on that one, now I am running latest 19.07.3
> on the APU2.
>
> Before I had 500/100 and I had to use FQ_CODEL because CAKE took too much
> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to 1000/1000
> and tried it again, and even the APU2 can only do CAKE up to ~300
> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM in
> OpenWrt).
>
> Looking in top, I see sirq% sitting at 50% pegged. This is typical what I
> see when CPU based forwarding is maxed out. From my recollection of
> running CAKE on earlier versions of openwrt (17.x) I don't remember CAKE
> using more CPU than FQ_CODEL.
>
> Anyone know what's up? I'm fine running FQ_CODEL, it solves any
> bufferbloat but... I thought CAKE supposedly should use less CPU, not
> more?
>
> --
> Mikael Abrahamssonemail: swm...@swm.pp.se
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



-- 
"For a successful technology, reality must take precedence over public
relations, for Mother Nature cannot be fooled" - Richard Feynman

d...@taht.net  CTO, TekLibre, LLC Tel: 1-831-435-0729
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat