Re: [Cake] Fwd: net-next is OPEN...

2018-04-16 Thread Toke Høiland-Jørgensen
Dave Taht  writes:

> do we consider cake ready this time?

I'm not aware of anything outstanding, at least...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

 your solution significantly hurts performance in the common case
>>> 
>>> I'm sorry - did someone actually describe such a case?  I must have
>>> missed it.
>> 
>> I started this whole thread by pointing out that this behaviour results
>> in the delay of the TCP flows scaling with the number of active flows;
>> and that for 32 active flows (on a 10Mbps link), this results in the
>> latency being three times higher than for FQ-CoDel on the same link.
>
> Okay, so intra-flow latency is impaired for bulk flows sharing a
> relatively low-bandwidth link. That's a metric which few people even
> know how to measure for bulk flows, though it is of course important
> for sparse flows. I was hoping you had a common use-case where
> *sparse* flow latency was impacted, in which case we could actually
> discuss it properly.
>
> But *inter-flow* latency is not impaired, is it? Nor intra-sparse-flow
> latency? Nor packet loss, which people often do measure (or at least
> talk about measuring) - quite the opposite? Nor goodput, which people
> *definitely* measure and notice, and is influenced more strongly by
> packet loss when in ingress mode?

As I said, I'll run more tests and post more data once I have time.

> The measurement you took had a baseline latency in the region of 60ms.

The baseline link latency is 50 ms; which is sorta what you'd expect
from a median non-CDN'en internet connection.

> That's high enough for a couple of packets per flow to be in flight
> independently of the bottleneck queue.

Yes. As is the case for most flows going over the public internet...

> I would take this argument more seriously if a use-case that mattered
> was identified.

Use cases where intra-flow latency matters, off the top of my head:

- Real-time video with congestion response
- Multiple connections multiplexed over a single flow (HTTP/2 or
  QUIC-style)
- Anything that behaves more sanely than TCP at really low bandwidths.

But yeah, you're right, no one uses any of those... /s

> So far, I can't even see a coherent argument for making this tweak
> optional (which is of course possible), let alone removing it
> entirely; we only have a single synthetic benchmark which shows one
> obscure metric move in the "wrong" direction, versus a real use-case
> identified by an actual user in which this configuration genuinely
> helps.

And I've been trying to explain why you are the one optimising for
pathological cases at the expense of the common case.

But I don't think we are going to agree based on a theoretical
discussion. So let's just leave this and I'll return with some data once
I've had a chance to run some actual tests of the different use cases.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] Diffserv LLT mode

2018-04-19 Thread Toke Høiland-Jørgensen
Is anyone actually using the LLT diffserv setting? The draft describing
it seems to have expired ages ago:

https://datatracker.ietf.org/doc/draft-you-tsvwg-latency-loss-tradeoff/

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] net-next is OPEN...

2018-04-20 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Apr 18, 2018, at 7:43 AM, Pete Heist  wrote:
>> 
>> I also think I saw this happen at lower bandwidths as well, when the CPU 
>> wasn’t loaded. What I’ll do is re-test on the current version I have at, 
>> say, 50Mbit (or to where load drops substantially), then update to the head 
>> and test again, and let you know...
>> 
>>> On Apr 17, 2018, at 3:52 PM, Jonathan Morton  wrote:
>>> 
 On 16 Apr, 2018, at 11:55 pm, Pete Heist  wrote:
 
 I remember that fairness behavior at low RTTs (< 20ms) needed to be either 
 improved or documented
>>> 
>>> The reason for the behaviour, IIRC, was that throughput dropped below 100% 
>>> when the latency target was reduced too much.  Since then there has been a 
>>> small change which might improve it a little, so a retest would be 
>>> reasonable.
>
> At 50mbit I don’t see nearly as much fairness degradation at low RTTs, 
> although there’s some. Even at 100us, “fairness” is around 1.1 (1.0 being 
> perfectly fair) instead of the 2.x I saw at 500mbit. I do not see much of a 
> difference between the latest code (16d7fed, 2018-04-17) and the previous 
> code I tested (7061401, 2017-12-01), if that info is of use.
>
> RTT: tcp_1up upload Mbps / tcp_12up upload Mbps
>
> 7061401 (2017-12-01):
>
>100us: 23.80 / 25.85
>1ms: 23.89 / 29.46
>10ms: 23.93 / 24.66
>40ms: 23.96 / 24.10
>100ms: 23.97 / 24.12
>
> 16d7fed (2018-04-17):
>
>100us: 23.97 / 26.49
>1ms: 23.89 / 26.27
>10ms: 23.98 / 26.37
>40ms: 23.94 / 24.08
>100ms: 23.97 / 24.12
>
> I can post reports / flent files on request.
>
> So it appears this is CPU related, and not worth exploring further(?)
> and not worth documenting(?) other than that once things have
> stabilized, documenting how Cake degrades under various extreme
> conditions would be informative.

Awesome, thanks for re-testing! :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v2] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
Georgios Amanakis  writes:

> On Tue, Apr 24, 2018 at 11:47 AM, Georgios Amanakis  
> wrote:
>>>
>>> Does anyone know if there is a way to do this so the module/builtin
>>> split doesn't bite us?
>>>
>> #ifdef CONFIG_NF_CONNTRACK ??

That is basically what we're doing. But it looks like there's an
IS_REACHABLE macro which does what we need. Will fix, thanks for
pointing it out :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Testing wanted: Statistics API rework

2018-04-25 Thread Toke Høiland-Jørgensen
Pete Heist <p...@eventide.io> writes:

>> On Apr 24, 2018, at 11:30 PM, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>> 
>> If someone has time to test it (especially on mips!), that would be
>> great; I think we still have a few days left of net-next being open, so
>> if it works for you, I'll resubmit tomorrow...
>
> I requested a re-build over in the Ubiquiti EdgeOS forums. Between
> ER-X and ER-L I think that would cover 32-bit mips (little endian) and
> 64-bit mips (big endian). Anything specific to test besides “it
> works"? :)

We need to verify that it will still print statistics and that the
values do not look weird (especially the per-tin 'bytes' counter since
that is a 64bit value).

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH ipruote2-next v4] Add support for cake qdisc

2018-04-25 Thread Toke Høiland-Jørgensen
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.t...@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
Changelog:
v4:
  - Switch stats parsing to use nested netlink attributes
  - Tweaks to JSON stats output keys
  
v3:
  - Remove accidentally included test flag
  
v2:
  - Updated netlink config ABI
  - Remove diffserv-llt mode
  - Various tweaks and clean-ups of stats output

 man/man8/tc-cake.8 | 632 +
 man/man8/tc.8  |   1 +
 tc/Makefile|   1 +
 tc/q_cake.c| 761 +
 4 files changed, 1395 insertions(+)
 create mode 100644 man/man8/tc-cake.8
 create mode 100644 tc/q_cake.c

diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
new file mode 100644
index ..30e41bc9
--- /dev/null
+++ b/man/man8/tc-cake.8
@@ -0,0 +1,632 @@
+.TH CAKE 8 "23 November 2017" "iproute2" "Linux"
+.SH NAME
+CAKE \- Common Applications Kept Enhanced (CAKE)
+.SH SYNOPSIS
+.B tc qdisc ... cake
+.br
+[
+.BR bandwidth
+RATE |
+.BR unlimited*
+|
+.BR autorate_ingress
+]
+.br
+[
+.BR rtt
+TIME |
+.BR datacentre
+|
+.BR lan
+|
+.BR metro
+|
+.BR regional
+|
+.BR internet*
+|
+.BR oceanic
+|
+.BR satellite
+|
+.BR interplanetary
+]
+.br
+[
+.BR besteffort
+|
+.BR diffserv8
+|
+.BR diffserv4
+|
+.BR diffserv3*
+]
+.br
+[
+.BR flowblind
+|
+.BR srchost
+|
+.BR dsthost
+|
+.BR hosts
+|
+.BR flows
+|
+.BR dual-srchost
+|
+.BR dual-dsthost
+|
+.BR triple-isolate*
+]
+.br
+[
+.BR nat
+|
+.BR nonat*
+]
+.br
+[
+.BR wash
+|
+.BR nowash*
+]
+.br
+[
+.BR ack-filter
+|
+.BR ack-filter-aggressive
+|
+.BR no-ack-filter*
+]
+.br
+[
+.BR memlimit
+LIMIT ]
+.br
+[
+.BR ptm
+|
+.BR atm
+|
+.BR noatm*
+]
+.br
+[
+.BR overhead
+N |
+.BR conservative
+|
+.BR raw*
+]
+.br
+[
+.BR mpu
+N ]
+.br
+[
+.BR ingress
+|
+.BR egress*
+]
+.br
+(* marks defaults)
+
+
+.SH DESCRIPTION
+CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline
+which uses both AQM and FQ.  It combines COBALT, which is an AQM algorithm
+combining Codel and BLUE, a shaper which operates in deficit mode, and a 
variant
+of DRR++ for flow isolation.  8-way set-associative hashing is used to 
virtually
+eliminate hash collisions.  Priority queuing is available through a simplified
+diffserv implementation.  Overhead compensation for various encapsulation
+schemes is tightly integrated.
+
+All settings are optional; the default settings are chosen to be sensible in
+most common deployments.  Most people will only need to set the
+.B bandwidth
+parameter to get useful results, but reading the
+.B Overhead Compensation
+and
+.B Round Trip Time
+sections is strongly encouraged.
+
+.SH SHAPER PARAMETERS
+CAKE uses a deficit-mode shaper, which does not exhibit the initial burst
+typical of token-bucket shapers.  It will automatically burst precisely as much
+as required to maintain the configured throughput.  As such, it is very
+straightforward to configure.
+.PP
+.B unlimit

[Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Toke Høiland-Jørgensen
sch_cake targets the home router use case and is intended to squeeze the
most bandwidth and latency out of even the slowest ISP links and routers,
while presenting an API simple enough that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash

CAKE is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Extensive support for DSL framing types.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency
  variation.

A paper describing the design of CAKE is available at
https://arxiv.org/abs/1804.07617

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

CAKE's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

tc -s qdisc show dev eth2
qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 
100.0ms raw overhead 0
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 0b of 500b
 capacity estimate: 100Mbit
 min/max network layer size:65535 /   0
 min/max overhead-adjusted size:65535 /   0
 average network hdr offset:0

   Bulk  Best EffortVoice
  thresh   6250Kbit  100Mbit   25Mbit
  target  5.0ms5.0ms5.0ms
  interval  100.0ms  100.0ms  100.0ms
  pk_delay  0us  0us  0us
  av_delay  0us  0us  0us
  sp_delay  0us  0us  0us
  pkts000
  bytes   000
  way_inds000
  way_miss000
  way_cols000
  drops   000
  marks   000
  ack_drop000
  sp_flows000
  bk_flows000
  un_flows000
  max_len 000
  quantum   300 1514  762

Tested-by: Pete Heist <petehe...@gmail.com>
Tested-by: Georgios Amanakis <gamana...@gmail.com>
Signed-off-by: Dave Taht <dave.t...@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
Changelog:

v3:
  - Use IS_REACHABLE() macro to fix compilation when sch_cake is
built-in and conntrack is a module.
  - Switch the stats output to use nested netlink attributes instead
of a versioned struct.
  - Remove GPL boilerplate.
  - Fix array initialisation style.

v2:
  - Fix kbuild test bot complaint
  - Clean up the netlink ABI
  - Fix checkpatch complaints
  - A few tweaks to the behaviour of cake based on testing carried out
while writing the paper.

 include/uapi/linux/pkt_sched.h |  104 ++
 net/sched/Kconfig  |   11 +
 net/sched/Makefile |1 +
 net/sched/sch_cake.c   | 2616 
 4 files changed, 2732 insertions(+)
 create mode 100644 net/sched/sch_cake.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096ae97b..a54474ac8259 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -934,4 +934,108 @@ enum {
 
 #define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
 
+/* CAKE */
+enum {
+   TCA_CAKE_UNSPEC,
+   TCA_CAKE_BASE_RATE,
+   TCA_CAKE_DIFFSERV_MODE,
+   TCA_CAKE_ATM,
+   TCA_CAKE_FLOW_MODE,
+   TCA_CAKE_OVERHEAD,
+   TCA_CAKE_RTT,
+ 

[Cake] Pre-print of Cake paper available

2018-04-23 Thread Toke Høiland-Jørgensen
Last week we submitted an academic paper describing Cake. A pre-print is
now available on arXiv: https://arxiv.org/abs/1804.07617

Comments welcome, of course :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] SKB queue API usage

2018-04-23 Thread Toke Høiland-Jørgensen
I've started looking at what changes are needed before Cake can be
submitted upstream (see the upstream-4.18 branch of the repo).

While looking over the data structures I noticed that struct cake_flow
contains pointers to struct sk_buff instread of using the in-kernel
struct sk_buff_head queueing structure.

Why is that?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Testing variants of the MTU latency scaling

2018-04-23 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> Based on all this, I propose we change the scaling mechanism so that it
>> is only active in egress mode, and change it from 4 MTUs to 2. I'll
>> merge Kevin's patch to do this unless someone complains loudly :)
>
> I suppose that makes enough sense.  But let me look over the patch
> again - there's probably a better way to do it.

Right. Pushed the patch to the cobalt branch, feel free to fix it up :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Pre-print of Cake paper available

2018-04-24 Thread Toke Høiland-Jørgensen
Sebastian Moeller <moell...@gmx.de> writes:

> Hi Toke,
>
>
>
>> On Apr 24, 2018, at 10:47, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>> 
>> Sebastian Moeller <moell...@gmx.de> writes:
>> 
>>>> On Apr 24, 2018, at 01:01, Pete Heist <p...@eventide.io> wrote:
>>>> 
>>>> 
>>>>> On Apr 23, 2018, at 10:39 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>>>>> 
>>>>> Last week we submitted an academic paper describing Cake. A pre-print is
>>>>> now available on arXiv: https://arxiv.org/abs/1804.07617
>>>>> 
>>>>> Comments welcome, of course :)
>>>> 
>>>> 
>>>> Nice work overall… :) Below is some feedback on content, and attached is a 
>>>> marked up PDF with some feedback on grammar and wording. Click the vanilla 
>>>> squares to show the notes.
>>>> 
>>>> Content:
>>>> 
>>>> - I wish there were some reference on how widespread of a problem 
>>>> bufferbloat actually is on the current Internet. That would bolster the 
>>>> initial assertion in the introduction.
>>>> 
>>>> - Thank you, I finally “get" triple-isolate. :) But I find it easier
>>>> to understand the behavior of dual-srchost and dual-dsthost, and I
>>>> think most would prefer its behavior, despite the fact that it needs
>>>> to be configured manually. Just a thought, knowing that cake
>>>> currently targets home gateways, and that there are now the egress
>>>> and ingress keywords, could host isolation default to dual-srchost
>>>> for egress mode and dual-dsthost for ingress mode? Or since using the
>>>> keywords would be fragile, is there a better way to know the proper
>>>> sense for dual-srchost and dual-dsthost?
>>> 
>>> The challenge is to find a heuristic that covers all reasonable
>>> use cases and does no harm in unexpected cases. I could envision
>>> setting up a cake instance on the upstream end of say a
>>> microwave link, there "ingress" seems like the appropriate
>>> keyword (as the goal would be to keep the link non-congested),
>>> but for customer fairness "dual-srchost" would be the
>>> appropriate keyword (or just srchost if all the ISP cares for is
>>> inter-customer fairness). Sure this will not work with IPv6 (for
>>> that we would either need to llok at the MACs or IMHO preferably
>>> the IPv6 prefix (or the partially masked IPv6 IP-address, I
>>> believe this to be better than MAC adresses as the ISP can
>>> easily control the prefix, but I digress)).
>> 
>> I don't think we can make assumptions on ISP deployments.
>
> Sure we do not really need to:
> https://forum.lede-project.org/t/transparent-cake-box/2161/4?u=moeller0
> and
> https://forum.lede-project.org/t/lede-as-a-dedicated-qos-bufferbloat-appliance/1861/14?u=moeller0
> so it looks like one person already use cake in an small ISP context.
> Now 1 is not a very convincing number, but certainly larger than
> zero...

Well I just meant that there are many ways to deploy a shaper in an ISP
context (centrally in the network, at the next hop from the customer,
etc).

Looking at those threads, they seem to be increasing the number of
queues. Not sure they need to, but, well, there's nothing in principle
that says this couldn't be configurable (it is in FQ-CoDel). It would
need a bit of a reorg of the current code, though, so that would be a
thing for later I guess...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Pre-print of Cake paper available

2018-04-24 Thread Toke Høiland-Jørgensen
Sebastian Moeller  writes:

>> Looking at those threads, they seem to be increasing the number of
>> queues. Not sure they need to, but, well, there's nothing in principle
>> that says this couldn't be configurable (it is in FQ-CoDel). It would
>> need a bit of a reorg of the current code, though, so that would be a
>> thing for later I guess...
>
>   I had hoped that orangetek could be convinced to do measurements
>   with different number of queues to answer that question, but I
>   guess the current default of 1000 queues is decent for a typical
>   homenet, but will simply become a bit tight with 600 concurrent
>   households (I assume that its the peak usage that would want
>   more queues on the "back-haul", on average 1000 might still be
>   okay, especially with Jonathan's clever set associativity
>   design)

Well, there's
https://www.researchgate.net/profile/S_Oueslati/publication/247045479_On_the_Scalability_of_Fair_Queueing/links/5451fab10cf285a067c6af0a/On-the-Scalability-of-Fair-Queueing.pdf

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH] Remove plain pkt_sched.h dependency

2018-04-24 Thread Toke Høiland-Jørgensen
Georgios Amanakis  writes:

> I just tried an in-tree built using net-next. Aside from that line it
> compiles on x86_64 successfully. I will give it a try on my router
> later on.

Yeah, I was getting to that; but removing that include means the changes
have to be imported into the kernel-tree pkt_sched.h (which I assume you
did to compile it?), so I wanted to finish up any other changes first...

Thanks for testing!

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Pre-print of Cake paper available

2018-04-24 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Apr 24, 2018, at 10:50 AM, Jonathan Morton  wrote:
>> 
>>> I think, if we wanted to support the ISP case, that a per-customer *shaper* 
>>> is more useful.
>> 
>> Yes, I think the technology can be recoded to better suit a
>> multi-subscriber environment; it would no longer be Cake, but would
>> use some of the same key algorithms.
>
> Right, I was going to comment on the paper- where’s the ISP backhaul
> use case? (Because I might still try to use it that way.) But I was
> sure that this was already considered, and am not surprised that it
> might be a different but related project.

Well, you could use it on an ISP backhaul by having a separate CAKE
instance per customer, and having another mechanism to assign customer
traffic to each. An HTB tree would be one way to do this; separate
interfaces per customer would be another. The latter works well if
customers are on separate VLANs, for instance...

But yeah, having a single instance of CAKE solve all of your customer
shaping problems is probably not in scope currently... ;)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Pre-print of Cake paper available

2018-04-24 Thread Toke Høiland-Jørgensen
Sebastian Moeller <moell...@gmx.de> writes:

>> On Apr 24, 2018, at 01:01, Pete Heist <p...@eventide.io> wrote:
>> 
>> 
>>> On Apr 23, 2018, at 10:39 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>>> 
>>> Last week we submitted an academic paper describing Cake. A pre-print is
>>> now available on arXiv: https://arxiv.org/abs/1804.07617
>>> 
>>> Comments welcome, of course :)
>> 
>> 
>> Nice work overall… :) Below is some feedback on content, and attached is a 
>> marked up PDF with some feedback on grammar and wording. Click the vanilla 
>> squares to show the notes.
>> 
>> Content:
>> 
>> - I wish there were some reference on how widespread of a problem 
>> bufferbloat actually is on the current Internet. That would bolster the 
>> initial assertion in the introduction.
>> 
>> - Thank you, I finally “get" triple-isolate. :) But I find it easier
>> to understand the behavior of dual-srchost and dual-dsthost, and I
>> think most would prefer its behavior, despite the fact that it needs
>> to be configured manually. Just a thought, knowing that cake
>> currently targets home gateways, and that there are now the egress
>> and ingress keywords, could host isolation default to dual-srchost
>> for egress mode and dual-dsthost for ingress mode? Or since using the
>> keywords would be fragile, is there a better way to know the proper
>> sense for dual-srchost and dual-dsthost?
>
>   The challenge is to find a heuristic that covers all reasonable
>   use cases and does no harm in unexpected cases. I could envision
>   setting up a cake instance on the upstream end of say a
>   microwave link, there "ingress" seems like the appropriate
>   keyword (as the goal would be to keep the link non-congested),
>   but for customer fairness "dual-srchost" would be the
>   appropriate keyword (or just srchost if all the ISP cares for is
>   inter-customer fairness). Sure this will not work with IPv6 (for
>   that we would either need to llok at the MACs or IMHO preferably
>   the IPv6 prefix (or the partially masked IPv6 IP-address, I
>   believe this to be better than MAC adresses as the ISP can
>   easily control the prefix, but I digress)).

I don't think we can make assumptions on ISP deployments. The shaper may
or may not be at the point of NAT, and per-customer prefix size can
vary. To properly support the ISP per-customer fairness use case, we'd
probably need to support arbitrary filtering (like what FQ-CoDel
supports with 'tc filter'). And I think, if we wanted to support the ISP
case, that a per-customer *shaper* is more useful...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Pre-print of Cake paper available

2018-04-24 Thread Toke Høiland-Jørgensen
Pete Heist <p...@eventide.io> writes:

>> On Apr 23, 2018, at 10:39 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>> 
>> Last week we submitted an academic paper describing Cake. A pre-print is
>> now available on arXiv: https://arxiv.org/abs/1804.07617
>> 
>> Comments welcome, of course :)
>
>
> Nice work overall… :) Below is some feedback on content, and attached
> is a marked up PDF with some feedback on grammar and wording. Click
> the vanilla squares to show the notes.

Thanks! A lot of those should have been fixed before submission; boy,
did I make a mess of verb tenses... Ah well, I'll incorporate your
fixes, so they will be fixed for the camera ready (assuming it gets
accepted) :)

> Content:
>
> - I wish there were some reference on how widespread of a problem
> bufferbloat actually is on the current Internet. That would bolster
> the initial assertion in the introduction.

Hmm, I do actually have a paper of my own that I could cite for this ;)
The trouble is that we have a pretty tight page limit, and adding
another reference takes us over that, meaning we would have to cut
something else. And I think we can do without in an academic paper
context at least...

> - Thank you, I finally “get" triple-isolate. :) But I find it easier
> to understand the behavior of dual-srchost and dual-dsthost, and I
> think most would prefer its behavior, despite the fact that it needs
> to be configured manually. Just a thought, knowing that cake currently
> targets home gateways, and that there are now the egress and ingress
> keywords, could host isolation default to dual-srchost for egress mode
> and dual-dsthost for ingress mode? Or since using the keywords would
> be fragile, is there a better way to know the proper sense for
> dual-srchost and dual-dsthost?
>
> - If ‘nat’ is not the default, won’t host isolation not work by
> default for most home gateways, almost all of which do nat? (Untested
> assumption.)

I think these questions are actually better handled as userspace policy
issues. For instance, in sqm-scripts we could reasonably default to
dual-srchost on egress and dual-dsthost on ingress, as we know which is
which.

> - Not in the paper, but is the ‘wash’ keyword really needed?

Is anything really *needed*? ;) It's useful in settings where you want to
clear diffserv markings, and not in other places...

> - Is it worth mentioning that when the home gateway’s uplink is WiFi,
> shaping is hard to do reliably, overhead and framing compensation
> can’t even be implemented, and that this is all more properly handled
> in the WiFi specific work?
>
> - One of the biggest deployment challenges (not unique to cake) is
> that most people have to use shaping, since deploying cake on their
> gateway’s external interface isn’t practical. But setting the rate
> properly for shaping isn’t always straightforward. This is sheer
> speculation, but could observed latency (obtained by passively
> measuring TCP RTT, for instance) be used as a signal to control the
> rate? I can only imagine this might be difficult to get right (though
> I would’ve thought what BBR does is also), so just take this as food
> for thought...

I'd categorise these as relevant issues that we don't have space to
discuss in the paper ;)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Testing variants of the MTU latency scaling

2018-04-24 Thread Toke Høiland-Jørgensen
Jonas Mårtensson  writes:

> One thing that is still not clear to me from these results: if I run
> cake on an IFB without ingress mode (i.e. the default?), does the MTU
> scaling have any impact on TCP download throughput?

Odds are that not using ingress mode will make Cake lose control of the
bottleneck (that is what happened when I tried running a quick test),
and so will mess up both latency and throughput as you hit the bloated
upstream link buffer...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next v2] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
sch_cake targets the home router use case and is intended to squeeze the
most bandwidth and latency out of even the slowest ISP links and routers,
while presenting an API simple enough that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash

CAKE is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Extensive support for DSL framing types.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency
  variation.

A paper describing the design of CAKE is available at
https://arxiv.org/abs/1804.07617

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

CAKE's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

tc -s qdisc show dev eth2
qdisc cake 1: root refcnt 2 bandwidth 100Mbit diffserv3 triple-isolate rtt 
100.0ms raw overhead 0
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 0b of 500b
 capacity estimate: 100Mbit
 min/max network layer size:65535 /   0
 min/max overhead-adjusted size:65535 /   0
 average network hdr offset:0

   Bulk  Best EffortVoice
  thresh   6250Kbit  100Mbit   25Mbit
  target  5.0ms5.0ms5.0ms
  interval  100.0ms  100.0ms  100.0ms
  pk_delay  0us  0us  0us
  av_delay  0us  0us  0us
  sp_delay  0us  0us  0us
  pkts000
  bytes   000
  way_inds000
  way_miss000
  way_cols000
  drops   000
  marks   000
  ack_drop000
  sp_flows000
  bk_flows000
  un_flows000
  max_len 000
  quantum   300 1514  762

Tested-by: Pete Heist <petehe...@gmail.com>
Tested-by: Georgios Amanakis <gamana...@gmail.com>
Signed-off-by: Dave Taht <dave.t...@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
Changes since v1:
  - Fix kbuild test bot complaint
  - Clean up the netlink ABI
  - Fix checkpatch complaints
  - A few tweaks to the behaviour of cake based on testing carried out
while writing the paper.

include/uapi/linux/pkt_sched.h |  107 ++
 net/sched/Kconfig  |   11 +
 net/sched/Makefile |1 +
 net/sched/sch_cake.c   | 2620 
 4 files changed, 2739 insertions(+)
 create mode 100644 net/sched/sch_cake.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096ae97b..919f7c698fed 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -934,4 +934,111 @@ enum {
 
 #define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
 
+/* CAKE */
+enum {
+   TCA_CAKE_UNSPEC,
+   TCA_CAKE_BASE_RATE,
+   TCA_CAKE_DIFFSERV_MODE,
+   TCA_CAKE_ATM,
+   TCA_CAKE_FLOW_MODE,
+   TCA_CAKE_OVERHEAD,
+   TCA_CAKE_RTT,
+   TCA_CAKE_TARGET,
+   TCA_CAKE_AUTORATE,
+   TCA_CAKE_MEMORY,
+   TCA_CAKE_NAT,
+   TCA_CAKE_RAW, // was _ETHERNET
+   TCA_CAKE_WASH,
+   TCA_CAKE_MPU,
+   TCA_CAKE_INGRESS,
+   TCA_CAKE_ACK_FILTER,
+   __TCA_CAKE_MAX
+};
+#define

[Cake] [PATCH iproute2-next v2] Add support for cake qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
sch_cake targets the home router use case and is intended to squeeze the
most bandwidth and latency out of even the slowest ISP links and routers,
while presenting an API simple enough that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash

CAKE is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency
  variation.

A paper describing the design of CAKE is available at
https://arxiv.org/abs/1804.07617

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.t...@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
Changes since v1:
  - Updated netlink ABI
  - Remove diffserv-llt mode
  - Various tweaks and clean-ups of stats output
  
 man/man8/tc-cake.8 | 632 +++
 man/man8/tc.8  |   1 +
 tc/Makefile|   1 +
 tc/q_cake.c| 797 +
 4 files changed, 1431 insertions(+)
 create mode 100644 man/man8/tc-cake.8
 create mode 100644 tc/q_cake.c

diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
new file mode 100644
index ..30e41bc9
--- /dev/null
+++ b/man/man8/tc-cake.8
@@ -0,0 +1,632 @@
+.TH CAKE 8 "24 April 2018" "iproute2" "Linux"
+.SH NAME
+CAKE \- Common Applications Kept Enhanced (CAKE)
+.SH SYNOPSIS
+.B tc qdisc ... cake
+.br
+[
+.BR bandwidth
+RATE |
+.BR unlimited*
+|
+.BR autorate_ingress
+]
+.br
+[
+.BR rtt
+TIME |
+.BR datacentre
+|
+.BR lan
+|
+.BR metro
+|
+.BR regional
+|
+.BR internet*
+|
+.BR oceanic
+|
+.BR satellite
+|
+.BR interplanetary
+]
+.br
+[
+.BR besteffort
+|
+.BR diffserv8
+|
+.BR diffserv4
+|
+.BR diffserv3*
+]
+.br
+[
+.BR flowblind
+|
+.BR srchost
+|
+.BR dsthost
+|
+.BR hosts
+|
+.BR flows
+|
+.BR dual-srchost
+|
+.BR dual-dsthost
+|
+.BR triple-isolate*
+]
+.br
+[
+.BR nat
+|
+.BR nonat*
+]
+.br
+[
+.BR wash
+|
+.BR nowash*
+]
+.br
+[
+.BR ack-filter
+|
+.BR ack-filter-aggressive
+|
+.BR no-ack-filter*
+]
+.br
+[
+.BR memlimit
+LIMIT ]
+.br
+[
+.BR ptm
+|
+.BR atm
+|
+.BR noatm*
+]
+.br
+[
+.BR overhead
+N |
+.BR conservative
+|
+.BR raw*
+]
+.br
+[
+.BR mpu
+N ]
+.br
+[
+.BR ingress
+|
+.BR egress*
+]
+.br
+(* marks defaults)
+
+
+.SH DESCRIPTION
+CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline
+which uses both AQM and FQ.  It combines COBALT, which is an AQM algorithm
+combining Codel and BLUE, a shaper which operates in deficit mode, and a 
variant
+of DRR++ for flow isolation.  8-way set-associative hashing is used to 
virtually
+eliminate hash collisions.  Priority queuing is available through a simplified
+diffserv implementation.  Overhead compensation for various encapsulation
+schemes is tightly integrated.
+
+All settings are optional; the default settings are chosen to be sensible in
+most common deployments.  Most people will only need to set the
+.B bandwidth
+parameter to get useful results, but reading the
+.B Overhead Compensation
+and
+.B Round Trip Time
+sections is strongly encouraged.
+
+.SH SHAPER PARAMETERS
+CAKE uses a deficit-mode shaper, which does not exhibit the initial burst
+typical of token-bucket shapers.  It will automatically burst precisely as much
+as required to maintain the configured throughput.  As such, it is very
+straightforward to configure.
+.PP
+.B unlimited
+(default)
+.br
+   No limit on the b

[Cake] [PATCH iproute2-next v3] Add support for cake qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
sch_cake is intended to squeeze the most bandwidth and latency out of even
the slowest ISP links and routers, while presenting an API simple enough
that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash besteffort

Cake is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Support for DSL framing types and shapers.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency variation.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

Cake's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Guido Sarducci, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

Signed-off-by: Dave Taht <dave.t...@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
---
Changes since v2:
  - Accidentally sent a version of the tc patch from my testing branch
which includes a test flag that shouldn't be in there. Sorry for the
mistake.
  
 man/man8/tc-cake.8 | 632 +++
 man/man8/tc.8  |   1 +
 tc/Makefile|   1 +
 tc/q_cake.c| 778 +
 4 files changed, 1412 insertions(+)
 create mode 100644 man/man8/tc-cake.8
 create mode 100644 tc/q_cake.c

diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
new file mode 100644
index ..30e41bc9
--- /dev/null
+++ b/man/man8/tc-cake.8
@@ -0,0 +1,632 @@
+.TH CAKE 8 "23 November 2017" "iproute2" "Linux"
+.SH NAME
+CAKE \- Common Applications Kept Enhanced (CAKE)
+.SH SYNOPSIS
+.B tc qdisc ... cake
+.br
+[
+.BR bandwidth
+RATE |
+.BR unlimited*
+|
+.BR autorate_ingress
+]
+.br
+[
+.BR rtt
+TIME |
+.BR datacentre
+|
+.BR lan
+|
+.BR metro
+|
+.BR regional
+|
+.BR internet*
+|
+.BR oceanic
+|
+.BR satellite
+|
+.BR interplanetary
+]
+.br
+[
+.BR besteffort
+|
+.BR diffserv8
+|
+.BR diffserv4
+|
+.BR diffserv3*
+]
+.br
+[
+.BR flowblind
+|
+.BR srchost
+|
+.BR dsthost
+|
+.BR hosts
+|
+.BR flows
+|
+.BR dual-srchost
+|
+.BR dual-dsthost
+|
+.BR triple-isolate*
+]
+.br
+[
+.BR nat
+|
+.BR nonat*
+]
+.br
+[
+.BR wash
+|
+.BR nowash*
+]
+.br
+[
+.BR ack-filter
+|
+.BR ack-filter-aggressive
+|
+.BR no-ack-filter*
+]
+.br
+[
+.BR memlimit
+LIMIT ]
+.br
+[
+.BR ptm
+|
+.BR atm
+|
+.BR noatm*
+]
+.br
+[
+.BR overhead
+N |
+.BR conservative
+|
+.BR raw*
+]
+.br
+[
+.BR mpu
+N ]
+.br
+[
+.BR ingress
+|
+.BR egress*
+]
+.br
+(* marks defaults)
+
+
+.SH DESCRIPTION
+CAKE (Common Applications Kept Enhanced) is a shaping-capable queue discipline
+which uses both AQM and FQ.  It combines COBALT, which is an AQM algorithm
+combining Codel and BLUE, a shaper which operates in deficit mode, and a 
variant
+of DRR++ for flow isolation.  8-way set-associative hashing is used to 
virtually
+eliminate hash collisions.  Priority queuing is available through a simplified
+diffserv implementation.  Overhead compensation for various encapsulation
+schemes is tightly integrated.
+
+All settings are optional; the default settings are chosen to be sensible in
+most common deployments.  Most people will only need to set the
+.B bandwidth
+parameter to get useful results, but reading the
+.B Overhead Compensation
+and
+.B Round Trip Time
+sections is strongly encouraged.
+
+.SH SHAPER PARAMETERS
+CAKE uses a deficit-mode shaper, which does not exhibit the initial burst
+typical of token-bucket shapers.  It will automatically burst precisely as much
+as required to maintain the configured throughput.  As such, it is very
+straightforward to configure.
+.PP
+.B unlimited
+(default)
+.br
+   No limit on the bandwidth.
+.PP
+.B bandwidth
+RATE
+.br

Re: [Cake] [PATCH iproute2-next v3] Add support for cake qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
Stephen Hemminger <step...@networkplumber.org> writes:

> On Tue, 24 Apr 2018 14:30:46 +0200
> Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>
>> +static void cake_print_json_tin(struct tc_cake_tin_stats *tst, uint version)
>> +{
>> +open_json_object(NULL);
>> +print_uint(PRINT_JSON, "threshold_rate", NULL, tst->threshold_rate);
>> +print_uint(PRINT_JSON, "target", NULL, tst->target_us);
>> +print_uint(PRINT_JSON, "interval", NULL, tst->interval_us);
>> +print_uint(PRINT_JSON, "peak_delay", NULL, tst->peak_delay_us);
>> +print_uint(PRINT_JSON, "average_delay", NULL, tst->avge_delay_us);
>> +print_uint(PRINT_JSON, "base_delay", NULL, tst->base_delay_us);
>> +print_uint(PRINT_JSON, "sent_packets", NULL, tst->sent.packets);
>> +print_uint(PRINT_JSON, "sent_bytes", NULL, tst->sent.bytes);
>> +print_uint(PRINT_JSON, "way_indirect_hits", NULL, 
>> tst->way_indirect_hits);
>> +print_uint(PRINT_JSON, "way_misses", NULL, tst->way_misses);
>> +print_uint(PRINT_JSON, "way_collisions", NULL, tst->way_collisions);
>> +print_uint(PRINT_JSON, "drops", NULL, tst->dropped.packets);
>> +print_uint(PRINT_JSON, "ecn_mark", NULL, tst->ecn_marked.packets);
>> +print_uint(PRINT_JSON, "ack_drops", NULL, tst->ack_drops.packets);
>> +print_uint(PRINT_JSON, "sparse_flows", NULL, tst->sparse_flows);
>> +print_uint(PRINT_JSON, "bulk_flows", NULL, tst->bulk_flows);
>> +print_uint(PRINT_JSON, "unresponsive_flows", NULL, 
>> tst->unresponse_flows);
>> +print_uint(PRINT_JSON, "max_pkt_len", NULL, tst->max_skblen);
>> +if (version >= 0x102)
>> +print_uint(PRINT_JSON, "flow_quantum", NULL, tst->flow_quantum);
>
> Please don't version objects in netlink. That is not how netlink is
> supposed to be used.

Well, this is leftover from keeping track of different versions of the
out-of-tree patch, and we already broke compatibility pretty thoroughly
as a preparation for upstreaming. So I'm fine with dropping the version
check; will resend.

That being said, the versioning comes from the XSTATS API, which does
not use netlink attributes for its members, but rather passes through as
struct. So what is one supposed to do in this case?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH iproute2-next v3] Add support for cake qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
Stephen Hemminger <step...@networkplumber.org> writes:

> On Tue, 24 Apr 2018 16:52:57 +0200
> Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>
>> Well, this is leftover from keeping track of different versions of the
>> out-of-tree patch, and we already broke compatibility pretty thoroughly
>> as a preparation for upstreaming. So I'm fine with dropping the version
>> check; will resend.
>> 
>> That being said, the versioning comes from the XSTATS API, which does
>> not use netlink attributes for its members, but rather passes through as
>> struct. So what is one supposed to do in this case?
>
> If a structure is likely to change, then it should be decomposed into nested
> netlink attributes. Once you send a structure over userspace API in netlink
> it is fixed forever.

Right. Is decomposing stats into netlink attributes actually possible
within the qdisc dump_stats callback? Could we do something like:

nla_start_nest(d->skb, TCA_CAKE_STATS);

from within cake_dump_stats() and read it in cake_print_xstats in tc?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v2] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-24 Thread Toke Høiland-Jørgensen
Georgios Amanakis <gamana...@gmail.com> writes:

> On Tue, 24 Apr 2018 13:44:06 +0200
> Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>
>> +config NET_SCH_CAKE
>> +   tristate "Common Applications Kept Enhanced (CAKE)"
>> +   help
>> + Say Y here if you want to use the Common Applications Kept Enhanced
>> +  (CAKE) queue management algorithm.
>> +
>> + To compile this driver as a module, choose M here: the module
>> + will be called sch_cake.
>
> In net/sched/Kconfig we should probably add NF_CONNTRACK as a dependency:
> "depends on NF_CONNTRACK"
>
> Otherwise if NET_SCH_CAKE=y and NF_CONNTRACK=m compilation fails with:
>
> net/sched/sch_cake.o: In function `cake_enqueue':
> sch_cake.c:(.text+0x3e10): undefined reference to `nf_ct_get_tuplepr'
> sch_cake.c:(.text+0x3e3a): undefined reference to `nf_conntrack_find_get'
> make: *** [Makefile:1041: vmlinux] Error 1

Hmm we really don't want to have a hard depend on conntrack. We
currently ifdef the conntrack-specific bits thus:
#if IS_ENABLED(CONFIG_NF_CONNTRACK).

Does anyone know if there is a way to do this so the module/builtin
split doesn't bite us?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Toke Høiland-Jørgensen
Eric Dumazet <eric.duma...@gmail.com> writes:

> On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote:
>> sch_cake targets the home router use case and is intended to squeeze the
>> most bandwidth and latency out of even the slowest ISP links and routers,
>> while presenting an API simple enough that even an ISP can configure it.
>> 
>
> * Support for ack filtering.
>
> Oh my god. Cake became a monster.

Haha, you either die a hero or live long enough to see yourself become
the villain, I guess? ;)

We do realise there are lots of good reasons not to do ACK filtering;
but it does make a difference on highly asymmetrical links (see
http://blog.cerowrt.org/post/ack_filtering/), which sadly some people
are still behind.

> syzkaller will be very happy to trigger all kind of bugs in it.

Heh, right.

> Lack of any pskb_may_pull() is really concerning.

By this you mean "check that the packet is long enough to contain the
header we are looking for before trying to do ACK filtering", right?

> How ack filter deals with reorders ?

I think this will do the right thing?

/* new ack sequence must be greater
 */
if (thisconn &&
(ntohl(tcph_check->ack_seq) > ntohl(tcph->ack_seq)))
continue;

> Also the forced GSO segmentation looks wrong to me.

Wrong as in it's done wrong, or wrong as in you object to the whole
concept?

> This kills xmit_more gain we have when GSO is performed after
> qdisc dequeue before hitting device.
>
> This should be driven by a parameter really, some threshold on the skb size.
>
> What performance number do you get on a 10Gbit NIC for example ?

Single-flow throughput through 2 hops on a 40Gbit connection (with CAKE
in unlimited mode vs pfifo_fast on the router):

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 
() port 0 AF_INET : demo
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

 87380  16384  1638410.0018840.40   

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 
() port 0 AF_INET : demo
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

 87380  16384  1638410.0024804.77   


Note, however, that CAKE is explicitly targeted at home gateways, which
generally run at speed a few orders of magnitude slower than this, and
we have heavily optimised for lowest possible latency. So this is a
conscious design choice, I guess you could say.

> Also, how ack filter can suppress packets after skb_gso_segment() ?

Hmm, because pure ACKs are not generally aggregated (sorry, I'm not
quite clear on when exactly GSO will kick in)?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Testing wanted: Statistics API rework

2018-04-25 Thread Toke Høiland-Jørgensen
Pete Heist <p...@eventide.io> writes:

>> On Apr 25, 2018, at 11:15 AM, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>> 
>> Pete Heist <p...@eventide.io> writes:
>> 
>>> I requested a re-build over in the Ubiquiti EdgeOS forums. Between
>>> ER-X and ER-L I think that would cover 32-bit mips (little endian) and
>>> 64-bit mips (big endian). Anything specific to test besides “it
>>> works"? :)
>> 
>> We need to verify that it will still print statistics and that the
>> values do not look weird (especially the per-tin 'bytes' counter since
>> that is a 64bit value).
>
>
> In case you didn’t see it before, Lochnair has automatic builds set up
> (cool) for EdgeOS that are failing as of yesterday:
>
> https://build.lochnair.net/job/ubiquiti/job/bufferbloat/job/cake_cobalt/
>
> It may very well be due to the 3.10(!) kernel it ships with, but in
> case it helps somehow, go to the latest build and “Console Output”. In
> case you commit a change, the git repo is polled once an hour,
> apparently. Otherwise, I don’t have an easy way to test mips…

Yup, definitely an issue with kernel versions. Guess we'll have to go
back and add in compatibility stuff later.

Kevin was nice enough to test on an Archer C7, so I think we are good
for now :)

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] gcc 7.3.1 issue on arch

2018-03-29 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> On 29 Mar, 2018, at 1:30 am, Bret Towe  wrote:
>> 
>> ./include/linux/kernel.h:6:10: fatal error: stdarg.h: No such file or 
>> directory
>
> I've started seeing the same error on my PowerPC box, running Gentoo.
> It seemed specific to that machine, but the GCC version may well be
> relevant since Gentoo tends to stabilise new GCC versions earlier on
> minority platforms.

Yeah, definitely only shows up in very new GCCs.

> Shouldn't stdarg.h be a built-in header?

Yup, which is why this is really odd...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] bufferbloat still misunderstood & ignored

2018-03-31 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> On 30 Mar, 2018, at 11:05 am, Pete Heist  wrote:
>> 
>> So this mapping from member (subscriber) to their MACs or IPs would need to 
>> be configurable somewhere
>
> Yes, I assumed something like that would be required to assign the
> correct tier of service (or BRAS rate) to each subscriber, and that's
> what makes it ISP-focused rather than end-user focused.
>
> In Linux it appears to be possible to assign a flow mark (ie. a
> number) to packets using firewall rules or qdisc filters, and the
> subscriber ID and tier ID could probably be encoded into that. It
> would then only be necessary to inform the qdisc of this encoding (how
> many bits for each portion) and to define the tiers.

Even better, you can do arbitrary programmable matching these days,
using tc-bpf(8). Theoretically FQ-CoDel supports this in place of the
built-in flow hashing; but I have never managed to get it configured
correctly (haven't tried that hard, though).

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] gcc 7.3.1 issue on arch

2018-03-29 Thread Toke Høiland-Jørgensen
Bret Towe  writes:

> for anyone seeing a compile error like the below on Arch Linux
>
>   CC [M]  /root/sch_cake/sch_cake.o
> In file included from ./include/linux/list.h:9:0,
>  from ./include/linux/module.h:9,
>  from /root/sch_cake/sch_cake.c:42:
> ./include/linux/kernel.h:6:10: fatal error: stdarg.h: No such file or 
> directory
>  #include 
>   ^~
> compilation terminated.

I was seeing the same issue on a full kernel compile, which was fixed by
make clean && make

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] bufferbloat still misunderstood & ignored

2018-03-28 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant via Cake  writes:

> From: Kevin Darbyshire-Bryant 
> Subject: bufferbloat still misunderstood & ignored
> To: Cake List 
> Date: Wed, 28 Mar 2018 15:46:47 +
>
> http://forums.thinkbroadband.com/talktalk/t/4587923-sensible-discussion-with-talktalk-about-bufferbloat.html
>
> The thing that bothers me more than anything….. the first reply comes
> from a staff member of ‘thinkbroadband’.

"Solution is two get two lines, so if a big download is using up one
they can do latency sensitive stuff on the other."

Lovely; an ISP that found a way to monetise bufferbloat... :/

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Compiling under net-next

2018-03-19 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen <t...@toke.dk> writes:

> George Amanakis <gamana...@gmail.com> writes:
>
>> It seems the change was introduced here:
>> https://patchwork.kernel.org/patch/9671147/
>>
>> I drafted the following very simplistic patch, could somebody take a
>> look at it?
>
> LGTM. We may want to actually use the extack feature at some point, but
> we're not really returning error messages from the configuration except
> when parsing fails, so I guess it doesn't matter too much in this
> instance...

Ah, wait, nla_parse_nested() takes an extack parameter now, so we should
pass that through in cake_change() after this change.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH 1/2] cake: print_uint format fixes

2018-03-17 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant <ke...@darbyshire-bryant.me.uk> writes:

>> On 12 Mar 2018, at 15:11, Stephen Hemminger <step...@networkplumber.org> 
>> wrote:
>> 
>> On Mon, 12 Mar 2018 10:56:09 +0100
>> Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>>> 
>>> Stephen, would you accept patches to fix the API (to add
>>> print_{u,}int64() variants and turn print_uint() into native-int size)?
>>> Or should we stick with the API currently there and live with the
>>> inconsistency? :)
>>> 
>>> -Toke
>> 
>> I agree print_int should take int, print_uint should take unsigned int, and 
>> there should
>> be print_u64 (and print_u32, print_u8)
>
> Submitted a patch to netdev but almost certainly did it wrong/based on
> wrong tree etc ‘cos I don’t normally inhabit that space. The quite
> simple attached patch restores corrrect functioning on my BE MIPS box.
> Validation that the MANY calls to print_uint are using the correct
> type is left as an exercise for the reader :-) But at least there’s no
> longer a hidden promotion from uint to uint_64t going on.

Awesome! Good work :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Compiling under net-next

2018-03-19 Thread Toke Høiland-Jørgensen
George Amanakis  writes:

> It seems the change was introduced here:
> https://patchwork.kernel.org/patch/9671147/
>
> I drafted the following very simplistic patch, could somebody take a
> look at it?

LGTM. We may want to actually use the extack feature at some point, but
we're not really returning error messages from the configuration except
when parsing fails, so I guess it doesn't matter too much in this
instance...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Fwd: Compiling under net-next

2018-03-19 Thread Toke Høiland-Jørgensen
Georgios Amanakis  writes:

>  #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 12, 0)
>  err = nla_parse_nested(tb, TCA_CAKE_MAX, opt, cake_policy);
> -#else
> +#elif LINUX_VERSION_CODE < KERNEL_VERSION(4, 16, 0)
>  err = nla_parse_nested(tb, TCA_CAKE_MAX, opt, cake_policy, NULL);
> +#else
> +err = nla_parse_nested(tb, TCA_CAKE_MAX, opt, cake_policy, extack);
>  #endif

Yes, exactly!

Could you squash the two patches into one, add a Signed-off-by tag and
send it using git send-email? It's somewhat mangled in transit, it
seems; but otherwise looks good to apply :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] eliminating link_ms and increasing packets to 64 bits

2018-03-23 Thread Toke Høiland-Jørgensen
Dave Taht  writes:

>   fprintf(f, "  pkts");
>   FOR_EACH_TIN(stnc, tst, i)
> - fprintf(f, " %12u", tst->sent.packets);
> + fprintf(f, " %12llu", tst->sent.packets);
>   fprintf(f, "\n");

Presumably this can potentially break column alignment? Not sure that is
too much of a problem, though; mostly cosmetic?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] eliminating link_ms and increasing packets to 64 bits

2018-03-23 Thread Toke Høiland-Jørgensen
Dave Taht  writes:

> the preceding space handles column alignment. the %12 is probably
> going to be misleading.

The space separates columns, the %12 aligns them. So it will break
alignment. But as I said, mostly a cosmetic issue, so probably not
something we should spend a lot of time on fixing.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH] q_cake: Update xstats format to use per-tin structure

2018-03-04 Thread Toke Høiland-Jørgensen
Toke Høiland-Jørgensen <t...@toke.dk> writes:

> Jonathan Morton <chromati...@gmail.com> writes:
>
>>> On 11 Feb, 2018, at 7:26 pm, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
>>> 
>>> This updates tc to understand the updated cake xstats structure (which
>>> splits out the tin stats in a separate structure the length of which is
>>> included in the containing struct).
>>> 
>>> Old versions of the cake stats will no longer be understood by the
>>> resulting version of tc.
>>
>> I think you could make a pull request for these patches now.  I can
>> then add new stats to the new code instead of duplicating work.
>
> I think I actually have write access to the repos, so I'll just push the
> commits. Sometime tonight; the code is on my laptop :)

And totally forgot about it, of course. Sorry about that, pushed now :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Toke Høiland-Jørgensen
David Miller <da...@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <t...@toke.dk>
> Date: Wed, 25 Apr 2018 15:42:48 +0200
>
>> +static void *cake_zalloc(size_t sz)
>> +{
>> +void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
>> +
>> +if (!ptr)
>> +ptr = vzalloc(sz);
>> +return ptr;
>> +}
>
> This is just kvzalloc(sz, GFP_KERNEL | __GFP_NOWARN)?

Ah yes; we actually fixed that before, but guess it got lost this time
around. Will fix :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Toke Høiland-Jørgensen
Eric Dumazet <eric.duma...@gmail.com> writes:

> On 04/25/2018 11:34 AM, Toke Høiland-Jørgensen wrote:
>> Eric Dumazet <eric.duma...@gmail.com> writes:
>> 
>>> On 04/25/2018 09:52 AM, Jonathan Morton wrote:
>>>>> We can see here the high cost of forcing software GSO :/
>>>>>
>>>>> Really, this should be done only :
>>>>> 1) If requested by the admin ( tc  gso )
>>>>>
>>>>> 2) If packet size is above a threshold.
>>>>>  The threshold could be set by the admin, and/or based on a fraction of 
>>>>> the bandwidth parameter.
>>>>>
>>>>> I totally understand why you prefer to segment yourself for < 100 Mbit 
>>>>> links.
>>>>>
>>>>> But this makes no sense on 10Gbit+
>>>>
>>>> It is absolutely necessary, so far as I can see, to segment GSO
>>>> superpackets when overhead compensation is selected - as it very
>>>> often should be, even on pure Ethernet links. Without that, the
>>>> calculation of link occupancy time will be wrong. (The actual
>>>> transmission time of an Ethernet frame is rather more than just 14
>>>> bytes longer than the underlying IP packet.)
>>>
>>> Just fix the overhead compensation computation in the code.
>>>
>>> skb in a qdisc have everything you need.
>>>
>>> qdisc_pkt_len_init() has initialized qdisc_skb_cb(skb)->pkt_len with
>>> the exact bytes on the wire, and you have gso_segs to perform any
>>> adjustement you need to do.
>> 
>> The problem is that may not be the right values. For example, in many
>> CPEs there's a built-in switch that strips VLAN tags before the packet
>> actually hits the wire. So we do need to be able to get the actual
>> packet size. Is it possible to get the sizes of the individual segments
>> of a GSO packet? That way we could do the calculation for the whole
>> super-packet...
>
> All segments of  GSO packets have the same size, by definition.
>
> Only the last segment might be smaller, and again this can be inferred
> from gso_size and gso_segs

Gotcha. Until we are confident that we've implemented this in a way that
works, will this do?

@@ -88,6 +88,7 @@
 #define CAKE_SET_WAYS (8)
 #define CAKE_MAX_TINS (8)
 #define CAKE_QUEUES (1024)
+#define CAKE_SPLIT_GSO_THRESHOLD (12500) /* 1Gbps */
@@ -1437,7 +1439,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 * or if we need to know individual packet sizes for framing overhead.
 */
 
-   if (skb_is_gso(skb)) {
+   if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) {
struct sk_buff *segs, *nskb;
netdev_features_t features = netif_skb_features(skb);
/* signed slen to handle corner case
@@ -2337,6 +2339,12 @@ static int cake_change(struct Qdisc *sch, struct nlattr 
*opt,
if (tb[TCA_CAKE_MEMORY])
q->buffer_config_limit = nla_get_u32(tb[TCA_CAKE_MEMORY]);
 
+   if (q->rate_bps && (q->rate_bps <= CAKE_SPLIT_GSO_THRESHOLD ||
+   q->rate_flags & CAKE_FLAG_OVERHEAD))
+   q->rate_flags |= CAKE_FLAG_SPLIT_GSO;
+   else
+   q->rate_flags &= ~CAKE_FLAG_SPLIT_GSO;
+
if (q->tins) {
sch_tree_lock(sch);
cake_reconfigure(sch);

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Pre-print of Cake paper available

2018-04-25 Thread Toke Høiland-Jørgensen
David Lang <da...@lang.hm> writes:

> On Tue, 24 Apr 2018, Toke Høiland-Jørgensen wrote:
>
>> Pete Heist <p...@eventide.io> writes:
>>
>>>> On Apr 24, 2018, at 7:58 AM, Jonathan Morton <chromati...@gmail.com> wrote:
>>>> 
>>>> Turning NAT support on by default might actually be reasonable, since
>>>> it doesn't really break anything if it's not needed - it just eats a
>>>> bit of CPU with unnecessary conntrack lookups.
>>>
>>> I would be for it, if it eats say < 1% additional CPU, and preferably
>>> less. I expect the impact to increase with packet rates.
>>
>> I'm a bit worried that the way it is implemented now, if we turn it on
>> by default we risk activating conntrack even when it was otherwise
>> disabled...
>
> I will say that just about every system ships with conntrack enabled, and 
> disabling it can be pretty difficult (especially in LEDE/OpenWRT), there are 
> so 
> many things that require it that tracking them all down and disabling them is 
> very difficult.
>
> There are not that many places where Cake is going to be used that NAT or 
> some 
> other thing that requires connection tracking is not also going to be used, 
> in 
> the remaining cases, can it be disabled manually in configs after it's been 
> sucked in automatically?

Hmm, actually it looks like just compiling against the conntrack code
adds a module dependency on conntrack. And as far as I can tell, the
code doesn't initiate any new conntrack state if it doesn't already
exist. So I think it's safe to turn on NAT mode by default. Will add
that :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Toke Høiland-Jørgensen
Eric Dumazet  writes:

> On 04/25/2018 09:52 AM, Jonathan Morton wrote:
>>> We can see here the high cost of forcing software GSO :/
>>>
>>> Really, this should be done only :
>>> 1) If requested by the admin ( tc  gso )
>>>
>>> 2) If packet size is above a threshold.
>>>  The threshold could be set by the admin, and/or based on a fraction of the 
>>> bandwidth parameter.
>>>
>>> I totally understand why you prefer to segment yourself for < 100 Mbit 
>>> links.
>>>
>>> But this makes no sense on 10Gbit+
>> 
>> It is absolutely necessary, so far as I can see, to segment GSO
>> superpackets when overhead compensation is selected - as it very
>> often should be, even on pure Ethernet links. Without that, the
>> calculation of link occupancy time will be wrong. (The actual
>> transmission time of an Ethernet frame is rather more than just 14
>> bytes longer than the underlying IP packet.)
>
> Just fix the overhead compensation computation in the code.
>
> skb in a qdisc have everything you need.
>
> qdisc_pkt_len_init() has initialized qdisc_skb_cb(skb)->pkt_len with
> the exact bytes on the wire, and you have gso_segs to perform any
> adjustement you need to do.

The problem is that may not be the right values. For example, in many
CPEs there's a built-in switch that strips VLAN tags before the packet
actually hits the wire. So we do need to be able to get the actual
packet size. Is it possible to get the sizes of the individual segments
of a GSO packet? That way we could do the calculation for the whole
super-packet...

> Do not kill GSO only because you do not want to deal with it.

It's not (just) that we don't want to deal with it, it is also a very
aggressive optimisation for latency, which makes a lot of sense at
residential internet bandwidths.

>> Another reason to apply GSO segmentation is to achieve maximal
>> smoothness of flow isolation. This should still be achievable within
>> some tolerance at high link rates, but calculating this tolerance is
>> complicated by the triple-isolate algorithm.
>> 
>> If there's a way to obtain the individual packet sizes without
>> incurring the full segmentation overhead, it may be worth considering
>> (at high link rates only). I would want to leave it on by default,
>> because some of Cake's demonstrably superior latency performance
>> depends on seeing the real packets, not the aggregates, and the
>> overhead only becomes significant above 100Mbps on weak MIPS CPUs and
>> 1Gbps on vaguely modern x86 stuff.
>
> Again, these arguments are moot on 10Gbit+.
>
> Lets build the future, do not pretend GSO/TSO is not part of it.

I'm all for being future proof, but we also do want well-tested code.
Reworking this feature to function correctly with GSO is going to take
some work. Would you accept gating GSO splitting on bandwidth for now,
and adding correct overhead compensation for GSO packets later?

something like:

if (shaper_active() && (bandwidth <= 1Gbps || overhead_configured))
  split_gso()

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] CAKE upstreaming - testers wanted, ACK filtering rescuers needed

2018-04-25 Thread Toke Høiland-Jørgensen
For those who have not been following the discussion on the upstreaming
patches, here's an update:

- I've just pushed patches to only split GSO packets when shaping below
  one gigabit; and hopefully made the overhead compensation code deal
  gracefully with GSO packets if someone for some reason wants to use
  the shaper at speeds higher than that and still use the overhead
  compensation code.

- It turns out that the ACK filtering code does not properly sanity
  check the packet sizes, and so can potentially crash the box running
  CAKE if it receives malformed packets. So if no one steps up to fix
  that within the next few days, or I'll submit the next version without
  it (I'm not going to open that particular can of worms)... This
  doesn't mean it can't be added back later, of course, it just means it
  won't go upstream this time around.

- NAT mode is now enabled by default; doesn't seem to be a good reason
  not to as the compile time dependency already makes the module depend
  on conntrack.


So please do test the current git version (cobalt branch, still). I'm
planning to resubmit on Friday.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] Linux 4.19 released...

2018-10-22 Thread Toke Høiland-Jørgensen
...which means that CAKE is now officially in upstream Linux. Woohoo!

It's even listed at the top of the "Coolest features" list on the
kernelnewbies overview:
https://kernelnewbies.org/Linux_4.19#Better_networking_experience_with_the_CAKE_queue_management_algorithm

Congratulations, and thanks, to all involved :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Cerowrt-devel] apu2 sqm/htb issue + a minor win for speeding up fq_codel itself

2018-10-03 Thread Toke Høiland-Jørgensen
Dave Taht  writes:

> Well, fq_codel does use a lot less cpu but everything seems slower

I don't suppose 18.06 enables any of the SPECTRE mitigations (was that
an issue on ARM)?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Cerowrt-devel] apu2 sqm/htb issue + a minor win for speeding up fq_codel itself

2018-10-03 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

> I'm not familiar with precisely what mitigations are now in use on
> ARM. I am however certain that, on a device running only trustworthy
> code (ie. not running a Web browser), mitigating Spectre is
> unnecessary. If an attacker gets into a position to exploit it, he's
> already compromised the device enough to run a botnet anyway.

Yup, especially on openwrt, where most daemons run as root anyway :)

I would assume that something like the retpoline indirect function call
protection is not actually enabled on openwrt; but since we were talking
about performance regressions, that is certainly a major one...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] arp flow dissector

2018-09-03 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> On 2 Sep, 2018, at 10:37 pm, Dave Taht  wrote:
>> 
>> Didn't know we had this now.
>> 
>> https://www.spinics.net/lists/netdev/msg413986.html
>
> Does that automagically give Cake an idea of the IP addresses
> associated with the packet, for host-fairness purposes?  If not, we
> should fix that.

Don't think so. There's a follow-up patch to enable the matching for
cls_flower: https://www.spinics.net/lists/netdev/msg413985.html

However, in normal operation ARPs should be fairly rare, so adding this
support to CAKE would mostly be to protect against flooding, wouldn't
it?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH] MAINTAINERS: Add entry for CAKE qdisc

2018-11-16 Thread Toke Høiland-Jørgensen
We would like the existing community to be kept in the loop for any new
developments on CAKE; and I certainly plan to keep maintaining it. Reflect
this in MAINTAINERS.

Signed-off-by: Toke Høiland-Jørgensen 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bd775ba51ce..96be48879a7b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3276,6 +3276,12 @@ F:   include/uapi/linux/caif/
 F: include/net/caif/
 F: net/caif/
 
+CAKE QDISC
+M: Toke Høiland-Jørgensen 
+L: cake@lists.bufferbloat.net (moderated for non-subscribers)
+S: Maintained
+F: net/sched/sch_cake.c
+
 CALGARY x86-64 IOMMU
 M: Muli Ben-Yehuda 
 M: Jon Mason 
-- 
2.19.1

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 4, 2019, at 11:34 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>> This basically means that we can't use CAKE as a leaf qdisc with GSO
>> splitting as it stands currently. I *think* the solution is for CAKE to
>> notify its parents; could you try the patch below and see if it helps?
>
> Aha, good news. :)
>
> I’m probably not currently in a position to try it on my old kernels with the 
> out of tree build:
>
> On 3.16.7:

Hmm, try this version for 3.16 - probably doesn't work on later kernels.
I'll look into a proper backport once you've confirmed that it works :)

-Toke

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index b910cd5c56f7..ef3acdbb8429 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1617,6 +1617,44 @@ static u32 cake_classify(struct Qdisc *sch, struct 
cake_tin_data **t,
 
 static void cake_reconfigure(struct Qdisc *sch);
 
+
+static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
+{
+   struct Qdisc *q;
+
+   if (!(root->flags & TCQ_F_BUILTIN) &&
+   root->handle == handle)
+   return root;
+
+   list_for_each_entry(q, >list, list) {
+   if (q->handle == handle)
+   return q;
+   }
+   return NULL;
+}
+
+void adjust_parent_qlen(struct Qdisc *sch, unsigned int n,
+  unsigned int len)
+{
+   u32 parentid;
+   if (n == 0 && len == 0)
+   return;
+   rcu_read_lock();
+   while ((parentid = sch->parent)) {
+   if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
+   break;
+
+   sch = qdisc_match_from_root(qdisc_dev(sch), TC_H_MAJ(parentid));
+   if (sch == NULL) {
+   WARN_ON_ONCE(parentid != TC_H_ROOT);
+   break;
+   }
+   sch->q.qlen += n;
+   sch->qstats.backlog += len;
+   }
+   rcu_read_unlock();
+}
+
 static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
 {
@@ -1667,7 +1705,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) {
struct sk_buff *segs, *nskb;
netdev_features_t features = netif_skb_features(skb);
-   unsigned int slen = 0;
+   unsigned int slen = 0, numsegs = 0;
 
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs))
@@ -1684,6 +1722,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 
sch->q.qlen++;
slen += segs->len;
+   numsegs++;
q->buffer_used += segs->truesize;
b->packets++;
segs = nskb;
@@ -1696,7 +1735,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
sch->qstats.backlog += slen;
q->avg_window_bytes += slen;
 
-   qdisc_tree_reduce_backlog(sch, 1, len);
+   adjust_parent_qlen(sch, numsegs - 1, slen - len);
consume_skb(skb);
} else {
/* not splitting */
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic

2019-01-03 Thread Toke Høiland-Jørgensen
> Jon, is there anything I can check by instrumenting the code somewhere
> specific?

Is there any way you could test with a bulk UDP flow? I'm wondering
whether this is a second-order effect where TCP ACKs are limited in a
way that cause the imbalance? Are you using ACK compression?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> Ok, that fixes the compiler warnings, but I get this now. Same as
> before it’s repeated until reboot, stack sometimes changes but I
> always see sch_hfsc.c:1427 at the beginning:

Hmm, that's odd. Could you try adding this debugging line in
adjust_parent_qlen(), right before the sch->q.qlen += n line:

net_info_ratelimited("Adjusting parent qdisc %d with pkt += %d, 
len += %d",
 parentid, n, len);

And see if you actually get any of those lines in your dmesg?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen


On 5 January 2019 11:44:44 CET, Jonathan Morton  wrote:
>> On 5 Jan, 2019, at 12:34 am, Toke Høiland-Jørgensen 
>wrote:
>> 
>> This basically means that we can't use CAKE as a leaf qdisc with GSO
>> splitting as it stands currently. I *think* the solution is for CAKE
>to
>> notify its parents; could you try the patch below and see if it
>helps?
>
>Is this also a problem on current kernels, or only older ones?

Newer ones as well (I assume - haven't tested).

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 5, 2019, at 2:35 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>>>> On Jan 5, 2019, at 2:10 PM, Toke Høiland-Jørgensen  wrote:
>>>> 
>>>> Hmm, that's odd. Could you try adding this debugging line in
>>>> adjust_parent_qlen(), right before the sch->q.qlen += n line:
>>>> 
>>>>net_info_ratelimited("Adjusting parent qdisc %d with pkt += %d, 
>>>> len += %d",
>>>> parentid, n, len);
>>>> 
>>>> And see if you actually get any of those lines in your dmesg?
>>> 
>>> I do see the messages twice, then not after that in the rest of the
>>> output...
>> 
>> Right. Looking at the HFSC code some more, I think the bug is actually
>> caused by another, but related, interaction between HFSC and CAKE.
>> 
>> Specifically, this line:
>> 
>> https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1605
>> 
>> where HFSC checks whether the child queue len is 1, which it interprets
>> as the event that activates that queue. However, because CAKE splits the
>> packet, this check will fail, and the HFSC class will not be activated.
>> This also explains why you only see the bug with HFSC, and not with HTB
>> (although I do think that we still need to update the hierarchy).
>> 
>> The good news it that it is a fairly simple to fix in HFSC. The bad news
>> is that it's something that's hard to work around from the out-of-tree
>> CAKE...
>
> Aha, well, I wonder if we’ll see this problem with other qdiscs- maybe
> cbq, if I ever get a chance to try it (not hurrying yet). Ideally this
> interaction between qdiscs would be clarified somewhere, at some
> point. :)
>
> Thanks a lot for doing the discovery though!

You're welcome, and thanks for you help :)

> We may not have hfsc+cake with GSO splitting on older kernels very
> soon, but what should we do with this? There’s nobody in MAINTAINERS
> for hfsc, so we may not get much of a response to any bug
> submissions...

$ ./scripts/get_maintainer.pl net/sched/sch_hfsc.c 
Jamal Hadi Salim  (maintainer:TC subsystem)
Cong Wang  (maintainer:TC subsystem)
Jiri Pirko  (maintainer:TC subsystem)
"David S. Miller"  (maintainer:NETWORKING [GENERAL])
net...@vger.kernel.org (open list:TC subsystem)

I'll submit a patch sometime next week, and also look into the qlen
adjustment for CAKE GSO splitting...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> Quick update to the trace because I had to apply the patch manually
> and missed one line to remove (qdisc_tree_reduce_backlog...), just so
> it doesn’t through off the addresses for you, but it still does the
> same thing:

Ah, my bad; you need to replace

qdisc_match_from_root(qdisc_dev(sch), TC_H_MAJ(parentid));

with

qdisc_match_from_root(qdisc_dev(sch)->qdisc, TC_H_MAJ(parentid));

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 5, 2019, at 2:10 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Hmm, that's odd. Could you try adding this debugging line in
>> adjust_parent_qlen(), right before the sch->q.qlen += n line:
>> 
>>  net_info_ratelimited("Adjusting parent qdisc %d with pkt += %d, 
>> len += %d",
>>   parentid, n, len);
>> 
>> And see if you actually get any of those lines in your dmesg?
>
>
> I do see the messages twice, then not after that in the rest of the
> output...

Right. Looking at the HFSC code some more, I think the bug is actually
caused by another, but related, interaction between HFSC and CAKE.

Specifically, this line:

https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1605

where HFSC checks whether the child queue len is 1, which it interprets
as the event that activates that queue. However, because CAKE splits the
packet, this check will fail, and the HFSC class will not be activated.
This also explains why you only see the bug with HFSC, and not with HTB
(although I do think that we still need to update the hierarchy).

The good news it that it is a fairly simple to fix in HFSC. The bad news
is that it's something that's hard to work around from the out-of-tree
CAKE...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> That first bug report looks decidedly similar to mine, but Toke would
> have to comment on the specifics. So far I see the patch to
> sch_codel.c you mentioned and another two-liner to remove the warning
> in hfsc.c (https://patchwork.ozlabs.org/patch/933611/). It would be
> really good to know that that warning is truly bogus, that it wasn’t
> put there by the author for good reason, as Toke may have been
> thinking of a different way to fix hfsc.

Well, it's the same WARN_ON(), and if that patch had been applied,
debugging our issue would have been a lot harder, I think. But the
underlying issue is different, and we still need to fix HFSC (and
probably CAKE as well).

> Thanks for bringing this up! I see that I ought to search
> OpenWRT/kernel.org next time… :)

Yeah, nice find. I'll make sure that CAKE doesn't suffer from the same
issue that https://www.spinics.net/lists/netdev/msg450655.html fixes for
CoDel while I'm writing patches... :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-05 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 5, 2019, at 9:10 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Well, it's the same WARN_ON(), and if that patch had been applied,
>> debugging our issue would have been a lot harder, I think.
>
> Yikes, this is what I mean. I’d rather suffer the warning than be
> troubleshooting flaky behavior. That patch is applied in the latest
> kernel, so hopefully it’s the right thing.

Well, if it causes false positives, getting rid of it is probably worth
it just to avoid spurious bug reports :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-06 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 5, 2019, at 11:27 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>>>> On Jan 5, 2019, at 9:10 PM, Toke Høiland-Jørgensen  wrote:
>>>> 
>>>> Well, it's the same WARN_ON(), and if that patch had been applied,
>>>> debugging our issue would have been a lot harder, I think.
>>> 
>>> Yikes, this is what I mean. I’d rather suffer the warning than be
>>> troubleshooting flaky behavior. That patch is applied in the latest
>>> kernel, so hopefully it’s the right thing.
>> 
>> Well, if it causes false positives, getting rid of it is probably worth
>> it just to avoid spurious bug reports :)
>
> If it helps finds bugs, I’d rather know about it.
>
> But, a warning once in a while might have been better than a repeated
> one that sometimes makes a hard reboot necessary, causing need for a
> manual, offline fsck in order to boot again. Just sayin’… ;)

Yes, that is why WARN_ON tends to be frowned upon ;)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-06 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> Lastly, is using cake as a leaf to htb risky until a fix is made? I’ve
> been doing that for a while without any apparent issues, though I’m
> hesitating now to try that in a production environment.

Hmm, that's a good question. I would expect so; but I would also expect
the issue to show up pretty much straight away, so if you haven't hit it
yet, I may be wrong. I'll do some more digging... Should probably also
try to replicate all this stuff on my own machine :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic

2019-01-18 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>>> Yes, exactly. Would be interesting to hear what Jonathan, Toke and
>>> others think. I want to see if fairness is preserved in this case with
>>> sparse flows only. Could flent do this?
>> 
>> Well, sparse flows are (by definition) not building a queue, so it
>> doesn't really make sense to talk about fairness for them. How would you
>> measure that?
>> 
>> This is also the reason I agree that they shouldn't be counted for host
>> fairness calculation purposes, BTW...
>
> The trick is that we need to keep fairness of the deficit
> replenishments, which occur for sparse flows as well as bulk ones, but
> in smaller amounts. The number of active flows is presently the
> stand-in for this. It's possible to have a host backlogged with
> hundreds of new flows which are, by definition, sparse.

Right, there's some care needed to ensure we don't get weird behaviour
during transients such as flow startup.

> I'm still trying to get my head around how the modified code works in
> detail.  It's possible that a different implementation would either be
> more concise and readable, or better model what is actually needed.
> But I can't tell until I grok it.

Cool, good to know you are on it; I'm happy to wait until you've had
some time to form an opinion on this :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic

2019-01-18 Thread Toke Høiland-Jørgensen
Sebastian Moeller  writes:

> Hi Toke,
>
>> On Jan 18, 2019, at 14:33, Toke Høiland-Jørgensen  wrote:
>> 
>> Georgios Amanakis  writes:
>> 
>>> Yes, exactly. Would be interesting to hear what Jonathan, Toke and
>>> others think. I want to see if fairness is preserved in this case with
>>> sparse flows only. Could flent do this?
>> 
>> Well, sparse flows are (by definition) not building a queue, so it
>> doesn't really make sense to talk about fairness for them. How would you
>> measure that?
>> 
>> This is also the reason I agree that they shouldn't be counted for host
>> fairness calculation purposes, BTW...
>
> That leads to a question (revealing my lack of detailed knowledge) if
> there is a sufficient number of new flows (that should qualify as
> new/sparse) that servicing all of them takes longer than each queue
> accumulating new packets, at what point in time are these flows
> considered "unworthy" of sparse flow boosting? Or differetly how i
> cake going to deal with a UDP flood where the 5 tuple hash is
> different for all packets (say by spoofing ports or randomly picking
> dst addresses)?

Well, what is considered a sparse flow is a function of both the flow
rate itself, *as well as* the link rate, number of competing flows, etc.
So a flow can be sparse on one link, but turn into a bulk flow on
another because that link has less capacity.

I explore this in some detail here:
https://doi.org/10.1109/LCOMM.2018.2871457

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic

2019-01-18 Thread Toke Høiland-Jørgensen
George Amanakis  writes:

> A better version of the patch for testing.

So basically, you're changing the host fairness algorithm to only
consider bulk flows instead of all active flows to that host, right?
Seems reasonable to me. Jonathan, any opinion?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic

2019-01-18 Thread Toke Høiland-Jørgensen
Georgios Amanakis  writes:

> Yes, exactly. Would be interesting to hear what Jonathan, Toke and
> others think. I want to see if fairness is preserved in this case with
> sparse flows only. Could flent do this?

Well, sparse flows are (by definition) not building a queue, so it
doesn't really make sense to talk about fairness for them. How would you
measure that?

This is also the reason I agree that they shouldn't be counted for host
fairness calculation purposes, BTW...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] Defending my thesis on Friday - live stream available

2018-11-21 Thread Toke Høiland-Jørgensen
Hey everyone

I'm defending my PhD thesis this Friday (Nov 23rd), which contains
content relevant to all three lists (hence the three-way cross post).

In case anyone is interested in watching, I'm setting up a live stream
on Youtube, see link below. The defence starts at 09:15 CET (UTC+1), but
the video should be available for later viewing as well.

Blog post with embedded video, and links to slides and the full thesis:
https://blog.tohojo.dk/2018/11/nearing-the-end-thesis-defence.html

Direct Youtube link: https://youtu.be/upvx6rpSLSw

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Defending my thesis on Friday - live stream available

2018-11-21 Thread Toke Høiland-Jørgensen
Georgios Amanakis  writes:

> Thank you very much for doing this, Toke! All the best of luck on
> Friday!!!

Thanks! :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Emulating Bufferbloat

2018-11-22 Thread Toke Høiland-Jørgensen
Fabian Ruffy  writes:

> Hello,
>
> this is a somewhat esoteric question. I am trying to actually force 
> bufferbloat
> in an emulation setup I am using. I set up a dumbbell topology and push 
> traffic
> through it, causing congestion at the central link. I use this setup to 
> compare
> congestion avoidance algorithms such as DCTCP to other solutions.
> This has worked nicely with the 4.18 kernel. However, after upgrading to 4.19 
> I
> cannot reproduce bufferbloat anymore. The traffic (even UDP packets) is
> perfectly rate limited and I never see any congestion happening. This is 
> great,
> but in practice it prevents me from prototyping algorithms.

Ha, that's awesome! :D

> My interface configuration for bottlenecked links is:
>
> qdisc tbf 5: dev OBcbnsw1-eth2 root refcnt 2 rate 10Mbit burst 15000b lat
> 12.0ms
>  Sent 6042 bytes 51 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> qdisc netem 10: dev OBcbnsw1-eth2 parent 5:1 limit 500
>  Sent 6042 bytes 51 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>
>
> I have the suspicion that it is related to the CAKE changes in the 4.19 
> kernel,
> but I am not exactly sure. I am not using tc cake at all. Do you maybe know
> what could cause this behavior? Apologies if this is the wrong mailing
> list.

My guess would be changes in the TCP stack; can't point you to anything
specific off the top of my head, though...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-08 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> Your patch in the latest kernels looks simpler. Bringing the patch
> back to prior kernel versions would be appreciated, but I can
> understand how 3.16 becomes less and less relevant as time goes on,
> although, it’s not at end of life yet. :)

Could you try the latest git version of the out-of-tree version of cake;
that has the same patch as upstream, and I *think* it will work on old
kernels as well (for the HTB issue; can't fix HFSC without patching it).

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-07 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 6, 2019, at 9:56 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>>> Lastly, is using cake as a leaf to htb risky until a fix is made? I’ve
>>> been doing that for a while without any apparent issues, though I’m
>>> hesitating now to try that in a production environment.
>> 
>> Hmm, that's a good question. I would expect so; but I would also expect
>> the issue to show up pretty much straight away, so if you haven't hit it
>> yet, I may be wrong. I'll do some more digging... Should probably also
>> try to replicate all this stuff on my own machine :)
>
>
> Ok, after what I’m seeing on my APU1 tests on 3.16.7, I’m definitely
> not putting split GSO into production. I just turned it on and off
> three times and here’s what I got:
>
> Split GSO on:
>
> https://www.heistp.net/downloads/htb_split_gso/htb_cake_split_gso.svg
> https://www.heistp.net/downloads/htb_split_gso/htb_cake_split_gso2.svg
> https://www.heistp.net/downloads/htb_split_gso/htb_cake_split_gso3.svg
>
> Split GSO off:
>
> https://www.heistp.net/downloads/htb_split_gso/htb_cake_no_split_gso.svg
> https://www.heistp.net/downloads/htb_split_gso/htb_cake_no_split_gso2.svg
> https://www.heistp.net/downloads/htb_split_gso/htb_cake_no_split_gso3.svg
>
> I’ve seen these square waves before with htb and wondered where they
> came from, and I think we may finally have an answer! What manner of
> thing causes this I don’t know, but there’s a chance you may end up
> finding out… :)

Is this without the patch to CAKE that adjusts the qlen? And have you
tried running with that patch (with HTB)?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH 1/4] sched: Avoid dereferencing skb pointer after child enqueue

2019-01-07 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

Parent qdiscs may dereference the pointer to the enqueued skb after
enqueue. However, both CAKE and TBF call consume_skb() on the original skb
when splitting GSO packets, leading to a potential use-after-free in the
parent. Fix this by avoiding dereferencing the skb pointer after enqueueing
to the child.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cbs.c|  3 ++-
 net/sched/sch_drr.c|  3 ++-
 net/sched/sch_dsmark.c |  3 ++-
 net/sched/sch_hfsc.c   |  5 ++---
 net/sched/sch_htb.c|  3 ++-
 net/sched/sch_prio.c   |  3 ++-
 net/sched/sch_qfq.c| 16 +---
 net/sched/sch_tbf.c|  3 ++-
 8 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e689e11b6d0f..c6a502933fe7 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -88,13 +88,14 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct 
Qdisc *sch,
 struct Qdisc *child,
 struct sk_buff **to_free)
 {
+   unsigned int len = qdisc_pkt_len(skb);
int err;
 
err = child->ops->enqueue(skb, child, to_free);
if (err != NET_XMIT_SUCCESS)
return err;
 
-   qdisc_qstats_backlog_inc(sch, skb);
+   sch->qstats.backlog += len;
sch->q.qlen++;
 
return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index cdebaed0f8cf..feaf47178653 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -350,6 +350,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, 
struct Qdisc *sch,
 static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
   struct sk_buff **to_free)
 {
+   unsigned int len = qdisc_pkt_len(skb);
struct drr_sched *q = qdisc_priv(sch);
struct drr_class *cl;
int err = 0;
@@ -376,7 +377,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
cl->deficit = cl->quantum;
}
 
-   qdisc_qstats_backlog_inc(sch, skb);
+   sch->qstats.backlog += len;
sch->q.qlen++;
return err;
 }
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index f6f480784bc6..42471464ded3 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -199,6 +199,7 @@ static struct tcf_block *dsmark_tcf_block(struct Qdisc 
*sch, unsigned long cl,
 static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch,
  struct sk_buff **to_free)
 {
+   unsigned int len = qdisc_pkt_len(skb);
struct dsmark_qdisc_data *p = qdisc_priv(sch);
int err;
 
@@ -271,7 +272,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
return err;
}
 
-   qdisc_qstats_backlog_inc(sch, skb);
+   sch->qstats.backlog += len;
sch->q.qlen++;
 
return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index b18ec1f6de60..6bb8f73a8473 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1539,6 +1539,7 @@ hfsc_dump_qdisc(struct Qdisc *sch, struct sk_buff *skb)
 static int
 hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
 {
+   unsigned int len = qdisc_pkt_len(skb);
struct hfsc_class *cl;
int uninitialized_var(err);
 
@@ -1560,8 +1561,6 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, 
struct sk_buff **to_free)
}
 
if (cl->qdisc->q.qlen == 1) {
-   unsigned int len = qdisc_pkt_len(skb);
-
if (cl->cl_flags & HFSC_RSC)
init_ed(cl, len);
if (cl->cl_flags & HFSC_FSC)
@@ -1576,7 +1575,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, 
struct sk_buff **to_free)
 
}
 
-   qdisc_qstats_backlog_inc(sch, skb);
+   sch->qstats.backlog += len;
sch->q.qlen++;
 
return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 58b449490757..30f9da7e1076 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -581,6 +581,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
   struct sk_buff **to_free)
 {
int uninitialized_var(ret);
+   unsigned int len = qdisc_pkt_len(skb);
struct htb_sched *q = qdisc_priv(sch);
struct htb_class *cl = htb_classify(skb, sch, );
 
@@ -610,7 +611,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
htb_activate(q, cl);
}
 
-   qdisc_qstats_backlog_inc(sch, skb);
+   sch->qstats.backlog += len;
sch->q.qlen++;
return NET_XMIT_SUCCESS;
 }
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index cdf68706e40f..847141cd900f 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -72,6 +72,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qer

[Cake] [PATCH 4/4] sch_cake: Correctly update parent qlen when splitting GSO packets

2019-01-07 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

To ensure parent qdiscs have the same notion of the number of enqueued
packets even after splitting a GSO packet, update the qdisc tree with the
number of packets that was added due to the split.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index b910cd5c56f7..73940293700d 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1667,7 +1667,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) {
struct sk_buff *segs, *nskb;
netdev_features_t features = netif_skb_features(skb);
-   unsigned int slen = 0;
+   unsigned int slen = 0, numsegs = 0;
 
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs))
@@ -1683,6 +1683,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
flow_queue_add(flow, segs);
 
sch->q.qlen++;
+   numsegs++;
slen += segs->len;
q->buffer_used += segs->truesize;
b->packets++;
@@ -1696,7 +1697,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
sch->qstats.backlog += slen;
q->avg_window_bytes += slen;
 
-   qdisc_tree_reduce_backlog(sch, 1, len);
+   qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);
consume_skb(skb);
} else {
/* not splitting */
-- 
2.20.1

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH 3/4] sch_api: Allow reducing queue backlog by a negative value

2019-01-07 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

With GSO splitting in sch_cake, we can decrease as well as increase the
qlen. To make it possible to signal this up the qdisc tree, change
qdisc_tree_reduce_backlog() to accept signed integer values as the number
of packets and bytes to reduce the backlog by.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/sch_generic.h | 3 +--
 net/sched/sch_api.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9481f2c142e2..7a4957599874 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -580,8 +580,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
 void qdisc_put_unlocked(struct Qdisc *qdisc);
-void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
-  unsigned int len);
+void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len);
 #ifdef CONFIG_NET_SCHED
 int qdisc_offload_dump_helper(struct Qdisc *q, enum tc_setup_type type,
  void *type_data);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 7e4d1ccf4c87..03e26e8d0ec9 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -758,8 +758,7 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
return 0;
 }
 
-void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
-  unsigned int len)
+void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
 {
bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
const struct Qdisc_class_ops *cops;
-- 
2.20.1

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH 0/4] sched: Fix qdisc interactions exposed by using sch_cake as a leaf qdisc

2019-01-07 Thread Toke Høiland-Jørgensen
This series fixes a couple of issues exposed by running sch_cake as a
leaf qdisc in an HFSC tree, which were discovered and reported by Pete
Heist. The interaction between CAKE's GSO splitting and the parent
qdisc's notion of its own queue length could cause queue stalls. While
investigating the report, I also noticed that several qdiscs would
dereference the skb pointer after dequeue, which is potentially
problematic since the GSO splitting code also frees the original skb.

See the individual patches in the series for details.

Toke Høiland-Jørgensen (4):
  sched: Avoid dereferencing skb pointer after child enqueue
  sched: Fix detection of empty queues in child qdiscs
  sch_api: Allow reducing queue backlog by a negative value
  sch_cake: Correctly update parent qlen when splitting GSO packets

 include/net/sch_generic.h |  3 +--
 net/sched/sch_api.c   |  3 +--
 net/sched/sch_cake.c  |  5 +++--
 net/sched/sch_cbs.c   |  3 ++-
 net/sched/sch_drr.c   |  7 +--
 net/sched/sch_dsmark.c|  3 ++-
 net/sched/sch_hfsc.c  |  9 +
 net/sched/sch_htb.c   |  3 ++-
 net/sched/sch_prio.c  |  3 ++-
 net/sched/sch_qfq.c   | 20 
 net/sched/sch_tbf.c   |  3 ++-
 11 files changed, 37 insertions(+), 25 deletions(-)

-- 
2.20.1

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-04 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 4, 2019, at 10:34 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>>> Ok, the lockup goes away if you use no-split-gso on the cake qdiscs for the 
>>> default traffic (noted below in the drr and hfsc cases with "!!! must use 
>>> no-split-gso here !!!"). Only I’d like my 600 μs back. :)
>>> 
>>> This smells of a bug Toke fixed on Sep 12, 2018 in 
>>> 42e87f12ea5c390bf5eeb658c942bc810046160a, but then reverted in the next 
>>> commit because it was fixed upstream. However, if I re-apply that commit, 
>>> it still doesn’t fix it.
>>> 
>>> Perhaps there are more cases where skb_reset_mac_len(skb) needs to be 
>>> called somewhere for VLAN support?
>>> 
>>> I managed to capture some output from what happens to hfsc:
>>> 
>>> [  683.864456] [ cut here ]
>>> [  683.869116] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427
>>> 0xf9ced4ef()
>> 
>> So this seems to be this line:
>> 
>> WARN_ON(next_time == 0);
>> 
>> See 
>> https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1427
>> 
>> Which seems to indicate that HFSC can't find the next class to schedule.
>> Not entirely sure why, nor why this only happens with CAKE as a qdisc.
>> But I don't think it's actually an infinite loop that's causing it...
>
>
> Ok, fwiw one doesn’t actually need a one-armed router or VLANs to reproduce 
> this. Just do this:
>
> tc qdisc add dev $IFACE root handle 1: hfsc default 1
> tc class add dev $IFACE parent 1: classid 1:1 hfsc ls rate $RATE ul rate $RATE
> tc qdisc add dev $IFACE parent 1:1 cake # add split-gso here, or else…
>
> I’ve tried it as far as 4.9.0-8, but no farther. It’s not much of a
> priority for me now that I have a workaround for it...

Ah, I think I know what's going on:

On enqueue, HFSC will increase its own internal notion of qlen (q.qlen++
in hfsc_enqueue()), and in dequeue, it will return immediately if this
qlen is 0. Now, with GSO packet splitting, a single packet on enqueue
can turn in to several packets on dequeue, which means that HFSC will
think the queue is empty after dequeueing the first on, and refuse to
dequeue any more packets.

This basically means that we can't use CAKE as a leaf qdisc with GSO
splitting as it stands currently. I *think* the solution is for CAKE to
notify its parents; could you try the patch below and see if it helps?

-Toke

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index b910cd5c56f7..77b0ebd673ac 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1617,6 +1617,30 @@ static u32 cake_classify(struct Qdisc *sch, struct 
cake_tin_data **t,
 
 static void cake_reconfigure(struct Qdisc *sch);
 
+void adjust_parent_qlen(struct Qdisc *sch, unsigned int n,
+  unsigned int len)
+{
+   u32 parentid;
+   if (n == 0 && len == 0)
+   return;
+   rcu_read_lock();
+   while ((parentid = sch->parent)) {
+   if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
+   break;
+
+   if (sch->flags & TCQ_F_NOPARENT)
+   break;
+   sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
+   if (sch == NULL) {
+   WARN_ON_ONCE(parentid != TC_H_ROOT);
+   break;
+   }
+   sch->q.qlen += n;
+   sch->qstats.backlog += len;
+   }
+   rcu_read_unlock();
+}
+
 static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
 {
@@ -1667,7 +1691,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) {
struct sk_buff *segs, *nskb;
netdev_features_t features = netif_skb_features(skb);
-   unsigned int slen = 0;
+   unsigned int slen = 0, numsegs = 0;
 
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs))
@@ -1684,6 +1708,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 
sch->q.qlen++;
slen += segs->len;
+   numsegs++;
q->buffer_used += segs->truesize;
b->packets++;
segs = nskb;
@@ -1696,7 +1721,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
sch->qstats.backlog += slen;
q->avg_window_bytes += slen;
 
-   qdisc_tree_reduce_backlog(sch, 1, len);
+   adjust_parent_qlen(sch, numsegs - 1, slen - len);
consume_skb(skb);
} else {
/* not splitting */
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] dual-src/dsthost unfairness, only with bi-directional traffic

2019-01-03 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 3, 2019, at 12:03 PM, Toke Høiland-Jørgensen  wrote:
>> 
>>> Jon, is there anything I can check by instrumenting the code somewhere
>>> specific?
>> 
>> Is there any way you could test with a bulk UDP flow? I'm wondering
>> whether this is a second-order effect where TCP ACKs are limited in a
>> way that cause the imbalance? Are you using ACK compression?
>
>
> Not using ack-filter, if that’s what’s meant by ACK compression. I
> thought about the TCP ACK traffic, but would be very surprised if that
> amount of ACK traffic could cause that large of an imbalance, although
> it’s worth trying to find out.
>
> I tried iperf3 in UDP mode, but cake is treating these flows
> aggressively. I get the impression that cake penalizes flows heavily
> that do not respond to congestion control signals. If I pit one 8 TCP
> flows against a single UDP flow at 40mbit, the UDP flow goes into a
> death spiral with increasing drops over time (iperf3 output attached).
>
> I’m not sure there’d be any way I can test fairness with iperf3 in UDP
> mode. We’d need something that has some congestion control feedback,
> right? Otherwise, I don’t think there are any rates I can choose to
> both reach saturation and not be severely punished. And if it has
> congestion control feedback, it has the ACK-like traffic we’re trying
> to avoid for the test. :)

Try setting cake to 'interplanetary' - that should basically turn off
the AQM dropping...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH 0/4] sched: Fix qdisc interactions exposed by using sch_cake as a leaf qdisc

2019-01-09 Thread Toke Høiland-Jørgensen
David Miller  writes:

> From: Toke Høiland-Jørgensen 
> Date: Mon,  7 Jan 2019 20:47:29 +0100
>
>> This series fixes a couple of issues exposed by running sch_cake as a
>> leaf qdisc in an HFSC tree, which were discovered and reported by Pete
>> Heist. The interaction between CAKE's GSO splitting and the parent
>> qdisc's notion of its own queue length could cause queue stalls. While
>> investigating the report, I also noticed that several qdiscs would
>> dereference the skb pointer after dequeue, which is potentially
>> problematic since the GSO splitting code also frees the original skb.
>> 
>> See the individual patches in the series for details.
>
> Toke, can you please resubmit this without patch #3.
>
> If you want to push for patch #3 still, it is much easier to submit
> it separately.

Can do :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-09 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

>> On Jan 8, 2019, at 11:27 PM, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>>> Your patch in the latest kernels looks simpler. Bringing the patch
>>> back to prior kernel versions would be appreciated, but I can
>>> understand how 3.16 becomes less and less relevant as time goes on,
>>> although, it’s not at end of life yet. :)
>> 
>> Could you try the latest git version of the out-of-tree version of cake;
>> that has the same patch as upstream, and I *think* it will work on old
>> kernels as well (for the HTB issue; can't fix HFSC without patching it).
>
> Yes, it compiles fine on 3.16, and gives statistically identical
> results for HTB. :)

Excellent! That's at least one bug squashed :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH 1/4] sched: Avoid dereferencing skb pointer after child enqueue

2019-01-09 Thread Toke Høiland-Jørgensen
Cong Wang  writes:

> On Mon, Jan 7, 2019 at 11:50 AM Toke Høiland-Jørgensen  wrote:
>> @@ -1254,7 +1256,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct 
>> Qdisc *sch,
>> if (cl->qdisc->q.qlen != 1) {
>> if (unlikely(skb == cl->qdisc->ops->peek(cl->qdisc)) &&
>
>
> Isn't this comparison problematic too? While you are on it...

Well, I was only looking at safety issues, and since it's not
dereferencing the pointer, that's not really an issue here. The check is
just going to always fail if GSO splitting is enabled. Which I'm not
actually sure is an error in this case?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake infinite loop(?) with hfsc on one-armed router

2019-01-04 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> Ok, the lockup goes away if you use no-split-gso on the cake qdiscs for the 
> default traffic (noted below in the drr and hfsc cases with "!!! must use 
> no-split-gso here !!!"). Only I’d like my 600 μs back. :)
>
> This smells of a bug Toke fixed on Sep 12, 2018 in 
> 42e87f12ea5c390bf5eeb658c942bc810046160a, but then reverted in the next 
> commit because it was fixed upstream. However, if I re-apply that commit, it 
> still doesn’t fix it.
>
> Perhaps there are more cases where skb_reset_mac_len(skb) needs to be called 
> somewhere for VLAN support?
>
> I managed to capture some output from what happens to hfsc:
>
> [  683.864456] [ cut here ]
> [  683.869116] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427
> 0xf9ced4ef()

So this seems to be this line:

WARN_ON(next_time == 0);

See https://elixir.bootlin.com/linux/v3.16.7/source/net/sched/sch_hfsc.c#L1427

Which seems to indicate that HFSC can't find the next class to schedule.
Not entirely sure why, nor why this only happens with CAKE as a qdisc.
But I don't think it's actually an infinite loop that's causing it...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Munin-Plugins

2018-09-13 Thread Toke Høiland-Jørgensen
Ruben  writes:

> Hey guys, 
>
> I've already mentioned this in a response to dtaht on GitHub, but here
> again for everyone:
>
> I was wondering if it's possible to extend the tin statistics by
> packets for backlog.

Why do you need packets when there's already bytes?

> Also I haven't found details on the man page, but haven't looked in
> the code how exactly flows are defined for the new 2 level fair
> queueings.

A flow is a flow (hash on the regular 5-tuple). Unless you install
custom tc filters, in which case a flow could be anything you want it to
be...

> I would like to show hosts as well as flows in a graph, in the various
> states as summary as well as per tin. But hosts aren't shown in the
> statistics. But they could be helpful to see unexpected rises if
> someone is abusing the system.

Hmm, guess we could do "number of hosts per tin" accounting as well as
flows. But it would be bloating up the already bloated statistics. But
on the other hand, they are already quite bloated, so another pair of
stats maybe wouldn't make much of a difference?

Does anyone else have an opinion on this?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] gso_segment: Reset skb->mac_len after modifying network header

2018-09-13 Thread Toke Høiland-Jørgensen


On 13 September 2018 16:11:18 CEST, Eric Dumazet  wrote:
>
>
>On 09/11/2018 03:19 PM, Toke Høiland-Jørgensen wrote:
>> When splitting a GSO segment that consists of encapsulated packets,
>the
>> skb->mac_len of the segments can end up being set wrong, causing
>packet
>> drops in particular when using act_mirred and ifb interfaces in
>> combination with a qdisc that splits GSO packets.
>> 
>> This happens because at the time skb_segment() is called,
>network_header
>> will point to the inner header, throwing off the calculation in
>> skb_reset_mac_len(). The network_header is subsequently adjust by the
>> outer IP gso_segment handlers, but they don't set the mac_len.
>> 
>> Fix this by adding skb_reset_mac_len() calls to both the IPv4 and
>IPv6
>> gso_segment handlers, after they modify the network_header.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen 
>
>Looks good but I would have appreciated a thanks or something
>after the help I gave on this problem.

Yes, of course. Should have mentioned that in the commit message.

My apologies, won't happen again. And thanks! :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net v2] gso_segment: Reset skb->mac_len after modifying network header

2018-09-13 Thread Toke Høiland-Jørgensen
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.

This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.

Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.

Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.

Acked-by: Dave Taht 
Reviewed-by: Eric Dumazet 
Signed-off-by: Toke Høiland-Jørgensen 
---
v2:
  - Properly credit Eric for his help
  - Add review and ack tags

 net/ipv4/af_inet.c | 1 +
 net/ipv6/ip6_offload.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 20fda8fb8ffd..1fbe2f815474 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1377,6 +1377,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
if (encap)
skb_reset_inner_headers(skb);
skb->network_header = (u8 *)iph - skb->head;
+   skb_reset_mac_len(skb);
} while ((skb = skb->next));
 
 out:
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 37ff4805b20c..c7e495f12011 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -115,6 +115,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
payload_len = skb->len - nhoff - sizeof(*ipv6h);
ipv6h->payload_len = htons(payload_len);
skb->network_header = (u8 *)ipv6h - skb->head;
+   skb_reset_mac_len(skb);
 
if (udpfrag) {
int err = ip6_find_1stfragopt(skb, );
-- 
2.18.0

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH iproute2] q_cake: Also print nonat, nowash and no-ack-filter keywords

2018-09-14 Thread Toke Høiland-Jørgensen
Similar to the previous patch for no-split-gso, the negative keywords for
'nat', 'wash' and 'ack-filter' were not printed either. Add those well.

Signed-off-by: Toke Høiland-Jørgensen 
---
 tc/q_cake.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tc/q_cake.c b/tc/q_cake.c
index 077bf84f..e827e3f1 100644
--- a/tc/q_cake.c
+++ b/tc/q_cake.c
@@ -468,6 +468,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
 
if (nat)
print_string(PRINT_FP, NULL, "nat ", NULL);
+   else
+   print_string(PRINT_FP, NULL, "nonat ", NULL);
print_bool(PRINT_JSON, "nat", NULL, nat);
 
if (tb[TCA_CAKE_WASH] &&
@@ -508,6 +510,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
 
if (wash)
print_string(PRINT_FP, NULL, "wash ", NULL);
+   else
+   print_string(PRINT_FP, NULL, "nowash ", NULL);
print_bool(PRINT_JSON, "wash", NULL, wash);
 
if (ingress)
@@ -520,7 +524,7 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
else if (ack_filter == CAKE_ACK_FILTER)
print_string(PRINT_ANY, "ack-filter", "ack-filter ", "enabled");
else
-   print_string(PRINT_JSON, "ack-filter", NULL, "disabled");
+   print_string(PRINT_ANY, "ack-filter", "no-ack-filter ", 
"disabled");
 
if (split_gso)
print_string(PRINT_FP, NULL, "split-gso ", NULL);
-- 
2.18.0

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Munin-Plugins

2018-09-19 Thread Toke Høiland-Jørgensen
Ruben  writes:

> Hey Toke,
>
> Am 13.09.2018 21:12 schrieb Toke Høiland-Jørgensen :
>
>
> Ruben  writes:
>
> > Hey Toke,
> >
> > Thanks for your fast response!
> >
> > Am 13.09.2018 12:27 schrieb Toke Høiland-Jørgensen :
> >
> >
> > Ruben  writes:
> >
> > > Hey guys,
> > >
> > > I've already mentioned this in a response to dtaht on GitHub, but
> here
> > > again for everyone:
> > >
> > > I was wondering if it's possible to extend the tin statistics by
> > > packets for backlog.
> >
> > Why do you need packets when there's already bytes?
> >
> > Easy: dtaht requested a packets graph with ecn marks, which is also
> > packets, so backlog as bytes do not fit, backlog as packets do.
> >
> > The idea was to do a multi-graph which is one graph with combined
> > stats for all tins and sub graphs for all tins.
> >
> > On the main graph a backlog in packets is available, but I would need
> > to leave out the backlog for the tins, which is somewhat confusing.
>
> Why not just do both backlogs in bytes?
>
> There's no counter for ecn marked packets in bytes, so it's impossible to
> implement it that way, too.
>
> ECN-Marks is in packets, so everything else need to be in packets as
> well.

Hmm, so the obvious follow-up question would be "why do you need to have
backlog and number of drops on the same graph?" :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Munin-Plugins

2018-09-13 Thread Toke Høiland-Jørgensen
Ruben  writes:

> Hey Toke,
>
> Thanks for your fast response!
>
> Am 13.09.2018 12:27 schrieb Toke Høiland-Jørgensen :
>
>
> Ruben  writes:
>
> > Hey guys,
> >
> > I've already mentioned this in a response to dtaht on GitHub, but here
> > again for everyone:
> >
> > I was wondering if it's possible to extend the tin statistics by
> > packets for backlog.
>
> Why do you need packets when there's already bytes?
>
> Easy: dtaht requested a packets graph with ecn marks, which is also
> packets, so backlog as bytes do not fit, backlog as packets do.
>
> The idea was to do a multi-graph which is one graph with combined
> stats for all tins and sub graphs for all tins.
>
> On the main graph a backlog in packets is available, but I would need
> to leave out the backlog for the tins, which is somewhat confusing.

Why not just do both backlogs in bytes?

> > Also I haven't found details on the man page, but haven't looked in
> > the code how exactly flows are defined for the new 2 level fair
> > queueings.
>
> A flow is a flow (hash on the regular 5-tuple). Unless you install
> custom tc filters, in which case a flow could be anything you want it to
> be...
>
> > I would like to show hosts as well as flows in a graph, in the various
> > states as summary as well as per tin. But hosts aren't shown in the
> > statistics. But they could be helpful to see unexpected rises if
> > someone is abusing the system.
>
> Hmm, guess we could do "number of hosts per tin" accounting as well as
> flows. But it would be bloating up the already bloated statistics. But
> on the other hand, they are already quite bloated, so another pair of
> stats maybe wouldn't make much of a difference?
>
>
> How important is a row more or less? It's quite verbose already (what
> I like) - as long as the number is available without much additionally
> work needed.

It's the additional complexity of recording the stats internally in cake
I'm worried about, not the extra lines of output :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] SCE/CAKE coding curiosities

2019-04-02 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

> Ahoy there team Bufferboat :-)
>
> I’ve been looking at the new SCE related code and a few questions have fallen 
> out of the brain cell.
>
> 1)
> https://github.com/dtaht/sch_cake/blob/47d821f89f39c1b2216d6f65b014f609e46fc57c/sce.h#L50
> Curious as to why htons isn’t used here as per other instances in
> CAKE?

No idea about this.

> 2) Immediately below that line we have a skb length guard before
> trying to set ECT ultimately via ipv?_change_dsfield - we have other
> places in CAKE where we play with dsfield bits that don’t have an
> (obvious) guard - should they?
>
> 3) And saving the most curious for last, in cake_update_flowkeys we check the 
> protocol again (we want IPv4 packets only) but we use tc_skb_protocol(skb) 
> https://github.com/dtaht/sch_cake/blob/47d821f89f39c1b2216d6f65b014f609e46fc57c/sch_cake.c#L628
>
> tc_skb_protocol says "/* We need to take extra care in case the skb came via
>  * vlan accelerated path. In that case, use skb->vlan_proto
>  * as the original vlan header was already stripped.
>  */
> if (skb_vlan_tag_present(skb))
> return skb->vlan_proto;
> return skb->protocol;
> "
>
> What???  Should cake_handle_diffserv use tc_skb_protocol too?  and
> INET_ECN_set_sce ??

Yup, I believe you are right about both of these; nice catch! I pushed
fixes for cake_handle_diffserv(); keeping my hands off the sce bits for
now ;)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH iproute2] q_cake: Add support for setting the fwmark option

2019-04-04 Thread Toke Høiland-Jørgensen
This adds support for the newly added fwmark option to CAKE, which allows
overriding the tin selection from the per-packet firewall marks. The fwmark
field is a bitmask that is applied to the fwmark to select the tin.

Signed-off-by: Toke Høiland-Jørgensen 
---
 man/man8/tc-cake.8 | 16 
 tc/q_cake.c| 24 
 2 files changed, 40 insertions(+)

diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
index eda436e1..8c57eadd 100644
--- a/man/man8/tc-cake.8
+++ b/man/man8/tc-cake.8
@@ -91,6 +91,10 @@ TIME |
 LIMIT ]
 .br
 [
+.BR fwmark
+MASK ]
+.br
+[
 .BR ptm
 |
 .BR atm
@@ -524,6 +528,18 @@ preset on the modern Internet is firmly discouraged.
 .br
Voice (CS7, CS6, EF, VA, TOS4), 25% threshold, reduced Codel 
interval.
 
+.PP
+.B fwmark
+MASK
+.br
+   This options turns on fwmark-based overriding of CAKE's tin selection.
+If set, the option specifies a bitmask that will be applied to the fwmark
+associated with each packet. If the result of this masking is non-zero, the
+result will be right-shifted by the number of least-significant unset bits in
+the mask value, and the result will be used as a the tin number for that 
packet.
+This can be used to set policies in a firewall script that will override CAKE's
+built-in tin selection.
+
 .SH OTHER PARAMETERS
 .B memlimit
 LIMIT
diff --git a/tc/q_cake.c b/tc/q_cake.c
index e827e3f1..307a12c0 100644
--- a/tc/q_cake.c
+++ b/tc/q_cake.c
@@ -82,6 +82,7 @@ static void explain(void)
 "[ split-gso* | no-split-gso ]\n"
 "[ ack-filter | ack-filter-aggressive | no-ack-filter* ]\n"
 "[ memlimit LIMIT ]\n"
+"[ fwmark MASK ]\n"
 "[ ptm | atm | noatm* ] [ overhead N | conservative | raw* ]\n"
 "[ mpu N ] [ ingress | egress* ]\n"
 "(* marks defaults)\n");
@@ -106,6 +107,7 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
int autorate = -1;
int ingress = -1;
int overhead = 0;
+   int fwmark = -1;
int wash = -1;
int nat = -1;
int atm = -1;
@@ -332,6 +334,16 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
"Illegal value for \"memlimit\": 
\"%s\"\n", *argv);
return -1;
}
+   } else if (strcmp(*argv, "fwmark") == 0) {
+   unsigned int fwm;
+
+   NEXT_ARG();
+   if (get_u32(, *argv, 0)) {
+   fprintf(stderr,
+   "Illegal value for \"fwmark\": 
\"%s\"\n", *argv);
+   return -1;
+   }
+   fwmark = fwm;
} else if (strcmp(*argv, "help") == 0) {
explain();
return -1;
@@ -376,6 +388,9 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
if (memlimit)
addattr_l(n, 1024, TCA_CAKE_MEMORY, ,
  sizeof(memlimit));
+   if (fwmark != -1)
+   addattr_l(n, 1024, TCA_CAKE_FWMARK, ,
+ sizeof(fwmark));
if (nat != -1)
addattr_l(n, 1024, TCA_CAKE_NAT, , sizeof(nat));
if (wash != -1)
@@ -409,6 +424,7 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
struct rtattr *tb[TCA_CAKE_MAX + 1];
unsigned int interval = 0;
unsigned int memlimit = 0;
+   unsigned int fwmark = 0;
__u64 bandwidth = 0;
int ack_filter = 0;
int split_gso = 0;
@@ -507,6 +523,10 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
RTA_PAYLOAD(tb[TCA_CAKE_RTT]) >= sizeof(__u32)) {
interval = rta_getattr_u32(tb[TCA_CAKE_RTT]);
}
+   if (tb[TCA_CAKE_FWMARK] &&
+   RTA_PAYLOAD(tb[TCA_CAKE_FWMARK]) >= sizeof(__u32)) {
+   fwmark = rta_getattr_u32(tb[TCA_CAKE_FWMARK]);
+   }
 
if (wash)
print_string(PRINT_FP, NULL, "wash ", NULL);
@@ -559,6 +579,10 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
 sprint_size(memlimit, b1));
}
 
+   if (fwmark)
+   print_uint(PRINT_FP, NULL, "fwmark 0x%x ", fwmark);
+   print_0xhex(PRINT_JSON, "fwmark", NULL, fwmark);
+
return 0;
 }
 
-- 
2.21.0

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 1/2] sch_cake: Use tc_skb_protocol() helper for getting packet protocol

2019-04-04 Thread Toke Høiland-Jørgensen
We shouldn't be using skb->protocol directly as that will miss cases with
hardware-accelerated VLAN tags. Use the helper instead to get the right
protocol number.

Reported-by: Kevin Darbyshire-Bryant 
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index acc9b9da985f..a3b55e18df04 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1519,7 +1519,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 
wash)
 {
u8 dscp;
 
-   switch (skb->protocol) {
+   switch (tc_skb_protocol(skb)) {
case htons(ETH_P_IP):
dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
if (wash && dscp)

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 2/2] sch_cake: Make sure we can write the IP header before changing DSCP bits

2019-04-04 Thread Toke Høiland-Jørgensen
There is not actually any guarantee that the IP headers are valid before we
access the DSCP bits of the packets. Fix this using the same approach taken
in sch_dsmark.

Reported-by: Kevin Darbyshire-Bryant 
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index a3b55e18df04..259d97bc2abd 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1517,16 +1517,27 @@ static unsigned int cake_drop(struct Qdisc *sch, struct 
sk_buff **to_free)
 
 static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 {
+   int wlen = skb_network_offset(skb);
u8 dscp;
 
switch (tc_skb_protocol(skb)) {
case htons(ETH_P_IP):
+   wlen += sizeof(struct iphdr);
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
if (wash && dscp)
ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0);
return dscp;
 
case htons(ETH_P_IPV6):
+   wlen += sizeof(struct ipv6hdr);
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
if (wash && dscp)
ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0);

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 0/2] sched: A few small fixes for sch_cake

2019-04-04 Thread Toke Høiland-Jørgensen
Hi Dave

Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
diffserv fields. This series fixes those two issues, and should probably go to
in 4.19 as well. However, the previous refactoring patch means they don't apply
as-is; I can send a follow-up directly to stable if that's OK with you?

-Toke

---

Toke Høiland-Jørgensen (2):
  sch_cake: Use tc_skb_protocol() helper for getting packet protocol
  sch_cake: Make sure we can write the IP header before changing DSCP bits


 net/sched/sch_cake.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net 2/2] sch_cake: Make sure we can write the IP header before changing DSCP bits

2019-04-04 Thread Toke Høiland-Jørgensen
Stephen Hemminger  writes:

> On Thu, 04 Apr 2019 22:44:33 +0200
> Toke Høiland-Jørgensen  wrote:
>
>> Stephen Hemminger  writes:
>> 
>> > On Thu, 04 Apr 2019 15:01:33 +0200
>> > Toke Høiland-Jørgensen  wrote:
>> >  
>> >>  static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> >>  {
>> >> + int wlen = skb_network_offset(skb);  
>> >
>> > In theory this could be negative, you should handle that?
>> > Rather than calling may_pull() with a huge unsigned value.  
>> 
>> Huh, that would imply that skb->network_header points to before
>> skb->head; when does that happen?
>> 
>> Also, pskb_may_pull() does check for len > skb->len, so I guess a
>> follow-up question would be, "does it happen often enough to warrant
>> handling at this level"?
>> 
>> Also, I copied that bit from sch_dsmark, so if you really thing it needs
>> to be fixed, I guess we should fix both...
>> 
>> -Toke
>
> It should never happen just paranoid

Ah, right. Well in that case I think any overflow is handled fine inside
pskb_may_pull():

if (unlikely(len > skb->len))
return 0;


-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH for-4.19 2/3] sch_cake: Use tc_skb_protocol() helper for getting packet protocol

2019-04-05 Thread Toke Høiland-Jørgensen
Commit bbd669a868bba591ffd38b7bc75a7b361bb54b04 upstream.

We shouldn't be using skb->protocol directly as that will miss cases with
hardware-accelerated VLAN tags. Use the helper instead to get the right
protocol number.

Reported-by: Kevin Darbyshire-Bryant 
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 640f00e9f665..de92b5d81ca6 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1512,7 +1512,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 
wash)
 {
u8 dscp;
 
-   switch (skb->protocol) {
+   switch (tc_skb_protocol(skb)) {
case htons(ETH_P_IP):
dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
if (wash && dscp)

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH for-4.19 0/3] Backport of series: 'sched: A few small fixes for sch_cake' from 5.1

2019-04-05 Thread Toke Høiland-Jørgensen
Hi Greg

This series contains backports for a couple of fixes to sch_cake that was just
merged for 5.1. This series backports an earlier refactoring commit, which makes
the fixes themselves apply cleanly from upstream.

-Toke

---

Toke Høiland-Jørgensen (3):
  sch_cake: Simplify logic in cake_select_tin()
  sch_cake: Use tc_skb_protocol() helper for getting packet protocol
  sch_cake: Make sure we can write the IP header before changing DSCP bits


 net/sched/sch_cake.c |   57 +-
 1 file changed, 28 insertions(+), 29 deletions(-)

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH for-4.19 0/3] Backport of series: 'sched: A few small fixes for sch_cake' from 5.1

2019-04-05 Thread Toke Høiland-Jørgensen
Greg Kroah-Hartman  writes:

> On Fri, Apr 05, 2019 at 02:13:18PM +0200, Toke Høiland-Jørgensen wrote:
>> Greg Kroah-Hartman  writes:
>> 
>> > On Fri, Apr 05, 2019 at 12:28:21PM +0200, Toke Høiland-Jørgensen wrote:
>> >> Hi Greg
>> >> 
>> >> This series contains backports for a couple of fixes to sch_cake that was 
>> >> just
>> >> merged for 5.1. This series backports an earlier refactoring commit, 
>> >> which makes
>> >> the fixes themselves apply cleanly from upstream.
>> >
>> > You have read:
>> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>> > for how to submit networking patches to the stable trees, right?
>> >
>> > I suggest trying it that way first...
>> 
>> Yeah, Dave already queued the original fixes up for stable, but they are
>> not going to apply cleanly on 4.19; hence the first patch in this
>> series.
>> 
>> I thought it was better to just include the full series with that, for
>> context, but maybe that was wrong? Should I just have sent the first
>> one? If so, feel free to just take the first patch in this series and
>> let the others go through the usual stable submission process...
>
> Dave queues up and sends me the stable backports for networking code,
> as the document states. If there is a special series needed for 4.19,
> I'm sure he would be glad to take them. Or, I can take them directly,
> after I have the 5.0 series queued up, if I get an ack from him.
>
> But to seemingly circumvent the normal process, isn't ok, I need to
> know that at least you tried :)

Well I did ask Dave if he wanted me to supply a backport, see
https://marc.info/?l=linux-netdev=155440061002032=2 - I just thought
sending that directly to you was the natural thing to do. Certainly
wasn't trying to circumvent process, just haven't done any backports for
the net tree before :)

Actually, looking at the log again now, I see that I got my versions
mixed up, and the backport patch is also needed for 5.0... so I guess
I'll just (re-)submit that to Dave to put into his stable queue...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net 2/2] sch_cake: Make sure we can write the IP header before changing DSCP bits

2019-04-04 Thread Toke Høiland-Jørgensen
Stephen Hemminger  writes:

> On Thu, 04 Apr 2019 15:01:33 +0200
> Toke Høiland-Jørgensen  wrote:
>
>>  static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>  {
>> +int wlen = skb_network_offset(skb);
>
> In theory this could be negative, you should handle that?
> Rather than calling may_pull() with a huge unsigned value.

Huh, that would imply that skb->network_header points to before
skb->head; when does that happen?

Also, pskb_may_pull() does check for len > skb->len, so I guess a
follow-up question would be, "does it happen often enough to warrant
handling at this level"?

Also, I copied that bit from sch_dsmark, so if you really thing it needs
to be fixed, I guess we should fix both...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Using firewall connmarks as tin selectors

2019-02-28 Thread Toke Høiland-Jørgensen
> I also equally aware that this is ‘creeping featuritis’ and doing
> nothing to speed cake up…

Yeah, this is the crux of the issue, really: it's a tradeoff between
ease of use and featuritis. Now in this case the actual impact is a
single check it might actually be acceptable

> actually I may have improved BESTEFFORT a little - we no longer look
> for matching TC Major numbers if there’s no actual choice of tin to be
> made :-)

Well, you made the besteffort case slightly faster, but every other mode
slightly slower... :)

If you are going to send a patch (or pull request), please leave out the
refactoring, and only include the feature. This makes it easier to see
the impact of the feature addition on its own.

Also, I assume you have a companion patch for iproute2 somewhere?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Using firewall connmarks as tin selectors

2019-02-27 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

> How unpopular would the idea of having cake look at skb->mark directly be?
>
> https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1
>
> https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc

Hmm, not impossible, but seeing as we already have a way to achieve that
with BPF, is it really needed?

> I did the equivalent in eBPF here
> https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t
> work out how to make the major number a tc command line argument into
> the BPF code.

You can't, but you can set the major number explicitly when you create
the qdisc:

$ sudo tc qdisc replace dev eth0 root handle 1: cake
$ tc qdisc show dev eth0
qdisc cake 1: root refcnt 2 bandwidth unlimited diffserv3 triple-isolate nonat 
nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0


-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Using firewall connmarks as tin selectors

2019-02-28 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

>> On 28 Feb 2019, at 09:54, Toke Høiland-Jørgensen  wrote:
>> 
>>> I also equally aware that this is ‘creeping featuritis’ and doing
>>> nothing to speed cake up…
>> 
>> Yeah, this is the crux of the issue, really: it's a tradeoff between
>> ease of use and featuritis. Now in this case the actual impact is a
>> single check it might actually be acceptable
>> 
>>> actually I may have improved BESTEFFORT a little - we no longer look
>>> for matching TC Major numbers if there’s no actual choice of tin to be
>>> made :-)
>> 
>> Well, you made the besteffort case slightly faster, but every other mode
>> slightly slower... :)
>
> Ah, ok, tc filter really is king ;-)

Well, I don't mind doing the simplification, but it should be done as a
separate commit; I'll do that after mering your pull request.

>> If you are going to send a patch (or pull request), please leave out the
>> refactoring, and only include the feature. This makes it easier to see
>> the impact of the feature addition on its own.
>
> Ha ha, sending a patch to the kernel lists???  Never again!

Haha, no, I meant to this list. I'll submit upstream, along with the
other stuff.

> The non-refactored version is
> https://github.com/ldir-EDB0/sch_cake/commit/f30bb18adc97c827c81c9a3297ab14bfaf2adcb0
> and I’ve created a PR

Great, thanks!

>> Also, I assume you have a companion patch for iproute2 somewhere?
>
> In the first email: https://github.com/ldir-EDB0/tc-adv/commits/fwmark
> - based on 4.20.0 ‘cos that’s where openwrt is at the moment.

A pull request against tc-adv would be useful :)

> When I’ve found a suitable editor I was even going to update the man
> page!

I usually just use a regular editor and copy-paste the syntax bits from
another entry...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Using firewall connmarks as tin selectors

2019-03-04 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

>> On 3 Mar 2019, at 12:22, John Sager  wrote:
>> 
>> If you are going to do that, I would suggest using a few of the upper bits
>> of the 32-bit fwmark/connmark space available, rather than the lowest bits.
>> Then that would allow to use fwmarks other purposes, and to use the lowest
>> bits, as well in the future. As iptables allows a mask before comparison,
>> then choose a specific mask for the bits you use both for setting and 
>> testing.
>
> That’s a good point and I’m sort of hoping upstream reject the current
> submission on that basis.  I think the ‘use of fwmarks’ needs more
> thought as to how it’s done for the future - too keen to get something
> out.  Maybe it’s sufficient as is, I don’t know.

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc

This means it'll be in 5.1; so we have until that is released (~8 weeks
or so) to set the behaviour in stone.

I do think we at least need to add masking of the mark before using it
for tin selection; the question is just which bits to use from it.

As for setting the fwmark back in conntrack, I'm not sure I agree that
this is something CAKE should be doing. Mostly because it means even
tighter coupling between CAKE and the conntrack subsystem. However, I
may be convinced by a sufficiently neat implementation, and anyway this
is definitely something that will need to wait for 5.2 for upstream.

So I think the priority is to agree on semantics for masking the fwmark
when reading, and getting that implemented in a way that is compatible
with both other uses of marks, and with anything we else we might want
to do down the road.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Using firewall connmarks as tin selectors

2019-03-04 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

>> On 4 Mar 2019, at 08:39, Pete Heist  wrote:
>> 
>> 
>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant 
>>>  wrote:
>>> 
>>> The very bad idea:
>>> 
>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>> implementation as described above. So an awful lot of our
>>> shenanigans above is due to DSCP not traversing the internet
>>> particularly well. The solution above abstracts DSCP into ’tins’
>>> which we put into fwmarks. Another approach would be to put the DSCP
>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>> local domain preserved.
>> 
>> If I understand it right, another use case for this “very bad idea”
>> is preserving DSCP locally while traversing upstream WiFi links as
>> besteffort, which avoids airtime efficiency problems that can occur
>> with 802.11e (WMM). In cases where the router config can’t be changed
>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>> it hides DSCP from the WiFi stack while preserving the values through
>> the tunnel, but this would be easier. Neat… :)
>
> Everyone has understood the intent & maybe the implementation
> correctly. 2 patches attached, one for cake, one for tc.
>
> They are naively coded and some of it undoes Toke’s recent tidying up
> (sorry!)

Heh. First comment: Don't do that ;)

A few more below.

> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> diff --git a/pkt_sched.h b/pkt_sched.h
> index a2f570c..d1f288d 100644
> --- a/pkt_sched.h
> +++ b/pkt_sched.h
> @@ -879,6 +879,7 @@ enum {
>   TCA_CAKE_ACK_FILTER,
>   TCA_CAKE_SPLIT_GSO,
>   TCA_CAKE_FWMARK,
> + TCA_CAKE_ICING,
>   __TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
> diff --git a/sch_cake.c b/sch_cake.c
> index 733b897..5aca0f3 100644
> --- a/sch_cake.c
> +++ b/sch_cake.c
> @@ -270,7 +270,8 @@ enum {
>   CAKE_FLAG_INGRESS  = BIT(2),
>   CAKE_FLAG_WASH = BIT(3),
>   CAKE_FLAG_SPLIT_GSO= BIT(4),
> - CAKE_FLAG_FWMARK   = BIT(5)
> + CAKE_FLAG_FWMARK   = BIT(5),
> + CAKE_FLAG_ICING= BIT(6)

This implies that icing and fwmark can be enabled completely
independently of each other. Are you sure about the semantics for that?
>  };
>  
>  /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>  };
>  
>  static const u8 diffserv4[] = {
> - 0, 2, 0, 0, 2, 0, 0, 0,
> + 0, 1, 0, 0, 2, 0, 0, 0,
>   1, 0, 0, 0, 0, 0, 0, 0,
>   2, 0, 2, 0, 2, 0, 2, 0,
>   2, 0, 2, 0, 2, 0, 2, 0,
> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>  };
>  
>  static const u8 diffserv3[] = {
> - 0, 0, 0, 0, 2, 0, 0, 0,
> + 0, 1, 0, 0, 2, 0, 0, 0,

Why are you messing with the diffserv mappings in this patch?

>   1, 0, 0, 0, 0, 0, 0, 0,
>   0, 0, 0, 0, 0, 0, 0, 0,
>   0, 0, 0, 0, 0, 0, 0, 0,
> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, 
> struct sk_buff **to_free)
>   return idx + (tin << 16);
>  }
>  
> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
> +{
> + switch (skb->protocol) {
> + case htons(ETH_P_IP):
> + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
> + break;
> + case htons(ETH_P_IPV6):
> + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
> + break;
> + default:
> + break;
> + }
> +
> +}

So washing is just a special case of this (wash is
cake_update_diffserv(skb,0)). So you shouldn't need to add another
function, just augment the existing handling code.

> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>  {
>   u8 dscp;
>  
> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, 
> u16 wash)
>   }
>  }
>  
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)

Save an ifdef below by moving the ifdef inside the function definition.

> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
> +{
> + enum ip_conntrack_info ctinfo;
> + struct nf_conn *ct;
> +
> + ct = nf_ct_get(skb, );
> + if (!ct)
> + return;
> +
> + ct->mark &= 0x80ff;
> + ct->mark |= (0x40 | dscp) << 24;

Right, so we *might* have an argument that putting the *tin* into the
fwmark is CAKE's business, but copying over the dscp mark is not
something a qdisc should be doing...

> + nf_conntrack_event_cache(IPCT_MARK, ct);
> +}
> +#endif

Also, are you sure this will work in all permutations of conntrack being
a module vs 

Re: [Cake] Using firewall connmarks as tin selectors

2019-03-04 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

>> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen  wrote:
>> 
>> Kevin Darbyshire-Bryant  writes:
>> 
>>>> On 4 Mar 2019, at 08:39, Pete Heist  wrote:
>>>> 
>>>> 
>>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant 
>>>>>  wrote:
>>>>> 
>>>>> The very bad idea:
>>>>> 
>>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>>> implementation as described above. So an awful lot of our
>>>>> shenanigans above is due to DSCP not traversing the internet
>>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>>> local domain preserved.
>>>> 
>>>> If I understand it right, another use case for this “very bad idea”
>>>> is preserving DSCP locally while traversing upstream WiFi links as
>>>> besteffort, which avoids airtime efficiency problems that can occur
>>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>>> it hides DSCP from the WiFi stack while preserving the values through
>>>> the tunnel, but this would be easier. Neat… :)
>>> 
>>> Everyone has understood the intent & maybe the implementation
>>> correctly. 2 patches attached, one for cake, one for tc.
>>> 
>>> They are naively coded and some of it undoes Toke’s recent tidying up
>>> (sorry!)
>> 
>> Heh. First comment: Don't do that ;)
>
> I did say naively coded.
>
>> 
>> A few more below.
>> 
>>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>>> diff --git a/pkt_sched.h b/pkt_sched.h
>>> index a2f570c..d1f288d 100644
>>> --- a/pkt_sched.h
>>> +++ b/pkt_sched.h
>>> @@ -879,6 +879,7 @@ enum {
>>> TCA_CAKE_ACK_FILTER,
>>> TCA_CAKE_SPLIT_GSO,
>>> TCA_CAKE_FWMARK,
>>> +   TCA_CAKE_ICING,
>>> __TCA_CAKE_MAX
>>> };
>>> #define TCA_CAKE_MAX(__TCA_CAKE_MAX - 1)
>>> diff --git a/sch_cake.c b/sch_cake.c
>>> index 733b897..5aca0f3 100644
>>> --- a/sch_cake.c
>>> +++ b/sch_cake.c
>>> @@ -270,7 +270,8 @@ enum {
>>> CAKE_FLAG_INGRESS  = BIT(2),
>>> CAKE_FLAG_WASH = BIT(3),
>>> CAKE_FLAG_SPLIT_GSO= BIT(4),
>>> -   CAKE_FLAG_FWMARK   = BIT(5)
>>> +   CAKE_FLAG_FWMARK   = BIT(5),
>>> +   CAKE_FLAG_ICING= BIT(6)
>> 
>> This implies that icing and fwmark can be enabled completely
>> independently of each other. Are you sure about the semantics for that?
>
> No, I’m not.  I sent the patches so others could see my lunacy in action and 
> hopefully improve it.
>
> The actual operation links FWMARK, INGRESS & ICING in a variety of
> combinations.

Right, so obviously this needs to be thought through...

>>> };
>>> 
>>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>>> };
>>> 
>>> static const u8 diffserv4[] = {
>>> -   0, 2, 0, 0, 2, 0, 0, 0,
>>> +   0, 1, 0, 0, 2, 0, 0, 0,
>>> 1, 0, 0, 0, 0, 0, 0, 0,
>>> 2, 0, 2, 0, 2, 0, 2, 0,
>>> 2, 0, 2, 0, 2, 0, 2, 0,
>>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>>> };
>>> 
>>> static const u8 diffserv3[] = {
>>> -   0, 0, 0, 0, 2, 0, 0, 0,
>>> +   0, 1, 0, 0, 2, 0, 0, 0,
>> 
>> Why are you messing with the diffserv mappings in this patch?
>
> This is a combination patch of Dave’s new LE coding and the
> fwmark/dscp mangling.

Ah. Well let's keep that separate from this patch/discussion...

>> 
>>> 1, 0, 0, 0, 0, 0, 0, 0,
>>> 0, 0, 0, 0, 0, 0, 0, 0,
>>> 0, 0, 0, 0, 0, 0, 0, 0,
>>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, 
>>> struct sk_buff **to_free)
>>> return idx + (tin << 16);
>>> }
>>> 
>>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 

Re: [Cake] Using firewall connmarks as tin selectors

2019-03-04 Thread Toke Høiland-Jørgensen
Kevin Darbyshire-Bryant  writes:

>> On 4 Mar 2019, at 16:39, Toke Høiland-Jørgensen  wrote:
>> 
>> Kevin Darbyshire-Bryant  writes:
>> 
>> [ ... snipping a bit of context here ... ]
>> 
>>>>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>>>>> +{
>>>>>>> +   enum ip_conntrack_info ctinfo;
>>>>>>> +   struct nf_conn *ct;
>>>>>>> +
>>>>>>> +   ct = nf_ct_get(skb, );
>>>>>>> +   if (!ct)
>>>>>>> +   return;
>>>>>>> +
>>>>>>> +   ct->mark &= 0x80ff;
>>>>>>> +   ct->mark |= (0x40 | dscp) << 24;
>>>>>> 
>>>>>> Right, so we *might* have an argument that putting the *tin* into the
>>>>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>>>>> something a qdisc should be doing…
>>>>> 
>>>>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>>>>> priority table, it just happens to look like the DSCP ;-)
>>>> 
>>>> If it quacks like a duck…
>>> 
>>> I honestly don’t know where to go from here. I’m clearly trying to do
>>> something that the kernel doesn’t want to do.
>> 
>> I'm not disputing that what you're trying to do (moving DSCP field into
>> connmark) is useful. I'm just questioning whether CAKE is the right
>> place to do this. I think it would fit better in a TC action; either
>> modify act_connmark, or create a new action to sync fwmarks and DSCP
>> marks…
>
> Interesting.  Thinks out loud - Two actions - ‘act_storedscp’,
> ‘act_restoredscp'

Or act_dscp with 'get' and 'set' options :)

> As I said earlier I couldn't work out how m_conntrack did… anything at
> all to be honest!

act_connmark is pretty straight forward:
https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34

>>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct 
>>> Qdisc *sch,
>>> tin = 0;
>>> 
>>> else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>>> -skb->mark &&
>>> -skb->mark <= q->tin_cnt)
>>> -   tin = q->tin_order[skb->mark - 1];
>>> -
>>> -   else if (TC_H_MAJ(skb->priority) == sch->handle &&
>>> -TC_H_MIN(skb->priority) > 0 &&
>>> -TC_H_MIN(skb->priority) <= q->tin_cnt)
>>> +  skb->mark & 0x4000) {
>> 
>> I think there's something odd with this mask?  There's only one bit set
>> in it…
>
> I use the single bit as a flag to indicate cake has stored the  DSCP
> in the lower 6 bits of the byte.  Otherwise with a DSCP of 0 (the
> default) it’s rather difficult to know if a connection has been
> through the cake ’save dscp to fwmark’ process or not.  That also
> indicates to user space whether it should consider mangling packets or
> not e.g.

Ah, right. But that breaks the previous use case where someone just
wants to set fwmarks that get turned into CAKE tins?

I think this definitely is leaning towards decoupling the
fw-mark-to-DSCP settings to an action. And probably making the shift and
mask configurable rather than hard-coding something...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


<    1   2   3   4   5   6   >