Re: [Cake] net-next is OPEN...

2018-04-19 Thread Pete Heist
> On Apr 18, 2018, at 7:43 AM, Pete Heist  wrote:
> 
> I also think I saw this happen at lower bandwidths as well, when the CPU 
> wasn’t loaded. What I’ll do is re-test on the current version I have at, say, 
> 50Mbit (or to where load drops substantially), then update to the head and 
> test again, and let you know...
> 
>> On Apr 17, 2018, at 3:52 PM, Jonathan Morton  wrote:
>> 
>>> On 16 Apr, 2018, at 11:55 pm, Pete Heist  wrote:
>>> 
>>> I remember that fairness behavior at low RTTs (< 20ms) needed to be either 
>>> improved or documented
>> 
>> The reason for the behaviour, IIRC, was that throughput dropped below 100% 
>> when the latency target was reduced too much.  Since then there has been a 
>> small change which might improve it a little, so a retest would be 
>> reasonable.

At 50mbit I don’t see nearly as much fairness degradation at low RTTs, although 
there’s some. Even at 100us, “fairness” is around 1.1 (1.0 being perfectly 
fair) instead of the 2.x I saw at 500mbit. I do not see much of a difference 
between the latest code (16d7fed, 2018-04-17) and the previous code I tested 
(7061401, 2017-12-01), if that info is of use.

RTT: tcp_1up upload Mbps / tcp_12up upload Mbps

7061401 (2017-12-01):

   100us: 23.80 / 25.85
   1ms: 23.89 / 29.46
   10ms: 23.93 / 24.66
   40ms: 23.96 / 24.10
   100ms: 23.97 / 24.12

16d7fed (2018-04-17):

   100us: 23.97 / 26.49
   1ms: 23.89 / 26.27
   10ms: 23.98 / 26.37
   40ms: 23.94 / 24.08
   100ms: 23.97 / 24.12

I can post reports / flent files on request.

So it appears this is CPU related, and not worth exploring further(?) and not 
worth documenting(?) other than that once things have stabilized, documenting 
how Cake degrades under various extreme conditions would be informative.

Well, here’s to science and a good walk in the weeds…

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] Diffserv LLT mode

2018-04-19 Thread Toke Høiland-Jørgensen
Is anyone actually using the LLT diffserv setting? The draft describing
it seems to have expired ages ago:

https://datatracker.ietf.org/doc/draft-you-tsvwg-latency-loss-tradeoff/

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Luca Muscariello
I don't think that this feature really hurts TCP.
TCP is robust to that in any case. Even if there is avg RTT increase and
stddev RTT increase.

And, I agree that what is more important is the performance of sparse
flows, which is not affected by this feature.

There is one little thing that might appear negligible, but it is not from
my point of view,
which is about giving incentives to transport end-points
to behaves in the right way. For instance a transport end-point that sends
traffic using pacing should be considered as
behaving better than a transport end-point that sends in burst. And get
reward for that.

Flow isolation creates incentives to pace transmissions and so create less
queueing in the network.
This feature reduces the level of that incentive.
I am not saying that it eliminates  the incentive, because there is still
flow isolation, but it makes it less
effective. If you send less bursts you dont get lower latency.

When I say transport end-point I don't only think toTCP but also QUIC and
all other possible TCPs
as we all know TCP is a variety of protocols.

But I understand Jonathan's point.

Luca


On Thu, Apr 19, 2018 at 12:33 PM, Toke Høiland-Jørgensen 
wrote:

> Jonathan Morton  writes:
>
>  your solution significantly hurts performance in the common case
> >>>
> >>> I'm sorry - did someone actually describe such a case?  I must have
> >>> missed it.
> >>
> >> I started this whole thread by pointing out that this behaviour results
> >> in the delay of the TCP flows scaling with the number of active flows;
> >> and that for 32 active flows (on a 10Mbps link), this results in the
> >> latency being three times higher than for FQ-CoDel on the same link.
> >
> > Okay, so intra-flow latency is impaired for bulk flows sharing a
> > relatively low-bandwidth link. That's a metric which few people even
> > know how to measure for bulk flows, though it is of course important
> > for sparse flows. I was hoping you had a common use-case where
> > *sparse* flow latency was impacted, in which case we could actually
> > discuss it properly.
> >
> > But *inter-flow* latency is not impaired, is it? Nor intra-sparse-flow
> > latency? Nor packet loss, which people often do measure (or at least
> > talk about measuring) - quite the opposite? Nor goodput, which people
> > *definitely* measure and notice, and is influenced more strongly by
> > packet loss when in ingress mode?
>
> As I said, I'll run more tests and post more data once I have time.
>
> > The measurement you took had a baseline latency in the region of 60ms.
>
> The baseline link latency is 50 ms; which is sorta what you'd expect
> from a median non-CDN'en internet connection.
>
> > That's high enough for a couple of packets per flow to be in flight
> > independently of the bottleneck queue.
>
> Yes. As is the case for most flows going over the public internet...
>
> > I would take this argument more seriously if a use-case that mattered
> > was identified.
>
> Use cases where intra-flow latency matters, off the top of my head:
>
> - Real-time video with congestion response
> - Multiple connections multiplexed over a single flow (HTTP/2 or
>   QUIC-style)
> - Anything that behaves more sanely than TCP at really low bandwidths.
>
> But yeah, you're right, no one uses any of those... /s
>
> > So far, I can't even see a coherent argument for making this tweak
> > optional (which is of course possible), let alone removing it
> > entirely; we only have a single synthetic benchmark which shows one
> > obscure metric move in the "wrong" direction, versus a real use-case
> > identified by an actual user in which this configuration genuinely
> > helps.
>
> And I've been trying to explain why you are the one optimising for
> pathological cases at the expense of the common case.
>
> But I don't think we are going to agree based on a theoretical
> discussion. So let's just leave this and I'll return with some data once
> I've had a chance to run some actual tests of the different use cases.
>
> -Toke
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

 your solution significantly hurts performance in the common case
>>> 
>>> I'm sorry - did someone actually describe such a case?  I must have
>>> missed it.
>> 
>> I started this whole thread by pointing out that this behaviour results
>> in the delay of the TCP flows scaling with the number of active flows;
>> and that for 32 active flows (on a 10Mbps link), this results in the
>> latency being three times higher than for FQ-CoDel on the same link.
>
> Okay, so intra-flow latency is impaired for bulk flows sharing a
> relatively low-bandwidth link. That's a metric which few people even
> know how to measure for bulk flows, though it is of course important
> for sparse flows. I was hoping you had a common use-case where
> *sparse* flow latency was impacted, in which case we could actually
> discuss it properly.
>
> But *inter-flow* latency is not impaired, is it? Nor intra-sparse-flow
> latency? Nor packet loss, which people often do measure (or at least
> talk about measuring) - quite the opposite? Nor goodput, which people
> *definitely* measure and notice, and is influenced more strongly by
> packet loss when in ingress mode?

As I said, I'll run more tests and post more data once I have time.

> The measurement you took had a baseline latency in the region of 60ms.

The baseline link latency is 50 ms; which is sorta what you'd expect
from a median non-CDN'en internet connection.

> That's high enough for a couple of packets per flow to be in flight
> independently of the bottleneck queue.

Yes. As is the case for most flows going over the public internet...

> I would take this argument more seriously if a use-case that mattered
> was identified.

Use cases where intra-flow latency matters, off the top of my head:

- Real-time video with congestion response
- Multiple connections multiplexed over a single flow (HTTP/2 or
  QUIC-style)
- Anything that behaves more sanely than TCP at really low bandwidths.

But yeah, you're right, no one uses any of those... /s

> So far, I can't even see a coherent argument for making this tweak
> optional (which is of course possible), let alone removing it
> entirely; we only have a single synthetic benchmark which shows one
> obscure metric move in the "wrong" direction, versus a real use-case
> identified by an actual user in which this configuration genuinely
> helps.

And I've been trying to explain why you are the one optimising for
pathological cases at the expense of the common case.

But I don't think we are going to agree based on a theoretical
discussion. So let's just leave this and I'll return with some data once
I've had a chance to run some actual tests of the different use cases.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
>>> your solution significantly hurts performance in the common case
>> 
>> I'm sorry - did someone actually describe such a case?  I must have
>> missed it.
> 
> I started this whole thread by pointing out that this behaviour results
> in the delay of the TCP flows scaling with the number of active flows;
> and that for 32 active flows (on a 10Mbps link), this results in the
> latency being three times higher than for FQ-CoDel on the same link.

Okay, so intra-flow latency is impaired for bulk flows sharing a relatively 
low-bandwidth link.  That's a metric which few people even know how to measure 
for bulk flows, though it is of course important for sparse flows.  I was 
hoping you had a common use-case where *sparse* flow latency was impacted, in 
which case we could actually discuss it properly.

But *inter-flow* latency is not impaired, is it?  Nor intra-sparse-flow 
latency?  Nor packet loss, which people often do measure (or at least talk 
about measuring) - quite the opposite?  Nor goodput, which people *definitely* 
measure and notice, and is influenced more strongly by packet loss when in 
ingress mode?

The measurement you took had a baseline latency in the region of 60ms.  That's 
high enough for a couple of packets per flow to be in flight independently of 
the bottleneck queue.  Therefore, the most severe effects of fq_codel's 
configuration (and Cake's old configuration) are less obvious, since TCP is 
still kept operating in a regime where its behaviour is vaguely acceptable.  
Aggregate goodput remains high anyway, due to the large number of flows 
involved, but I would expect the goodput of individual flows to show odd 
behaviour under fq_codel.

I would take this argument more seriously if a use-case that mattered was 
identified.  So far, I can't even see a coherent argument for making this tweak 
optional (which is of course possible), let alone removing it entirely; we only 
have a single synthetic benchmark which shows one obscure metric move in the 
"wrong" direction, versus a real use-case identified by an actual user in which 
this configuration genuinely helps.

And I've tried to explain why I believe this to be the Right Thing to do in 
general, contrary to Dave's opinion.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
> If you turn off the AQM entirely for the
> first four packets, it is going to activate when the fifth packet
> arrives, resulting in a tail loss and... an RTO!

That isn't what happens.

First of all, Cake explicitly guards against tail loss by exempting the last 
packet in each queue from being dropped.  If a tail loss and RTO actually 
occurs, it's extremely unlikely that Cake caused it, unless it's been driven 
far beyond its design load in terms of flow count.

Secondly, and as you should very well know, Codel only starts marking or 
dropping when the *standing* queue exceeds the threshold set.  COBALT 
implements that logic in a different way to the reference version, but it's 
still there.  It's not a case of the fifth packet in a flow getting dropped, 
but of a five-packet standing queue being the smallest that *can* experience 
drops.

So please don't strawman this.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> your solution significantly hurts performance in the common case
>
> I'm sorry - did someone actually describe such a case?  I must have
> missed it.

I started this whole thread by pointing out that this behaviour results
in the delay of the TCP flows scaling with the number of active flows;
and that for 32 active flows (on a 10Mbps link), this results in the
latency being three times higher than for FQ-CoDel on the same link.

This was the message:
https://lists.bufferbloat.net/pipermail/cake/2018-April/003405.html

And this graph, specifically:
https://lists.bufferbloat.net/pipermail/cake/attachments/20180417/1e56d8f3/attachment-0002.png

It's even worse for 64 flows, obviously; and there's no change in
goodput.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
> your solution significantly hurts performance in the common case

I'm sorry - did someone actually describe such a case?  I must have missed it.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>>> I'm saying that there's a tradeoff between intra-flow induced latency and 
>>> packet loss, and I've chosen 4 MTUs as the operating point.
>> 
>> Is there a reason for picking 4 MTUs vs 2 MTUs vs 2 packets, etc?
>
> To be more precise, I'm using a sojourn time equivalent to 4 MTU-sized
> packets per bulk flow at line rate, as a modifier to existing AQM
> behaviour.
>
> The worst case for packet loss within the AQM occurs when the inherent
> latency of the links is very low but the available bandwidth per flow
> is also low. This is easy to replicate using a test box flanked by
> GigE links to endpoint hosts; GigE has sub-millisecond inherent
> delays. In this case, the entire BDP of each flow exists within the
> queue.
>
> A general recommendation exists for TCP to use a minimum of 4 packets
> in flight, in order to keep the ack-clock running smoothly in the face
> of packet losses which might otherwise trigger an RTO (retransmit
> timeout).  This allows one packet to be lost and detected by the
> triple-repetition ACK method, without SACK.

But for triple-dupack to work you actually need to drop packets (the
first one, to be precise), not let it sit around in a bloated queue and
wait for precisely RTO timeout. If you turn off the AQM entirely for the
first four packets, it is going to activate when the fifth packet
arrives, resulting in a tail loss and... an RTO!

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> This is why I think that any fix that tries to solve this problem in
>> the queueing system should be avoided. It does not solve the real
>> problem (overload) and introduces latency.
>
> Most people, myself included, prefer systems that degrade gracefully
> instead of simply failing or rejecting new loads. Systems that exhibit
> the latter behaviours tend to be open to DoS attacks, which are
> obviously bad. Or users obsessively retry the failed requests until
> they succeed, increasing total load for the same goodput and inferior
> perceived QoS. Or ignorant application developers try to work around a
> perceived-unreliable system by spamming it with connections so that
> *their* traffic ends up getting through somehow.
>
> By designing a system which exhibits engineering elegance where
> practical, and graceful degradation otherwise, I try to encourage
> others to do the Right Thing by providing suitable incentives in the
> system's behaviour. The conventional way (of just throwing up one's
> hands when load exceeds capacity) has already been tried, extensively,
> and obviously doesn't work. Cake does better.

Except this is not simply a question of "better and more elegant". It is
a tradeoff between different concerns, and your solution significantly
hurts performance in the common case to accommodate a corner case that
quite fundamentally *can't* be solved properly at the queueing level, as
Luca points out.


-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Kevin Darbyshire-Bryant via Cake
--- Begin Message ---


> On 18 Apr 2018, at 19:16, Kevin Darbyshire-Bryant via Cake 
>  wrote:
> 
> I know this can be writted betterrer but I think this is the sort of thing 
> we’re pondering over?
> 
> https://github.com/ldir-EDB0/sch_cake/commit/334ae4308961e51eb6ad0d08450cdcba558ef4e3
> 
> Warning: compiles, not yet actually run in any way whatsoever ;-)

Writted betterrer.

https://github.com/ldir-EDB0/sch_cake/commit/eb5543f397fb3522bc60cc80805596282fbe076f

And this is currently running on a box here.  The impact/change not tested but 
it doesn’t blow up.

KDB


signature.asc
Description: Message signed with OpenPGP
--- End Message ---
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
> This is why I think that any fix that tries to solve this problem in the 
> queueing system should be avoided. It does not solve the real problem 
> (overload) and introduces latency.

Most people, myself included, prefer systems that degrade gracefully instead of 
simply failing or rejecting new loads.  Systems that exhibit the latter 
behaviours tend to be open to DoS attacks, which are obviously bad.  Or users 
obsessively retry the failed requests until they succeed, increasing total load 
for the same goodput and inferior perceived QoS.  Or ignorant application 
developers try to work around a perceived-unreliable system by spamming it with 
connections so that *their* traffic ends up getting through somehow.

By designing a system which exhibits engineering elegance where practical, and 
graceful degradation otherwise, I try to encourage others to do the Right Thing 
by providing suitable incentives in the system's behaviour.  The conventional 
way (of just throwing up one's hands when load exceeds capacity) has already 
been tried, extensively, and obviously doesn't work.  Cake does better.

Since Pacific islands are topical, perhaps look up the story of the California 
Clipper, which had to trek from NZ to NY "the long way round" after Japan 
entered the war.  To do so, the crew had to push the aircraft's endurance 
beyond the normal limits several times, and run it on the 90-octane fuel that 
was available in India and Africa, rather than the 100-octane fuel that the 
engines had been designed for.  Eventually part of the exhaust fell off one 
engine, and they had no spare - but the engine kept working, so they just 
posted a lookout to account for the increased fire hazard, and kept on flying.  
They could do that because it was a well-designed aircraft that had some 
tolerance for hard running, and comparatively graceful failure modes (as you'd 
hope an airliner would).

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Luca Muscariello
I think that this discussion is about trying to solve an almost impossible
problem.
When the link is in overload, and this is the case, there is nothing one
can do with flow queuing or AQM.

It is just too late to make something useful.

Overload means that the number of active backlogged flows is just too large
and the fair-share is too low for application in the first place and for
the transport too.

Jonathan tries to make TCP work in a desperate situation.

In real life what would happen is that applications would just stop and so
the number of flows would dicrease  to normal numbers.
For those apps that don’t stop the best approach would be to just kill in a
selective manner, best if driven by a policy that is set by the user.

This is why I think that any fix that tries to solve this problem in the
queueing system should be avoided. It does not solve the real problem
(overload) and introduces latency.

My2c

Luca


On Wed, Apr 18, 2018 at 6:25 PM, Dave Taht  wrote:

> I would like to revert this change.
>
> On Wed, Apr 18, 2018 at 9:11 AM, Toke Høiland-Jørgensen 
> wrote:
> > Jonathan Morton  writes:
> >
> >>> On 18 Apr, 2018, at 6:17 pm, Sebastian Moeller 
> wrote:
> >>>
> >>> Just a thought, in egress mode in the typical deployment we expect,
> >>> the bandwidth leading into cake will be >> than the bandwidth out of
> >>> cake, so I would argue that the package droppage might be acceptable
> >>> on egress as there is bandwidth to "waste" while on ingress the issue
> >>> very much is that all packets cake sees already used up parts of the
> >>> limited transfer time on the bottleneck link and hence are more
> >>> "precious", no? Users wanting this new behavior could still use the
> >>> ingress keyword even on egress interfaces?
> >>
> >> Broadly speaking, that should indeed counter most of the negative
> >> effects you'd expect from disabling this tweak in egress mode. But it
> >> doesn't really answer the question of whether there's a compelling
> >> *positive* reason to do so. I want to see a use case that holds up.
> >
> > What you're saying here is that you basically don't believe there are
> > any applications where a bulk TCP flow would also want low queueing
> > latency? :)
> >
> > -Toke
> > ___
> > Cake mailing list
> > Cake@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619
> ___
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake