Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Luca Muscariello
I don't think that this feature really hurts TCP.
TCP is robust to that in any case. Even if there is avg RTT increase and
stddev RTT increase.

And, I agree that what is more important is the performance of sparse
flows, which is not affected by this feature.

There is one little thing that might appear negligible, but it is not from
my point of view,
which is about giving incentives to transport end-points
to behaves in the right way. For instance a transport end-point that sends
traffic using pacing should be considered as
behaving better than a transport end-point that sends in burst. And get
reward for that.

Flow isolation creates incentives to pace transmissions and so create less
queueing in the network.
This feature reduces the level of that incentive.
I am not saying that it eliminates  the incentive, because there is still
flow isolation, but it makes it less
effective. If you send less bursts you dont get lower latency.

When I say transport end-point I don't only think toTCP but also QUIC and
all other possible TCPs
as we all know TCP is a variety of protocols.

But I understand Jonathan's point.

Luca


On Thu, Apr 19, 2018 at 12:33 PM, Toke Høiland-Jørgensen 
wrote:

> Jonathan Morton  writes:
>
>  your solution significantly hurts performance in the common case
> >>>
> >>> I'm sorry - did someone actually describe such a case?  I must have
> >>> missed it.
> >>
> >> I started this whole thread by pointing out that this behaviour results
> >> in the delay of the TCP flows scaling with the number of active flows;
> >> and that for 32 active flows (on a 10Mbps link), this results in the
> >> latency being three times higher than for FQ-CoDel on the same link.
> >
> > Okay, so intra-flow latency is impaired for bulk flows sharing a
> > relatively low-bandwidth link. That's a metric which few people even
> > know how to measure for bulk flows, though it is of course important
> > for sparse flows. I was hoping you had a common use-case where
> > *sparse* flow latency was impacted, in which case we could actually
> > discuss it properly.
> >
> > But *inter-flow* latency is not impaired, is it? Nor intra-sparse-flow
> > latency? Nor packet loss, which people often do measure (or at least
> > talk about measuring) - quite the opposite? Nor goodput, which people
> > *definitely* measure and notice, and is influenced more strongly by
> > packet loss when in ingress mode?
>
> As I said, I'll run more tests and post more data once I have time.
>
> > The measurement you took had a baseline latency in the region of 60ms.
>
> The baseline link latency is 50 ms; which is sorta what you'd expect
> from a median non-CDN'en internet connection.
>
> > That's high enough for a couple of packets per flow to be in flight
> > independently of the bottleneck queue.
>
> Yes. As is the case for most flows going over the public internet...
>
> > I would take this argument more seriously if a use-case that mattered
> > was identified.
>
> Use cases where intra-flow latency matters, off the top of my head:
>
> - Real-time video with congestion response
> - Multiple connections multiplexed over a single flow (HTTP/2 or
>   QUIC-style)
> - Anything that behaves more sanely than TCP at really low bandwidths.
>
> But yeah, you're right, no one uses any of those... /s
>
> > So far, I can't even see a coherent argument for making this tweak
> > optional (which is of course possible), let alone removing it
> > entirely; we only have a single synthetic benchmark which shows one
> > obscure metric move in the "wrong" direction, versus a real use-case
> > identified by an actual user in which this configuration genuinely
> > helps.
>
> And I've been trying to explain why you are the one optimising for
> pathological cases at the expense of the common case.
>
> But I don't think we are going to agree based on a theoretical
> discussion. So let's just leave this and I'll return with some data once
> I've had a chance to run some actual tests of the different use cases.
>
> -Toke
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

 your solution significantly hurts performance in the common case
>>> 
>>> I'm sorry - did someone actually describe such a case?  I must have
>>> missed it.
>> 
>> I started this whole thread by pointing out that this behaviour results
>> in the delay of the TCP flows scaling with the number of active flows;
>> and that for 32 active flows (on a 10Mbps link), this results in the
>> latency being three times higher than for FQ-CoDel on the same link.
>
> Okay, so intra-flow latency is impaired for bulk flows sharing a
> relatively low-bandwidth link. That's a metric which few people even
> know how to measure for bulk flows, though it is of course important
> for sparse flows. I was hoping you had a common use-case where
> *sparse* flow latency was impacted, in which case we could actually
> discuss it properly.
>
> But *inter-flow* latency is not impaired, is it? Nor intra-sparse-flow
> latency? Nor packet loss, which people often do measure (or at least
> talk about measuring) - quite the opposite? Nor goodput, which people
> *definitely* measure and notice, and is influenced more strongly by
> packet loss when in ingress mode?

As I said, I'll run more tests and post more data once I have time.

> The measurement you took had a baseline latency in the region of 60ms.

The baseline link latency is 50 ms; which is sorta what you'd expect
from a median non-CDN'en internet connection.

> That's high enough for a couple of packets per flow to be in flight
> independently of the bottleneck queue.

Yes. As is the case for most flows going over the public internet...

> I would take this argument more seriously if a use-case that mattered
> was identified.

Use cases where intra-flow latency matters, off the top of my head:

- Real-time video with congestion response
- Multiple connections multiplexed over a single flow (HTTP/2 or
  QUIC-style)
- Anything that behaves more sanely than TCP at really low bandwidths.

But yeah, you're right, no one uses any of those... /s

> So far, I can't even see a coherent argument for making this tweak
> optional (which is of course possible), let alone removing it
> entirely; we only have a single synthetic benchmark which shows one
> obscure metric move in the "wrong" direction, versus a real use-case
> identified by an actual user in which this configuration genuinely
> helps.

And I've been trying to explain why you are the one optimising for
pathological cases at the expense of the common case.

But I don't think we are going to agree based on a theoretical
discussion. So let's just leave this and I'll return with some data once
I've had a chance to run some actual tests of the different use cases.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
>>> your solution significantly hurts performance in the common case
>> 
>> I'm sorry - did someone actually describe such a case?  I must have
>> missed it.
> 
> I started this whole thread by pointing out that this behaviour results
> in the delay of the TCP flows scaling with the number of active flows;
> and that for 32 active flows (on a 10Mbps link), this results in the
> latency being three times higher than for FQ-CoDel on the same link.

Okay, so intra-flow latency is impaired for bulk flows sharing a relatively 
low-bandwidth link.  That's a metric which few people even know how to measure 
for bulk flows, though it is of course important for sparse flows.  I was 
hoping you had a common use-case where *sparse* flow latency was impacted, in 
which case we could actually discuss it properly.

But *inter-flow* latency is not impaired, is it?  Nor intra-sparse-flow 
latency?  Nor packet loss, which people often do measure (or at least talk 
about measuring) - quite the opposite?  Nor goodput, which people *definitely* 
measure and notice, and is influenced more strongly by packet loss when in 
ingress mode?

The measurement you took had a baseline latency in the region of 60ms.  That's 
high enough for a couple of packets per flow to be in flight independently of 
the bottleneck queue.  Therefore, the most severe effects of fq_codel's 
configuration (and Cake's old configuration) are less obvious, since TCP is 
still kept operating in a regime where its behaviour is vaguely acceptable.  
Aggregate goodput remains high anyway, due to the large number of flows 
involved, but I would expect the goodput of individual flows to show odd 
behaviour under fq_codel.

I would take this argument more seriously if a use-case that mattered was 
identified.  So far, I can't even see a coherent argument for making this tweak 
optional (which is of course possible), let alone removing it entirely; we only 
have a single synthetic benchmark which shows one obscure metric move in the 
"wrong" direction, versus a real use-case identified by an actual user in which 
this configuration genuinely helps.

And I've tried to explain why I believe this to be the Right Thing to do in 
general, contrary to Dave's opinion.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
> If you turn off the AQM entirely for the
> first four packets, it is going to activate when the fifth packet
> arrives, resulting in a tail loss and... an RTO!

That isn't what happens.

First of all, Cake explicitly guards against tail loss by exempting the last 
packet in each queue from being dropped.  If a tail loss and RTO actually 
occurs, it's extremely unlikely that Cake caused it, unless it's been driven 
far beyond its design load in terms of flow count.

Secondly, and as you should very well know, Codel only starts marking or 
dropping when the *standing* queue exceeds the threshold set.  COBALT 
implements that logic in a different way to the reference version, but it's 
still there.  It's not a case of the fifth packet in a flow getting dropped, 
but of a five-packet standing queue being the smallest that *can* experience 
drops.

So please don't strawman this.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
> your solution significantly hurts performance in the common case

I'm sorry - did someone actually describe such a case?  I must have missed it.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>>> I'm saying that there's a tradeoff between intra-flow induced latency and 
>>> packet loss, and I've chosen 4 MTUs as the operating point.
>> 
>> Is there a reason for picking 4 MTUs vs 2 MTUs vs 2 packets, etc?
>
> To be more precise, I'm using a sojourn time equivalent to 4 MTU-sized
> packets per bulk flow at line rate, as a modifier to existing AQM
> behaviour.
>
> The worst case for packet loss within the AQM occurs when the inherent
> latency of the links is very low but the available bandwidth per flow
> is also low. This is easy to replicate using a test box flanked by
> GigE links to endpoint hosts; GigE has sub-millisecond inherent
> delays. In this case, the entire BDP of each flow exists within the
> queue.
>
> A general recommendation exists for TCP to use a minimum of 4 packets
> in flight, in order to keep the ack-clock running smoothly in the face
> of packet losses which might otherwise trigger an RTO (retransmit
> timeout).  This allows one packet to be lost and detected by the
> triple-repetition ACK method, without SACK.

But for triple-dupack to work you actually need to drop packets (the
first one, to be precise), not let it sit around in a bloated queue and
wait for precisely RTO timeout. If you turn off the AQM entirely for the
first four packets, it is going to activate when the fifth packet
arrives, resulting in a tail loss and... an RTO!

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> This is why I think that any fix that tries to solve this problem in
>> the queueing system should be avoided. It does not solve the real
>> problem (overload) and introduces latency.
>
> Most people, myself included, prefer systems that degrade gracefully
> instead of simply failing or rejecting new loads. Systems that exhibit
> the latter behaviours tend to be open to DoS attacks, which are
> obviously bad. Or users obsessively retry the failed requests until
> they succeed, increasing total load for the same goodput and inferior
> perceived QoS. Or ignorant application developers try to work around a
> perceived-unreliable system by spamming it with connections so that
> *their* traffic ends up getting through somehow.
>
> By designing a system which exhibits engineering elegance where
> practical, and graceful degradation otherwise, I try to encourage
> others to do the Right Thing by providing suitable incentives in the
> system's behaviour. The conventional way (of just throwing up one's
> hands when load exceeds capacity) has already been tried, extensively,
> and obviously doesn't work. Cake does better.

Except this is not simply a question of "better and more elegant". It is
a tradeoff between different concerns, and your solution significantly
hurts performance in the common case to accommodate a corner case that
quite fundamentally *can't* be solved properly at the queueing level, as
Luca points out.


-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Jonathan Morton
> This is why I think that any fix that tries to solve this problem in the 
> queueing system should be avoided. It does not solve the real problem 
> (overload) and introduces latency.

Most people, myself included, prefer systems that degrade gracefully instead of 
simply failing or rejecting new loads.  Systems that exhibit the latter 
behaviours tend to be open to DoS attacks, which are obviously bad.  Or users 
obsessively retry the failed requests until they succeed, increasing total load 
for the same goodput and inferior perceived QoS.  Or ignorant application 
developers try to work around a perceived-unreliable system by spamming it with 
connections so that *their* traffic ends up getting through somehow.

By designing a system which exhibits engineering elegance where practical, and 
graceful degradation otherwise, I try to encourage others to do the Right Thing 
by providing suitable incentives in the system's behaviour.  The conventional 
way (of just throwing up one's hands when load exceeds capacity) has already 
been tried, extensively, and obviously doesn't work.  Cake does better.

Since Pacific islands are topical, perhaps look up the story of the California 
Clipper, which had to trek from NZ to NY "the long way round" after Japan 
entered the war.  To do so, the crew had to push the aircraft's endurance 
beyond the normal limits several times, and run it on the 90-octane fuel that 
was available in India and Africa, rather than the 100-octane fuel that the 
engines had been designed for.  Eventually part of the exhaust fell off one 
engine, and they had no spare - but the engine kept working, so they just 
posted a lookout to account for the increased fire hazard, and kept on flying.  
They could do that because it was a well-designed aircraft that had some 
tolerance for hard running, and comparatively graceful failure modes (as you'd 
hope an airliner would).

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-19 Thread Luca Muscariello
I think that this discussion is about trying to solve an almost impossible
problem.
When the link is in overload, and this is the case, there is nothing one
can do with flow queuing or AQM.

It is just too late to make something useful.

Overload means that the number of active backlogged flows is just too large
and the fair-share is too low for application in the first place and for
the transport too.

Jonathan tries to make TCP work in a desperate situation.

In real life what would happen is that applications would just stop and so
the number of flows would dicrease  to normal numbers.
For those apps that don’t stop the best approach would be to just kill in a
selective manner, best if driven by a policy that is set by the user.

This is why I think that any fix that tries to solve this problem in the
queueing system should be avoided. It does not solve the real problem
(overload) and introduces latency.

My2c

Luca


On Wed, Apr 18, 2018 at 6:25 PM, Dave Taht  wrote:

> I would like to revert this change.
>
> On Wed, Apr 18, 2018 at 9:11 AM, Toke Høiland-Jørgensen 
> wrote:
> > Jonathan Morton  writes:
> >
> >>> On 18 Apr, 2018, at 6:17 pm, Sebastian Moeller 
> wrote:
> >>>
> >>> Just a thought, in egress mode in the typical deployment we expect,
> >>> the bandwidth leading into cake will be >> than the bandwidth out of
> >>> cake, so I would argue that the package droppage might be acceptable
> >>> on egress as there is bandwidth to "waste" while on ingress the issue
> >>> very much is that all packets cake sees already used up parts of the
> >>> limited transfer time on the bottleneck link and hence are more
> >>> "precious", no? Users wanting this new behavior could still use the
> >>> ingress keyword even on egress interfaces?
> >>
> >> Broadly speaking, that should indeed counter most of the negative
> >> effects you'd expect from disabling this tweak in egress mode. But it
> >> doesn't really answer the question of whether there's a compelling
> >> *positive* reason to do so. I want to see a use case that holds up.
> >
> > What you're saying here is that you basically don't believe there are
> > any applications where a bulk TCP flow would also want low queueing
> > latency? :)
> >
> > -Toke
> > ___
> > Cake mailing list
> > Cake@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619
> ___
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-18 Thread Jonathan Morton
>> I'm saying that there's a tradeoff between intra-flow induced latency and 
>> packet loss, and I've chosen 4 MTUs as the operating point.
> 
> Is there a reason for picking 4 MTUs vs 2 MTUs vs 2 packets, etc?

To be more precise, I'm using a sojourn time equivalent to 4 MTU-sized packets 
per bulk flow at line rate, as a modifier to existing AQM behaviour.

The worst case for packet loss within the AQM occurs when the inherent latency 
of the links is very low but the available bandwidth per flow is also low.  
This is easy to replicate using a test box flanked by GigE links to endpoint 
hosts; GigE has sub-millisecond inherent delays.  In this case, the entire BDP 
of each flow exists within the queue.

A general recommendation exists for TCP to use a minimum of 4 packets in 
flight, in order to keep the ack-clock running smoothly in the face of packet 
losses which might otherwise trigger an RTO (retransmit timeout).  This allows 
one packet to be lost and detected by the triple-repetition ACK method, without 
SACK.

It isn't necessary for these packets to all carry an MSS payload; theoretically 
a TCP could reduce the payload per packet to maintain four packets in flight 
with a congestion window below 4x MSS.  I'm not aware of any TCP which actually 
bothers to do that, though I might have missed recent developments in Linux TCP.

It's also possible for a TCP to pace its output so that fewer than 4 packets 
are physically in flight at a time, but still functionally have a congestion 
window that's significantly larger.  BBR could be said to fall into that 
category under some conditions.  TSQ might also produce this behaviour under 
some conditions.

The vast majority of widely deployed TCPs, however, are unable to operate 
efficiently at less than 4x MSS congestion windows.  Additionally, actual use 
of ECN remains deplorably low.  That's the reason for choosing 4 MTUs per bulk 
flow.

Originally, Cake had a similar AQM tweak but imposing a flat minimum of 1.5 
MTUs, irrespective of flow count.  This mechanism is what was adapted into the 
present scheme.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-18 Thread David Lang

On Wed, 18 Apr 2018, Jonathan Morton wrote:


I'm saying that there's a tradeoff between intra-flow induced latency and 
packet loss, and I've chosen 4 MTUs as the operating point.


Is there a reason for picking 4 MTUs vs 2 MTUs vs 2 packets, etc?
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-18 Thread Jonathan Morton
> On 18 Apr, 2018, at 6:17 pm, Sebastian Moeller  wrote:
> 
> Just a thought, in egress mode in the typical deployment we expect, the 
> bandwidth leading into cake will be >> than the bandwidth out of cake, so I 
> would argue that the package droppage might be acceptable on egress as there 
> is bandwidth to "waste" while on ingress the issue very much is that all 
> packets cake sees already used up parts of the limited transfer time on the 
> bottleneck link and hence are more "precious", no? Users wanting this new 
> behavior could still use the ingress keyword even on egress interfaces?

Broadly speaking, that should indeed counter most of the negative effects you'd 
expect from disabling this tweak in egress mode.  But it doesn't really answer 
the question of whether there's a compelling *positive* reason to do so.  I 
want to see a use case that holds up.

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-18 Thread Jonathan Morton
>>> So if there is one active bulk flow, we allow each flow to queue four
>>> packets. But if there are ten active bulk flows, we allow *each* flow to
>>> queue *40* packets.
>> 
>> No - because the drain rate per flow scales inversely with the number
>> of flows, we have to wait for 40 MTUs' serialisation delay to get 4
>> packets out of *each* flow.
> 
> Ah right, yes. Except it's not 40 MTUs it's 40 quantums (as each flow
> will only dequeue a packet each MTU/quantum rounds of the scheduler). 

The maximum quantum in Cake is equal to the MTU, and obviously you can't 
increase the drain rate by decreasing the quantum below the packet size.

>> Without that, we can end up with very high drop rates which, in
>> ingress mode, don't actually improve congestion on the bottleneck link
>> because TCP can't reduce its window below 4 MTUs, and it's having to
>> retransmit all the lost packets as well.  That loses us a lot of
>> goodput for no good reason.
> 
> I can sorta, maybe, see the point of not dropping packets that won't
> cause the flow to decrease its rate *in ingress mode*. But this is also
> enabled in egress mode, where it doesn't make sense.

I couldn't think of a good reason to switch it off in egress mode.  That would 
improve a metric that few people care about or can even measure, while severely 
increasing packet loss and retransmissions in some situations, which is 
something that people *do* care about and measure.

> Also, the minimum TCP window is two packets including those that are in
> flight but not yet queued; so allowing four packets at the bottleneck is
> way excessive.

You can only hold the effective congestion window in NewReno down to 2 packets 
if you have a 33% AQM signalling rate (dropping one packet per RTT), which is 
hellaciously high if the hosts aren't using ECN.  If they *are* using ECN, then 
goodput in ingress mode doesn't depend inversely on signalling rate anyway, so 
it doesn't matter.  At 4 packets, the required signalling rate is still pretty 
high (1 packet per 3 RTTs, if it really does go down to 2 MTUs meanwhile) but a 
lot more manageable - in particular, it's comfortably within the margin 
required by ingress mode - and gets a lot more goodput through.

We did actually measure the effect this had in a low-inherent-latency, 
low-bandwidth environment.  Goodput went up significantly, and peak inter-flow 
latency went *down* due to upstream queuing effects.

>> So I do accept the increase in intra-flow latency when the flow count
>> grows beyond the link's capacity to cope.
> 
> TCP will always increase its bandwidth above the link's capacity to
> cope. That's what TCP does.
> 
>> It helps us keep the inter-flow induced latency low
> 
> What does this change have to do with inter-flow latency?
> 
>> while maintaining bulk goodput, which is more important.
> 
> No, it isn't! Accepting a factor of four increase in latency to gain a
> few percents' goodput in an edge case is how we got into this whole
> bufferbloat mess in the first place...

Perhaps a poor choice of wording; I consider *inter-flow latency* to be the 
most important factor.  But users also consider goodput relative to link 
capacity to be important, especially on slow links.  Intra-flow latency, by 
contrast, is practically invisible except for traffic types that are usually 
sparse.

As I noted during the thread Kevin linked, Dave originally asserted that the 
AQM target should *not* depend on the flow count, but the total number of 
packets in the queue should be held constant.  I found that assertion had to be 
challenged once cases emerged where it was clearly detrimental.  So now I 
assert the opposite: that the queue must be capable of accepting a minimum 
number of packets *per flow*, and not just transiently, if the inherent latency 
is not greater than what corresponds to the optimal BDP for TCP.

This tweak has zero theoretical effect on inter-flow latency (which is 
guaranteed by the DRR++ scheme, not the AQM), but can improve goodput and 
sender load at the expense of intra-flow latency.  The practical effect on 
inter-flow latency can actually be positive in some scenarios.

Feel free to measure.  Just be aware of what this is designed to handle.

And obviously I need to write about this in the paper...

 - Jonathan Morton

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-18 Thread Toke Høiland-Jørgensen
Jonas Mårtensson  writes:

> On Wed, Apr 18, 2018 at 1:25 PM, Toke Høiland-Jørgensen 
> wrote:
>
>> Toke Høiland-Jørgensen  writes:
>>
>> > Jonathan Morton  writes:
>> >
>> >>> On 17 Apr, 2018, at 12:42 pm, Toke Høiland-Jørgensen 
>> wrote:
>> >>>
>> >>> - The TCP RTT of the 32 flows is *way* higher for Cake. FQ-CoDel
>> >>>  controls TCP flow latency to around 65 ms, while for Cake it is all
>> >>>  the way up around the 180ms mark. Is the Codel version in Cake too
>> >>>  lenient, or what is going on here?
>> >>
>> >> A recent change was to increase the target dynamically so that at
>> >> least 4 MTUs per flow could fit in each queue without AQM activity.
>> >> That should improve throughput in high-contention scenarios, but it
>> >> does come at the expense of intra-flow latency when it's relevant.
>> >
>> > Ah, right, that might explain it. In the 128 flow case each flow has
>> > less than 100 Kbps available to it, so four MTUs are going to take a
>> > while to dequeue...
>>
>> OK, so I went and looked at the code and found this:
>>
>> bool over_target = sojourn > p->target &&
>>sojourn > p->mtu_time * bulk_flows * 4;
>>
>>
>> Which means that we scale the allowed sojourn time for each flow by the
>> time of four packets *times the number of bulk flows*.
>>
>> So if there is one active bulk flow, we allow each flow to queue four
>> packets. But if there are ten active bulk flows, we allow *each* flow to
>> queue *40* packets.
>
>
> I'm confused. Isn't the sojourn time for a packet a result of the
> total number of queued packets from all flows? If each flow were
> allowed to queue 40 packets, the sojourn time would be mtu_time *
> bulk_flows * 40, no?

No, the 40 in my example came from the bulk_flows multiplier.

Basically, what the current code does is that it scales the AQM target
by the number of active flows, so that the less effective bandwidth is
available to a flow, the more lenient the AQM is going to be.

Which is wrong; the AQM should signal the flow to slow down when it
exceeds its available bandwidth and starts building a queue. So if the
available bandwidth decreases (by more flows sharing it), the AQM is
*expected* to react by sending more "slow down" signals (dropping more
packets).

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-18 Thread Jonas Mårtensson
On Wed, Apr 18, 2018 at 1:25 PM, Toke Høiland-Jørgensen 
wrote:

> Toke Høiland-Jørgensen  writes:
>
> > Jonathan Morton  writes:
> >
> >>> On 17 Apr, 2018, at 12:42 pm, Toke Høiland-Jørgensen 
> wrote:
> >>>
> >>> - The TCP RTT of the 32 flows is *way* higher for Cake. FQ-CoDel
> >>>  controls TCP flow latency to around 65 ms, while for Cake it is all
> >>>  the way up around the 180ms mark. Is the Codel version in Cake too
> >>>  lenient, or what is going on here?
> >>
> >> A recent change was to increase the target dynamically so that at
> >> least 4 MTUs per flow could fit in each queue without AQM activity.
> >> That should improve throughput in high-contention scenarios, but it
> >> does come at the expense of intra-flow latency when it's relevant.
> >
> > Ah, right, that might explain it. In the 128 flow case each flow has
> > less than 100 Kbps available to it, so four MTUs are going to take a
> > while to dequeue...
>
> OK, so I went and looked at the code and found this:
>
> bool over_target = sojourn > p->target &&
>sojourn > p->mtu_time * bulk_flows * 4;
>
>
> Which means that we scale the allowed sojourn time for each flow by the
> time of four packets *times the number of bulk flows*.
>
> So if there is one active bulk flow, we allow each flow to queue four
> packets. But if there are ten active bulk flows, we allow *each* flow to
> queue *40* packets.


I'm confused. Isn't the sojourn time for a packet a result of the total
number of queued packets from all flows?  If each flow were allowed to
queue 40 packets, the sojourn time would be mtu_time * bulk_flows * 40, no?
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Luca Muscariello
I will check that later, still unsure.

First guess: the quantum component should influence only how close to a
fluid bit-wise approximation you are.
So cake gets closer by automatic adjustment.

The computation of the correction factor should be done by computing the
probability that a packet
of a sparse flow loses priority because of the quantum. Bad setting, higher
probability, ideal setting 0 probability.

So your formula seems still wrong to me...


On Tue, Apr 17, 2018 at 4:25 PM, Toke Høiland-Jørgensen 
wrote:

> Luca Muscariello  writes:
>
> > I'm not sure that the quantum correction factor is correct.
>
> No, you're right, there's an off-by-one error. It should be:
>
> R_s < R / ((L/L_s)(N+1) + 1)
>
> -Toke
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Toke Høiland-Jørgensen
Luca Muscariello  writes:

> I'm not sure that the quantum correction factor is correct.

No, you're right, there's an off-by-one error. It should be:

R_s < R / ((L/L_s)(N+1) + 1)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> On 17 Apr, 2018, at 12:42 pm, Toke Høiland-Jørgensen  wrote:
>> 
>> - The TCP RTT of the 32 flows is *way* higher for Cake. FQ-CoDel
>>  controls TCP flow latency to around 65 ms, while for Cake it is all
>>  the way up around the 180ms mark. Is the Codel version in Cake too
>>  lenient, or what is going on here?
>
> A recent change was to increase the target dynamically so that at
> least 4 MTUs per flow could fit in each queue without AQM activity.
> That should improve throughput in high-contention scenarios, but it
> does come at the expense of intra-flow latency when it's relevant.

Ah, right, that might explain it. In the 128 flow case each flow has
less than 100 Kbps available to it, so four MTUs are going to take a
while to dequeue...

> To see whether Diffserv actually prioritises correctly, you'll need to
> increase the number of bulk flows beyond the point where the VoIP flow
> no longer receives the bandwidth it needs purely from its fair share
> of the link.

Yup, which is what I have been unable to do for the VoIP flow case. I
guess I'll have to try with a flow that has a bit higher bandwidth...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Luca Muscariello
I'm not sure that the quantum correction factor is correct.

On Tue, Apr 17, 2018 at 2:22 PM, Toke Høiland-Jørgensen <t...@toke.dk>
wrote:

> Y via Cake <cake@lists.bufferbloat.net> writes:
>
> > From: Y <intruder_t...@yahoo.fr>
> > Subject: Re: [Cake] A few puzzling Cake results
> > To: cake@lists.bufferbloat.net
> > Date: Tue, 17 Apr 2018 21:05:12 +0900
> >
> > Hi.
> >
> > Any certain fomula of fq_codel flow number?
>
> Well, given N active bulk flows with packet size L, and assuming the
> quantum Q=L (which is the default for FQ-CoDel at full-size 1500-byte
> packets), the maximum rate for a sparse flow, R_s, is bounded by
>
> R_s < R / ((L/L_s)(N+1))
>
> Where R is the link rate and L_s is the packet size of the sparse flow.
> This assumes that the sparse flow has constant spacing between its
> packets, which is often the case for a VoIP flow...
>
> -Toke
> ___
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Luca Muscariello
It is more complex than that. The general formula is the max-min fair-rate.
The formula Toke has provided works only if you have one single sparse flow
s and all the others are bottlenecked at this link.
I.e. the experiment he has reported.

If you have N_s sparse flows and each consumers  R_s,i  and N_b
bottlenecked flows the max-min fair-rate is
(R - sum_i R_s,i) / N_b

The simplest way to compute max-min fair-rates is using the water filling
procedure (starting for low rate upwards) which
sets the threshold to determine if a given flow is in N_s or N_b.

BTW, this is well known literature. Search max-min rates calculations.

On Tue, Apr 17, 2018 at 2:22 PM, Toke Høiland-Jørgensen <t...@toke.dk>
wrote:

> Y via Cake <cake@lists.bufferbloat.net> writes:
>
> > From: Y <intruder_t...@yahoo.fr>
> > Subject: Re: [Cake] A few puzzling Cake results
> > To: cake@lists.bufferbloat.net
> > Date: Tue, 17 Apr 2018 21:05:12 +0900
> >
> > Hi.
> >
> > Any certain fomula of fq_codel flow number?
>
> Well, given N active bulk flows with packet size L, and assuming the
> quantum Q=L (which is the default for FQ-CoDel at full-size 1500-byte
> packets), the maximum rate for a sparse flow, R_s, is bounded by
>
> R_s < R / ((L/L_s)(N+1))
>
> Where R is the link rate and L_s is the packet size of the sparse flow.
> This assumes that the sparse flow has constant spacing between its
> packets, which is often the case for a VoIP flow...
>
> -Toke
> ___
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Jonas Mårtensson
On Tue, Apr 17, 2018 at 2:22 PM, Toke Høiland-Jørgensen <t...@toke.dk>
wrote:

> Y via Cake <cake@lists.bufferbloat.net> writes:
>
> > From: Y <intruder_t...@yahoo.fr>
> > Subject: Re: [Cake] A few puzzling Cake results
> > To: cake@lists.bufferbloat.net
> > Date: Tue, 17 Apr 2018 21:05:12 +0900
> >
> > Hi.
> >
> > Any certain fomula of fq_codel flow number?
>
> Well, given N active bulk flows with packet size L, and assuming the
> quantum Q=L (which is the default for FQ-CoDel at full-size 1500-byte
> packets), the maximum rate for a sparse flow, R_s, is bounded by
>
> R_s < R / ((L/L_s)(N+1))
>
> Where R is the link rate and L_s is the packet size of the sparse flow.
> This assumes that the sparse flow has constant spacing between its
> packets, which is often the case for a VoIP flow...


For 10-Mbit/s link rate and 32 bulk flows with 1500-byte packets this
formula gives roughly 25 pps (packets per second) as maximum for a sparse
flow. A VoIP flow is typically 50 pps (20 ms voice payload).

Does this mean that cake sets the quantum to less than 750 bytes for a
10-Mbit/s link?

Do you see any benefit with cake diffserv if you increase the number of
flows?

Does the adjusted quantum also explain the "*way* higher" TCP RTT for cake?
How?

/Jonas
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] A few puzzling Cake results

2018-04-17 Thread Toke Høiland-Jørgensen
Y via Cake <cake@lists.bufferbloat.net> writes:

> From: Y <intruder_t...@yahoo.fr>
> Subject: Re: [Cake] A few puzzling Cake results
> To: cake@lists.bufferbloat.net
> Date: Tue, 17 Apr 2018 21:05:12 +0900
>
> Hi.
>
> Any certain fomula of fq_codel flow number?

Well, given N active bulk flows with packet size L, and assuming the
quantum Q=L (which is the default for FQ-CoDel at full-size 1500-byte
packets), the maximum rate for a sparse flow, R_s, is bounded by

R_s < R / ((L/L_s)(N+1))

Where R is the link rate and L_s is the packet size of the sparse flow.
This assumes that the sparse flow has constant spacing between its
packets, which is often the case for a VoIP flow...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake