Re: [Codel] hardware multiqueue in fq_codel?

2013-07-15 Thread Dave Taht
On Mon, Jul 15, 2013 at 9:57 AM, Eric Dumazet  wrote:
> On Mon, 2013-07-15 at 15:40 +0200, Jesper Dangaard Brouer wrote:
>
>> Then they should also be smart enough to change their default fq_codel
>> qdisc, to be a prio band based qdisc... shouldn't they ;-)
>>
>
>
> Some companies do this classification at the edge of their network, so
> that they do not have to worry for each machine of their fleet.
>
> Forcing them to learn how to 'fix' things once a new linux version is
> installed would be quite lame. I wont be the guy responsible for this.
>
> Listen, there is no point trying to tell me how fq_codel is better than
> pfifo_fast. Is an apple better than an orange ?

Is a tesla better than a oil tanker? :)

> Instead, we only have to create a clear path.

+10!

> 1) Allow the default qdisc to be specified/chosen in Kconfig, a bit
>   like tcp congestion module (cubic is the default)

Concur. Presently that would require exposing some structures
that are private (the qdisc *ops) to this, tho... (?)

> 2) Allow the default qdisc to be selected by a /proc/sys entry, like TCP
> congestion module.

Concur.

Not clear where would be a good place there. /proc/sys/core? I am not fond
of all the tcp related stuff that ended up in ipv4...

I don't see the need for the equivalent of a tcp_allowed_congestion_control,
just a qdisc_default or default_qdisc.

> 3) Define the PRIO + codel/band0 + fq_codel/band1 + codel/band2 as a new
> standalone qdisc

Stressing "a". Concur.

> 4) Eventually switch the default Kconfig from pfifo_fast to this beast.

After tons more testing on ever wider deployments. Sure.

There's a net-next window opening up now...



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-15 Thread Eric Dumazet
On Mon, 2013-07-15 at 06:57 -0700, Eric Dumazet wrote:

> By the way, tcp_cong.c has a race in its list handling, list_move() is
> not RCU compatable.

Oh well, list_move() is fine, ignore this false statement.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-15 Thread Jesper Dangaard Brouer
On Mon, 15 Jul 2013 06:57:54 -0700
Eric Dumazet  wrote:

> On Mon, 2013-07-15 at 15:40 +0200, Jesper Dangaard Brouer wrote:
> 
> > Then they should also be smart enough to change their default
> > fq_codel qdisc, to be a prio band based qdisc... shouldn't they ;-)
> > 
> 
> Some companies do this classification at the edge of their network, so
> that they do not have to worry for each machine of their fleet.
>
> Forcing them to learn how to 'fix' things once a new linux version is
> installed would be quite lame. I wont be the guy responsible for this.

Agreed. (And at big companies the network-router-guys and sysadm-guys
are different people/groups, thus harder to coordinate this change. I
was mostly trolling ;-))

 
> Listen, there is no point trying to tell me how fq_codel is better
> than pfifo_fast. Is an apple better than an orange ?
> 
> Instead, we only have to create a clear path.
> 
> 1) Allow the default qdisc to be specified/chosen in Kconfig, a bit
>   like tcp congestion module (cubic is the default)
> 
> 2) Allow the default qdisc to be selected by a /proc/sys entry, like
> TCP congestion module.
> 
> 3) Define the PRIO + codel/band0 + fq_codel/band1 + codel/band2 as a
> new standalone qdisc
> 
> 4) Eventually switch the default Kconfig from pfifo_fast to this
> beast.

Agreed, sounds like a good plan to me, Dave?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-15 Thread Eric Dumazet
On Mon, 2013-07-15 at 15:40 +0200, Jesper Dangaard Brouer wrote:

> Then they should also be smart enough to change their default fq_codel
> qdisc, to be a prio band based qdisc... shouldn't they ;-)
> 


Some companies do this classification at the edge of their network, so
that they do not have to worry for each machine of their fleet.

Forcing them to learn how to 'fix' things once a new linux version is
installed would be quite lame. I wont be the guy responsible for this.

Listen, there is no point trying to tell me how fq_codel is better than
pfifo_fast. Is an apple better than an orange ?

Instead, we only have to create a clear path.

1) Allow the default qdisc to be specified/chosen in Kconfig, a bit
  like tcp congestion module (cubic is the default)

2) Allow the default qdisc to be selected by a /proc/sys entry, like TCP
congestion module.

3) Define the PRIO + codel/band0 + fq_codel/band1 + codel/band2 as a new
standalone qdisc

4) Eventually switch the default Kconfig from pfifo_fast to this beast.

By the way, tcp_cong.c has a race in its list handling, list_move() is
not RCU compatable.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-15 Thread Jesper Dangaard Brouer
On Fri, 12 Jul 2013 09:54:17 -0700
Eric Dumazet  wrote:

> On Fri, 2013-07-12 at 18:36 +0200, Sebastian Moeller wrote:
> 
> > 
> > Question, what stops the same attacker to also fudge the
> > TOS bits (say to land in priority band 0)? Just asking...
> 
> This kind of thing is filtered before those packets arrive to the tx
> queue where pfifo_fast is plugged ;)
> 
> TOS is properly checked/rewritten when alien packets enter your
> network.
> 
> People caring with this do their own classification using iptables or
> tc filter rules.

Then they should also be smart enough to change their default fq_codel
qdisc, to be a prio band based qdisc... shouldn't they ;-)

Something as "easy" like:

ETH=eth66
NQUEUES=16  # or more, check how many tx queues your NIC supports
tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 100: mq
for i in `seq 1 $NQUEUES`; do
  tc qdisc add dev $ETH parent 100:$i handle $i: prio bands 3
 tc qdisc add dev $ETH parent $i:1 pfifo limit 10
 tc qdisc add dev $ETH parent $i:2 fq_codel
 tc qdisc add dev $ETH parent $i:3 fq_codel
done

(p.s. sorry, I'm in a troll mood today ;-))
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-15 Thread Jesper Dangaard Brouer
On Fri, 12 Jul 2013 10:19:49 -0700
Eric Dumazet  wrote:

> On Fri, 2013-07-12 at 12:54 -0400, Dave Taht wrote:
> 
> > My point was that same program would be just as damaging against
> > pfifo_fast.
> > 
> > > Or just think of SYN flood attack.
> > 
> > For which other defenses exist.
> 
> If someone uses pfifo_fast, it needs no particular protection right
> now to be able to log in into his machine.

I actually like your SSH use-case better than, the high-avail heartbeat
use-case, as the HA guys should just change the qdisc by-hand, as they
(should) know what they are doing (setting up their complicated configs).


Then I say: Not if the attacker also sets the TOS bits.

Then you say: But the TOS bits should be stripped at the border-gateway.

Then I say: But my server is at a cloud provider, thus I'm logging
remotely and the cloud provider is stripping my SSH TOS bits. Thus, its
not helping me... ;-)

You SSH use-case is more valid, but when we are under real hard
SYN DoS-attacks then all CPU are pinned down on the listen-spinlock
problem... troll running away hiding ;-)


ps. I usually have a separate NIC on the machine for management/SSH
(using ip rule, routing tables to ensure this NIC have a seperate
default gateway).
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Dave Taht
On Fri, Jul 12, 2013 at 1:47 PM, Eric Dumazet  wrote:
> On Fri, 2013-07-12 at 13:35 -0400, Dave Taht wrote:
>
>> Against a syn flood attack?
>
>
> Yes. SYNACK messages are in the band 1. SYNACK messages might be
> dropped, but your precious management traffic will not.

I think I'm beginning to gain clue here, in that 1) in a big outward facing
deployment, some work is done to ensure that certain kinds of traffic
ends up in the priority queue, by matching against a range of internal
ips, etc. Sure, this always happens and is another good argument for
a multiband solution, but I note that those deployments have special
requirements in general as you've noted from the htb scheme you've
had to deal with.

And/or 2)  there is in-kernel stuff that utilizes skb->priority to ensure that
some other in-kernel systems (like the synack defense) do that?

Which is what I think you mean... (?) so I'll go poke around there. I have
treated that feature as a black box, sorry.

I should probably try to return this thread to establishing
"a reasonable default for desktops, servers, androids, routers, etc"

and having a mechanism to provide that at kernel buildtime...

but we're making progress, so...

>
>>
>> > Thats the point you absolutely missed. Its kind of incredible.
>>
>> I guess I'm still entirely missing it. By default the networks I have
>> are protected by the syn_flood mechanism as enabled in openwrt.
>
> Most servers coping with synflood or any kind of traffic flood do not
> use openwrt ;)

Heh. I have plenty of other servers at linode, I just don't attack them,
because then I get nasty notes from their management.

I ran operations for a large ISP and later a large banner ad company back in
the 90s, so my skills are out of date, the hair loss still memorable, and the
tools I use to protect them (things like xinetd) archaic.

I should probably
have said merely that syn flooding stuff in sysctl is turned on, and
to me, it was a magic option that I (still) have no idea how it works, so
if I'm reading your mind correctly about an innate usage of pfifo_fast,
cool, I'll go read.

I know it's a wild and wooly internet, believe me. I rant on ECN a bit...

But I do not as a rule talk
about security/attack problems publicly.

I should probably note that a
reason for wanting a service guarantee for a background queue is part
of a half thought out approach towards being able to deal with certain
kinds of floods better (ICMP etoobig and related in particular. If it was
up to me I'd toss nearly all non-link-local icmp traffic into a
background queue)

The design goal is  "something less horrible than pfifo_fast".
It is good to identify features and problems and to figure out what
can be solved to move along incrementally.

and a mechanism for enabling it (or whatever)

>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Eric Dumazet
On Fri, 2013-07-12 at 13:35 -0400, Dave Taht wrote:

> Against a syn flood attack?


Yes. SYNACK messages are in the band 1. SYNACK messages might be
dropped, but your precious management traffic will not.


> 
> > Thats the point you absolutely missed. Its kind of incredible.
> 
> I guess I'm still entirely missing it. By default the networks I have
> are protected by the syn_flood mechanism as enabled in openwrt.

Most servers coping with synflood or any kind of traffic flood do not
use openwrt ;)



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Dave Taht
On Fri, Jul 12, 2013 at 1:19 PM, Eric Dumazet  wrote:
> On Fri, 2013-07-12 at 12:54 -0400, Dave Taht wrote:
>
>> My point was that same program would be just as damaging against
>> pfifo_fast.
>>
>> > Or just think of SYN flood attack.
>>
>> For which other defenses exist.
>
> If someone uses pfifo_fast, it needs no particular protection right now
> to be able to log in into his machine.

Against a syn flood attack?

> Thats the point you absolutely missed. Its kind of incredible.

I guess I'm still entirely missing it. By default the networks I have
are protected by the syn_flood mechanism as enabled in openwrt.

I have hit them with attack tools like thc and related stuff, and well,
that list is rather incredibly large but not bound to the queue type
and I'd rather discuss it offlist.

So if you can point me at some code that thoroughly disables
fq_codel worse than pfifo_fast (offlist), I'll gladly run it on
the testbed here, against everything:

http://results.lab.taht.net/

One of the big reasons why I haven't advocated a smaller number
of flows by default in fq_codel was due to the attack protection I
surmised it + the permuted hash - provided.

> If fq_codel could replace pfifo_fast as is, why do you think I did not
> submit the patch doing the change 

I have generally always thought a three tier system was still
needed, just far less so. The characteristics of that system
are what we are discussing now. The time spent analyzing
fq_codel's behavior


>
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread luca.muscariello


I agree with Dave,

any active flow list used by a per-flow scheduler must store
as much state as the maximum number of distinct flows
active at the same time in the buffer memory, i.e. flows having
at least one packet in the total available buffering.

This max number is bounded and the test described by Eric
is the worst case. In such case, however, the same configuration
would be observed in a FIFO queue with as much buffer memory
than the fqcodel system. In that configuration the service order
of the packets from the queue is meaningless, and the has either.


Luca


--
France Telecom R&D - Orange Labs
MUSCARIELLO Luca - OLN/NMP
38 - 40 rue du General Leclerc
92794 Issy Les Moulineaux Cedex 9 - France
http://perso.rd.francetelecom.fr/muscariello

Dave Taht  wrote:
On Fri, Jul 12, 2013 at 12:50 PM, Eric Dumazet  wrote:
> On Fri, 2013-07-12 at 12:37 -0400, Dave Taht wrote:
>
>> This is not strictly true, as the hash is permuted by a secret random
>> number, any level of dumb attack as an attempt to fill all available queues
>> will need to vastly exceed the packet limit rather than the number of queues,
>> thus yielding the same behavior as a normal attack against pfifo_fast, and
>> in the general case an attack that would overwhelm pfifo_fast won't be
>> anywhere near as damaging against fq_codel.
>
> I can give you a program doing a flood on random destination IP, and I
> will tell you it will fill your fq_codel buckets. All of them. secret
> random number wont help at all.

My point was that same program would be just as damaging against
pfifo_fast.

> Or just think of SYN flood attack.

For which other defenses exist.
>
>
>



--
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Eric Dumazet
On Fri, 2013-07-12 at 12:54 -0400, Dave Taht wrote:

> My point was that same program would be just as damaging against
> pfifo_fast.
> 
> > Or just think of SYN flood attack.
> 
> For which other defenses exist.

If someone uses pfifo_fast, it needs no particular protection right now
to be able to log in into his machine.

Thats the point you absolutely missed. Its kind of incredible.

If fq_codel could replace pfifo_fast as is, why do you think I did not
submit the patch doing the change 



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Dave Taht
On Fri, Jul 12, 2013 at 12:54 PM, Eric Dumazet  wrote:
> On Fri, 2013-07-12 at 18:36 +0200, Sebastian Moeller wrote:
>
>>
>>   Question, what stops the same attacker to also fudge the TOS bits (say 
>> to land in priority band 0)? Just asking...
>
> This kind of thing is filtered before those packets arrive to the tx
> queue where pfifo_fast is plugged ;)

Agree.

>
> TOS is properly checked/rewritten when alien packets enter your network.

Agree.

>
> People caring with this do their own classification using iptables or tc
> filter rules.

Linux wifi automagically tosses stuff currently based on CSX diffserv
values into what it thinks is the appropriate mq driven hardware queue.
I've already shown how damaging it is to use 802.11e and the hardware
VI and VO queues from a wifi client elsewhere, so regard this as a separate
(and harder) problem from finding a pfifo_fast replacement.

(and kernel build mechanism for making it a default)

>
>
>
> ___
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Eric Dumazet
On Fri, 2013-07-12 at 18:36 +0200, Sebastian Moeller wrote:

> 
>   Question, what stops the same attacker to also fudge the TOS bits (say 
> to land in priority band 0)? Just asking...

This kind of thing is filtered before those packets arrive to the tx
queue where pfifo_fast is plugged ;)

TOS is properly checked/rewritten when alien packets enter your network.

People caring with this do their own classification using iptables or tc
filter rules.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Dave Taht
On Fri, Jul 12, 2013 at 12:50 PM, Eric Dumazet  wrote:
> On Fri, 2013-07-12 at 12:37 -0400, Dave Taht wrote:
>
>> This is not strictly true, as the hash is permuted by a secret random
>> number, any level of dumb attack as an attempt to fill all available queues
>> will need to vastly exceed the packet limit rather than the number of queues,
>> thus yielding the same behavior as a normal attack against pfifo_fast, and
>> in the general case an attack that would overwhelm pfifo_fast won't be
>> anywhere near as damaging against fq_codel.
>
> I can give you a program doing a flood on random destination IP, and I
> will tell you it will fill your fq_codel buckets. All of them. secret
> random number wont help at all.

My point was that same program would be just as damaging against
pfifo_fast.

> Or just think of SYN flood attack.

For which other defenses exist.
>
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Eric Dumazet
On Fri, 2013-07-12 at 12:37 -0400, Dave Taht wrote:

> This is not strictly true, as the hash is permuted by a secret random
> number, any level of dumb attack as an attempt to fill all available queues
> will need to vastly exceed the packet limit rather than the number of queues,
> thus yielding the same behavior as a normal attack against pfifo_fast, and
> in the general case an attack that would overwhelm pfifo_fast won't be
> anywhere near as damaging against fq_codel.

I can give you a program doing a flood on random destination IP, and I
will tell you it will fill your fq_codel buckets. All of them. secret
random number wont help at all.

Or just think of SYN flood attack.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Dave Taht
aghh, er, this message was riddled with off-by-one errors. In the
first part of the message I started from 0, then I started to
start from 1...

My coffee machine broke this morning.

On Fri, Jul 12, 2013 at 12:37 PM, Dave Taht  wrote:
> On Fri, Jul 12, 2013 at 11:13 AM, Eric Dumazet  wrote:
>> On Fri, 2013-07-12 at 11:34 +0200, Jesper Dangaard Brouer wrote:
>>
>>> I also think of "fq_codel" as a good replacement for pfifo_fast.  As
>>> the 3-PRIO bands in pfifo_fast is replaced with something smarter in
>>> "fq_codel". (IMHO please don't try to add a prio_fq_codel, just be because
>>> pfifo_fast had prio bands, people can just enable a prio qdisc if they
>>> really need it).
>>
>> Nope. Its really easy for an attacker to flood your fq_codel with say
>> UDP messages on all available hash slots.
>
> This is not strictly true, as the hash is permuted by a secret random
> number, any level of dumb attack as an attempt to fill all available queues
> will need to vastly exceed the packet limit rather than the number of queues,
> thus yielding the same behavior as a normal attack against pfifo_fast, and
> in the general case an attack that would overwhelm pfifo_fast won't be
> anywhere near as damaging against fq_codel.
>
> While it is possible to determine the permutation value it would take a while.
>
>> Some people really want the high prio packets to be sent before any
>> med/low prio packets. Not everybody uses a separate ethernet port for
>> management and heartbeats.
>
> I agree this is a strong argument for a strictly priority queue to exist,
> but would prefer it codeled. Don't mind it fq_codeled either...
>
>> If we want to replace pfifo_fast as the default qdisc, we want some
>> integrated qdisc with 3 bands.
>
> Agree.
>
>> I presume something really simple like :
>>
>> a fifo for band 0 messages
>> a fq_codel for band 1 messages
>> a fifo for band 2 messages
>>
>> Would be more than enough, and this also should use device txqueue len
>> as the (dynamic) limit, because some existing scripts expect to control
>> qdisc limit using "ifconfig eth0 txqueuelen 100", not a tc script.
>
> I believe this would suffice! although I continue to argue for
> fq_codel on band 2
> with a very limited number of queues by default (say, 8), and some level of
> service guarantee better than starvation.
>
> txqueuelen 100 is rather low for codel queue, so I wouldn't
> mind if the lowest value was capped at say, 600, but informed by the
> txqueuelen setting to do so.
>
> in one version of cake I'd merely taken out some queues for 1 and 3
> out of the flows array, changed the hash to account for the offsets
> using band2prio on the skb->priority field, converted the new_flows
> and old_flows pointers to a flows[4].
>
> I got stuck on trying to provide some service guarantee for all three
> queues. (well, I was trying at the time to do weights or more than
> three queues, too) Gave up and misplaced the work.
>
> So I've come around to where I can live with a strict priority queue,
> a la pfifo_fast, that can starve the other queues, and should come
> with a large red warning label if used.
>
> This simplifies providing a service guarantee to an integer value, say,
> a default of 10 (so service is provided every 10th attempt at delivery
> from queue 2),
> to the 3rd queue.
>
>>
>>
>>
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: 
> http://www.teklibre.com/cerowrt/subscribe.html



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Dave Taht
On Fri, Jul 12, 2013 at 11:13 AM, Eric Dumazet  wrote:
> On Fri, 2013-07-12 at 11:34 +0200, Jesper Dangaard Brouer wrote:
>
>> I also think of "fq_codel" as a good replacement for pfifo_fast.  As
>> the 3-PRIO bands in pfifo_fast is replaced with something smarter in
>> "fq_codel". (IMHO please don't try to add a prio_fq_codel, just be because
>> pfifo_fast had prio bands, people can just enable a prio qdisc if they
>> really need it).
>
> Nope. Its really easy for an attacker to flood your fq_codel with say
> UDP messages on all available hash slots.

This is not strictly true, as the hash is permuted by a secret random
number, any level of dumb attack as an attempt to fill all available queues
will need to vastly exceed the packet limit rather than the number of queues,
thus yielding the same behavior as a normal attack against pfifo_fast, and
in the general case an attack that would overwhelm pfifo_fast won't be
anywhere near as damaging against fq_codel.

While it is possible to determine the permutation value it would take a while.

> Some people really want the high prio packets to be sent before any
> med/low prio packets. Not everybody uses a separate ethernet port for
> management and heartbeats.

I agree this is a strong argument for a strictly priority queue to exist,
but would prefer it codeled. Don't mind it fq_codeled either...

> If we want to replace pfifo_fast as the default qdisc, we want some
> integrated qdisc with 3 bands.

Agree.

> I presume something really simple like :
>
> a fifo for band 0 messages
> a fq_codel for band 1 messages
> a fifo for band 2 messages
>
> Would be more than enough, and this also should use device txqueue len
> as the (dynamic) limit, because some existing scripts expect to control
> qdisc limit using "ifconfig eth0 txqueuelen 100", not a tc script.

I believe this would suffice! although I continue to argue for
fq_codel on band 2
with a very limited number of queues by default (say, 8), and some level of
service guarantee better than starvation.

txqueuelen 100 is rather low for codel queue, so I wouldn't
mind if the lowest value was capped at say, 600, but informed by the
txqueuelen setting to do so.

in one version of cake I'd merely taken out some queues for 1 and 3
out of the flows array, changed the hash to account for the offsets
using band2prio on the skb->priority field, converted the new_flows
and old_flows pointers to a flows[4].

I got stuck on trying to provide some service guarantee for all three
queues. (well, I was trying at the time to do weights or more than
three queues, too) Gave up and misplaced the work.

So I've come around to where I can live with a strict priority queue,
a la pfifo_fast, that can starve the other queues, and should come
with a large red warning label if used.

This simplifies providing a service guarantee to an integer value, say,
a default of 10 (so service is provided every 10th attempt at delivery
from queue 2),
to the 3rd queue.

>
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Sebastian Moeller
Hi there,


On Jul 12, 2013, at 17:13 , Eric Dumazet  wrote:

> On Fri, 2013-07-12 at 11:34 +0200, Jesper Dangaard Brouer wrote:
> 
>> I also think of "fq_codel" as a good replacement for pfifo_fast.  As
>> the 3-PRIO bands in pfifo_fast is replaced with something smarter in
>> "fq_codel". (IMHO please don't try to add a prio_fq_codel, just be because
>> pfifo_fast had prio bands, people can just enable a prio qdisc if they
>> really need it).
> 
> Nope. Its really easy for an attacker to flood your fq_codel with say
> UDP messages on all available hash slots.

Question, what stops the same attacker to also fudge the TOS bits (say 
to land in priority band 0)? Just asking...

> 
> Some people really want the high prio packets to be sent before any
> med/low prio packets. Not everybody uses a separate ethernet port for
> management and heartbeats.
> 
> If we want to replace pfifo_fast as the default qdisc, we want some
> integrated qdisc with 3 bands.
> 
> I presume something really simple like :
> 
> a fifo for band 0 messages
> a fq_codel for band 1 messages
> a fifo for band 2 messages
> 
> Would be more than enough, and this also should use device txqueue len
> as the (dynamic) limit, because some existing scripts expect to control
> qdisc limit using "ifconfig eth0 txqueuelen 100", not a tc script.
> 
> 
> 
> ___
> Codel mailing list
> Codel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel


Best
Sebastian
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Eric Dumazet
On Fri, 2013-07-12 at 11:34 +0200, Jesper Dangaard Brouer wrote:

> I also think of "fq_codel" as a good replacement for pfifo_fast.  As
> the 3-PRIO bands in pfifo_fast is replaced with something smarter in
> "fq_codel". (IMHO please don't try to add a prio_fq_codel, just be because
> pfifo_fast had prio bands, people can just enable a prio qdisc if they
> really need it).

Nope. Its really easy for an attacker to flood your fq_codel with say
UDP messages on all available hash slots.

Some people really want the high prio packets to be sent before any
med/low prio packets. Not everybody uses a separate ethernet port for
management and heartbeats.

If we want to replace pfifo_fast as the default qdisc, we want some
integrated qdisc with 3 bands.

I presume something really simple like :

a fifo for band 0 messages
a fq_codel for band 1 messages
a fifo for band 2 messages

Would be more than enough, and this also should use device txqueue len
as the (dynamic) limit, because some existing scripts expect to control
qdisc limit using "ifconfig eth0 txqueuelen 100", not a tc script.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-12 Thread Jesper Dangaard Brouer

On Thu, 11 Jul 2013 14:18:31 -0700 Dave Taht  wrote:
> On Thu, Jul 11, 2013 at 11:54 AM, Eric Dumazet  wrote:
> > On Thu, 2013-07-11 at 11:06 -0700, Dave Taht wrote:
> >
[...]
> >>
> >> B) people that expect pfifo_fast semantics, for which substituting
> >> fq_codel behaves oddly in two ways -
[...]
> >
> > There is no 'one solution fits every needs'.
> >
> > codel is _not_ a replacement of pfifo_fast, its a replacement for
> > pfifo.

Correct: "codel" is not replacement of pfifo_fast.

But I do see "fq_codel" as a good replacement for the default qdisc
(placed on each MQ qdisc)

I also think of "fq_codel" as a good replacement for pfifo_fast.  As
the 3-PRIO bands in pfifo_fast is replaced with something smarter in
"fq_codel". (IMHO please don't try to add a prio_fq_codel, just be because
pfifo_fast had prio bands, people can just enable a prio qdisc if they
really need it).


 
> Semantically here I'm trying to "replace the default qdisc" that
> 99.98% of people use, not "replace pfifo_fast" (that 99.99% of people
> use)
> 
> or rather, come up with a strategy for doing such, one day, in some
> more easily deployable fashion.

Yes, we want to replace "the default qdisc", not discuss which qdisc
combos are semantically equivalent.  Okay, the current default is bad causing
bufferbloat, thus we want to *replace* that qdisc, and yes replacing it
will change the semantics, and that is okay as people needing the old
semantics can still change their qdisc back via tc.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Dave Taht
On Thu, Jul 11, 2013 at 5:06 PM, Eric Dumazet  wrote:
> On Thu, 2013-07-11 at 14:18 -0700, Dave Taht wrote:
>
>> I have incidentally long thought that you are also tweaking target and
>> interval for your environment?
>
> Google data centers are spread all over the world, and speed of light
> was not increased by wonderful Google engineers,
> thats a real pity I can tell you.

Heh. Interesting. I once circulated a proposal among some VCs for
developing neutrino based networking. After extolling the advantages
for 3 pages (shrinking RTT's halfway around the world by a FACTOR OF
3!), I then moved to the problem of shrinking the emitter and detector
from their present sizes down to handset sizes on page 4,
with a big picture of the sun, and a big picture of the moutainful of water...

Was a nice april 1 prank, that... still, it would be nice to do one
day. So many more crazy ideas were being circulated in the late 90s.

>
>>
>> > Whole point of codel is that number of packets in the queue is
>> > irrelevant. Only sojourn time is.
>>
>> Which is a lovely thing in worlds with infinite amounts of memory.
>
> The limit is a tunable, not hard-coded in qdisc, like other qdiscs.
>
> I chose a packet count limit because pfifo was the target, and pfifo
> limit is given in packets, not in bytes.
>
> There is bfifo variant for a byte limited fifo, so feel free to add
> bcodel and fq_bcodel.

I've often thought that a byte limit was far saner than a strict
packet limit, yes.

> Linux Qdiscs have some existing semantic.
>
> If every new qdisc had strange and new behavior, what would happen ?

http://xkcd.com/927/

:)

>>
>> > Now if your host has memory concerns, that's a separate issue, and you
>> > can adjust the qdisc limit, or add a callback from mm to be able to
>> > shrink queue in case of memory pressure, if you deal with non elastic
>> > flows.
>>
>> So my take on this is that the default limit should be 1k on devices
>> with less than 256MB of ram overall, and 10k (or more) elsewhere. This
>> matches current txqueuelen behavior and has the least surprise.
>
> What is current 'txqueuelen behavior' ?

The default is largely 1000. I have seen a multitude of other settings
ranging from 0 to 10s of thousands, mostly done from userspace. The
vast majority of which are wrong for most workloads...

> Current 'txqueuelen behavior' is a limit of 16.000 packets on typical
> 10G hardware with 16 tx queues, not 1000 as you seem to believe.

Huh? no, I'm aware multiqueue is like that.

It's one of my kvetches in that I've been able to easily crash small
wireless routers with multiple SSIDs with the default queue lengths,
by using classification (e.g. rrul)

This was sort of my motivation for wanting a single qdisc for multiple
hardware queues, or at least, a single limit covering multiple queues,
as by and large the 3 extra queues in wifi are unused, but as best I
can tell from our dialog today that's not doable. I don't care about
the single queue lock that much either (still single core here)...

> If txqueuelen was a percentage of available memory, we would be in a bad
> situation : With txqueuelen being 100 15 years ago, it would be 100.000
> today, since ram was increased by 3 order of magnitude.
>
> 0.1 % would be too much for my laptop with 4GB of ram, and not enough
> for your 256MB host.

I concur that percentage based sizing is nuts. an outer limit based on
the maximum bandwidth based on the devices' current speed would not be
horrible, but byte rather than packet based.

> I do not want to bloat codel/fq_codel with some strange concerns about
> skb truesize resizing. No other qdisc does this, with codel/fq_codel
> would be different ?

actually that patch has been in openwrt for months and months against
the common qdiscs not just fq_codel.

It could be a compile time option, as I said, for small devices. I
thought we'd discussed this back then?

> This adds a huge performance penalty and latencies. A router should
> never even touch (read or write) a _single_ byte of the payload.

I tend to agree, but the core problem solved here was the memory
starvation problem from 2k+ memory allocations for (predominately)
64byte packets on the received path. The performance hit only incurs
in case of overload/attack.

in my tests, I observed a mild increase in forwarding performance but
there were many other variables at the time. Certainly forwarding
performance on the box I work on is at an all time high, even when
"attacked". And reliability is up.

> Whole point having a queue is to absorb bursts : You don't want to spend
> cpu cycles when bursts are coming, or else there wont be bursts anymore,
> but losses episodes.

The default (and somewhat arbitrary) limit of 128 above was chosen so
that if a queue built during a burst we still wouldn't bother touching
the packets until things got out of hand - and *it was there to fix
the allocation problem without which we ran out of memory* in which to
put the bursts in the first p

Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Eric Dumazet
On Thu, 2013-07-11 at 14:18 -0700, Dave Taht wrote:

> I have incidentally long thought that you are also tweaking target and
> interval for your environment?

Google data centers are spread all over the world, and speed of light
was not increased by wonderful Google engineers,
thats a real pity I can tell you.


> 
> > Whole point of codel is that number of packets in the queue is
> > irrelevant. Only sojourn time is.
> 
> Which is a lovely thing in worlds with infinite amounts of memory.

The limit is a tunable, not hard-coded in qdisc, like other qdiscs.

I chose a packet count limit because pfifo was the target, and pfifo
limit is given in packets, not in bytes.

There is bfifo variant for a byte limited fifo, so feel free to add
bcodel and fq_bcodel.

Linux Qdiscs have some existing semantic.

If every new qdisc had strange and new behavior, what would happen ?

> 
> > Now if your host has memory concerns, that's a separate issue, and you
> > can adjust the qdisc limit, or add a callback from mm to be able to
> > shrink queue in case of memory pressure, if you deal with non elastic
> > flows.
> 
> So my take on this is that the default limit should be 1k on devices
> with less than 256MB of ram overall, and 10k (or more) elsewhere. This
> matches current txqueuelen behavior and has the least surprise.

What is current 'txqueuelen behavior' ?

Current 'txqueuelen behavior' is a limit of 16.000 packets on typical
10G hardware with 16 tx queues, not 1000 as you seem to believe.

If txqueuelen was a percentage of available memory, we would be in a bad
situation : With txqueuelen being 100 15 years ago, it would be 100.000
today, since ram was increased by 3 order of magnitude.

0.1 % would be too much for my laptop with 4GB of ram, and not enough
for your 256MB host.

I do not want to bloat codel/fq_codel with some strange concerns about
skb truesize resizing. No other qdisc does this, with codel/fq_codel
would be different ?

This adds a huge performance penalty and latencies. A router should
never even touch (read or write) a _single_ byte of the payload.

Whole point having a queue is to absorb bursts : You don't want to spend
cpu cycles when bursts are coming, or else there wont be bursts anymore,
but losses episodes.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Dave Taht
On Thu, Jul 11, 2013 at 11:54 AM, Eric Dumazet  wrote:
> On Thu, 2013-07-11 at 11:06 -0700, Dave Taht wrote:
>
>> Gotcha. So what I actually did (felix did, in openwrt, actually) was
>> just make fq_codel the default qdisc to avoid having to inspect things
>> to set the number of queues in mq and mqprio. I see, for example, that
>> mq is the default for tg3...
>>
>> http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0003-Use-FQ_codel-by-default.patch
>>
>> I just added it to htb and hfsc too:
>>
>> http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0008-Make-fq_codel-the-default-qdisc-for-htb-and-hfsc.patch
>>
>> There's a patch to obsolete pfifo_fast entirely in openwrt, which is a
>> tad premature.
>>
>> A remaining concern is to what this affects:
>>
>> A) people that expect ifconfig X txqueuelen Y to do anything will be
>> misled. Perhaps this could be fixed by having the fq_codel default
>> limit be txqueuelen rather than the default (and overlarge) limit of
>> 10k, but as tons of people are supplying oddball txqueuelens, I tend
>> to think just ignoring txqueuelen going forward is more the right
>> thing.
>>
>> Do you actually get close to 10k packets outstanding in 10GigE under
>> any sane circumstances?
>
>
> 10GigE can send 10.000.000 packets per second.
>
> 10k is only 1ms of buffering, which is pretty low considering the cpu
> able to restart a queue might be blocked ~10 ms in a softirq handler.

I have incidentally long thought that you are also tweaking target and
interval for your environment?

> Whole point of codel is that number of packets in the queue is
> irrelevant. Only sojourn time is.

Which is a lovely thing in worlds with infinite amounts of memory.

> Now if your host has memory concerns, that's a separate issue, and you
> can adjust the qdisc limit, or add a callback from mm to be able to
> shrink queue in case of memory pressure, if you deal with non elastic
> flows.

So my take on this is that the default limit should be 1k on devices
with less than 256MB of ram overall, and 10k (or more) elsewhere. This
matches current txqueuelen behavior and has the least surprise.

It does strike me as useful but probably hurtful to try and resize the
queue when it gets too large as a callback from the mm subsystem,
better to just drop packets?

There are other patches out there to reduce memory pressure under load
(also used in openwrt) by reducing skb size, those have also worked
out well... typically they look like:

 static int pfifo_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
-   if (likely(skb_queue_len(&sch->q) < sch->limit))
+   if (likely(skb_queue_len(&sch->q) < sch->limit)) {
+   if (skb_queue_len(&sch->q) > 128)
+   skb = skb_reduce_truesize(skb);
return qdisc_enqueue_tail(skb, sch);
-
+   }

If these were wrapped in a define


>
>>
>> B) people that expect pfifo_fast semantics, for which substituting
>> fq_codel behaves oddly in two ways -
>>
>> 1) if you are explicitly setting skb->priority for the default
>> pfifo_fast 3 bands  and expecting a result, nothing happens - but in
>> the general case, people setting skb->priority are trying to get
>> better latency in the first place, and I really don't think almost
>> anybody will notice.

I can also sit down and go through all the various overloaded uses
that skb->priority has which make life really confusing and difficult.
I really don't mind ignoring it entirely by default. :)

>> 2) if you are using a filter on pfifo_fast that expects 3 bands, and
>> end up using fq_codel by default anyway we get DRR-like behavior over
>> codel rather than strict prioritization and lose fq_codel's full
>> benefits... which is still a win IMHO. I am not fond of being able to
>> starve the other two bands

>> 3) trying to explicitly set pfifo_fast via tc doesn't work with this patch.
>>
>> 4) ECN processing is enabled by default (but off by default in sysctl)
>
> There is no 'one solution fits every needs'.
>
> codel is _not_ a replacement of pfifo_fast, its a replacement for pfifo.

Semantically here I'm trying to "replace the default qdisc" that
99.98% of people use, not "replace pfifo_fast" (that 99.99% of people
use)

or rather, come up with a strategy for doing such, one day, in some
more easily deployable fashion.

> If you want to replace pfifo_fast, you want PRIO + 3 codel, because
> pfifo_fast is really PRIO + 3 pfifo.

This is where this dialog died last time. This time however I'm trying
to assemble consensus as to the steps required to build a viable
*default* qdisc that is better than pfifo_fast, for desktops, servers,
android boxes, routers, etc - which fq_codel seems to win at (nearly)
across the board.

Certainly those users that override pfifo_fast should be allowed to
continue to do so.

I agree a three tier system on top of fq_codel, would be a pure
superset of pfifo_fast, and probably better in a few respects than
pure fq_codel, but disagree strongly t

Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Jonathan Morton
> 4) ECN processing is enabled by default (but off by default in sysctl)

I don't see any harm in handling ECN correctly in the qdisc at all times.
It is routers that screw it up that require ECN negotiation to be disabled
at the endpoints by default - I assume that's the sysctl you're referring
to.

The more ECN traffic there is in the wild, the more visible the problems
will be with the broken routers, and the greater the likelihood that they
might actually get fixed. And the greater benefit perceived to be attached
to ECN by end users and net admins, the more likely they are to actually
turn it on.

- Jonathan Morton
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Eric Dumazet
On Thu, 2013-07-11 at 11:06 -0700, Dave Taht wrote:

> Gotcha. So what I actually did (felix did, in openwrt, actually) was
> just make fq_codel the default qdisc to avoid having to inspect things
> to set the number of queues in mq and mqprio. I see, for example, that
> mq is the default for tg3...
> 
> http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0003-Use-FQ_codel-by-default.patch
> 
> I just added it to htb and hfsc too:
> 
> http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0008-Make-fq_codel-the-default-qdisc-for-htb-and-hfsc.patch
> 
> There's a patch to obsolete pfifo_fast entirely in openwrt, which is a
> tad premature.
> 
> A remaining concern is to what this affects:
> 
> A) people that expect ifconfig X txqueuelen Y to do anything will be
> misled. Perhaps this could be fixed by having the fq_codel default
> limit be txqueuelen rather than the default (and overlarge) limit of
> 10k, but as tons of people are supplying oddball txqueuelens, I tend
> to think just ignoring txqueuelen going forward is more the right
> thing.
> 
> Do you actually get close to 10k packets outstanding in 10GigE under
> any sane circumstances?


10GigE can send 10.000.000 packets per second.

10k is only 1ms of buffering, which is pretty low considering the cpu
able to restart a queue might be blocked ~10 ms in a softirq handler.

Whole point of codel is that number of packets in the queue is
irrelevant. Only sojourn time is.

Now if your host has memory concerns, that's a separate issue, and you
can adjust the qdisc limit, or add a callback from mm to be able to
shrink queue in case of memory pressure, if you deal with non elastic
flows.

> 
> B) people that expect pfifo_fast semantics, for which substituting
> fq_codel behaves oddly in two ways -
> 
> 1) if you are explicitly setting skb->priority for the default
> pfifo_fast 3 bands  and expecting a result, nothing happens - but in
> the general case, people setting skb->priority are trying to get
> better latency in the first place, and I really don't think almost
> anybody will notice.
> 
> 2) if you are using a filter on pfifo_fast that expects 3 bands, and
> end up using fq_codel by default anyway we get DRR-like behavior over
> codel rather than strict prioritization and lose fq_codel's full
> benefits... which is still a win IMHO. I am not fond of being able to
> starve the other two bands
> 
> 3) trying to explicitly set pfifo_fast via tc doesn't work with this patch.
> 
> 4) ECN processing is enabled by default (but off by default in sysctl)

There is no 'one solution fits every needs'.

codel is _not_ a replacement of pfifo_fast, its a replacement for pfifo.

If you want to replace pfifo_fast, you want PRIO + 3 codel, because
pfifo_fast is really PRIO + 3 pfifo.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Dave Taht
On Thu, Jul 11, 2013 at 10:44 AM, Eric Dumazet  wrote:
> On Thu, 2013-07-11 at 10:09 -0700, Dave Taht wrote:
>> In my default environments (wifi, mainly) the hardware queues have
>> very different properties.
>>
>> I'm under the impression that in at least a few ethernet devices they
>> are essentially the same. That said, in the sch_mq case, an entirely
>> separate qdisc is created per hardware queue, and it's always been
>> puzzling to me as to how to attempt to use them within a single qdisc
>> in the pull-through manner.
>>
>> logically, you should be able to take the fq_codel hash index (idx %
>> dev->num_tx_queues) and spread out across the hardware queues that
>> way, but I have no idea where that info would go (the skb? the flow?)
>> or even if it were possible as per the pull through problem...
>>
>> (This does not mean that I necessarily think hardware multiqueues are
>> a good idea... (certainly the results I get out of 802.11e are
>> terrible - but it would be nice to have a unified solution for hw
>> multiqueue devices)
>>
>
> We do not have a fixed/unified queue selection.
>
> It can be tweaked by many different things, depending on exact needs.
>
> MQ is not a qdisc per se, it's only a fake one, a demux if you want, so
> that each tx queue has a separate qdisc lock.
>
> If you stick one fq_codel at the top of the hierarchy (instead of MQ),
> then you loose all the pros of having multiple locks : sending packets
> from fq_codel to different queues on hardware makes no sense, since the
> single qdisc lock is the bottleneck.
>
> So if you want fq_codel and MQ, to be able to drive 40G links from many
> cpus, just use :
>
> ETH=eth0
> NQUEUES=16  # or more, check how many tx queues your NIC supports
> tc qd del dev $ETH root 2>/dev/null
> tc qd add dev $ETH root handle 1: mq
> for i in `seq 1 $NQUEUES`
> do
>  tc qd add dev $ETH parent 1:$i fq_codel
> done
>
> Thats only replaces the default pfifo_fast on each slave qdisc by
> fq_codel.

Gotcha. So what I actually did (felix did, in openwrt, actually) was
just make fq_codel the default qdisc to avoid having to inspect things
to set the number of queues in mq and mqprio. I see, for example, that
mq is the default for tg3...

http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0003-Use-FQ_codel-by-default.patch

I just added it to htb and hfsc too:

http://snapon.lab.bufferbloat.net/~cero2/deb/patches/0008-Make-fq_codel-the-default-qdisc-for-htb-and-hfsc.patch

There's a patch to obsolete pfifo_fast entirely in openwrt, which is a
tad premature.

A remaining concern is to what this affects:

A) people that expect ifconfig X txqueuelen Y to do anything will be
misled. Perhaps this could be fixed by having the fq_codel default
limit be txqueuelen rather than the default (and overlarge) limit of
10k, but as tons of people are supplying oddball txqueuelens, I tend
to think just ignoring txqueuelen going forward is more the right
thing.

Do you actually get close to 10k packets outstanding in 10GigE under
any sane circumstances?

B) people that expect pfifo_fast semantics, for which substituting
fq_codel behaves oddly in two ways -

1) if you are explicitly setting skb->priority for the default
pfifo_fast 3 bands  and expecting a result, nothing happens - but in
the general case, people setting skb->priority are trying to get
better latency in the first place, and I really don't think almost
anybody will notice.

2) if you are using a filter on pfifo_fast that expects 3 bands, and
end up using fq_codel by default anyway we get DRR-like behavior over
codel rather than strict prioritization and lose fq_codel's full
benefits... which is still a win IMHO. I am not fond of being able to
starve the other two bands

3) trying to explicitly set pfifo_fast via tc doesn't work with this patch.

4) ECN processing is enabled by default (but off by default in sysctl)


>
>
>



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel


Re: [Codel] hardware multiqueue in fq_codel?

2013-07-11 Thread Eric Dumazet
On Thu, 2013-07-11 at 10:09 -0700, Dave Taht wrote:
> In my default environments (wifi, mainly) the hardware queues have
> very different properties.
> 
> I'm under the impression that in at least a few ethernet devices they
> are essentially the same. That said, in the sch_mq case, an entirely
> separate qdisc is created per hardware queue, and it's always been
> puzzling to me as to how to attempt to use them within a single qdisc
> in the pull-through manner.
> 
> logically, you should be able to take the fq_codel hash index (idx %
> dev->num_tx_queues) and spread out across the hardware queues that
> way, but I have no idea where that info would go (the skb? the flow?)
> or even if it were possible as per the pull through problem...
> 
> (This does not mean that I necessarily think hardware multiqueues are
> a good idea... (certainly the results I get out of 802.11e are
> terrible - but it would be nice to have a unified solution for hw
> multiqueue devices)
> 

We do not have a fixed/unified queue selection.

It can be tweaked by many different things, depending on exact needs.

MQ is not a qdisc per se, it's only a fake one, a demux if you want, so
that each tx queue has a separate qdisc lock.

If you stick one fq_codel at the top of the hierarchy (instead of MQ),
then you loose all the pros of having multiple locks : sending packets
from fq_codel to different queues on hardware makes no sense, since the
single qdisc lock is the bottleneck.

So if you want fq_codel and MQ, to be able to drive 40G links from many
cpus, just use :

ETH=eth0
NQUEUES=16  # or more, check how many tx queues your NIC supports
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 $NQUEUES` 
do
 tc qd add dev $ETH parent 1:$i fq_codel
done

Thats only replaces the default pfifo_fast on each slave qdisc by
fq_codel.



___
Codel mailing list
Codel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/codel