Re: State of QoS peering in Nanog

2011-04-04 Thread Jim Gettys

On 04/03/2011 12:50 PM, Stefan Fouant wrote:

-Original Message-
From: Leo Bicknell [mailto:bickn...@ufp.org]
Sent: Saturday, April 02, 2011 10:24 PM

But it also only affects priority queue traffic.  I realize I'm making
a value judgment, but many customers under DDoS would find things
vastly improved if their video conferencing went down, but everything
else continued to work (if slowly), compared to today when everything
goes down.

I'd like to observe that discussion when the Netflix guys come calling on
the support line - Hey Netflix, yeah you're under attack and your
subscribers can't watch videos at the moment, but the good news is that all
other apps running on our network are currently unaffected. ;


In closing, I want to push folks back to the buffer bloat issue though.
More than once I've been asked to configure QoS on the network to
support VoIP, Video Conferencing or the like.  These things were
deployed and failed to work properly.  I went into the network and
_reduced_ the buffer sizes, and _increased_ packet drops.  Magically
these applications worked fine, with no QoS.

Video conferencing can tolerate a 1% packet drop, but can't tolerate a
4 second buffer delay.  Many people today who want QoS are actually
suffering from buffer bloat. :(

Concur 100%.  In my experience, I've gotten much better performance w/
VoIP/Video Conferencing and other delay-intolerant applications when setting
buffer sizes to a temporal value rather than based on a _fixed_ number of
packets.



There is no magic here at all.

There are dark buffers all over the Internet; some network operators run 
routers and broadband without RED enabled, our broadband gear suffers 
from excessive buffering, as does our home routers and hosts.


What is happening, as I outlined at the transport area meeting at the 
IETF in Prague, is that by putting in excessive buffers everywhere in 
the name of avoiding packet loss, we've destroyed TCP congestion 
avoidance and badly damaged slow start while adding terrible latency and 
jitter.  Tail drop with long buffers delays notification of congestion 
to TCP, and defeats the algorithms.  Even without this additional 
problem (which causes further havoc), TCP will always fill buffers on 
either side of your bottleneck link in your path.


So your large buffers add latency, and when a link is saturated, the 
buffers on either side of the saturated links fill, and stay so (most 
commonly in the broadband gear, but often also in the hosts/home routers 
over 802.11 links).


By running with AQM (or small buffers), you reduce the need for QOS 
(which doesn't yet exist seriously in the network edge).


See my talk in http://mirrors.bufferbloat.net/Talks/PragueIETF/ 
(slightly updated since the Prague IETF) and you can listen to it at


 http://ietf80streaming.dnsalias.net/ietf80/ietf80-ch4-wed-am.mp3

A longer version of that talk is 
at:http://mirrors.bufferbloat.net/Talks/BellLabs01192011/

Note that there is a lot you can do immediately to reduce your personal 
suffering, by using bandwidth shaping to reduce/eliminate the buffer problem in 
your home broadband gear, and by ensuring that your 802.11 wireless bandwidth 
is always greater than your home broadband bandwidth (since the bloat in 
current home routers can be even worse than in the broadband gear).

See http://gettys.wordpress.com for more detail.  Please come help fix this 
mess at bufferbloat.net.
The bloat mailing list is bl...@lists.bufferbloat.net.

We're all in this bloat together.
- Jim








RE: State of QoS peering in Nanog

2011-04-03 Thread Stefan Fouant
 -Original Message-
 From: Leo Bicknell [mailto:bickn...@ufp.org]
 Sent: Saturday, April 02, 2011 5:56 PM
 
 In an IP network, the bandwidth constraints are almost always across an
 administrative boundary.  This means in the majority of the case across
 transit circuits, not peering.  80-90% of the packet loss in the
 network happens at the end user access port, inbound or outbound.
 Another 5-10% occurs where regional or non-transit free providers buy
 transit.  Lastly, 3-5% occurs where there are geographic or
 geopolitical issues (oceans to cross, country boarders with restrictive
 governments to cross).

Hi Leo,

I think you bring up some interesting points here, and my experience and
observations largely lend credence to what you are saying.  I'd like to know
however, just for my own personal knowledge, are the numbers you are using
above based on some broad analysis or study of multiple providers, or are
you deriving these numbers likewise you're your own personal observations?

Thanks,

Stefan Fouant





RE: State of QoS peering in Nanog

2011-04-03 Thread Stefan Fouant
 -Original Message-
 From: Leo Bicknell [mailto:bickn...@ufp.org]
 Sent: Saturday, April 02, 2011 10:24 PM
 
 But it also only affects priority queue traffic.  I realize I'm making
 a value judgment, but many customers under DDoS would find things
 vastly improved if their video conferencing went down, but everything
 else continued to work (if slowly), compared to today when everything
 goes down.

I'd like to observe that discussion when the Netflix guys come calling on
the support line - Hey Netflix, yeah you're under attack and your
subscribers can't watch videos at the moment, but the good news is that all
other apps running on our network are currently unaffected. ;

 In closing, I want to push folks back to the buffer bloat issue though.
 More than once I've been asked to configure QoS on the network to
 support VoIP, Video Conferencing or the like.  These things were
 deployed and failed to work properly.  I went into the network and
 _reduced_ the buffer sizes, and _increased_ packet drops.  Magically
 these applications worked fine, with no QoS.
 
 Video conferencing can tolerate a 1% packet drop, but can't tolerate a
 4 second buffer delay.  Many people today who want QoS are actually
 suffering from buffer bloat. :(

Concur 100%.  In my experience, I've gotten much better performance w/
VoIP/Video Conferencing and other delay-intolerant applications when setting
buffer sizes to a temporal value rather than based on a _fixed_ number of
packets.

Stefan Fouant





State of QoS peering in Nanog

2011-04-02 Thread Francois Menard
Folks,

The Canadian telecommunications regulator, the CRTC, has just launched a public 
notice with possible worldwide implications IMHO, Telecom Notice of 
Consultation CRTC 2011-206:

http://www.crtc.gc.ca/eng/archive/2011/2011-206.htm

I think this is the very first regulatory inquiry into IP to IP interconnection 
for PSTN local interconnection.

One of the postulates that I intend to defend, is that in the PSTN today, in 
addition to interconnecting for the purpose of exchanging voice calls, it is 
possible to LOCALLY (at the Local Interconnection Region, roughly a US LATA) 
interconnect with guaranteed QoS for ISDN video conferencing.

In other words, there is more to PSTN interconnection than the support of the 
G.711 CODEC.  Other CODECs are supported, such as H.320.

This brings me to a point. Why should we loose this important feature of the 
PSTN, support for multiple CODECs, as we carelessly bottom level IP-IP 
interconnection to G.711 only.

Video conferencing on the Internet, particularly at high resolution, is not a 
reality today to say the least, foregoing of guessing what the future will hold.


Why not consider HD audio ?

Therefore:

A) I want to capture all instances where this issue has been addressed 
worldwide.

B) I also want to understand what is going on, insofar as enabling guaranteed 
QoS peering across BGP-4 interconnections in the Nanog community.

C) I also want to understand whether there is inter-service-provider RSVP or 
other per-session QoS establishment protocols.

I call upon the Nanog community to consider this proceeding as very important 
and contribute to this thread.

And I will try to provide a forum for discussing this outside of Nanog when 
required.

Regards,

-=Francois=-




Re: State of QoS peering in Nanog

2011-04-02 Thread Leo Bicknell
In a message written on Sat, Apr 02, 2011 at 04:00:30PM -0400, Francois Menard 
wrote:
 One of the postulates that I intend to defend, is that in the
 PSTN today, in addition to interconnecting for the purpose of
 exchanging voice calls, it is possible to LOCALLY (at the Local
 Interconnection Region, roughly a US LATA) interconnect with
 guaranteed QoS for ISDN video conferencing.

The PSTN features fixed, known bandwidth.  QoS isn't really the
right term.  When I nail up a BRI, I know I have 128kb of bandwidth,
never more, never less.  There is no function on that channel similar
to IP QoS.

When talking about IP QoS people like to talk about guaranteed, or
reserved bandwidth for particular applications.  The reality is
though that's not how IP QoS works.  IP QoS is really about identifying
which traffic can be thrown away first in th face of congestion.
Guaranteeing 128kb for a video call really means making sure all
other traffic is thrown away first, in the face of congestion.

 In other words, there is more to PSTN interconnection than the
 support of the G.711 CODEC.  Other CODECs are supported, such as
 H.320.
 
 This brings me to a point. Why should we loose this important
 feature of the PSTN, support for multiple CODECs, as we carelessly
 bottom level IP-IP interconnection to G.711 only.

IP networks can't tell the difference between G.711, H.320, and the
SMTP packets used to deliver this e-mail.  IP networks know nothing
about CODECs, and operate entirely on IP address and port information.

 B) I also want to understand what is going on, insofar as enabling
 guaranteed QoS peering across BGP-4 interconnections in the Nanog
 community.

You're looking at the wrong point in the network.  In my experience,
full peering circuits are very much the exception, not the rule.
While almost all the exceptions hit NANOG and are the subject of
fun and lively discussion, the reality is they are rare.

When there is no congestion, there is no reason to drop a packet.
A QoS policy would go unused, or if you want to look from the other
direction everything has 100% bandwidth across that link.

In an IP network, the bandwidth constraints are almost always across
an administrative boundary.  This means in the majority of the case
across transit circuits, not peering.  80-90% of the packet loss
in the network happens at the end user access port, inbound or
outbound.  Another 5-10% occurs where regional or non-transit free
providers buy transit.  Lastly, 3-5% occurs where there are geographic
or geopolitical issues (oceans to cross, country boarders with
restrictive governments to cross).

Basically, you could mandate QoS on every peering link in the
Internet and I suspect 99% of the end users would never notice any
change.

If you want to advocate for useful changes to end users that provide a
better network experience, you need to focus your efforts in three
areas:

1) Fight bufferbloat.  http://en.wikipedia.org/wiki/Bufferbloat
   
http://arstechnica.com/tech-policy/news/2011/01/understanding-bufferbloat-and-the-network-buffer-arms-race.ars
   http://www.bufferbloat.net/

2) Get access ISPs to offer QoS on customer access ports, ideally in
   some user configurable way.

3) Get ISP's who purchase transit further up the line to implement QoS
   with their transit provider for their customers traffic, if they are
   going to run those links at full.

-- 
   Leo Bicknell - bickn...@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


pgpuyhhQROyBa.pgp
Description: PGP signature


Re: State of QoS peering in Nanog

2011-04-02 Thread Jeff Wheeler
On Sat, Apr 2, 2011 at 5:56 PM, Leo Bicknell bickn...@ufp.org wrote:
 The PSTN features fixed, known bandwidth.  QoS isn't really the
 right term.  When I nail up a BRI, I know I have 128kb of bandwidth,
 never more, never less.  There is no function on that channel similar
 to IP QoS.

The PSTN also has exactly one unidirectional flow per access port.
This is not true of IP networks, where an end-user access port may
have dozens of flows going at once for common web browsing, and
perhaps hundreds of flows when using P2P file sharing applications,
etc.  The lifetime of these flows may be several hours (streaming
movie) or under a second (web browser.)

Where the PSTN has channels between two access ports (which might be
packetized within the backbone) and a relatively complex control plane
for establishing flows, the IP network has little or no knowledge of
flows, and if it does have any knowledge of them, it's not because a
control plane exists to establish them, it's because punting from the
data plane to the control plane allows flow state to be established
for things like NAT.

 Basically, you could mandate QoS on every peering link in the
 Internet and I suspect 99% of the end users would never notice any
 change.

I don't agree with this.  IMO all DDoS traffic would suddenly be
marked into the highest priority forwarding class that doesn't have an
absurdly low policer for the DDoS source's access port, and as a
result, DDoS would more easily cripple the network, either from
hitting policers on the higher-priority traffic and killing streaming
movies/voip/etc, or in the absence of policers, it would more easily
cause significant packet loss to best-effort traffic.

I think end-users would notice because their ISP would suddenly grind
to a halt anytime a clever DDoS was directed their way.

We will no sooner see a practical solution to this than we will one
for large-scale multicast in backbone and subscriber access networks.
The limitations are similar: to be effective, you need a lot more
state for multicast.  For a truly good QoS implementation, you need a
lot more hardware counters and policers (more state.)  If you don't
have this, all your QoS setup will do, deployed across a large
Internet subscriber access network, is work a little better under
ideal conditions, and probably a lot worse when subjected to malicious
traffic.

 2) Get access ISPs to offer QoS on customer access ports, ideally in
   some user configurable way.

I do agree that QoS should be available to end-users across access
links, but I don't agree with pushing it further towards the core
unless per-subscriber policers are available beyond those on access
routers.  Otherwise, all someone has to do to be mean to Netflix is
send a short-term, high-volume DoS attack that looks like Netflix
traffic towards an end-user IP, which would interrupt movie-viewing
for a potentially larger number of users, or at least as many
end-users as the same DoS would in the absence of any QoS.  The case
of per-subscriber policers pushed further towards the ISP core fares
better.

-- 
Jeff S Wheeler j...@inconcepts.biz
Sr Network Operator  /  Innovative Network Concepts



Re: State of QoS peering in Nanog

2011-04-02 Thread Leo Bicknell
In a message written on Sat, Apr 02, 2011 at 07:00:52PM -0400, Jeff Wheeler 
wrote:
 I don't agree with this.  IMO all DDoS traffic would suddenly be
 marked into the highest priority forwarding class that doesn't have an
 absurdly low policer for the DDoS source's access port, and as a
 result, DDoS would more easily cripple the network, either from
 hitting policers on the higher-priority traffic and killing streaming
 movies/voip/etc, or in the absence of policers, it would more easily
 cause significant packet loss to best-effort traffic.

Agree in part, and disagree in part.

No doubt DDoS programs will try and masquerade as high priority
traffic.  This will create a new set of problems, and require some
new solutions.

Let's separate the problem into two parts.  The first is best
effort traffic.  Provided the QoS policy only prioritizes a fraction
of the bandwidth (20 to maybe 40%), the impact of a DDoS that came
in prioritized would only be a few percentage points worse than a
standard DDoS.

Today it takes about 10x link speed to make a link completely
unusable (although YMMV, and it depends a lot on your traffic mix
and definition of unusable).  Witha  25% priority queue, and the
DDoS hitting it that may drop to 8x.  I think it is both statistically
interesting, but also relatively minor.

The second problem is what happens to priority traffic.  You are
correct that if DDoS traffic can come in prioritized then you only
need fill the priority queue 2x-4x to generate issues (as streaming
traffic is more sensitive), assuming traffic over the limit is not
dropped but rather allowed best effort.  This is likely a lower
threshold than filling the entire link 5x-10x, and thus easier for
the attacker.

But it also only affects priority queue traffic.  I realize I'm
making a value judgment, but many customers under DDoS would find
things vastly improved if their video conferencing went down, but
everything else continued to work (if slowly), compared to today
when everything goes down.

In closing, I want to push folks back to the buffer bloat issue
though.  More than once I've been asked to configure QoS on the
network to support VoIP, Video Conferencing or the like.  These
things were deployed and failed to work properly.  I went into the
network and _reduced_ the buffer sizes, and _increased_ packet
drops.  Magically these applications worked fine, with no QoS.

Video conferencing can tolerate a 1% packet drop, but can't tolerate
a 4 second buffer delay.  Many people today who want QoS are actually
suffering from buffer bloat. :(

This is very hard to explain, while people on NANOG might get it 99% of
the network engineers in the world think minimizing packet loss is the
goal.  It is very much an uphill battle to make them understand higher
packet loss often _increases_ end user performance on full links.

-- 
   Leo Bicknell - bickn...@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


pgpPLIG1qC44R.pgp
Description: PGP signature