Re: [Cerowrt-devel] [Bloat] some good bloat related stuff on the ICCRG agenda, IETF #86 Tuesday, March 12 2013, 13:00-15:00, room Caribbean 6

2013-02-28 Thread Wesley Eddy
On 2/28/2013 10:53 AM, Dave Taht wrote:
 
 For those that don't attend ietf meetings in person, there is usually
 live audio and jabber chat hooked up into the presentations.
 
 See y'all there, next month, in one form or another.
 


In the TSVAREA meeting, we've also set aside some time to talk
about AQM and whether there's interest and energy to do some
more specific work on AQM algs in the IETF (e.g. like CoDel and
PIE):

https://datatracker.ietf.org/meeting/86/agenda/tsvarea

I'm working with Martin on some slides to seed the discussion,
but we hope that it's mostly the community that we hear from,
following up in the higher-bandwidth face-to-face time from
the thread we had on the tsv-a...@ietf.org mailing list a few
months ago.


-- 
Wes Eddy
MTI Systems
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] some good bloat related stuff on the ICCRG agenda, IETF #86 Tuesday, March 12 2013, 13:00-15:00, room Caribbean 6

2013-02-28 Thread dpreed

A small suggestion.  Instead of working on *algorithms*, focus on getting 
something actually *deployed* to fix the very real issues that we have today 
(preserving the option to upgrade later if need be).
 
The folks who built the Internet (I was there, as you probably know) focused on 
making stuff that worked and interoperated, not publishing papers or RFCs.
 
-Original Message-
From: Wesley Eddy w...@mti-systems.com
Sent: Thursday, February 28, 2013 1:11pm
To: Dave Taht dave.t...@gmail.com
Cc: bloat-annou...@lists.bufferbloat.net, Martin Stiemerling 
martin.stiemerl...@neclab.eu, cerowrt-devel@lists.bufferbloat.net, bloat 
bl...@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] [Bloat] some good bloat related stuff on the ICCRG 
agenda, IETF #86 Tuesday, March 12 2013, 13:00-15:00, room Caribbean 6



On 2/28/2013 10:53 AM, Dave Taht wrote:
 
 For those that don't attend ietf meetings in person, there is usually
 live audio and jabber chat hooked up into the presentations.
 
 See y'all there, next month, in one form or another.
 


In the TSVAREA meeting, we've also set aside some time to talk
about AQM and whether there's interest and energy to do some
more specific work on AQM algs in the IETF (e.g. like CoDel and
PIE):

https://datatracker.ietf.org/meeting/86/agenda/tsvarea

I'm working with Martin on some slides to seed the discussion,
but we hope that it's mostly the community that we hear from,
following up in the higher-bandwidth face-to-face time from
the thread we had on the tsv-a...@ietf.org mailing list a few
months ago.


-- 
Wes Eddy
MTI Systems
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel for Android

2013-02-28 Thread dpreed

Doesn't fq_codel need an estimate of link capacity?  Where will it get that 
from the 4G or 3G uplink?
 
-Original Message-
From: Maciej Soltysiak mac...@soltysiak.com
Sent: Thursday, February 28, 2013 1:03pm
To: cerowrt-devel@lists.bufferbloat.net
Subject: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel for 
Android



Hiya,

Looks like Google's experimenting with 3.8 for Android: 
[https://android.googlesource.com/kernel/common/+/experimental/android-3.8] 
https://android.googlesource.com/kernel/common/+/experimental/android-3.8
Sounds great if this means they will utilize fq_codel, TFO, BQL, etc.

Anyway my nexus 7 says it has 3.1.10 and this 3.8 will probably go to Android 
5.0 so I hope Nexus 7 will get it too some day or at least 3.3+

Phoronix coverage: [http://www.phoronix.com/scan.php?page=news_itempx=MTMxMzc] 
http://www.phoronix.com/scan.php?page=news_itempx=MTMxMzc
Their 3.8 changelog: 
[https://android.googlesource.com/kernel/common/+log/experimental/android-3.8] 
https://android.googlesource.com/kernel/common/+log/experimental/android-3.8

Regards,
Maciej
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel for Android

2013-02-28 Thread Jim Gettys
I've got a bit more insight into LTE than I did in the past, courtesy of
the last couple days.

To begin with, LTE runs with several classes of service (the call them
bearers).  Your VOIP traffic goes into one of them.
And I think there is another as well that is for guaranteed bit rate
traffic.  One transmit opportunity may have a bunch of chunks of data, and
that data may be destined for more than one device (IIRC).  It's
substantially different than WiFi.

But most of what we think of as Internet stuff (web surfing, dns, etc) all
gets dumped into a single best effort (BE), class.

The BE class is definitely badly bloated; I can't say how much because I
don't really know yet; the test my colleague ran wasn't run long enough to
be confident it filled the buffers).  But I will say worse than most cable
modems I've seen.  I expect this will be true to different degrees on
different hardware.  The other traffic classes haven't been tested yet for
bufferbloat, though I suspect they will have it too.  I was told that those
classes have much shorter queues, and when the grow, they dump the whole
queues (because delivering late real time traffic is useless).  But trust
*and* verify  Verification hasn't been done for anything but BE
traffic, and that hasn't been quantified.

But each device gets a fair shot at bandwidth in the cell (or sector of a
cell; they run 3 radios in each cell), where fair is basically time based;
if you are at the edge of a cell, you'll get a lot less bandwidth than
someone near a tower; and this fairness is guaranteed by a scheduler than
runs in the base station (called a b-nodeb, IIIRC).  So the base station
guarantees some sort of fairness between devices (a place where Linux's
wifi stack today fails utterly, since there is a single queue per device,
rather than one per station).

Whether there are bloat problems at the link level in LTE due to error
correction I don't know yet; but it wouldn't surprise me; I know there was
in 3g.  The people I talked to this morning aren't familiar with the HARQ
layer in the system.

The base stations are complicated beasts; they have both a linux system in
them as well as a real time operating system based device inside  We don't
know where the bottle neck(s) are yet.  I spent lunch upping their paranoia
and getting them through some conceptual hurdles (e.g. multiple bottlenecks
that may move, and the like).  They will try to get me some of the data so
I can help them figure it out.  I don't know if the data flow goes through
the linux system in the bnodeb or not, for example.

Most carriers are now trying to ensure that their backhauls from the base
station are never congested, though that is another known source of
problems.  And then there is the lack of AQM at peering point routers
 You'd think they might run WRED there, but many/most do not.
 - Jim





On Thu, Feb 28, 2013 at 2:08 PM, Dave Taht dave.t...@gmail.com wrote:



 On Thu, Feb 28, 2013 at 1:57 PM, dpr...@reed.com wrote:

 Doesn't fq_codel need an estimate of link capacity?


 No, it just measures delay. Since so far as I know the outgoing portion of
 LTE is not soft-rate limited, but sensitive to the actual available link
 bandwidth, fq_codel should work pretty good (if the underlying interfaces
 weren't horribly overbuffired) in that direction.

 I'm looking forward to some measurements of actual buffering at the device
 driver/device levels.

 I don't know how inbound to the handset is managed via LTE.

 Still quite a few assumptions left to smash in the above.

 ...

 in the home router case

 ...

 When there are artificial rate limits in play (in, for example, a cable
 modem/CMTS, hooked up via gigE yet rate limiting to 24up/4mbit down), then
 a rate limiter (tbf,htb,hfsc) needs to be applied locally to move that rate
 limiter/queue management into the local device, se we can manage it better.

 I'd like to be rid of the need to use htb and come up with a rate limiter
 that could be adjusted dynamically from a daemon in userspace, probing for
 short all bandwidth fluctuations while monitoring the load. It needent send
 that much data very often, to come up with a stable result

 You've described one soft-rate sensing scheme (piggybacking on TCP), and
 I've thought up a few others, that could feed back from a daemon some
 samples into a a soft(er) rate limiter that would keep control of the
 queues in the home router. I am thinking it's going to take way too long to
 fix the CPE and far easier to fix the home router via this method, and
 certainly it's too painful and inaccurate to merely measure the bandwidth
 once, then set a hard rate, when

 So far as I know the gargoyle project was experimenting with this
 approach.

 A problem is in places that connect more than one device to the cable
 modem... then you end up with those needing to communicate their perception
 of the actual bandwidth beyond the 

Re: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel for Android

2013-02-28 Thread Jim Gettys
In short, people who build hardware devices, or device drivers, don't
understand TCP.

There is a first class education failure in all this.

We have yet to find almost any device that isn't bloated; the only question
is how badly.
 - Jim



On Thu, Feb 28, 2013 at 3:58 PM, dpr...@reed.com wrote:

 At least someone actually saw what I've been seeing for years now in Metro
 area HSPA and LTE deployments.



 As you know, when I first reported this on the e2e list I was told it
 could not possibly be happening and that I didn't know what I was talking
 about.  No one in the phone companies was even interested in replicating my
 experiments, just dismissing them.  It was sad.



 However, I had the same experience on the original Honeywell 6180 dual CPU
 Multics deployment in about 1973.  One day all my benchmarks were running
 about 5 times slower every other time I ran the code.  I suggested that one
 of the CPUs was running 5x slower, and it was probably due to the CPU cache
 being turned off.   The hardware engineer on site said that that was
 *impossible*.  After 4 more hours of testing, I was sure I was right.  That
 evening, I got him to take the system down, and we hauled out an
 oscilloscope.  Sure enough, the gate that received the cache hit signal
 had died in one of the processors.   The machine continued to run, since
 all that caused was for memory to be fetched every time, rather than using
 the cache.



 Besides the value of finding the root cause of anomalies, the story
 points out that you really need to understand software and hardware
 sometimes.  The hardware engineer didn't understand the role of a cache,
 even though he fully understood timing margins, TTL logic, core memory
 (yes, this machine used core memory), etc.



 We both understood oscilloscopes, fortunately.



 In some ways this is like the LTE designers understanding TCP.   They
 don't.  But sometimes you need to know about both in some depth.



 Congratulations, Jim.  More Internet Plumbing Merit Badges for you.



 -Original Message-
 From: Jim Gettys j...@freedesktop.org
 Sent: Thursday, February 28, 2013 3:03pm
 To: Dave Taht dave.t...@gmail.com
 Cc: David P Reed dpr...@reed.com, cerowrt-devel@lists.bufferbloat.net
 cerowrt-devel@lists.bufferbloat.net
 Subject: Re: [Cerowrt-devel] Google working on experimental 3.8 Linux
 kernel for Android

  I've got a bit more insight into LTE than I did in the past, courtesy of
 the last couple days.
 To begin with, LTE runs with several classes of service (the call them
 bearers).  Your VOIP traffic goes into one of them.
 And I think there is another as well that is for guaranteed bit rate
 traffic.  One transmit opportunity may have a bunch of chunks of data, and
 that data may be destined for more than one device (IIRC).  It's
 substantially different than WiFi.
 But most of what we think of as Internet stuff (web surfing, dns, etc) all
 gets dumped into a single best effort (BE), class.
 The BE class is definitely badly bloated; I can't say how much because I
 don't really know yet; the test my colleague ran wasn't run long enough to
 be confident it filled the buffers).  But I will say worse than most cable
 modems I've seen.  I expect this will be true to different degrees on
 different hardware.  The other traffic classes haven't been tested yet for
 bufferbloat, though I suspect they will have it too.  I was told that those
 classes have much shorter queues, and when the grow, they dump the whole
 queues (because delivering late real time traffic is useless).  But trust
 *and* verify  Verification hasn't been done for anything but BE
 traffic, and that hasn't been quantified.
 But each device gets a fair shot at bandwidth in the cell (or sector of
 a cell; they run 3 radios in each cell), where fair is basically time
 based; if you are at the edge of a cell, you'll get a lot less bandwidth
 than someone near a tower; and this fairness is guaranteed by a scheduler
 than runs in the base station (called a b-nodeb, IIIRC).  So the base
 station guarantees some sort of fairness between devices (a place where
 Linux's wifi stack today fails utterly, since there is a single queue per
 device, rather than one per station).
 Whether there are bloat problems at the link level in LTE due to error
 correction I don't know yet; but it wouldn't surprise me; I know there was
 in 3g.  The people I talked to this morning aren't familiar with the HARQ
 layer in the system.
 The base stations are complicated beasts; they have both a linux system in
 them as well as a real time operating system based device inside  We don't
 know where the bottle neck(s) are yet.  I spent lunch upping their paranoia
 and getting them through some conceptual hurdles (e.g. multiple bottlenecks
 that may move, and the like).  They will try to get me some of the data so
 I can help them figure it out.  I don't know if the data flow goes through
 the linux 

Re: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel for Android

2013-02-28 Thread dpreed

It all started when CS departments decided they didn't need EE courses or 
affiliation with EE depts., and continued with the idea that digital 
communications had nothing to do with the folks who design the gear, so all you 
needed to know was the bit layouts of packets in memory to be a network 
expert.
 
You can see this in the curricula at all levels. Cisco certifies network people 
who have never studied control theory, queueing theory, ..., and the phone 
companies certify communications engineers who have never run traceroute or 
ping, much less debugged the performance of a web-based UI.
 
Modularity is great.  But it comes at a cost.  Besides this kind of failure, 
it's the primary cause of security vulnerabilities.
 
-Original Message-
From: Jim Gettys j...@freedesktop.org
Sent: Thursday, February 28, 2013 4:02pm
To: David P Reed dpr...@reed.com
Cc: Dave Taht dave.t...@gmail.com, cerowrt-devel@lists.bufferbloat.net 
cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel 
for Android



In short, people who build hardware devices, or device drivers, don't 
understand TCP.
There is a first class education failure in all this.

We have yet to find almost any device that isn't bloated; the only question is 
how badly.
- Jim



On Thu, Feb 28, 2013 at 3:58 PM,  [mailto:dpr...@reed.com] dpr...@reed.com 
wrote:

At least someone actually saw what I've been seeing for years now in Metro area 
HSPA and LTE deployments.
 
As you know, when I first reported this on the e2e list I was told it could not 
possibly be happening and that I didn't know what I was talking about.  No one 
in the phone companies was even interested in replicating my experiments, just 
dismissing them.  It was sad.
 
However, I had the same experience on the original Honeywell 6180 dual CPU 
Multics deployment in about 1973.  One day all my benchmarks were running about 
5 times slower every other time I ran the code.  I suggested that one of the 
CPUs was running 5x slower, and it was probably due to the CPU cache being 
turned off.   The hardware engineer on site said that that was *impossible*.  
After 4 more hours of testing, I was sure I was right.  That evening, I got him 
to take the system down, and we hauled out an oscilloscope.  Sure enough, the 
gate that received the cache hit signal had died in one of the processors.   
The machine continued to run, since all that caused was for memory to be 
fetched every time, rather than using the cache.
 
Besides the value of finding the root cause of anomalies, the story points 
out that you really need to understand software and hardware sometimes.  The 
hardware engineer didn't understand the role of a cache, even though he fully 
understood timing margins, TTL logic, core memory (yes, this machine used core 
memory), etc.
 
We both understood oscilloscopes, fortunately.
 
In some ways this is like the LTE designers understanding TCP.   They don't.  
But sometimes you need to know about both in some depth.
 
Congratulations, Jim.  More Internet Plumbing Merit Badges for you.


 
-Original Message-
From: Jim Gettys [mailto:j...@freedesktop.org] j...@freedesktop.org
Sent: Thursday, February 28, 2013 3:03pm
To: Dave Taht [mailto:dave.t...@gmail.com] dave.t...@gmail.com
 Cc: David P Reed [mailto:dpr...@reed.com] dpr...@reed.com, 
[mailto:cerowrt-devel@lists.bufferbloat.net] 
cerowrt-devel@lists.bufferbloat.net 
[mailto:cerowrt-devel@lists.bufferbloat.net] 
cerowrt-devel@lists.bufferbloat.net
 Subject: Re: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel 
for Android



I've got a bit more insight into LTE than I did in the past, courtesy of the 
last couple days.
To begin with, LTE runs with several classes of service (the call them 
bearers).  Your VOIP traffic goes into one of them.
And I think there is another as well that is for guaranteed bit rate traffic.  
One transmit opportunity may have a bunch of chunks of data, and that data may 
be destined for more than one device (IIRC).  It's substantially different than 
WiFi.
But most of what we think of as Internet stuff (web surfing, dns, etc) all gets 
dumped into a single best effort (BE), class.
The BE class is definitely badly bloated; I can't say how much because I don't 
really know yet; the test my colleague ran wasn't run long enough to be 
confident it filled the buffers).  But I will say worse than most cable modems 
I've seen.  I expect this will be true to different degrees on different 
hardware.  The other traffic classes haven't been tested yet for bufferbloat, 
though I suspect they will have it too.  I was told that those classes have 
much shorter queues, and when the grow, they dump the whole queues (because 
delivering late real time traffic is useless).  But trust *and* verify  
Verification hasn't been done for anything but BE traffic, and that hasn't been 
quantified.
But each device gets a fair shot at bandwidth in