[RFC PATCH net-2.6.25 uncompilable] [TCP]: Avoid breaking GSOed skbs when SACKed one-by-one (Was: Re: [RFC] TCP illinois max rtt aging)

2007-12-11 Thread Ilpo Järvinen
On Fri, 7 Dec 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Fri, 7 Dec 2007 15:05:59 +0200 (EET)
> 
> > On Fri, 7 Dec 2007, David Miller wrote:
> > 
> > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> > > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)
> > > 
> > > > I guess if you get a large cumulative ACK, the amount of processing is 
> > > > still overwhelming (added DaveM if he has some idea how to combat it).
> > > > 
> > > > Even a simple scenario (this isn't anything fancy at all, will occur 
> > > > all 
> > > > the time): Just one loss => rest skbs grow one by one into a single 
> > > > very large SACK block (and we do that efficiently for sure) => then the 
> > > > fast retransmit gets delivered and a cumulative ACK for whole 
> > > > orig_window 
> > > > arrives => clean_rtx_queue has to do a lot of processing. In this case 
> > > > we 
> > > > could optimize RB-tree cleanup away (by just blanking it all) but still 
> > > > getting rid of all those skbs is going to take a larger moment than I'd 
> > > > like to see.
> > > 
> > > Yes, it's the classic problem.  But it ought to be at least
> > > partially masked when TSO is in use, because we'll only process
> > > a handful of SKBs.  The more effectively TSO batches, the
> > > less work clean_rtx_queue() will do.
> > 
> > No, that's not what is going to happen, TSO won't help at all
> > because one-by-one SACKs will fragment every single one of them
> > (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
> > case, or am I missing something?
> 
> You're of course right, and it's ironic that I wrote the SACK
> splitting code so I should have known this :-)
> 
> A possible approach just occurred to me wherein we maintain
> the SACK state external to the SKBs so that we don't need to
> mess with them at all.
> 
> That would allow us to eliminate the TSO splitting but it would
> not remove the general problem of clean_rtx_queue()'s overhead.
> 
> I'll try to give some thought to this over the weekend.

How about this...

...I've left couple of FIXMEs there still, should be quite simple & 
straightforward to handle them if this seems viable solution at all.


Beware, this doesn't even compile yet because not all parameters are 
transferred currently (first_sack_index was killed earlier, I need to 
summon it back for this). Also, I'd need to do the dirty work of kill 
recv_sack_cache first to make this to not produce, well, interesting 
effects due to missed SACK blocks... :-)

Applies cleanly only after this:
  [TCP]: Push fack_count calculation deeper into functions


--
 i.

--
[RFC PATCH net-2.6.25 uncompilable] [TCP]: Avoid breaking GSOed skbs when 
SACKed one-by-one

Because receiver reports out-of-order segment one-by-one using
SACK, the tcp_fragment may do a lot of unnecessary splits that
would be avoided if the sender could see the upcoming future.
Not only SACK processing suffers but clean_rtx_queue as well
is considerable hit when the corresponding cumulative ACK
arrives.

Thus implement a local cache for a single skb to avoid enormous
splitting efforts while the latest SACK block is still growing.
Messy enough, other parts must be made aware of this change as
well because the skb state is a bit fuzzy while have not yet
marked it in tcp_sacktag_one.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 include/linux/tcp.h   |2 +
 include/net/tcp.h |   19 
 net/ipv4/tcp_input.c  |  110 ++--
 net/ipv4/tcp_output.c |7 +++
 4 files changed, 133 insertions(+), 5 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 56342c3..4fbfa46 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -360,6 +360,8 @@ struct tcp_sock {
u32 fackets_out;/* FACK'd packets   */
u32 high_seq;   /* snd_nxt at onset of congestion   */
 
+   u32 sack_pending;   /* End seqno of postponed SACK tagging  */
+
u32 retrans_stamp;  /* Timestamp of the last retransmit,
 * also used in SYN-SENT to remember stamp of
 * the first SYN. */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5e6c433..e2b88e3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -462,6 +462,21 @@ extern void tcp_send_delayed_ack(struct sock *sk);
 
 /* tcp_input.c */
 extern void tcp_cwnd_application_limited(struct sock *sk);
+extern void __tcp_process_postponed_sack(struct sock *sk, struct sk_buff *skb);
+extern struct sk_buff *tcp_find_postponed_skb(struct sock *sk);
+
+static inline void tcp_process_postponed_sack(struct sock *sk)
+{
+   if (tcp_sk(sk)->sack_pending)
+   __tcp_process_postponed_sack(sk, tcp_find_postponed_skb(sk));
+}
+
+static inline void tcp_process_postponed_sack_overlapping(struct sock *sk,
+ struct sk_buff *skb)
+{
+   if (tcp_sk(s

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Fri, 7 Dec 2007 15:05:59 +0200 (EET)

> On Fri, 7 Dec 2007, David Miller wrote:
> 
> > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)
> > 
> > > I guess if you get a large cumulative ACK, the amount of processing is 
> > > still overwhelming (added DaveM if he has some idea how to combat it).
> > > 
> > > Even a simple scenario (this isn't anything fancy at all, will occur all 
> > > the time): Just one loss => rest skbs grow one by one into a single 
> > > very large SACK block (and we do that efficiently for sure) => then the 
> > > fast retransmit gets delivered and a cumulative ACK for whole orig_window 
> > > arrives => clean_rtx_queue has to do a lot of processing. In this case we 
> > > could optimize RB-tree cleanup away (by just blanking it all) but still 
> > > getting rid of all those skbs is going to take a larger moment than I'd 
> > > like to see.
> > > 
> > > That tree blanking could be extended to cover anything which ACK more 
> > > than 
> > > half of the tree by just replacing the root (and dealing with potential 
> > > recolorization of the root).
> > 
> > Yes, it's the classic problem.  But it ought to be at least
> > partially masked when TSO is in use, because we'll only process
> > a handful of SKBs.  The more effectively TSO batches, the
> > less work clean_rtx_queue() will do.
> 
> No, that's not what is going to happen, TSO won't help at all
> because one-by-one SACKs will fragment every single one of them
> (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
> case, or am I missing something?

You're of course right, and it's ironic that I wrote the SACK
splitting code so I should have known this :-)

A possible approach just occurred to me wherein we maintain
the SACK state external to the SKBs so that we don't need to
mess with them at all.

That would allow us to eliminate the TSO splitting but it would
not remove the general problem of clean_rtx_queue()'s overhead.

I'll try to give some thought to this over the weekend.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread Ilpo Järvinen
On Fri, 7 Dec 2007, Ilpo Järvinen wrote:

> On Fri, 7 Dec 2007, David Miller wrote:
> 
> > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)
> > 
> > > I guess if you get a large cumulative ACK, the amount of processing is 
> > > still overwhelming (added DaveM if he has some idea how to combat it).
> > > 
> > > Even a simple scenario (this isn't anything fancy at all, will occur all 
> > > the time): Just one loss => rest skbs grow one by one into a single 
> > > very large SACK block (and we do that efficiently for sure) => then the 
> > > fast retransmit gets delivered and a cumulative ACK for whole orig_window 
> > > arrives => clean_rtx_queue has to do a lot of processing. In this case we 
> > > could optimize RB-tree cleanup away (by just blanking it all) but still 
> > > getting rid of all those skbs is going to take a larger moment than I'd 
> > > like to see.
> > > 
> > > That tree blanking could be extended to cover anything which ACK more 
> > > than 
> > > half of the tree by just replacing the root (and dealing with potential 
> > > recolorization of the root).
> > 
> > Yes, it's the classic problem.  But it ought to be at least
> > partially masked when TSO is in use, because we'll only process
> > a handful of SKBs.  The more effectively TSO batches, the
> > less work clean_rtx_queue() will do.
> 
> No, that's not what is going to happen, TSO won't help at all
> because one-by-one SACKs will fragment every single one of them
> (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
> case, or am I missing something?

Hmm... this could be solved though by postponing the fragmentation of a 
partially sacked skb when the first sack block can (is likely) to still 
grow and remove the need for fragmentation. Has some implications to 
packet processing, increases burstiness a bit & tcp_max_burst kicks in too 
easily.

-- 
 i.

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread Ilpo Järvinen
On Fri, 7 Dec 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
> Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)
> 
> > I guess if you get a large cumulative ACK, the amount of processing is 
> > still overwhelming (added DaveM if he has some idea how to combat it).
> > 
> > Even a simple scenario (this isn't anything fancy at all, will occur all 
> > the time): Just one loss => rest skbs grow one by one into a single 
> > very large SACK block (and we do that efficiently for sure) => then the 
> > fast retransmit gets delivered and a cumulative ACK for whole orig_window 
> > arrives => clean_rtx_queue has to do a lot of processing. In this case we 
> > could optimize RB-tree cleanup away (by just blanking it all) but still 
> > getting rid of all those skbs is going to take a larger moment than I'd 
> > like to see.
> > 
> > That tree blanking could be extended to cover anything which ACK more than 
> > half of the tree by just replacing the root (and dealing with potential 
> > recolorization of the root).
> 
> Yes, it's the classic problem.  But it ought to be at least
> partially masked when TSO is in use, because we'll only process
> a handful of SKBs.  The more effectively TSO batches, the
> less work clean_rtx_queue() will do.

No, that's not what is going to happen, TSO won't help at all
because one-by-one SACKs will fragment every single one of them
(see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
case, or am I missing something?

> Web100 just provides statistics and other kinds of connection data
> to userspace, all the actual algorithm etc. modifications have been
> merged upstream and yanked out of the web100 patch.  I was looking
> at it the other night and it's frankly totally uninteresting these
> days :-)

...Thanks, I'll keep that in my mind when looking... :-)


-- 
 i.

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)

> I guess if you get a large cumulative ACK, the amount of processing is 
> still overwhelming (added DaveM if he has some idea how to combat it).
> 
> Even a simple scenario (this isn't anything fancy at all, will occur all 
> the time): Just one loss => rest skbs grow one by one into a single 
> very large SACK block (and we do that efficiently for sure) => then the 
> fast retransmit gets delivered and a cumulative ACK for whole orig_window 
> arrives => clean_rtx_queue has to do a lot of processing. In this case we 
> could optimize RB-tree cleanup away (by just blanking it all) but still 
> getting rid of all those skbs is going to take a larger moment than I'd 
> like to see.
> 
> That tree blanking could be extended to cover anything which ACK more than 
> half of the tree by just replacing the root (and dealing with potential 
> recolorization of the root).

Yes, it's the classic problem.  But it ought to be at least
partially masked when TSO is in use, because we'll only process
a handful of SKBs.  The more effectively TSO batches, the
less work clean_rtx_queue() will do.

When not doing TSO the behavior is super-stupid, we bump reference
counts on the same page multiple times while running over the SKBs
since consequetive SKBs cover data in different spans of the same
page.

The core issue is that we have a poorly behaving data container,
and therefore that's obviously what we need to change.

Conceptually what we probably need to do is seperate the data
maintainence from the SKB objects themselves.  There is a blob
that maintains the paged data state for everything in the
retransmit queue.  SKBs are built and get the page pointers
but don't actually grab references to the pages, the blob
does that and it keeps track of how many SKB references to each
page there are, non-atomically.

The hardest part is dealing with the page lifetime issues.
Unfortunately, when we trim the rtx queue, references to the clones
can still exist in the driver output path.  It's a difficult problem
to overcome in fact, so in the end my suggestion above might not
even be workable.

> No idea about what it could do, haven't yet looked web100, I was planning 
> at some point of time...

Web100 just provides statistics and other kinds of connection data
to userspace, all the actual algorithm etc. modifications have been
merged upstream and yanked out of the web100 patch.  I was looking
at it the other night and it's frankly totally uninteresting these
days :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread Ilpo Järvinen
On Thu, 6 Dec 2007, Lachlan Andrew wrote:
> On 04/12/2007, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > On Mon, 3 Dec 2007, Lachlan Andrew wrote:
> > >
> > > When SACK is active, the per-packet processing becomes more involved,
> > > tracking the list of lost/SACKed packets.  This causes a CPU spike
> > > just after a loss, which increases the RTTs, at least in my
> > > experience.
> >
> > I suspect that as long as old code was able to use hint, it wasn't doing
> > that bad. But it was seriously lacking ability to take advantage of sack
> > processing hint when e.g., a new hole appeared, or cumulative ACK arrived.
> >
> > ...Code available in net-2.6.25 might cure those.
> 
> We had been using one of your earlier patches, and still had the
> problem.  I think you've cured the problem with SACK itself, but there
> still seems to be something taking a lot of CPU while recovering from
> the loss. 

I guess if you get a large cumulative ACK, the amount of processing is 
still overwhelming (added DaveM if he has some idea how to combat it).

Even a simple scenario (this isn't anything fancy at all, will occur all 
the time): Just one loss => rest skbs grow one by one into a single 
very large SACK block (and we do that efficiently for sure) => then the 
fast retransmit gets delivered and a cumulative ACK for whole orig_window 
arrives => clean_rtx_queue has to do a lot of processing. In this case we 
could optimize RB-tree cleanup away (by just blanking it all) but still 
getting rid of all those skbs is going to take a larger moment than I'd 
like to see.

That tree blanking could be extended to cover anything which ACK more than 
half of the tree by just replacing the root (and dealing with potential 
recolorization of the root).

> It is possible that it was to do with  web100  which we
> have also been running, but I cut out most of the statistics from that
> and still had problems.

No idea about what it could do, haven't yet looked web100, I was planning 
at some point of time...

-- 
 i.

Re: [RFC] TCP illinois max rtt aging

2007-12-06 Thread Lachlan Andrew
Greetings Ilpo,

On 04/12/2007, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> On Mon, 3 Dec 2007, Lachlan Andrew wrote:
> >
> > When SACK is active, the per-packet processing becomes more involved,
> > tracking the list of lost/SACKed packets.  This causes a CPU spike
> > just after a loss, which increases the RTTs, at least in my
> > experience.
>
> I suspect that as long as old code was able to use hint, it wasn't doing
> that bad. But it was seriously lacking ability to take advantage of sack
> processing hint when e.g., a new hole appeared, or cumulative ACK arrived.
>
> ...Code available in net-2.6.25 might cure those.

We had been using one of your earlier patches, and still had the
problem.  I think you've cured the problem with SACK itself, but there
still seems to be something taking a lot of CPU while recovering from
the loss.  It is possible that it was to do with  web100  which we
have also been running, but I cut out most of the statistics from that
and still had problems.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] TCP illinois max rtt aging

2007-12-04 Thread Ilpo Järvinen
On Mon, 3 Dec 2007, Lachlan Andrew wrote:

> On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > On Mon, 3 Dec 2007 15:59:23 -0800
> > "Shao Liu" <[EMAIL PROTECTED]> wrote:
> > > And also a question, why the samples when SACK is active are outliers?
> >
> > Any sample with SACK is going to mean a loss or reordering has occurred.
> > So shouldn't the SACK values be useful, but RTT values from retransmits
> > are not useful.
> 
> When SACK is active, the per-packet processing becomes more involved,
> tracking the list of lost/SACKed packets.  This causes a CPU spike
> just after a loss, which increases the RTTs, at least in my
> experience.

I suspect that as long as old code was able to use hint, it wasn't doing 
that bad. But it was seriously lacking ability to take advantage of sack 
processing hint when e.g., a new hole appeared, or cumulative ACK arrived.

...Code available in net-2.6.25 might cure those.


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Lachlan Andrew
Greetings,

On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> On Mon, 3 Dec 2007 15:59:23 -0800
> "Shao Liu" <[EMAIL PROTECTED]> wrote:
> > And also a question, why the samples when SACK is active are outliers?
>
> Any sample with SACK is going to mean a loss or reordering has occurred.
> So shouldn't the SACK values be useful, but RTT values from retransmits
> are not useful.

When SACK is active, the per-packet processing becomes more involved,
tracking the list of lost/SACKed packets.  This causes a CPU spike
just after a loss, which increases the RTTs, at least in my
experience.  This is a separate issue from the fact that it is hard to
get RTT measurements from lost/retransmitted packets themselves.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Stephen Hemminger
On Mon, 3 Dec 2007 15:59:23 -0800
"Shao Liu" <[EMAIL PROTECTED]> wrote:

> Hi Stephen and Lachlan,
> 
> Thanks for the discussion. The gradual aging is surely an option. And
> another possibility is that, we compute the RTT just before congestion
> notification, which ideally, represent the full queueing delay + propagation
> delay. We can compute the average of the last M such values, and either use
> the average as maxRTT, or use it as a benchmark to judge whether a sample is
> outlier. How do you think of this idea?

The problem with an average like that would be storing enough values
to be useful and choosing how many to store. Perhaps some form of
weighted sliding average which favors recent values heavily would work
best. Remember that RTT's have a huge noise component and you are
fighting against the long tail distribution trying to see the queue
effects.

> 
> And also a question, why the samples when SACK is active are outliers?

Any sample with SACK is going to mean a loss or reordering has occurred.
So shouldn't the SACK values be useful, but RTT values from retransmits
are not useful.

> 
> For the accuracy of time stamping, I am not very familiar with the
> implementation details. But I can think of two ways, 1) do time stamp in as
> low layer as possible; 2) use as high priority thread to do it as possible.
> For 2), we can use separate threads to do time stamp and to process packets.

Right now the resolution is in microseconds using the hardware clock.
The clock usage costs a little bit, but makes the math more accurate.
It would be worth exploring sensitivity by taking out RTT_STAMP from
the flags field and varying HZ.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Shao Liu
Hi Stephen and Lachlan,

Thanks for the discussion. The gradual aging is surely an option. And
another possibility is that, we compute the RTT just before congestion
notification, which ideally, represent the full queueing delay + propagation
delay. We can compute the average of the last M such values, and either use
the average as maxRTT, or use it as a benchmark to judge whether a sample is
outlier. How do you think of this idea?

And also a question, why the samples when SACK is active are outliers? 

For the accuracy of time stamping, I am not very familiar with the
implementation details. But I can think of two ways, 1) do time stamp in as
low layer as possible; 2) use as high priority thread to do it as possible.
For 2), we can use separate threads to do time stamp and to process packets.

Thanks and I will let you know more of my thoughts after I go over the
entire code space!
-Shao 


-Original Message-
From: Lachlan Andrew [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 03, 2007 3:06 PM
To: Stephen Hemminger
Cc: [EMAIL PROTECTED]; David S. Miller; Herbert Xu; Douglas Leith;
Robert Shorten; netdev@vger.kernel.org
Subject: Re: [RFC] TCP illinois max rtt aging

Greetings Stephen,

Thanks.  We'll have to play with the rate of ageing.  I used the slower
ageing

if (ca->cnt_rtt > 3) {
u64 mean_rtt = ca->sum_rtt;
do_div (mean_rtt, ca->cnt_rtt);

if (ca->max_rtt > mean_rtt)
ca->max_rtt -= (ca->max_rtt - mean_rtt) >> 9;
}

and still found that the max_rtt drops considerably within a congestion
epoch.

What would also really help would be getting rid of RTT outliers
somehow.  I ignore RTT measurements when SACK is active:
   if (ca->max_rtt < rtt) {
struct tcp_sock *tp = tcp_sk(sk);
if (! tp->sacked_out )  // SACKs cause hi-CPU/hi-RTT. ignore
ca->max_rtt = rtt;
}
which helps a lot, but still gets some outliers.  Would it be possible
to time-stamp packets in the hardware interrupt handler, instead of
waiting for the post-processing stage?

Cheers,
Lachlan

On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> On Wed, 28 Nov 2007 21:26:12 -0800
> "Shao Liu" <[EMAIL PROTECTED]> wrote:
>
> > Hi Stephen and Lachlan,
> >
> > Thanks for pointing out and fixing this bug.
> >
> > For the max RTT problem, I have considered it also and I have some idea
on
> > improve it. I also have some other places to improve. I will summarize
all
> > my new ideas and send you an update. For me to change it, could you
please
> > give me a link to download to latest source codes for the whole
congestion
> > control module in Linux implementation, including the general module for
all
> > algorithms, and the implementation for specific algorithms like
TCP-Illinois
> > and H-TCP?
> >
> > Thanks for the help!
> > -Shao
> >
> >
> >
> > -Original Message-
> > From: Stephen Hemminger [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, November 28, 2007 4:44 PM
> > To: Lachlan Andrew
> > Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith;
> > Robert Shorten; netdev@vger.kernel.org
> > Subject: Re: [PATCH] tcp-illinois: incorrect beta usage
> >
> > Lachlan Andrew wrote:
> > > Thanks Stephen.
> > >
> > > A related problem (largely due to the published algorithm itself) is
> > > that Illinois is very aggressive when it over-estimates the maximum
> > > RTT.
> > >
> > > At high load (say 200Mbps and 200ms RTT), a backlog of packets builds
> > > up just after a loss, causing the RTT estimate to become large.  This
> > > makes Illinois think that *all* losses are due to corruption not
> > > congestion, and so only back off by 1/8 instead of 1/2.
> > >
> > > I can't think how to fix this except by better RTT estimation, or
> > > changes to Illinois itself.  Currently, I ignore RTT measurements when
> > >sacked_out != 0and have a heuristic "RTT aging" mechanism, but
> > > that's pretty ugly.
> > >
> > > Cheers,
> > > Lachlan
> > >
> > >
> > Ageing the RTT estimates needs to be done anyway.
> > Maybe something can be reused from H-TCP. The two are closely related.
> >
>
> The following adds gradual aging of max RTT.
>
> --- a/net/ipv4/tcp_illinois.c   2007-11-29 08:58:35.0 -0800
> +++ b/net/ipv4/tcp_illinois.c   2007-11-29 09:37:33.0 -0800
> @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk)
> ca->cnt_rtt = 0;
> ca->sum_rtt = 0;
>
> -   /* TOD

Re: [RFC] TCP illinois max rtt aging

2007-12-03 Thread Lachlan Andrew
Greetings Stephen,

Thanks.  We'll have to play with the rate of ageing.  I used the slower ageing

if (ca->cnt_rtt > 3) {
u64 mean_rtt = ca->sum_rtt;
do_div (mean_rtt, ca->cnt_rtt);

if (ca->max_rtt > mean_rtt)
ca->max_rtt -= (ca->max_rtt - mean_rtt) >> 9;
}

and still found that the max_rtt drops considerably within a congestion epoch.

What would also really help would be getting rid of RTT outliers
somehow.  I ignore RTT measurements when SACK is active:
   if (ca->max_rtt < rtt) {
struct tcp_sock *tp = tcp_sk(sk);
if (! tp->sacked_out )  // SACKs cause hi-CPU/hi-RTT. ignore
ca->max_rtt = rtt;
}
which helps a lot, but still gets some outliers.  Would it be possible
to time-stamp packets in the hardware interrupt handler, instead of
waiting for the post-processing stage?

Cheers,
Lachlan

On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> On Wed, 28 Nov 2007 21:26:12 -0800
> "Shao Liu" <[EMAIL PROTECTED]> wrote:
>
> > Hi Stephen and Lachlan,
> >
> > Thanks for pointing out and fixing this bug.
> >
> > For the max RTT problem, I have considered it also and I have some idea on
> > improve it. I also have some other places to improve. I will summarize all
> > my new ideas and send you an update. For me to change it, could you please
> > give me a link to download to latest source codes for the whole congestion
> > control module in Linux implementation, including the general module for all
> > algorithms, and the implementation for specific algorithms like TCP-Illinois
> > and H-TCP?
> >
> > Thanks for the help!
> > -Shao
> >
> >
> >
> > -Original Message-
> > From: Stephen Hemminger [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, November 28, 2007 4:44 PM
> > To: Lachlan Andrew
> > Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith;
> > Robert Shorten; netdev@vger.kernel.org
> > Subject: Re: [PATCH] tcp-illinois: incorrect beta usage
> >
> > Lachlan Andrew wrote:
> > > Thanks Stephen.
> > >
> > > A related problem (largely due to the published algorithm itself) is
> > > that Illinois is very aggressive when it over-estimates the maximum
> > > RTT.
> > >
> > > At high load (say 200Mbps and 200ms RTT), a backlog of packets builds
> > > up just after a loss, causing the RTT estimate to become large.  This
> > > makes Illinois think that *all* losses are due to corruption not
> > > congestion, and so only back off by 1/8 instead of 1/2.
> > >
> > > I can't think how to fix this except by better RTT estimation, or
> > > changes to Illinois itself.  Currently, I ignore RTT measurements when
> > >sacked_out != 0and have a heuristic "RTT aging" mechanism, but
> > > that's pretty ugly.
> > >
> > > Cheers,
> > > Lachlan
> > >
> > >
> > Ageing the RTT estimates needs to be done anyway.
> > Maybe something can be reused from H-TCP. The two are closely related.
> >
>
> The following adds gradual aging of max RTT.
>
> --- a/net/ipv4/tcp_illinois.c   2007-11-29 08:58:35.0 -0800
> +++ b/net/ipv4/tcp_illinois.c   2007-11-29 09:37:33.0 -0800
> @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk)
> ca->cnt_rtt = 0;
> ca->sum_rtt = 0;
>
> -   /* TODO: age max_rtt? */
> +   /* add slowly fading memory for maxRTT to accommodate routing changes 
> */
> +   if (ca->max_rtt > ca->base_rtt)
> +   ca->max_rtt = ca->base_rtt
> +   + (((ca->max_rtt - ca->base_rtt) * 31) >> 5);
>  }
>
>  static void tcp_illinois_init(struct sock *sk)
>


-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] TCP illinois max rtt aging

2007-12-03 Thread Stephen Hemminger
On Wed, 28 Nov 2007 21:26:12 -0800
"Shao Liu" <[EMAIL PROTECTED]> wrote:

> Hi Stephen and Lachlan,
> 
> Thanks for pointing out and fixing this bug.
> 
> For the max RTT problem, I have considered it also and I have some idea on
> improve it. I also have some other places to improve. I will summarize all
> my new ideas and send you an update. For me to change it, could you please
> give me a link to download to latest source codes for the whole congestion
> control module in Linux implementation, including the general module for all
> algorithms, and the implementation for specific algorithms like TCP-Illinois
> and H-TCP? 
> 
> Thanks for the help!
> -Shao
> 
> 
> 
> -Original Message-
> From: Stephen Hemminger [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, November 28, 2007 4:44 PM
> To: Lachlan Andrew
> Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith;
> Robert Shorten; netdev@vger.kernel.org
> Subject: Re: [PATCH] tcp-illinois: incorrect beta usage
> 
> Lachlan Andrew wrote:
> > Thanks Stephen.
> >
> > A related problem (largely due to the published algorithm itself) is
> > that Illinois is very aggressive when it over-estimates the maximum
> > RTT.
> >
> > At high load (say 200Mbps and 200ms RTT), a backlog of packets builds
> > up just after a loss, causing the RTT estimate to become large.  This
> > makes Illinois think that *all* losses are due to corruption not
> > congestion, and so only back off by 1/8 instead of 1/2.
> >
> > I can't think how to fix this except by better RTT estimation, or
> > changes to Illinois itself.  Currently, I ignore RTT measurements when
> >sacked_out != 0and have a heuristic "RTT aging" mechanism, but
> > that's pretty ugly.
> >
> > Cheers,
> > Lachlan
> >
> >   
> Ageing the RTT estimates needs to be done anyway.
> Maybe something can be reused from H-TCP. The two are closely related.
>

The following adds gradual aging of max RTT.

--- a/net/ipv4/tcp_illinois.c   2007-11-29 08:58:35.0 -0800
+++ b/net/ipv4/tcp_illinois.c   2007-11-29 09:37:33.0 -0800
@@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk)
ca->cnt_rtt = 0;
ca->sum_rtt = 0;
 
-   /* TODO: age max_rtt? */
+   /* add slowly fading memory for maxRTT to accommodate routing changes */
+   if (ca->max_rtt > ca->base_rtt)
+   ca->max_rtt = ca->base_rtt
+   + (((ca->max_rtt - ca->base_rtt) * 31) >> 5);
 }
 
 static void tcp_illinois_init(struct sock *sk)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html