[RFC PATCH net-2.6.25 uncompilable] [TCP]: Avoid breaking GSOed skbs when SACKed one-by-one (Was: Re: [RFC] TCP illinois max rtt aging)
On Fri, 7 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Fri, 7 Dec 2007 15:05:59 +0200 (EET) > > > On Fri, 7 Dec 2007, David Miller wrote: > > > > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > > > > > I guess if you get a large cumulative ACK, the amount of processing is > > > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > > > > > Even a simple scenario (this isn't anything fancy at all, will occur > > > > all > > > > the time): Just one loss => rest skbs grow one by one into a single > > > > very large SACK block (and we do that efficiently for sure) => then the > > > > fast retransmit gets delivered and a cumulative ACK for whole > > > > orig_window > > > > arrives => clean_rtx_queue has to do a lot of processing. In this case > > > > we > > > > could optimize RB-tree cleanup away (by just blanking it all) but still > > > > getting rid of all those skbs is going to take a larger moment than I'd > > > > like to see. > > > > > > Yes, it's the classic problem. But it ought to be at least > > > partially masked when TSO is in use, because we'll only process > > > a handful of SKBs. The more effectively TSO batches, the > > > less work clean_rtx_queue() will do. > > > > No, that's not what is going to happen, TSO won't help at all > > because one-by-one SACKs will fragment every single one of them > > (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO > > case, or am I missing something? > > You're of course right, and it's ironic that I wrote the SACK > splitting code so I should have known this :-) > > A possible approach just occurred to me wherein we maintain > the SACK state external to the SKBs so that we don't need to > mess with them at all. > > That would allow us to eliminate the TSO splitting but it would > not remove the general problem of clean_rtx_queue()'s overhead. > > I'll try to give some thought to this over the weekend. How about this... ...I've left couple of FIXMEs there still, should be quite simple & straightforward to handle them if this seems viable solution at all. Beware, this doesn't even compile yet because not all parameters are transferred currently (first_sack_index was killed earlier, I need to summon it back for this). Also, I'd need to do the dirty work of kill recv_sack_cache first to make this to not produce, well, interesting effects due to missed SACK blocks... :-) Applies cleanly only after this: [TCP]: Push fack_count calculation deeper into functions -- i. -- [RFC PATCH net-2.6.25 uncompilable] [TCP]: Avoid breaking GSOed skbs when SACKed one-by-one Because receiver reports out-of-order segment one-by-one using SACK, the tcp_fragment may do a lot of unnecessary splits that would be avoided if the sender could see the upcoming future. Not only SACK processing suffers but clean_rtx_queue as well is considerable hit when the corresponding cumulative ACK arrives. Thus implement a local cache for a single skb to avoid enormous splitting efforts while the latest SACK block is still growing. Messy enough, other parts must be made aware of this change as well because the skb state is a bit fuzzy while have not yet marked it in tcp_sacktag_one. Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- include/linux/tcp.h |2 + include/net/tcp.h | 19 net/ipv4/tcp_input.c | 110 ++-- net/ipv4/tcp_output.c |7 +++ 4 files changed, 133 insertions(+), 5 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 56342c3..4fbfa46 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -360,6 +360,8 @@ struct tcp_sock { u32 fackets_out;/* FACK'd packets */ u32 high_seq; /* snd_nxt at onset of congestion */ + u32 sack_pending; /* End seqno of postponed SACK tagging */ + u32 retrans_stamp; /* Timestamp of the last retransmit, * also used in SYN-SENT to remember stamp of * the first SYN. */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 5e6c433..e2b88e3 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -462,6 +462,21 @@ extern void tcp_send_delayed_ack(struct sock *sk); /* tcp_input.c */ extern void tcp_cwnd_application_limited(struct sock *sk); +extern void __tcp_process_postponed_sack(struct sock *sk, struct sk_buff *skb); +extern struct sk_buff *tcp_find_postponed_skb(struct sock *sk); + +static inline void tcp_process_postponed_sack(struct sock *sk) +{ + if (tcp_sk(sk)->sack_pending) + __tcp_process_postponed_sack(sk, tcp_find_postponed_skb(sk)); +} + +static inline void tcp_process_postponed_sack_overlapping(struct sock *sk, + struct sk_buff *skb) +{ + if (tcp_sk(s
Re: [RFC] TCP illinois max rtt aging
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 15:05:59 +0200 (EET) > On Fri, 7 Dec 2007, David Miller wrote: > > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > > > I guess if you get a large cumulative ACK, the amount of processing is > > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > > > Even a simple scenario (this isn't anything fancy at all, will occur all > > > the time): Just one loss => rest skbs grow one by one into a single > > > very large SACK block (and we do that efficiently for sure) => then the > > > fast retransmit gets delivered and a cumulative ACK for whole orig_window > > > arrives => clean_rtx_queue has to do a lot of processing. In this case we > > > could optimize RB-tree cleanup away (by just blanking it all) but still > > > getting rid of all those skbs is going to take a larger moment than I'd > > > like to see. > > > > > > That tree blanking could be extended to cover anything which ACK more > > > than > > > half of the tree by just replacing the root (and dealing with potential > > > recolorization of the root). > > > > Yes, it's the classic problem. But it ought to be at least > > partially masked when TSO is in use, because we'll only process > > a handful of SKBs. The more effectively TSO batches, the > > less work clean_rtx_queue() will do. > > No, that's not what is going to happen, TSO won't help at all > because one-by-one SACKs will fragment every single one of them > (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO > case, or am I missing something? You're of course right, and it's ironic that I wrote the SACK splitting code so I should have known this :-) A possible approach just occurred to me wherein we maintain the SACK state external to the SKBs so that we don't need to mess with them at all. That would allow us to eliminate the TSO splitting but it would not remove the general problem of clean_rtx_queue()'s overhead. I'll try to give some thought to this over the weekend. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Fri, 7 Dec 2007, Ilpo Järvinen wrote: > On Fri, 7 Dec 2007, David Miller wrote: > > > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > > > I guess if you get a large cumulative ACK, the amount of processing is > > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > > > Even a simple scenario (this isn't anything fancy at all, will occur all > > > the time): Just one loss => rest skbs grow one by one into a single > > > very large SACK block (and we do that efficiently for sure) => then the > > > fast retransmit gets delivered and a cumulative ACK for whole orig_window > > > arrives => clean_rtx_queue has to do a lot of processing. In this case we > > > could optimize RB-tree cleanup away (by just blanking it all) but still > > > getting rid of all those skbs is going to take a larger moment than I'd > > > like to see. > > > > > > That tree blanking could be extended to cover anything which ACK more > > > than > > > half of the tree by just replacing the root (and dealing with potential > > > recolorization of the root). > > > > Yes, it's the classic problem. But it ought to be at least > > partially masked when TSO is in use, because we'll only process > > a handful of SKBs. The more effectively TSO batches, the > > less work clean_rtx_queue() will do. > > No, that's not what is going to happen, TSO won't help at all > because one-by-one SACKs will fragment every single one of them > (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO > case, or am I missing something? Hmm... this could be solved though by postponing the fragmentation of a partially sacked skb when the first sack block can (is likely) to still grow and remove the need for fragmentation. Has some implications to packet processing, increases burstiness a bit & tcp_max_burst kicks in too easily. -- i.
Re: [RFC] TCP illinois max rtt aging
On Fri, 7 Dec 2007, David Miller wrote: > From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> > Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > > > I guess if you get a large cumulative ACK, the amount of processing is > > still overwhelming (added DaveM if he has some idea how to combat it). > > > > Even a simple scenario (this isn't anything fancy at all, will occur all > > the time): Just one loss => rest skbs grow one by one into a single > > very large SACK block (and we do that efficiently for sure) => then the > > fast retransmit gets delivered and a cumulative ACK for whole orig_window > > arrives => clean_rtx_queue has to do a lot of processing. In this case we > > could optimize RB-tree cleanup away (by just blanking it all) but still > > getting rid of all those skbs is going to take a larger moment than I'd > > like to see. > > > > That tree blanking could be extended to cover anything which ACK more than > > half of the tree by just replacing the root (and dealing with potential > > recolorization of the root). > > Yes, it's the classic problem. But it ought to be at least > partially masked when TSO is in use, because we'll only process > a handful of SKBs. The more effectively TSO batches, the > less work clean_rtx_queue() will do. No, that's not what is going to happen, TSO won't help at all because one-by-one SACKs will fragment every single one of them (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO case, or am I missing something? > Web100 just provides statistics and other kinds of connection data > to userspace, all the actual algorithm etc. modifications have been > merged upstream and yanked out of the web100 patch. I was looking > at it the other night and it's frankly totally uninteresting these > days :-) ...Thanks, I'll keep that in my mind when looking... :-) -- i.
Re: [RFC] TCP illinois max rtt aging
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET) > I guess if you get a large cumulative ACK, the amount of processing is > still overwhelming (added DaveM if he has some idea how to combat it). > > Even a simple scenario (this isn't anything fancy at all, will occur all > the time): Just one loss => rest skbs grow one by one into a single > very large SACK block (and we do that efficiently for sure) => then the > fast retransmit gets delivered and a cumulative ACK for whole orig_window > arrives => clean_rtx_queue has to do a lot of processing. In this case we > could optimize RB-tree cleanup away (by just blanking it all) but still > getting rid of all those skbs is going to take a larger moment than I'd > like to see. > > That tree blanking could be extended to cover anything which ACK more than > half of the tree by just replacing the root (and dealing with potential > recolorization of the root). Yes, it's the classic problem. But it ought to be at least partially masked when TSO is in use, because we'll only process a handful of SKBs. The more effectively TSO batches, the less work clean_rtx_queue() will do. When not doing TSO the behavior is super-stupid, we bump reference counts on the same page multiple times while running over the SKBs since consequetive SKBs cover data in different spans of the same page. The core issue is that we have a poorly behaving data container, and therefore that's obviously what we need to change. Conceptually what we probably need to do is seperate the data maintainence from the SKB objects themselves. There is a blob that maintains the paged data state for everything in the retransmit queue. SKBs are built and get the page pointers but don't actually grab references to the pages, the blob does that and it keeps track of how many SKB references to each page there are, non-atomically. The hardest part is dealing with the page lifetime issues. Unfortunately, when we trim the rtx queue, references to the clones can still exist in the driver output path. It's a difficult problem to overcome in fact, so in the end my suggestion above might not even be workable. > No idea about what it could do, haven't yet looked web100, I was planning > at some point of time... Web100 just provides statistics and other kinds of connection data to userspace, all the actual algorithm etc. modifications have been merged upstream and yanked out of the web100 patch. I was looking at it the other night and it's frankly totally uninteresting these days :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Thu, 6 Dec 2007, Lachlan Andrew wrote: > On 04/12/2007, Ilpo Järvinen <[EMAIL PROTECTED]> wrote: > > On Mon, 3 Dec 2007, Lachlan Andrew wrote: > > > > > > When SACK is active, the per-packet processing becomes more involved, > > > tracking the list of lost/SACKed packets. This causes a CPU spike > > > just after a loss, which increases the RTTs, at least in my > > > experience. > > > > I suspect that as long as old code was able to use hint, it wasn't doing > > that bad. But it was seriously lacking ability to take advantage of sack > > processing hint when e.g., a new hole appeared, or cumulative ACK arrived. > > > > ...Code available in net-2.6.25 might cure those. > > We had been using one of your earlier patches, and still had the > problem. I think you've cured the problem with SACK itself, but there > still seems to be something taking a lot of CPU while recovering from > the loss. I guess if you get a large cumulative ACK, the amount of processing is still overwhelming (added DaveM if he has some idea how to combat it). Even a simple scenario (this isn't anything fancy at all, will occur all the time): Just one loss => rest skbs grow one by one into a single very large SACK block (and we do that efficiently for sure) => then the fast retransmit gets delivered and a cumulative ACK for whole orig_window arrives => clean_rtx_queue has to do a lot of processing. In this case we could optimize RB-tree cleanup away (by just blanking it all) but still getting rid of all those skbs is going to take a larger moment than I'd like to see. That tree blanking could be extended to cover anything which ACK more than half of the tree by just replacing the root (and dealing with potential recolorization of the root). > It is possible that it was to do with web100 which we > have also been running, but I cut out most of the statistics from that > and still had problems. No idea about what it could do, haven't yet looked web100, I was planning at some point of time... -- i.
Re: [RFC] TCP illinois max rtt aging
Greetings Ilpo, On 04/12/2007, Ilpo Järvinen <[EMAIL PROTECTED]> wrote: > On Mon, 3 Dec 2007, Lachlan Andrew wrote: > > > > When SACK is active, the per-packet processing becomes more involved, > > tracking the list of lost/SACKed packets. This causes a CPU spike > > just after a loss, which increases the RTTs, at least in my > > experience. > > I suspect that as long as old code was able to use hint, it wasn't doing > that bad. But it was seriously lacking ability to take advantage of sack > processing hint when e.g., a new hole appeared, or cumulative ACK arrived. > > ...Code available in net-2.6.25 might cure those. We had been using one of your earlier patches, and still had the problem. I think you've cured the problem with SACK itself, but there still seems to be something taking a lot of CPU while recovering from the loss. It is possible that it was to do with web100 which we have also been running, but I cut out most of the statistics from that and still had problems. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Mon, 3 Dec 2007, Lachlan Andrew wrote: > On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > On Mon, 3 Dec 2007 15:59:23 -0800 > > "Shao Liu" <[EMAIL PROTECTED]> wrote: > > > And also a question, why the samples when SACK is active are outliers? > > > > Any sample with SACK is going to mean a loss or reordering has occurred. > > So shouldn't the SACK values be useful, but RTT values from retransmits > > are not useful. > > When SACK is active, the per-packet processing becomes more involved, > tracking the list of lost/SACKed packets. This causes a CPU spike > just after a loss, which increases the RTTs, at least in my > experience. I suspect that as long as old code was able to use hint, it wasn't doing that bad. But it was seriously lacking ability to take advantage of sack processing hint when e.g., a new hole appeared, or cumulative ACK arrived. ...Code available in net-2.6.25 might cure those. -- i. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
Greetings, On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > On Mon, 3 Dec 2007 15:59:23 -0800 > "Shao Liu" <[EMAIL PROTECTED]> wrote: > > And also a question, why the samples when SACK is active are outliers? > > Any sample with SACK is going to mean a loss or reordering has occurred. > So shouldn't the SACK values be useful, but RTT values from retransmits > are not useful. When SACK is active, the per-packet processing becomes more involved, tracking the list of lost/SACKed packets. This causes a CPU spike just after a loss, which increases the RTTs, at least in my experience. This is a separate issue from the fact that it is hard to get RTT measurements from lost/retransmitted packets themselves. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Mon, 3 Dec 2007 15:59:23 -0800 "Shao Liu" <[EMAIL PROTECTED]> wrote: > Hi Stephen and Lachlan, > > Thanks for the discussion. The gradual aging is surely an option. And > another possibility is that, we compute the RTT just before congestion > notification, which ideally, represent the full queueing delay + propagation > delay. We can compute the average of the last M such values, and either use > the average as maxRTT, or use it as a benchmark to judge whether a sample is > outlier. How do you think of this idea? The problem with an average like that would be storing enough values to be useful and choosing how many to store. Perhaps some form of weighted sliding average which favors recent values heavily would work best. Remember that RTT's have a huge noise component and you are fighting against the long tail distribution trying to see the queue effects. > > And also a question, why the samples when SACK is active are outliers? Any sample with SACK is going to mean a loss or reordering has occurred. So shouldn't the SACK values be useful, but RTT values from retransmits are not useful. > > For the accuracy of time stamping, I am not very familiar with the > implementation details. But I can think of two ways, 1) do time stamp in as > low layer as possible; 2) use as high priority thread to do it as possible. > For 2), we can use separate threads to do time stamp and to process packets. Right now the resolution is in microseconds using the hardware clock. The clock usage costs a little bit, but makes the math more accurate. It would be worth exploring sensitivity by taking out RTT_STAMP from the flags field and varying HZ. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] TCP illinois max rtt aging
Hi Stephen and Lachlan, Thanks for the discussion. The gradual aging is surely an option. And another possibility is that, we compute the RTT just before congestion notification, which ideally, represent the full queueing delay + propagation delay. We can compute the average of the last M such values, and either use the average as maxRTT, or use it as a benchmark to judge whether a sample is outlier. How do you think of this idea? And also a question, why the samples when SACK is active are outliers? For the accuracy of time stamping, I am not very familiar with the implementation details. But I can think of two ways, 1) do time stamp in as low layer as possible; 2) use as high priority thread to do it as possible. For 2), we can use separate threads to do time stamp and to process packets. Thanks and I will let you know more of my thoughts after I go over the entire code space! -Shao -Original Message- From: Lachlan Andrew [mailto:[EMAIL PROTECTED] Sent: Monday, December 03, 2007 3:06 PM To: Stephen Hemminger Cc: [EMAIL PROTECTED]; David S. Miller; Herbert Xu; Douglas Leith; Robert Shorten; netdev@vger.kernel.org Subject: Re: [RFC] TCP illinois max rtt aging Greetings Stephen, Thanks. We'll have to play with the rate of ageing. I used the slower ageing if (ca->cnt_rtt > 3) { u64 mean_rtt = ca->sum_rtt; do_div (mean_rtt, ca->cnt_rtt); if (ca->max_rtt > mean_rtt) ca->max_rtt -= (ca->max_rtt - mean_rtt) >> 9; } and still found that the max_rtt drops considerably within a congestion epoch. What would also really help would be getting rid of RTT outliers somehow. I ignore RTT measurements when SACK is active: if (ca->max_rtt < rtt) { struct tcp_sock *tp = tcp_sk(sk); if (! tp->sacked_out ) // SACKs cause hi-CPU/hi-RTT. ignore ca->max_rtt = rtt; } which helps a lot, but still gets some outliers. Would it be possible to time-stamp packets in the hardware interrupt handler, instead of waiting for the post-processing stage? Cheers, Lachlan On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > On Wed, 28 Nov 2007 21:26:12 -0800 > "Shao Liu" <[EMAIL PROTECTED]> wrote: > > > Hi Stephen and Lachlan, > > > > Thanks for pointing out and fixing this bug. > > > > For the max RTT problem, I have considered it also and I have some idea on > > improve it. I also have some other places to improve. I will summarize all > > my new ideas and send you an update. For me to change it, could you please > > give me a link to download to latest source codes for the whole congestion > > control module in Linux implementation, including the general module for all > > algorithms, and the implementation for specific algorithms like TCP-Illinois > > and H-TCP? > > > > Thanks for the help! > > -Shao > > > > > > > > -Original Message- > > From: Stephen Hemminger [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, November 28, 2007 4:44 PM > > To: Lachlan Andrew > > Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith; > > Robert Shorten; netdev@vger.kernel.org > > Subject: Re: [PATCH] tcp-illinois: incorrect beta usage > > > > Lachlan Andrew wrote: > > > Thanks Stephen. > > > > > > A related problem (largely due to the published algorithm itself) is > > > that Illinois is very aggressive when it over-estimates the maximum > > > RTT. > > > > > > At high load (say 200Mbps and 200ms RTT), a backlog of packets builds > > > up just after a loss, causing the RTT estimate to become large. This > > > makes Illinois think that *all* losses are due to corruption not > > > congestion, and so only back off by 1/8 instead of 1/2. > > > > > > I can't think how to fix this except by better RTT estimation, or > > > changes to Illinois itself. Currently, I ignore RTT measurements when > > >sacked_out != 0and have a heuristic "RTT aging" mechanism, but > > > that's pretty ugly. > > > > > > Cheers, > > > Lachlan > > > > > > > > Ageing the RTT estimates needs to be done anyway. > > Maybe something can be reused from H-TCP. The two are closely related. > > > > The following adds gradual aging of max RTT. > > --- a/net/ipv4/tcp_illinois.c 2007-11-29 08:58:35.0 -0800 > +++ b/net/ipv4/tcp_illinois.c 2007-11-29 09:37:33.0 -0800 > @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk) > ca->cnt_rtt = 0; > ca->sum_rtt = 0; > > - /* TOD
Re: [RFC] TCP illinois max rtt aging
Greetings Stephen, Thanks. We'll have to play with the rate of ageing. I used the slower ageing if (ca->cnt_rtt > 3) { u64 mean_rtt = ca->sum_rtt; do_div (mean_rtt, ca->cnt_rtt); if (ca->max_rtt > mean_rtt) ca->max_rtt -= (ca->max_rtt - mean_rtt) >> 9; } and still found that the max_rtt drops considerably within a congestion epoch. What would also really help would be getting rid of RTT outliers somehow. I ignore RTT measurements when SACK is active: if (ca->max_rtt < rtt) { struct tcp_sock *tp = tcp_sk(sk); if (! tp->sacked_out ) // SACKs cause hi-CPU/hi-RTT. ignore ca->max_rtt = rtt; } which helps a lot, but still gets some outliers. Would it be possible to time-stamp packets in the hardware interrupt handler, instead of waiting for the post-processing stage? Cheers, Lachlan On 03/12/2007, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > On Wed, 28 Nov 2007 21:26:12 -0800 > "Shao Liu" <[EMAIL PROTECTED]> wrote: > > > Hi Stephen and Lachlan, > > > > Thanks for pointing out and fixing this bug. > > > > For the max RTT problem, I have considered it also and I have some idea on > > improve it. I also have some other places to improve. I will summarize all > > my new ideas and send you an update. For me to change it, could you please > > give me a link to download to latest source codes for the whole congestion > > control module in Linux implementation, including the general module for all > > algorithms, and the implementation for specific algorithms like TCP-Illinois > > and H-TCP? > > > > Thanks for the help! > > -Shao > > > > > > > > -Original Message- > > From: Stephen Hemminger [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, November 28, 2007 4:44 PM > > To: Lachlan Andrew > > Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith; > > Robert Shorten; netdev@vger.kernel.org > > Subject: Re: [PATCH] tcp-illinois: incorrect beta usage > > > > Lachlan Andrew wrote: > > > Thanks Stephen. > > > > > > A related problem (largely due to the published algorithm itself) is > > > that Illinois is very aggressive when it over-estimates the maximum > > > RTT. > > > > > > At high load (say 200Mbps and 200ms RTT), a backlog of packets builds > > > up just after a loss, causing the RTT estimate to become large. This > > > makes Illinois think that *all* losses are due to corruption not > > > congestion, and so only back off by 1/8 instead of 1/2. > > > > > > I can't think how to fix this except by better RTT estimation, or > > > changes to Illinois itself. Currently, I ignore RTT measurements when > > >sacked_out != 0and have a heuristic "RTT aging" mechanism, but > > > that's pretty ugly. > > > > > > Cheers, > > > Lachlan > > > > > > > > Ageing the RTT estimates needs to be done anyway. > > Maybe something can be reused from H-TCP. The two are closely related. > > > > The following adds gradual aging of max RTT. > > --- a/net/ipv4/tcp_illinois.c 2007-11-29 08:58:35.0 -0800 > +++ b/net/ipv4/tcp_illinois.c 2007-11-29 09:37:33.0 -0800 > @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk) > ca->cnt_rtt = 0; > ca->sum_rtt = 0; > > - /* TODO: age max_rtt? */ > + /* add slowly fading memory for maxRTT to accommodate routing changes > */ > + if (ca->max_rtt > ca->base_rtt) > + ca->max_rtt = ca->base_rtt > + + (((ca->max_rtt - ca->base_rtt) * 31) >> 5); > } > > static void tcp_illinois_init(struct sock *sk) > -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] TCP illinois max rtt aging
On Wed, 28 Nov 2007 21:26:12 -0800 "Shao Liu" <[EMAIL PROTECTED]> wrote: > Hi Stephen and Lachlan, > > Thanks for pointing out and fixing this bug. > > For the max RTT problem, I have considered it also and I have some idea on > improve it. I also have some other places to improve. I will summarize all > my new ideas and send you an update. For me to change it, could you please > give me a link to download to latest source codes for the whole congestion > control module in Linux implementation, including the general module for all > algorithms, and the implementation for specific algorithms like TCP-Illinois > and H-TCP? > > Thanks for the help! > -Shao > > > > -Original Message- > From: Stephen Hemminger [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 28, 2007 4:44 PM > To: Lachlan Andrew > Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith; > Robert Shorten; netdev@vger.kernel.org > Subject: Re: [PATCH] tcp-illinois: incorrect beta usage > > Lachlan Andrew wrote: > > Thanks Stephen. > > > > A related problem (largely due to the published algorithm itself) is > > that Illinois is very aggressive when it over-estimates the maximum > > RTT. > > > > At high load (say 200Mbps and 200ms RTT), a backlog of packets builds > > up just after a loss, causing the RTT estimate to become large. This > > makes Illinois think that *all* losses are due to corruption not > > congestion, and so only back off by 1/8 instead of 1/2. > > > > I can't think how to fix this except by better RTT estimation, or > > changes to Illinois itself. Currently, I ignore RTT measurements when > >sacked_out != 0and have a heuristic "RTT aging" mechanism, but > > that's pretty ugly. > > > > Cheers, > > Lachlan > > > > > Ageing the RTT estimates needs to be done anyway. > Maybe something can be reused from H-TCP. The two are closely related. > The following adds gradual aging of max RTT. --- a/net/ipv4/tcp_illinois.c 2007-11-29 08:58:35.0 -0800 +++ b/net/ipv4/tcp_illinois.c 2007-11-29 09:37:33.0 -0800 @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk) ca->cnt_rtt = 0; ca->sum_rtt = 0; - /* TODO: age max_rtt? */ + /* add slowly fading memory for maxRTT to accommodate routing changes */ + if (ca->max_rtt > ca->base_rtt) + ca->max_rtt = ca->base_rtt + + (((ca->max_rtt - ca->base_rtt) * 31) >> 5); } static void tcp_illinois_init(struct sock *sk) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html