Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-21 Thread Gavin McCullagh
On Fri, 21 Dec 2007, David Miller wrote:

 When Gavin respins the patch I'll look at in the context of submitting
 it as a bug fix.  So Gavin please generate the patch against Linus's
 vanilla GIT tree or net-2.6, your choise.

The existing patch was against Linus' linux-2.6.git from a few days ago so
I've updated my tree and regenerated the patch (below).  Is that the right
one?

I'm just checking through the existing CA modules.  I don't see the rtt
used for RTO anywhere.  This is what I gather they're each using rtt for.

tcp_highspeed.c doesn't implement .pkts_acked
tcp_hybla.c doesn't implement .pkts_acked
tcp_scalable.c  doesn't implement .pkts_acked
tcp_bic.c   ignores rtt value from .pkts_acked

tcp_lp.cseems to ignore rtt value from .pkts_acked (despite setting
TCP_CONG_RTT_STAMP for high res rtts -- why?)
tcp_vegas.c uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_veno.c  uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_yeah.c  uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_illinois.c  uses rtt to scale increase, backoff 
-- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt

tcp_htcp.c  uses rtt to scale increase, backoff 
tcp_cubic.c uses rtt to scale increase, backoff
tcp_westwood.c  scales backoff using rtt

So as far as I can tell, timeout stuff is not ever altered using
pkts_acked() so I guess this fix only affects westwood, htcp and cubic just
now.

I need to re-read properly, but I think the same problem affects the
microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
yeah, illinois).  I might follow up with another patch which changes the
behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.

Thanks,
Gavin


Signed-off-by: Gavin McCullagh [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 889c893..6fb7989 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
u32 cnt = 0;
u32 reord = tp-packets_out;
s32 seq_rtt = -1;
+   s32 ca_seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
 
while ((skb = tcp_write_queue_head(sk))  skb != tcp_send_head(sk)) {
@@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
if (sacked  TCPCB_SACKED_RETRANS)
tp-retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
+   ca_seq_rtt = -1;
seq_rtt = -1;
if ((flag  FLAG_DATA_ACKED) ||
(packets_acked  1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
+   ca_seq_rtt = now - scb-when;
if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
+   seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb-tstamp;
}
@@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
!before(end_seq, tp-snd_up))
tp-urg_mode = 0;
} else {
+   ca_seq_rtt = now - scb-when;
if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
+   seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb-tstamp;
}
@@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
 net_invalid_timestamp()))
rtt_us = 
ktime_us_delta(ktime_get_real(),
last_ackt);
-   else if (seq_rtt  0)
-   rtt_us = jiffies_to_usecs(seq_rtt);
+   else if (ca_seq_rtt  0)
+   rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
 
ca_ops-pkts_acked(sk, pkts_acked, rtt_us);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-19 Thread Gavin McCullagh
Hi,

On Tue, 18 Dec 2007, Gavin McCullagh wrote:

 The last attempt didn't take account of the situation where a timestamp
 wasn't available and tcp_clean_rtx_queue() has to feed both the RTO and the
 congestion avoidance.  This updated patch stores both RTTs, making the
 delayed one available for the RTO and the other (ca_seq_rtt) available for
 congestion control.

I forgot to include some data to show the difference this can make to the
RTT signal:

http://www.hamilton.ie/gavinmc/linux/tcp_clean_rtx_queue.html

Gavin

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-19 Thread Gavin McCullagh
Hi,

On Wed, 19 Dec 2007, Ilpo Järvinen wrote:

 Isn't it also much better this way in a case where ACK losses happened,
 taking the longest RTT in that case is clearly questionable as it
 may over-estimate considerably.

Quite so.

 However, another thing to consider is the possibility of this value being 
 used in timeout-like fashion in ca modules (I haven't read enough ca 
 modules code to know if any of them does that), on contrary to 
 determinating just rtt or packet's delay in which case this change seems 
 appropriate (most modules do the latter). 

I'm not aware of any, but I haven't read them all either.  I would have
thought tp-srtt was the value to use in this instance, but perhaps the
individual timestamps including delack delay are useful.

 Therefore, if timeout-like module exists one should also add
 TCP_CONG_RTT_STAMP_LONGEST for that particular module and keep using
 seq_rtt for it like previously and use ca_seq_rtt only for others.

Seems reasonable.  I'll add this.

 This part doesn't exists anymore in development tree. Please base this 
 patch (and anything in future) you intend to get included to mainline
 onto net-2.6.25 unless there's a very good reason to not do so or 
 whatever 2.6.xx is the correct net development tree at that time (if
 one exists). Thanks.

Will do.   I gather I should use the latest net- tree in future when
submitting patches.

Thanks for the helpful comments,

Gavin

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-18 Thread Gavin McCullagh

The last attempt didn't take account of the situation where a timestamp
wasn't available and tcp_clean_rtx_queue() has to feed both the RTO and the
congestion avoidance.  This updated patch stores both RTTs, making the
delayed one available for the RTO and the other (ca_seq_rtt) available for
congestion control.


Signed-off-by: Gavin McCullagh [EMAIL PROTECTED] 


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 889c893..6fb7989 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
u32 cnt = 0;
u32 reord = tp-packets_out;
s32 seq_rtt = -1;
+   s32 ca_seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
 
while ((skb = tcp_write_queue_head(sk))  skb != tcp_send_head(sk)) {
@@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
if (sacked  TCPCB_SACKED_RETRANS)
tp-retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
+   ca_seq_rtt = -1;
seq_rtt = -1;
if ((flag  FLAG_DATA_ACKED) ||
(packets_acked  1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
+   ca_seq_rtt = now - scb-when;
if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
+   seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb-tstamp;
}
@@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
!before(end_seq, tp-snd_up))
tp-urg_mode = 0;
} else {
+   ca_seq_rtt = now - scb-when;
if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
+   seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb-tstamp;
}
@@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
 net_invalid_timestamp()))
rtt_us = 
ktime_us_delta(ktime_get_real(),
last_ackt);
-   else if (seq_rtt  0)
-   rtt_us = jiffies_to_usecs(seq_rtt);
+   else if (ca_seq_rtt  0)
+   rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
 
ca_ops-pkts_acked(sk, pkts_acked, rtt_us);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] TCP: use non-delayed ACK for congestion control RTT

2007-12-17 Thread Gavin McCullagh

When a delayed ACK representing two packets arrives, there are two RTT
samples available, one for each packet.  The first (in order of seq number)
will be artificially long due to the delay waiting for the second packet,
the second will trigger the ACK and so will not itself be delayed.

According to rfc1323, the SRTT used for RTO calculation should use the
first rtt, so receivers echo the timestamp from the first packet in the
delayed ack.  For congestion control however, it seems measuring delayed
ack delay is not desirable as it varies independently of congestion.

The patch below causes seq_rtt to be updated with any available later
packet rtts which should have less (and hopefully zero) delack delay.  The
lower seq_rtt then gets passed to ca_ops-pkts_acked().  

For non-delay based congestion control (cubic, h-tcp), rtt is sometimes
used for rtt-scaling.  In shortening the RTT, this may make them a little
less aggressive.  Delay-based schemes (eg vegas, illinois) should get a
considerably cleaner, more accurate congestion signal, particularly for
small cwnds. The congestion control module can potentially also filter out
bad RTTs due to the delayed ack alarm by looking at the associated cnt
which (where delayed acking is in use) should probably be 1 if the alarm
went off or greater if the ACK was triggered by a packet.

I seem to be undoing a design decision here so perhaps there is some reason
this should not be done?  Comments/explanations appreciated...


Signed-off-by: Gavin McCullagh [EMAIL PROTECTED]


--- a/net/ipv4/tcp_input.c  2007-12-15 00:22:23.0 +
+++ b/net/ipv4/tcp_input.c  2007-12-17 13:35:16.0 +
@@ -2691,11 +2691,9 @@ static int tcp_clean_rtx_queue(struct so
(packets_acked  1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
-   if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
-   if (fully_acked)
-   last_ackt = skb-tstamp;
-   }
+   seq_rtt = now - scb-when;
+   if (fully_acked)
+   last_ackt = skb-tstamp;
if (!(sacked  TCPCB_SACKED_ACKED))
reord = min(cnt, reord);
}
@@ -2709,11 +2707,9 @@ static int tcp_clean_rtx_queue(struct so
!before(end_seq, tp-snd_up))
tp-urg_mode = 0;
} else {
-   if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
-   if (fully_acked)
-   last_ackt = skb-tstamp;
-   }
+   seq_rtt = now - scb-when;
+   if (fully_acked)
+   last_ackt = skb-tstamp;
reord = min(cnt, reord);
}
tp-packets_out -= packets_acked;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reading the tcp headers within the write queue

2007-12-13 Thread Gavin McCullagh
Hi,

thanks for the swift reply.

On Thu, 13 Dec 2007, David Miller wrote:

  I'm trying to hack together something which will run through the
  retransmit queue looking at the tcp headers.
 
 The packets in the retransmit queue are headerless, the
 header only gets added to clones of the retransmit queue
 frames during the actual transmit.

Thought that might be it. I presume there isn't any other residue of the
tcp options elsewhere, that one could look at when the packet gets
acknowledged?  I'm particularly interested in the timestamp.

 And this question belongs on netdev not linux-net.

Oops, sorry.

Gavin

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


possible bug in tcp_probe

2007-11-13 Thread Gavin McCullagh
Hi,

I'm using linux v2.6.22.6 and tcp_probe with a couple of small
modifications[1]. 

Even with moderately large numbers of flows (16 on the one machine) and
increasingly as I monitor more flows than that, I get strange overflow
problems such as this one:

74.259589763 192.168.2.1 36988 192.168.3.5 5001 0x679c23dc 0x679bc3b4 18 13 
9114624 78 76 1 0 64
74.260590660 192.168.2.1 44261 192.168.3.5 5006 0x573bb3ed 0x573b700d 13 9 
5254144 155 127 1 0 64
74.261607478 192.168.2.1 44261 192.168.3.5 5006 0x588.066586741 192.168.2.1 
33739 192.168.3.5 5009 0xe26d1767 0xe26cf577 2 3 13090816 443 15818 1 0 64
88.066690797 192.168.2.1 33739 192.168.3.5 5009 0xe26d1767 0xe26cfb1f 3 3 
13092864 2365 15818 1 0 64
88.067625714 192.168.2.1 59385 192.168.3.5 5012 0x411c1090 0x411bd258 12 9 
14578688 2807 15812 1 0 64

As you can see the third line has been truncated as well as the next
roughly 14 seconds of data after which data continues writing as usual.

I don't think my small changes are causing this but perhaps I'm wrong.
Does anyone know what might be causing the above?

Many thanks for any ideas,
Gavin

[1] I have slightly modified tcp_probe to print out information for a range
of ports (instead of one port or all) and to print info from the congestion
avoidance inet_csk_ca struct.  This adds a couple of extra fields to the
end.  If either of these are of interest as patches I'll happily submit
them.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: possible bug in tcp_probe

2007-11-13 Thread Gavin McCullagh
Hi Sangtae,

On Tue, 13 Nov 2007, SANGTAE HA wrote:

 This is fixed in the current version of tcp_probe by Stephen.  Please
 see the below.

 You can copy the current version of tcp_probe to your kernel version
 and it should work.

Many thanks, I'll give this a try,

Gavin

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bcm43xx: Fix code for spec changes of 2/7/2007

2007-02-27 Thread Gavin McCullagh
Hi,

On Sat, 10 Feb 2007, Matthew Garrett wrote:

 On Sat, Feb 10, 2007 at 06:55:50AM +0100, Michael Buesch wrote:
 
  It's likely that old cards still work with v4 firmware, but we don't know 
  and
  it has to be tested.
  
  Care to do so?
 
 I'll check the revision of my 4306, but I think it's probably too new to 
 be useful, unfortunately...

I have a fairly old 4306 at home which I was lent by my boss and I've been
struggling to get it working for the past week or two.  What I find is that
loads of firmwares which I try with bcm43xx-fwcutter complain about the
firmware being old but then when I get a firmware which is not too old the
device registers but doesn't seem to work -- iwlist can't see any access
points and when I set the essid it doesn't associate with the AP.

Can I be helpful as a tester?

Gavin

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fixing opt-ack DoS against TCP stack

2007-01-11 Thread Gavin McCullagh
Hi,

[ moving this to netdev as requested ]

On Tue, 09 Jan 2007, Stephen Hemminger wrote:

 Actually, this paper seems to be a zombified version of:
   http://www.cs.ucsd.edu/~savage/papers/CCR99.pdf

Thanks.  In fairness to them, the emphasis is slightly different, Savage et
al are more interested in improving a receiver's performance, whereas
Sherwood seems more interested in a DoS attack.  However...

 It is not clear that current Linux systems are prone to the attack for a
 couple of reasons. First, Linux does more counts packets not bytes so
 extra ack's would be ignored. Turning on ABC would also help.

The issue I'm raising is not as much how you could artifically increase
Cwnd by dividing ACKs or sending them early.  The issue as I see it is that
if the receiver doesn't send dup acks, the sender never backs off and may
eventually flood its own link.  As the receiver need only ACK the odd
packet, the amplification can be substantial.

As this issue is already moreorless solved (in the research sense), I'm not
keen to spend a lot of time writing it up.  So, here's a quick throw
together of a small example experiment I ran yesterday.

http://www.hamilton.ie/gavinmc/drop_dupack_attack/

 Lastly, the patch looks like it could cause more problems. It probably would
 break some application and other non-attacking TCP stacks. For this case, IMHO
 we need to wait for more research. If you want to pursue the problem, it needs
 to go through the RFC process.

I must admit I didn't read the patch in detail.  As I understand it, the
fix should (in principal) be compatible with other TCP stacks who should
just see an odd extra dropped packet and react with a duplicate ack as
usual.

Gavin

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fixing opt-ack DoS against TCP stack

2007-01-09 Thread Gavin McCullagh
Hi,

recently, a few of us came up with a novel (or so we thought) DoS attack
against TCP.  We spent some time implementing and testing it and found it
to work worryingly well.

It turns out that we are not the first to come across this attack.  Rob
Sherwood and colleagues in Maryland were a year or two ahead of us.  They
have published a paper entitled Misbehaving TCP Receivers Can Cause
Internet-Wide Congestion Collapse.

http://www.cs.umd.edu/~capveg/optack/optack-ccs05.pdf
http://www.cs.umd.edu/~capveg/
http://www.kb.cert.org/vuls/id/102014

Linux appears not to have implemented any fix for this vulnerability,
although Rob Sherwood wrote a patch against 2.4.24.

http://www.cs.umd.edu/~capveg/optack/optack.patch

There seems to be a brief mention of it on the fedora-security list but I
can't find much discussion of it in linux circles otherwise.

http://www.spinics.net/linux/fedora/fedora-security/msg00426.html
http://www.securityfocus.com/bid/15468/

Is there some reason that this fix was not accepted or has this just
slipped under people's radars?  Should some fix not be implemented?  The
issue seems even more severe with the larger buffer sizes now in use.

Gavin

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix integer overflow in H-TCP congestion control

2006-10-25 Thread Gavin McCullagh

When using H-TCP with a single flow on a 500Mbit connection (or less
actually), alpha can exceed 65000, so alpha needs to be a u32.

Signed-off-by: Gavin McCullagh [EMAIL PROTECTED]
Signed-off-by: Doug Leith [EMAIL PROTECTED]


diff --git a/net/ipv4/tcp_htcp.c b/net/ipv4/tcp_htcp.c
index 6edfe5e..8072b6d 100644
--- a/net/ipv4/tcp_htcp.c
+++ b/net/ipv4/tcp_htcp.c
@@ -23,7 +23,7 @@ module_param(use_bandwidth_switch, int, 
 MODULE_PARM_DESC(use_bandwidth_switch, turn on/off bandwidth switcher);
 
 struct htcp {
-   u16 alpha;  /* Fixed point arith,  7 */
+   u32 alpha;  /* Fixed point arith,  7 */
u8  beta;   /* Fixed point arith,  7 */
u8  modeswitch; /* Delay modeswitch until we had at least one 
congestion event */
u32 last_cong;  /* Time since last congestion event end */

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix integer overflow in H-TCP congestion control

2006-10-24 Thread Gavin McCullagh

When using H-TCP with a single flow on a 500Mbit connection (or less
actually), alpha can exceed 65000, so alpha needs to be a u32.

Signed-off-by: Gavin McCullagh [EMAIL PROTECTED]
Signed-off-by: Doug Leith [EMAIL PROTECTED]


diff --git a/net/ipv4/tcp_htcp.c b/net/ipv4/tcp_htcp.c
index 6edfe5e..8072b6d 100644
--- a/net/ipv4/tcp_htcp.c
+++ b/net/ipv4/tcp_htcp.c
@@ -23,7 +23,7 @@ module_param(use_bandwidth_switch, int, 
 MODULE_PARM_DESC(use_bandwidth_switch, turn on/off bandwidth switcher);
 
 struct htcp {
-   u16 alpha;  /* Fixed point arith,  7 */
+   u32 alpha;  /* Fixed point arith,  7 */
u8  beta;   /* Fixed point arith,  7 */
u8  modeswitch; /* Delay modeswitch until we had at least one 
congestion event */
u32 last_cong;  /* Time since last congestion event end */

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html