Re: No pmtu probing on retransmits?

2008-02-03 Thread John Heffner
Andi Kleen wrote: Hallo, While looking for something else in tcp_output.c I noticed that MTU probing seems to be only done in tcp_write_xmit (when packets come directly from process context), but not via the timer driven timer retransmit path (tcp_retransmit_skb). Is that intentional? It

Re: SO_RCVBUF doesn't change receiver advertised window

2008-01-16 Thread John Heffner
Ritesh Kumar wrote: On 1/16/08, Bill Fink [EMAIL PROTECTED] wrote: On Tue, 15 Jan 2008, Ritesh Kumar wrote: Hi, I am using linux 2.6.20 and am trying to limit the receiver window size for a TCP connection. However, it seems that auto tuning is not turning itself off even after I use the

Re: SACK scoreboard

2008-01-09 Thread John Heffner
David Miller wrote: From: John Heffner [EMAIL PROTECTED] Date: Tue, 08 Jan 2008 23:27:08 -0500 I also wonder how much of a problem this is (for now, with window sizes of order 1 packets. My understanding is that the biggest problems arise from O(N^2) time for recovery because every ack

Re: SACK scoreboard

2008-01-09 Thread John Heffner
SANGTAE HA wrote: On Jan 9, 2008 9:56 AM, John Heffner [EMAIL PROTECTED] wrote: I also wonder how much of a problem this is (for now, with window sizes of order 1 packets. My understanding is that the biggest problems arise from O(N^2) time for recovery because every ack was expensive

Re: SACK scoreboard

2008-01-08 Thread John Heffner
David Miller wrote: Ilpo, just trying to keep an old conversation from dying off. Did you happen to read a recent blog posting of mine? http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2007/12/31#tcp_overhead I've been thinking more and more and I think we might be able to get away with

Re: SACK scoreboard

2008-01-08 Thread John Heffner
Andi Kleen wrote: David Miller [EMAIL PROTECTED] writes: The big problem is that recovery from even a single packet loss in a window makes us run kfree_skb() for a all the packets in a full window's worth of data when recovery completes. Why exactly is it a problem to free them all at once?

Re: TSO trimming question

2007-12-20 Thread John Heffner
David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) [PATCH] [TCP]: Fix TSO deferring I'd say that most of what tcp_tso_should_defer had in between there was dead code because of this. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Re: TCP event tracking via netlink...

2007-12-05 Thread John Heffner
David Miller wrote: Ilpo, I was pondering the kind of debugging one does to find congestion control issues and even SACK bugs and it's currently too painful because there is no standard way to track state changes. I assume you're using something like carefully crafted printk's, kprobes, or even

Re: [PATCH net-2.6 0/3]: Three TCP fixes

2007-12-04 Thread John Heffner
Ilpo Järvinen wrote: ...I'm still to figure out why tcp_cwnd_down uses snd_ssthresh/2 as lower bound even though the ssthresh was already halved, so snd_ssthresh should suffice. I remember this coming up at least once before, so it's probably worth a comment in the code. Rate-halving

Re: [PATCH net-2.6 0/3]: Three TCP fixes

2007-12-04 Thread John Heffner
Ilpo Järvinen wrote: On Tue, 4 Dec 2007, John Heffner wrote: Ilpo Järvinen wrote: ...I'm still to figure out why tcp_cwnd_down uses snd_ssthresh/2 as lower bound even though the ssthresh was already halved, so snd_ssthresh should suffice. I remember this coming up at least once before, so

Re: [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop)

2007-11-21 Thread John Heffner
there with SYN still queued. Use of write_seq check guarantees that there's a valid skb in send_head so I removed the extra check. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Acked-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/tcp_output.c |7 +-- 1 files changed, 1 insertions(+), 6

Re: [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window data available checks fixed

2007-11-21 Thread John Heffner
[EMAIL PROTECTED] Acked-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/tcp_output.c | 17 - 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 30d6737..ff22ce8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4

Re: Fw: [Bug 9189] New: Oops in kernel 2.6.21-rc4 through 2.6.23, page allocation failure

2007-10-19 Thread John Heffner
Stephen Hemminger wrote: Looks like a memory over commit with small machines?? Begin forwarded message: Date: Fri, 19 Oct 2007 01:35:33 -0700 (PDT) From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bug 9189] New: Oops in kernel 2.6.21-rc4 through 2.6.23, page allocation failure [snip]

Re: Question on TSO maximum segment sizes.

2007-10-11 Thread John Heffner
Ben Greear wrote: I just tried turning off my explicit SO_SNDBUF/SO_RCVBUG settings in my app, and the connection ran very poorly through a link with even a small bit of latency (~2-4ms I believe). I often run at full gigabit or faster with latencies of 100+ ms. Can you give a bit more

Re: tcp bw in 2.6

2007-10-02 Thread John Heffner
gigabit machines: $ ./tcpsend -t10 dew Sent 1240415312 bytes in 10.033101 seconds Throughput: 123632294 B/s -John /* * discard.c * A simple discard server. * * Copyright 2003 John Heffner. */ #include stdio.h #include signal.h #include unistd.h #include string.h #include stdlib.h #include

Re: tcp bw in 2.6

2007-10-02 Thread John Heffner
Larry McVoy wrote: On Tue, Oct 02, 2007 at 06:52:54PM +0800, Herbert Xu wrote: One of my clients also has gigabit so I played around with just that one and it (itanium running hpux w/ broadcom gigabit) can push the load as well. One weird thing is that it is dependent on the direction the data

Re: tcp bw in 2.6

2007-10-02 Thread John Heffner
Larry McVoy wrote: More data, we've conclusively eliminated the card / cpu from the mix. We've got 2 ia64 boxes with e1000 interfaces. One box is running linux 2.6.12 and the other is running hpux 11. I made sure the linux one was running at gigabit and reran the tests from the linux/ia64 =

Re: sk98lin, jumbo frames, and memory fragmentation

2007-10-01 Thread John Heffner
Yes it has this problem. I've observed it in practice on a busy firewall. -John Chris Friesen wrote: Hi all, We're considering some hardware that uses the sk98lin network hardware, and we'll be using jumbo frames. Looking at the driver, when using a 9KB MTU it seems like it would end

Re: [RFC] Make TCP prequeue configurable

2007-09-27 Thread John Heffner
Stephen Hemminger wrote: On Fri, 28 Sep 2007 00:08:33 +0200 Eric Dumazet [EMAIL PROTECTED] wrote: Hi all I am sure some of you are going to tell me that prequeue is not all black :) Thank you [RFC] Make TCP prequeue configurable The TCP prequeue thing is based on old facts, and has

Re: [PATCH] include listenq max/backlog in tcp_info and related reports - correct version/signorder

2007-09-17 Thread John Heffner
Any reason you're overloading tcpi_unacked and tcpi_sacked? It seems that setting idiag_rqueue and idiag_wqueue are sufficient. -John Rick Jones wrote: Return some useful information such as the maximum listen backlog and the current listen backlog in the tcp_info structure and have that

Re: [PATCH] include listenq max/backlog in tcp_info and related reports - correct version/signorder

2007-09-17 Thread John Heffner
Rick Jones wrote: John Heffner wrote: Any reason you're overloading tcpi_unacked and tcpi_sacked? It seems that setting idiag_rqueue and idiag_wqueue are sufficient. Different fields for different structures. The tcp_info struct doesn't have the idiag_mumble, so to get the two values

[PATCH 0/2] Clean up owner field in sock_lock_t

2007-09-11 Thread John Heffner
I don't know why the owner field is a (struct sock_iocb *). I'm assuming it's historical. Can someone check this out? Did I miss some alternate usage? These patches are against net-2.6.24. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL

[PATCH 1/2] [NET] Cleanup: Use sock_owned_by_user() macro

2007-09-11 Thread John Heffner
Changes asserts in sunrpc to use sock_owned_by_user() macro instead of referencing sock_lock.owner directly. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/sunrpc/svcsock.c |2 +- net/sunrpc/xprtsock.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net

[PATCH 2/2] [NET] Change type of owner in sock_lock_t to int, rename

2007-09-11 Thread John Heffner
-by: John Heffner [EMAIL PROTECTED] --- include/net/sock.h |7 +++ net/core/sock.c|6 +++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 802c670..5ed9fa4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -76,10

[PATCH 2/2] [IPROUTE2] ss: parse bare integers are port numbers rather than IP addresses

2007-09-11 Thread John Heffner
Signed-off-by: John Heffner [EMAIL PROTECTED] --- misc/ss.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 5d14f13..d617f6d 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -953,6 +953,10 @@ void *parse_hostcond(char *addr) memset(a, 0

[PATCH 1/2] [IPROUTE2] Add missing LIBUTIL for dependencies.

2007-09-11 Thread John Heffner
Signed-off-by: John Heffner [EMAIL PROTECTED] --- Makefile |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index af0d5e4..7e4605c 100644 --- a/Makefile +++ b/Makefile @@ -29,7 +29,8 @@ LDLIBS += -L../lib -lnetlink -lutil SUBDIRS=lib ip tc misc

Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2

2007-08-30 Thread John Heffner
Rick Jones wrote: Like I said the consumers of this are a triffle well, anxious :) Just curious, did you or this customer try with F-RTO enabled? Or is this case you're dealing with truly hopeless? -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a

Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread John Heffner
David Miller wrote: From: Rick Jones [EMAIL PROTECTED] Date: Wed, 29 Aug 2007 15:29:03 -0700 David Miller wrote: None of the research folks want to commit to saying a lower value is OK, even though it's quite clear that on a local 10 gigabit link a minimum value of even 200 is absolutely and

Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread John Heffner
John Heffner wrote: What exactly causes such a huge delay? What is the TCP measured RTO in these circumstances where spurious RTOs happen and a 3 second minimum RTO makes things better? I haven't done a lot of work on wireless myself, but my understanding is that one of the biggest problems

Re: NCR, was [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread John Heffner
Stephen Hemminger wrote: On Wed, 29 Aug 2007 15:28:12 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: And reading NCR some more, we already have something similar in the form of Alexey's reordering detection, in fact it handles exactly the case NCR supposedly deals with. We do not trigger

Re: [PATCH] make _minimum_ TCP retransmission timeout configurable

2007-08-29 Thread John Heffner
David Miller wrote: From: Rick Jones [EMAIL PROTECTED] Date: Wed, 29 Aug 2007 16:06:27 -0700 I belive the biggest component comes from link-layer retransmissions. There can also be some short outtages thanks to signal blocking, tunnels, people with big hats and whatnot that the link-layer

Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)

2007-08-28 Thread John Heffner
OBATA Noboru wrote: Is it correct that you think my problem can be addressed either by the followings? (1) Make the application timeouts longer. (Steve has shown that making an application timeouts twice the failover detection timeout would be a solution.) Right. Is there something

Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-26 Thread John Heffner
Bill Fink wrote: Here's the beforeafter delta of the receiver's netstat -s statistics for the TSO enabled case: Ip: 3659898 total packets received 3659898 incoming packets delivered 80050 requests sent out Tcp: 2 passive connection openings 3659897 segments received

Re: Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-24 Thread John Heffner
TJ wrote: Right now Juniper are claiming the issue that brought this to the surface (the bug linked to in my original post) is a problem with the implementation of TCP_DEFER_ACCEPT. My position so far is that the Juniper DX OS is not following the HTTP standard because it doesn't send a request

Re: Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-23 Thread John Heffner
TJ wrote: client SYN server LISTENING client SYN ACK server SYN_RECEIVED (time-out 3s) server: inet_rsk(req)-acked = 1 client ACK server (discarded) client SYN ACK (DUP) server (time-out 6s) client ACK (DUP) server (discarded) client SYN ACK (DUP) server (time-out 12s)

Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread John Heffner
That sounds right to me. -John Ilpo Järvinen wrote: On Mon, 6 Aug 2007, Ilpo Järvinen wrote: ...Goto logic could be cleaner (somebody has any suggestion for better way to structure it?) ...I could probably move the setting of snd_cwnd earlier to avoid this problem if this seems a valid

Re: TCP's initial cwnd setting correct?...

2007-08-08 Thread John Heffner
I believe the current calculation is correct. The RFC specifies a window of no more than 4380 bytes unless 2*MSS 4380. If you change the code in this way, then MSS=1461 will give you an initial window of 3*MSS == 4383, violating the spec. Reading the pseudocode in the RFC 3390 is a bit

Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)

2007-05-18 Thread John Heffner
Rick Jones wrote: as an asside, tcp_max_ssthresh sounds like the maximum value ssthresh can take-on. is that correct, or is this more of a once ssthresh is above this, behave in this new way? If that is the case, while the I don't like it either, but you'll have to talk to Sally Floyd

Re: [PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread John Heffner
Benjamin LaHaise wrote: According to your patch, several packets with fin bit might be sent, including one with data. If another host does not receive fin retransmit, then that logic is broken, and it can not be fixed by duplicating fins, I would even say, that remote box should drop second

Re: [PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread John Heffner
Benjamin LaHaise wrote: On Tue, May 01, 2007 at 09:41:28PM +0400, Evgeniy Polyakov wrote: Hmm, 2.2 machine in your test seems to behave incorrectly: I am aware of that. However, I think that the loss of certain packets and reordering can result in the same behaviour. What's more, is that

[PATCH 0/0] Re-try changes for PMTUDISC_PROBE

2007-04-18 Thread John Heffner
This backs out the the transport layer MTU checks that don't work. As a consequence, I had to back out the PMTUDISC_PROBE patch as well. These patches should fix the problem with ipv6 that the transport layer change tried to address, and re-implement PMTUDISC_PROBE. I think this approach is

[PATCH] Revert [NET] Do pmtu check in transport layer

2007-04-18 Thread John Heffner
This reverts commit 87e927a0583bd4a8ba9e97cd75b58d8aa1c76e37. This idea does not work, as pointed at by Patrick McHardy. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/ip_output.c |4 +--- net/ipv4/raw.c|8 +++- net/ipv6/ip6_output.c | 11 +-- net/ipv6

[PATCH] [NET] MTU discovery check in ip6_fragment()

2007-04-18 Thread John Heffner
Adds a check in ip6_fragment() mirroring ip_fragment() for packets that we can't fragment, and sends an ICMP Packet Too Big message in response. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv6/ip6_output.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff

[PATCH] Revert [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-04-18 Thread John Heffner
This reverts commit d21d2a90b879c0cf159df5944847e6d9833816eb. Must be backed out because commit 87e927a0583bd4a8ba9e97cd75b58d8aa1c76e37 does not work. Signed-off-by: John Heffner [EMAIL PROTECTED] --- include/linux/in.h |1 - include/linux/in6.h |1 - include/linux/skbuff.h

[PATCH] [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-04-18 Thread John Heffner
, like traceroute/tracepath. Signed-off-by: John Heffner [EMAIL PROTECTED] --- include/linux/in.h |1 + include/linux/in6.h |1 + net/ipv4/ip_output.c | 20 +++- net/ipv4/ip_sockglue.c |2 +- net/ipv6/ip6_output.c| 15 --- net/ipv6

[PATCH 2/4] Revert [NET] Do pmtu check in transport layer

2007-04-18 Thread John Heffner
This reverts commit 87e927a0583bd4a8ba9e97cd75b58d8aa1c76e37. This idea does not work, as pointed at by Patrick McHardy. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/ip_output.c |4 +--- net/ipv4/raw.c|8 +++- net/ipv6/ip6_output.c | 11 +-- net/ipv6

[PATCH 1/4] Revert [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-04-18 Thread John Heffner
This reverts commit d21d2a90b879c0cf159df5944847e6d9833816eb. Must be backed out because commit 87e927a0583bd4a8ba9e97cd75b58d8aa1c76e37 does not work. Signed-off-by: John Heffner [EMAIL PROTECTED] --- include/linux/in.h |1 - include/linux/in6.h |1 - include/linux/skbuff.h

[PATCH 4/4] [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-04-18 Thread John Heffner
, like traceroute/tracepath. Signed-off-by: John Heffner [EMAIL PROTECTED] --- include/linux/in.h |1 + include/linux/in6.h |1 + net/ipv4/ip_output.c | 20 +++- net/ipv4/ip_sockglue.c |2 +- net/ipv6/ip6_output.c| 15 --- net/ipv6

[PATCH 3/4] [NET] MTU discovery check in ip6_fragment()

2007-04-18 Thread John Heffner
Adds a check in ip6_fragment() mirroring ip_fragment() for packets that we can't fragment, and sends an ICMP Packet Too Big message in response. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv6/ip6_output.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff

Re: [PATCH] [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-04-18 Thread John Heffner
Sorry, forgot the -n flag on git-format-patch. Patches resent with correct sequence numbers. Thanks, -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: TCP connection stops after high load.

2007-04-16 Thread John Heffner
Robert Iakobashvili wrote: Hi John, On 4/15/07, John Heffner [EMAIL PROTECTED] wrote: Robert Iakobashvili wrote: Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and 2.6.20.6 do not. Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5 tcp_rmem and tcp_wmem are the same

Re: TCP connection stops after high load.

2007-04-16 Thread John Heffner
Robert Iakobashvili wrote: Kernels 2.6.19 and 2.6.20 series are effectively broken right now. Don't you wish to patch them? I don't know if this qualifies as an unconditional bug. The commit above was actually a bugfix so that the limits were not higher than total memory on some systems,

Re: bug in tcp?

2007-04-16 Thread John Heffner
Stephen Hemminger wrote: A guess: maybe something related to a PAWS wraparound problem. Does turning off sysctl net.ipv4.tcp_timestamps fix it? That was my first thought too (aside from netfilter), but a failed PAWS check should not result in a reset.. -John - To unsubscribe from this

Re: TCP connection stops after high load.

2007-04-15 Thread John Heffner
--- 2.6.18.312288 16384 24576 2.6.19.5 30724096 6144 Is not it done deliberately by the below patch: commit 9e950efa20dc8037c27509666cba6999da9368e8 Author: John Heffner [EMAIL PROTECTED] Date: Mon Nov 6 23:10:51 2006 -0800 [TCP]: Don't use highmem in tcp hash

Re: [PATCH 1/3] [NET] Do pmtu check in transport layer

2007-04-09 Thread John Heffner
Patrick McHardy wrote: John Heffner wrote: Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6

[PATCH] [iputils] Add documentation for the -l flag.

2007-04-03 Thread John Heffner
--- doc/tracepath.sgml | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/doc/tracepath.sgml b/doc/tracepath.sgml index 71eaa8d..c0f308b 100644 --- a/doc/tracepath.sgml +++ b/doc/tracepath.sgml @@ -15,6 +15,7 @@ traces path to a network host discovering MTU

[PATCH] [iputils] Document -n flag.

2007-04-03 Thread John Heffner
--- doc/tracepath.sgml |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/doc/tracepath.sgml b/doc/tracepath.sgml index c0f308b..1bc83b9 100644 --- a/doc/tracepath.sgml +++ b/doc/tracepath.sgml @@ -15,6 +15,7 @@ traces path to a network host discovering MTU along

[PATCH 2/2] [iputils] Re-probe at same TTL after MTU reduction.

2007-04-03 Thread John Heffner
This fixes a bug that would miss a hop after an ICMP packet too big message, since it would continue increase the TTL without probing again. --- tracepath.c |6 ++ tracepath6.c |6 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/tracepath.c b/tracepath.c index

[PATCH 1/2] [iputils] Fix asymm messages.

2007-04-03 Thread John Heffner
We should only print the asymm messages in tracepath/6 when you receive a TTL expired message, because this is the only time when we'd expect the same number of hops back as our TTL was set to for a symmetric path. --- tracepath.c | 25 - tracepath6.c | 25

[PATCH] ip(7) IP_PMTUDISC_PROBE

2007-03-27 Thread John Heffner
Document new IP_PMTUDISC_PROBE value for IP_MTU_DISCOVERY. (Going into 2.6.22). Thanks, -John diff -rU3 man-pages-2.43-a/man7/ip.7 man-pages-2.43-b/man7/ip.7 --- man-pages-2.43-a/man7/ip.7 2006-09-26 09:54:29.0 -0400 +++ man-pages-2.43-b/man7/ip.7 2007-03-27 15:46:18.0

Re: [PATCH] NET: Add TCP connection abort IOCTL

2007-03-27 Thread John Heffner
Mark Huth wrote: David Miller wrote: From: [EMAIL PROTECTED] (David Griego) Date: Tue, 27 Mar 2007 14:47:54 -0700 Adds an IOCTL for aborting established TCP connections, and is designed to be an HA performance improvement for cleaning up, failure notification, and application

Re: [PATCH] NET: Add TCP connection abort IOCTL

2007-03-27 Thread John Heffner
John Heffner wrote: I also believe this is a useful thing to have. I'm not 100% sure this ioctl is the way to go, but it seems reasonable. This directly corresponds to writing deleteTcb to the tcpConnectionState variable in the TCP MIB (RFC 4022). I don't think it constitutes a protocol

[PATCH 0/3] [NET] MTU discovery changes

2007-03-23 Thread John Heffner
These are a few changes to fix/clean up some of the MTU discovery processing with non-stream sockets, and add a probing mode. See also matching patches to tracepath to take advantage of this. -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to

[PATCH 1/3] [NET] Do pmtu check in transport layer

2007-03-23 Thread John Heffner
Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6. Signed-off-by: John Heffner [EMAIL PROTECTED

[PATCH 2/3] [NET] Move DF check to ip_forward

2007-03-23 Thread John Heffner
Do fragmentation check in ip_forward, similar to ipv6 forwarding. Also add a debug printk in the DF check in ip_fragment since we should now never reach it. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/ip_forward.c |8 net/ipv4/ip_output.c |2 ++ 2 files changed

[PATCH 3/3] [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-03-23 Thread John Heffner
, like traceroute/tracepath. Signed-off-by: John Heffner [EMAIL PROTECTED] --- include/linux/in.h |1 + include/linux/in6.h |1 + include/linux/skbuff.h |3 ++- include/net/ip.h |2 +- net/core/skbuff.c|2 ++ net/ipv4/ip_output.c | 14

[PATCH 0/2] [iputils] MTU discovery changes

2007-03-23 Thread John Heffner
These add some changes that make tracepath a little more useful for diagnosing MTU issues. The length flag helps distinguish between MTU black holes and other types of black holes by allowing you to vary the probe packet lengths. Using PMTUDISC_PROBE gives you the same results on each run

[PATCH 2/2] [iputils] Use PMTUDISC_PROBE mode if it exists.

2007-03-23 Thread John Heffner
Signed-off-by: John Heffner [EMAIL PROTECTED] --- tracepath.c | 10 -- tracepath6.c | 10 -- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tracepath.c b/tracepath.c index 1f901ba..a562d88 100644 --- a/tracepath.c +++ b/tracepath.c @@ -24,6 +24,10

Re: [PATCH] tcp_mem initialization

2007-03-15 Thread John Heffner
David Miller wrote: From: John Heffner [EMAIL PROTECTED] Date: Wed, 14 Mar 2007 17:25:22 -0400 The current tcp_mem initialization gives values that are really too small for systems with ~256-768 MB of memory, and also for systems with larger page sizes (ia64). This patch gives an alternate

Re: SWS for rcvbuf MTU

2007-03-13 Thread John Heffner
just below full_space/2 for a bit then rises above full_space/2. I've also attached a corrected version of my earlier patch that I think solves the problem you noted. Thanks, -John Do full receiver-side SWS avoidance when rcvbuf mss. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit

Re: SWS for rcvbuf MTU

2007-03-03 Thread John Heffner
David Miller wrote: From: John Heffner [EMAIL PROTECTED] Date: Fri, 02 Mar 2007 16:16:39 -0500 Please don't apply the patch I sent. I've been thinking about this a bit harder, and it may not fix this particular problem. (Hard to say without knowing exactly what it is.) As the comment above

Re: SWS for rcvbuf MTU

2007-03-02 Thread John Heffner
. :) I think this attached patch does the correct SWS avoidance. Thanks, -John Do receiver-side SWS avoidance for rcvbuf MSS. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit 38d33181c93a28cf7fb2f9f3377305a04636c054 tree 503f8a9de6e78694bae9fc2eb1c9dd5d26a0b5ed parent

Re: SWS for rcvbuf MTU

2007-03-02 Thread John Heffner
David Miller wrote: From: Alex Sidorenko [EMAIL PROTECTED] Date: Fri, 2 Mar 2007 15:21:58 -0500 they told us that they use small rcvbuf to throttle bandwidth for this application. I explained it would be better to use TC for this purpose. They agreed and will probably redesign their

[PATCH 1/3] TCP sysctl documentation: tcp_moderate_rcvbuf

2007-02-26 Thread John Heffner
Document sysctl tcp_moderate_rcvbuf. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit 4c5fd9d3a9ea8b939aed1afda2ac0fc54e3df592 tree c25c2fd01e076fbb7356a8c37d06d2e22c60f263 parent aef8811abbc9249a2bd59bd2331bbe523df05d17 author John Heffner [EMAIL PROTECTED] Mon, 26 Feb 2007 19:44:58

[PATCH 2/3] TCP sysctl documentation: tcp_no_metrics_save

2007-02-26 Thread John Heffner
Document sysctl tcp_no_metrics_save. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit 17cb799000caef3b2fed28cc5d0601bb2311efa8 tree c27ccf561065b145bc48d0b8dbbaa3c608015e03 parent 4c5fd9d3a9ea8b939aed1afda2ac0fc54e3df592 author John Heffner [EMAIL PROTECTED] Mon, 26 Feb 2007 19:51:50

[PATCH 2/3] TCP sysctl documentation: MTU probing

2007-02-26 Thread John Heffner
Documentation for sysctls tcp_mtu_probing and tcp_base_mss. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit 6da0563572e0a6d0abda9d950f30902844c37862 tree 6f21ae02c11a1340412a926e8e2f568f5ed3b5a8 parent 17cb799000caef3b2fed28cc5d0601bb2311efa8 author John Heffner [EMAIL PROTECTED] Mon

Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-22 Thread John Heffner
My patch is meant as a replacement for YeAH patch 2/2, not meant to back it out. You do still need the second hunk below. Sorry 'bout that. If you're going to apply YeAH patch 2/2 first, you will also need to remove the declaration of tcp_limited_slow_start() in include/net/tcp.h. Thanks,

[PATCH] fix limited slow start bug

2007-02-22 Thread John Heffner
Fix arithmetic order bug in limited slow start. The subtraction needs to be done before snd_cwnd is incremented. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit 244e7411d99443df7b7ae849ba6ebbec4c2342bc tree e6d5985a22448f59f8bef393542e1d5497ee5684 parent

Re: [PATCH] fix limited slow start bug

2007-02-22 Thread John Heffner
Ilpo Järvinen wrote: BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 while snd_cwnd is u32, which seems rather strange since snd_cwnd is being limited by the clamp value here and there?!?! And tcp_highspeed.c is clearly assuming even more than this (but the problem is

Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-19 Thread John Heffner
control * This is special case used for fallback as well. Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit 97033fa201705e6cfc68ce66f34ede3277c3d645 tree 5df4607728abce93aa05b31015a90f2ce369abff parent

Re: [PATCH 2/2][TCP] YeAH-TCP: limited slow start exported function

2007-02-19 Thread John Heffner
Angelo P. Castellani wrote: John Heffner ha scritto: Note the patch is compile-tested only! I can do some real testing if you'd like to apply this Dave. The date you read on the patch is due to the fact I've splitted this patchset into 2 diff files. This isn't compile-tested only, I've used

Re: [patch 3/3] tcp: remove experimental variants from default list

2007-02-13 Thread John Heffner
This isn't really a reply to anyone in particular, but I wanted to touch on a few points. Reno. As Windows decided to go with Compound TCP, why we want to back to 80's algorithm? It's worth noting that Microsoft is not using Compound TCP by default, except in Beta versions so they can get

[PATCH] apply cwnd rules to FIN packets with data

2007-02-05 Thread John Heffner
to FIN packets that contain data. --- commit af319609eee705e0791a1a58c33b216e8d0254bf tree 5a1afcc506e09f5adfd74efb7e0cbbc82ec4d5b0 parent c0d4d573feed199b16094c072e7cb07afb01c598 author John Heffner [EMAIL PROTECTED] Mon, 05 Feb 2007 16:25:46 -0500 committer John Heffner [EMAIL PROTECTED] Mon, 05

Re: [PATCH] apply cwnd rules to FIN packets with data

2007-02-05 Thread John Heffner
David Miller wrote: From: John Heffner [EMAIL PROTECTED] Date: Mon, 05 Feb 2007 16:58:18 -0500 This is especially important with TSO enabled. Currently, it will send a burst of up to 64k at the end of a connection, even when cwnd is much smaller than 64k. This patch still lets out empty FIN

Re: [PATCH] apply cwnd rules to FIN packets with data

2007-02-05 Thread John Heffner
-off-by: John Heffner [EMAIL PROTECTED] --- commit 89de0d8cb75958b0315c076b31a597143e30f7a4 tree 7e9c321e62729c6ef76e3886fe9edf2ac78a680c parent c0d4d573feed199b16094c072e7cb07afb01c598 author John Heffner [EMAIL PROTECTED] Mon, 05 Feb 2007 18:42:31 -0500 committer John Heffner [EMAIL PROTECTED] Mon

Re: [PATCH] apply cwnd rules to FIN packets with data

2007-02-05 Thread John Heffner
Rick Jones wrote: John Heffner wrote: David Miller wrote: However, I can't think of any reason why the cwnd test should not apply. Care to elaborate here? You can view the FIN special case as an off by one error in the CWND test, it's not going to melt the internet. :-) True, it's

Re: [PATCH] fix up sysctl_tcp_mem initialization

2006-11-15 Thread John Heffner
David Miller wrote: However, I wonder if we want to set this differently than the way this patch does it. Depending on how far off the memory size is from a power of two (exactly equal to a power of two is the worst case), and if total memory 128M, it can be substantially less than 3/4.

[PATCH] fix up sysctl_tcp_mem initialization

2006-11-14 Thread John Heffner
systems). Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit d4ef8c8245c0a033622ce9ba9e25d379475254f6 tree 5377b8af0bac3b92161188e7369a84e472b5acb2 parent ea55b7c31b47edf90132baea9a088da3bbe2bb5c author John Heffner [EMAIL PROTECTED] Tue, 14 Nov 2006 14:53:27 -0500 committer John Heffner [EMAIL

Re: 2.6.19-rc1: Volanomark slowdown

2006-11-07 Thread John Heffner
David Miller wrote: If we don't ACK every two segments, stacks which grow the congestion window based upon packet counting will not grow the congestion window properly when they are sending smaller than MSS sized segments. The only stack I know of that does this currently is linux, and in

Re: 2.6.19-rc1: Volanomark slowdown

2006-11-07 Thread John Heffner
David Miller wrote: From: John Heffner [EMAIL PROTECTED] Date: Tue, 07 Nov 2006 16:50:33 -0500 The only stack I know of that does this currently is linux, and in doing so does not conform to the spec. ;) Sending to a BSD receiver will result in the same behavior, so the right place to fix

[PATCH] don't use highmem in tcp hash size calculation

2006-11-06 Thread John Heffner
This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Signed-off-by: John Heffner [EMAIL PROTECTED] --- commit ea55b7c31b47edf90132baea9a088da3bbe2bb5c tree

Re: [PATCH] tcp: don't allow unfair congestion control to be built without warning

2006-10-27 Thread John Heffner
I think unfair is a difficult word. Unfair to what? It's true that Scalable TCP is unfair to itself in that flows with unequal shares do not converge, but it's not clear what its interactions are with other congestion control algorithms. It's not clear to me that it's significantly more

Re: [RFC] tcp: setsockopt congestion control autoload

2006-10-26 Thread John Heffner
My reservation in doing this would be that as an administrator, I may want to choose exactly what congestion control is available any any given time. The different congestion control algorithms are not necessarily fair to each other. If the modules are autoloaded, I could still enforce this

Re: [RFC] tcp: setsockopt congestion control autoload

2006-10-26 Thread John Heffner
Hagen Paul Pfeifer wrote: * John Heffner | 2006-10-26 13:29:26 [-0400]: My reservation in doing this would be that as an administrator, I may want to choose exactly what congestion control is available any any given time. The different congestion control algorithms are not necessarily fair

Re: [PATCH] Bound TSO defer time (resend)

2006-10-17 Thread John Heffner
David Miller wrote: From: John Heffner [EMAIL PROTECTED] Date: Tue, 17 Oct 2006 00:18:33 -0400 Stephen Hemminger wrote: On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) John Heffner [EMAIL PROTECTED] wrote: This patch limits the amount of time you will defer sending a TSO segment to less than two

[PATCH] Bound TSO defer time (resend)

2006-10-16 Thread John Heffner
-- Forwarded message -- Date: Mon, 16 Oct 2006 15:55:53 -0400 (EDT) From: John Heffner [EMAIL PROTECTED] To: David Miller [EMAIL PROTECTED] Cc: netdev netdev@vger.kernel.org Subject: [PATCH] Bound TSO defer time This patch limits the amount of time you will defer sending a TSO segment

Re: simplify microsecond rtt sampling

2006-09-28 Thread John Heffner
Here is a corrected patch. Signed-off-by: John Heffner [EMAIL PROTECTED] -static u32 tcp_usrtt(const struct sk_buff *skb) +static u32 tcp_usrtt(struct timeval *tv) { - struct timeval tv, now; + struct timeval now; do_gettimeofday(now); - skb_get_timestamp(skb, tv

Re: simplify microsecond rtt sampling

2006-09-28 Thread John Heffner
-off-by: John Heffner [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b5521a9..d0f6bd6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2228,13 +2228,12 @@ static int tcp_tso_acked(struct sock *sk return acked; } -static u32 tcp_usrtt(const

Re: simplify microsecond rtt sampling

2006-09-27 Thread John Heffner
Okay, this patch is junk (never trust compile-tested code). Will send something better soon. -John John Heffner wrote: About commit 2d2abbab63f6726a147ae61ada39bf2c9ee0db9a: It looks like this patch bypassed the enforcement of Karn's algorithm in tcp_ack_no_tstamp() for the purposes

Re: MTU probing bug?

2006-07-25 Thread John Heffner
David Miller wrote: John, have a look at this code in tcp_write_timeout(): mss = min(sysctl_tcp_base_mss, tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2); mss = max(mss, 68 - tp-tcp_header_len); That first line looks like it should be

  1   2   >