Re: TCP SACK issue, hung connection, tcpdump included

2007-08-02 Thread Ilpo Järvinen
...I dropped lkml, it's useless to bother them with this network related 
stuff...

On Thu, 2 Aug 2007, Darryl Miles wrote:
> Ilpo Järvinen wrote:
> > On Tue, 31 Jul 2007, Darryl L. Miles wrote:

[...RFC bashing, snip...] 

>  * The older linux kernel for not being 100% SACK RFC compliant in its
> implementation ?  Not a lot we can do about this now, but if we're able
> to identify there maybe backward compatibility issues with the same
> implementation thats a useful point to take forward.
>
>  * The newer linux kernel for enabling D-SACK by default when RFC2883
> doesn't even claim a cast iron case for D-SACK to be compatible with any
> 100% RFC compliant SACK implementation.

Are you aware that D-SACK processing and generation has been part of the 
linux kernel TCP far before 2.6 series even begun... ...and it goes far 
beoynd that, 2.4.0 had it too (2.2.0 didn't seem to have it, never before 
have I read that one IIRC :-) ).

> Does Ilpo have a particular vested interest in D-SACK that should be 
> disclosed?

Sure :-). ...my interest was to show you that it's not a bug :-).

> So it is necessary to turn off a TCP option (that is enabled by default)
> to be sure to have reliable TCP connections (that don't lock up) in the
> bugfree Linux networking stack ?  This is absurd.

...You'll have to turn a lot off to be compatible with everything around 
Internet, and still you would probably fail. Some people have to, e.g., to 
turn of window scaling to work-around buggy intermediate nodes (nat boxes 
or some firewalls), there's even a sysctl to workaround signed 16-bit 
window arithmetic bugs that's mostly legacy but I bet you can find host 
broken in that area too. Etc. Yet we don't off those by default.

> If such an option causes such a problem; then that option should not be
> enabled by default. 

...Linux TCP has enabled by default option which are _known_ (at least 
nowadays) to cause bad problems and many of them are _still_ enabled... 
Browse archives if you don't believe me... And I'm relatively sure it will 
do so also in future though I'm not the maintainer nor "anybody" to make 
such decisions...

> rather than wallpaper over the cracks with the voodoo of turning things 
> that are enabled by default off.

...I said that because it felt like you kept repeating that the generated 
DSACK block is a bug even though, like you now know, it's a feature, not a 
bug. :-)

> > 2) The ACK got discarded by the SERVER
> 
> I'd thought about that one, its a possibility.  The server in question
> does have period of high UDP load on a fair number of UDP sockets at
> once (a few 1000).  Both systems have 2Gb of RAM.  The server has maybe
> just 250Mb of RSS of all apps combined.

...There are three independent signs in the log to indicate discard out
of these 3 reasons. Whereas your theory _fails_ to explain some behavior 
in the log you presented, e.g., not updated timestamp which happen even 
_before_ the DSACK stuff?!?... I'll formulate this question: why didn't 
snd_una advance nor timestamp update though a cumulative ACK arrived?
You can check for yourself (in server log):

03:58:56.384503
03:58:56.462583
03:58:56.465707
03:58:56.678546

...I'm hoping SNMPs provide explanation to it.

> The client sent a SACK.  But from understanding more about D-SACK, this
> is a valid D-SACK response, but it appears to confuse the older Linux
> kernel at the server end.

...Are you saying that it's confused by _DSACK_ just because it's only 
"strange" thing you seem to find from the log? I see other things in your 
log which are exceptional and point to elsewhere... Please don't neglect 
them... ...Problems occur already before that DSACK is received by the 
server end.

> Agreed on this.  However discarding data is allowed (providing it is
> intentional discarding not a bug where the TCP stack is discarding segments it
> shouldn't), TCP should recover providing sufficient packets get through.

But if one end decides to discard everything after time t, TCP _will
not_ recover because "sufficient packets" won't "get through"... And 
that's what your log is telling me. Yes discarding is allowed but that 
wasn't the point, we're more interested here on why it got discarded
rather than allowance of discarding.

> Forgive me if I am mistaken, but while the server reports a checksum
> error, the client did not.  I took this to be a misreporting by tcpdump
> at the server, probably due to the "e1000" network card checksum
> offloading

...That's probably the reason, I agree, these show up. Thought that also 
myself, besides, it wouldn't cause that kind of breakage anyway.

> So the SNMP data would show up intentional discards (due to memory/resource
> issues).  So I'll get some of those too.
> 
> The SNMP stats aren't so useful right now as
> the box has been rebooted since then but I shall attempt to capture
> /proc/net/* data, cause the problem, then capture /proc/net/* data again
> if those numbers can help.

Good, thanks. 

Re: TCP SACK issue, hung connection, tcpdump included

2007-08-02 Thread Darryl Miles

Ilpo Järvinen wrote:

On Tue, 31 Jul 2007, Darryl L. Miles wrote:


I've been able to capture a tcpdump from both ends during the problem and its
my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a
SACK option for an old sequence which the current window being advertised is
beyond it.  This is the most concerning issue as the integrity of the sequence
numbers doesn't seem right (to my limited understanding anyhow).


You probably didn't check the reference I explicitly gave to those who 
are not familiar how DSACK works, just in case you didn't pick it up last 
time, here it is again for you: RFC2883... 


I've now squinted the D-SACK RFC and understand a little about this,
however the RFC does make the claim "This extension is compatible with
current implementations of the SACK option in TCP.  That is, if one of
the TCP end-nodes does not implement this D-SACK extension and the other
TCP end-node does, we believe that this use of the D-SACK extension by
one of the end nodes will not introduce problems."

What if it turns out that is not true for a large enough number of SACK
implementations out there; in the timeframe that SACK was supported but
D-SACK was not supported.  Would it be possible to clearly catagorise an
implementation to be:

 * 100% SACK RFC compliant.  SACK works and by virtue of the mandatory
requirements written into the previous SACK RFCs then this
implementation would never see a problem with receiving D-SACK even
through the stack itself does not support D-SACK.

 * Mostly SACK RFC compliant.  SACK works but if it saw D-SACK it would
have a problems dealing with it, possibly resulting in fatal TCP
lockups.  Are there SACK implementation mandatory requirements in place
for to be able to clearly draw the line and state that the 2.6.9 SACK
implementation was not RFC compliant.

 * 100% SACK and D-DACK RFC compliant.  Such an implementation was 
written to support D-SACK on top of SACK.




So if there is a problem whos fault would it be:

 * The original SACK RFCs for not specifying a mandatory course of
action to take which D-SACK exploits.  Thus making the claim in RFC2883 
unsound.


 * The older linux kernel for not being 100% SACK RFC compliant in its
implementation ?  Not a lot we can do about this now, but if we're able
to identify there maybe backward compatibility issues with the same
implementation thats a useful point to take forward.

 * The newer linux kernel for enabling D-SACK by default when RFC2883
doesn't even claim a cast iron case for D-SACK to be compatible with any
100% RFC compliant SACK implementation.


Does TCP support the concept of vendor dependent options, that would be
TCP options which are in a special range that would both identify the
vendor and the vendors-specific option id.  Such a system would allow
Linux to implement a  option, even if the RFC claims one is
not needed.  This would allow moving forward through this era until such
point in time when it was officially agreed it was just a linux problem 
or an RFC problem.  If its an RFC problem then IANA (or whoever) would 
issue a generic TCP option for it.


If the dump on this problem really does identify a risk/problem when as
its between 2 version of linux a vendor specific option also makes sense.

I don't really want to switch new useful stuff off by default (so it
never gets used), I'm all for experimentation but not to the point of
failure between default configurations of widely distributed version of 
the kernel.



So thats the technical approaches I can come up with to discuss.  Does
Ilpo have a particular vested interest in D-SACK that should be disclosed?



However, if DSACKs really
bother you still (though it shouldn't :-)), IIRC I also told you how
you're able to turn it off (tcp_dsack sysctl) but I assure you that it's
not a bug but feature called DSACK [RFC2883], there's _absolutely_ 

nothing
wrong with it, instead, it would be wrong to _not_ send the below 

snd_una

SACK in this scenario when tcp_dsack set to 1.


So it is necessary to turn off a TCP option (that is enabled by default)
to be sure to have reliable TCP connections (that don't lock up) in the
bugfree Linux networking stack ?  This is absurd.

If such an option causes such a problem; then that option should not be
enabled by default.  If however the problem is because of a bug then let
us continue to try to isolate the cause rather than wallpaper over the
cracks with the voodoo of turning things that are enabled by default off.

It only makes sense to turn options off when there is a 3rd party
involved (or other means beyond your control) which is affecting
function, the case here is that two Linux kernel stacks are affected and
no 3rd party device has been shown to be affecting function.



There is another concern of why the SERVER performed a retransmission in the
first place, when the tcpdump shows the ack covering it has been seen.


There are only three possible reasons to this thing:

Re: TCP SACK issue, hung connection, tcpdump included

2007-08-02 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Thu, 2 Aug 2007 12:23:23 +0300 (EEST)

> ...Seriously, somebody else than me is probably better in suggesting what 
> could cause the discarding at the SERVER in this case. SNMP stuff Dave was 
> asking could help, you can find them from /proc/net/{netstat,snmp}...

That will also tell us if TCP discarded the packet due to
timestamps tests or similar.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-08-02 Thread Ilpo Järvinen
On Tue, 31 Jul 2007, Darryl L. Miles wrote:

> I've been able to capture a tcpdump from both ends during the problem and its
> my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a
> SACK option for an old sequence which the current window being advertised is
> beyond it.  This is the most concerning issue as the integrity of the sequence
> numbers doesn't seem right (to my limited understanding anyhow).

You probably didn't check the reference I explicitly gave to those who 
are not familiar how DSACK works, just in case you didn't pick it up last 
time, here it is again for you: RFC2883... However, if DSACKs really 
bother you still (though it shouldn't :-)), IIRC I also told you how 
you're able to turn it off (tcp_dsack sysctl) but I assure you that it's 
not a bug but feature called DSACK [RFC2883], there's _absolutely_ nothing 
wrong with it, instead, it would be wrong to _not_ send the below snd_una 
SACK in this scenario when tcp_dsack set to 1.

> There is another concern of why the SERVER performed a retransmission in the
> first place, when the tcpdump shows the ack covering it has been seen.

There are only three possible reasons to this thing:
1) The ACK didn't reach the SERVER (your logs prove this to not be the 
case)
2) The ACK got discarded by the SERVER
3) The SERVER (not the client) is buggy and sends an incorrect 
retransmission

...So we have just two options remaining...

> I have made available the full dumps at:
> 
> http://darrylmiles.org/snippets/lkml/20070731/

Thanks about these... Based on a quick check, it is rather clear that the 
SERVER is for some reason discarding the packets it's receiving:

04:11:26.833935 IP CLIENT.43726 > SERVER.ssh: P 4239:4287(48) ack 28176 win 501 

04:11:27.132425 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 
2728 
04:11:27.230081 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
 This sequence is interesting from the client side:
>
> 03:58:56.419034 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239
> win 2728  # S1
> 03:58:56.419100 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501
>  # C1
> 03:58:56.422019 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 4239
> win 2728  # S2
> 03:58:56.422078 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501
>  # C2
>
> The above 4 packets look as expect to me.  Then we suddenly see a
> retransmission of 26016:27464.
> 
> 03:58:56.731597 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239
> win 2728  # S3

...Look at this on the retransmission:
   ... timestamp 16346128 819458864>

...it tells us what really got received by the TCP. The corresponding ACK 
with matching timestamp is, surprise, surprise, this one:

> 03:58:56.340180 IP CLIENT.43726 > SERVER.ssh: . ack 26016 win 501 
> 

...thus the SERVER has _not_ received but discarded the subsequent 
cumulative ACKs!!! Therefore it's retransmitting from 26016 onward but 
never receives any reply as everything seems to get discarded...

There was one bad checksum btw:

> 03:58:56.365662 IP (tos 0x10, ttl  64, id 28685, offset 0, flags [DF], 
> proto 6, length: 764) SERVER.ssh > CLIENT.43726: P [bad tcp cksum 6662 
> (->ef2b)!] 617734888:617735600(712) ack 2634113543 win 2728 
> 


> There are some changes in 2.6.22 that appear to affect TCP SACK handling
> does this fix a known issue ?

There is no such "known issue" :-)... This issue has nothing to do with 
TCP SACK handling, since that code _won't_ be reached... We could verify 
that from the timestamps. But if you still insist that SACK under snd_una 
is the issue, please turn tcp_dsack to 0 on the CLIENT, you will not get 
them after that and you can be happy as your out-of-window SACK "issue"
is then fixed :-)... 

...Seriously, somebody else than me is probably better in suggesting what 
could cause the discarding at the SERVER in this case. SNMP stuff Dave was 
asking could help, you can find them from /proc/net/{netstat,snmp}...


-- 
 i.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-07-30 Thread Darryl L. Miles


I've been able to capture a tcpdump from both ends during the problem 
and its my belief there is a bug in 2.6.20.1 (at the client side) in 
that it issues a SACK option for an old sequence which the current 
window being advertised is beyond it.  This is the most concerning issue 
as the integrity of the sequence numbers doesn't seem right (to my 
limited understanding anyhow).


There is another concern of why the SERVER performed a retransmission in 
the first place, when the tcpdump shows the ack covering it has been seen.



I have made available the full dumps at:

http://darrylmiles.org/snippets/lkml/20070731/


There are some changes in 2.6.22 that appear to affect TCP SACK handling 
does this fix a known issue ?




This sequence is interesting from the client side:

03:58:56.419034 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S1
03:58:56.419100 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 
 # C1
03:58:56.422019 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 
4239 win 2728  # S2
03:58:56.422078 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
 # C2


The above 4 packets look as expect to me.  Then we suddenly see a 
retransmission of 26016:27464.


03:58:56.731597 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S3


So the client instead of discarding the retransmission of duplicate 
segment, issues a SACK.


03:58:56.731637 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
> # C3


In response to this the server is confused ???  It responds to 
sack{26016:27464} but the client is also saying "wnd 28176".  Wouldn't 
the server expect "wnd < 26016" to there is a segment to retransmit ?


03:58:57.322800 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S4





Now viewed from the server side:

03:58:56.365655 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S1
03:58:56.365662 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 
4239 win 2728  # S2
03:58:56.374633 IP CLIENT.43726 > SERVER.ssh: . ack 24144 win 488 
 # propagation delay
03:58:56.381630 IP CLIENT.43726 > SERVER.ssh: . ack 25592 win 501 
 # propagation delay
03:58:56.384503 IP CLIENT.43726 > SERVER.ssh: . ack 26016 win 501 
 # propagation delay
03:58:56.462583 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 
 # C1
03:58:56.465707 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
 # C2


The above packets just as expected.

03:58:56.678546 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S3


I guess the above packet is indeed a retransmission of "# S1" but why 
was it retransmitted, when we can clearly see "# C1" above acks this 
segment ?  It is not even as if the retransmission escaped before the 
kernel had time to process the ack, as 200ms elapsed.  CONCERN NUMBER TWO


03:58:56.774778 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 
> # C3


CONCERN NUMBER ONE, why in response to that escaped retransmission was a 
SACK the appropriate response ?  When at the time the client sent the 
SACK it had received all data upto 28176, a fact it continues to 
advertise in the "# C3" packet above.


There is nothing wrong is the CLIENT expecting to see a retransmission 
of that segment at this point in time that is an expected circumstance.


03:58:57.269529 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 
4239 win 2728  # S4






Darryl
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Ilpo Järvinen
On Sun, 29 Jul 2007, Willy Tarreau wrote:

> On Sun, Jul 29, 2007 at 12:28:04PM +0300, Ilpo Järvinen wrote:
> 
> > [...snip...]
> > 
> > > BTW, some information are missing. It would have been better if the trace
> > > had been read with tcpdump -Svv. We would have got seq numbers and ttl.
> > > Also, we do not know if there's a firewall between both sides. Sometimes,
> > > some IDS identify attacks in crypted traffic and kill connections. It
> > > might have been the case here, with the connection closed one way on an
> > > intermediate firewall.
> > 
> > Yeah, firewall or some other issue, I'd say it's quite unlikely a bug in 
> > TCP because behavior to both directions indicate client -> sender 
> > blackhole independently of each other...
> 
> It would also be possible that something stupid between both ends simply
> drops packets with the SACK option set. Also something which sometimes
> happen is that some firewalls automatically translate sequence numbers
> but not necessarily SACK values, which could pretty well lead to this
> packet being received but ignored on the server side.

...One can toggle those off with /proc/sys/net/ipv4/tcp_dsack but I 
suspect DSACKs are not the cause because these retransmissions neither
are making it through (there are many of them also earlier in the log, 
just quoted the easiest ones :-) ):

> > > > 09:36:44.335591 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 
> > > > win 
> > > > 378 
> > > > 09:38:44.351950 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 
> > > > win 
> > > > 378 
> > > > 09:40:44.368172 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 
> > > > win 
> > > > 378 

...there are no SACKs involved in them, yet no cumulative ACK ever 
arrives from SERVER...


-- 
 i.

Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Willy Tarreau
On Sun, Jul 29, 2007 at 12:28:04PM +0300, Ilpo Järvinen wrote:
(...)
> > > Limitation for 48 byte segments? You have to be kidding... :-) But yes,
> > > it seems that one of the directions is dropping packets for some reason 
> > > though I would not assume MTU limitation... Or did you mean some other 
> > > segment?
> > 
> > No, I was talking about the 1448 bytes segments. But in fact I don't
> > believe it much because the SACKs are always retransmitted just afterwards.
> 
> Ah, but it's ACKed correctly right below it...:
> 
> [...snip...]
> > > > > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack 
> > > > > 2991 
> > > > > win 2728 
> > > > > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > > > > 
> > > > > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 
> > > > > 2991 
> > > > > win 2728 
> 
> ...segment below snd_una arrived => snd_una remains 18464, receiver 
> generates a duplicate ACK:
>  
> > > > > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > > > >  > > > > {12408:13856} >
> 
> The cumulative ACK field of it covers _everything_ below 18464 (i.e., it 
> ACKs them), including the 1448 bytes in 12408:13856... In addition, the 
> SACK block is DSACK information [RFC2883] telling explicitly the address 
> of the received duplicate block. However, if this ACK doesn't reach the 
> SERVER TCP, RTO is triggered and the first not yet cumulatively ACKed 
> segment is retransmitted (I guess cumulative ACKs up to 12408 arrived 
> without problems to the SERVER):

Oh yes, you're damn right. I did not notice that the ACK was higher than
the SACK, I'm more used to read traces with absolute rather than relative
seq/acks.

So I agree, it is this ACK which is lost between client and server,
reinforcing the supposition about the location of the capture (client side).

> [...snip...]
> 
> > BTW, some information are missing. It would have been better if the trace
> > had been read with tcpdump -Svv. We would have got seq numbers and ttl.
> > Also, we do not know if there's a firewall between both sides. Sometimes,
> > some IDS identify attacks in crypted traffic and kill connections. It
> > might have been the case here, with the connection closed one way on an
> > intermediate firewall.
> 
> Yeah, firewall or some other issue, I'd say it's quite unlikely a bug in 
> TCP because behavior to both directions indicate client -> sender 
> blackhole independently of each other...

It would also be possible that something stupid between both ends simply
drops packets with the SACK option set. Also something which sometimes
happen is that some firewalls automatically translate sequence numbers
but not necessarily SACK values, which could pretty well lead to this
packet being received but ignored on the server side.

I'm pretty sure that the same trace taken on the server side will reveal
the reason for the problem.

Willy

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Ilpo Järvinen
On Sun, 29 Jul 2007, Willy Tarreau wrote:

> On Sun, Jul 29, 2007 at 11:26:00AM +0300, Ilpo Järvinen wrote:
> > On Sun, 29 Jul 2007, Willy Tarreau wrote:
> > 
> > > On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote:
> > > > CLIENT = Linux 2.6.20.1-smp [Customer build]
> > > > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 
> > > > (Nahant Update 5)]
> > > > 
> > > > The problems start around time index 09:21:39.860302 when the CLIENT 
> > > > issues 
> > > > a TCP packet with SACK option set (seemingly for a data segment which 
> > > > has 
> > > > already been seen) from that point on the connection hangs.
> > 
> > ...That's DSACK and it's being correctly sent. To me, it seems unlikely to 
> > be the cause for this breakage...
> > 
> > > Where was the capture taken ? on CLIENT or on SERVER (I suspect client 
> > > from
> > > the timers) ? 
> > 
> > ...I would guess the same based on SYN timestamps (and from the DSACK 
> > timestamps)...
> > 
> > > A possible, but very unlikely reason would be an MTU limitation
> > > somewhere, because the segment which never gets correctly ACKed is also 
> > > the
> > > largest one in this trace.
> > 
> > Limitation for 48 byte segments? You have to be kidding... :-) But yes,
> > it seems that one of the directions is dropping packets for some reason 
> > though I would not assume MTU limitation... Or did you mean some other 
> > segment?
> 
> No, I was talking about the 1448 bytes segments. But in fact I don't
> believe it much because the SACKs are always retransmitted just afterwards.

Ah, but it's ACKed correctly right below it...:

[...snip...]
> > > > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack 
> > > > 2991 
> > > > win 2728 
> > > > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > > > 
> > > > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 
> > > > 2991 
> > > > win 2728 

...segment below snd_una arrived => snd_una remains 18464, receiver 
generates a duplicate ACK:
 
> > > > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > > > 

The cumulative ACK field of it covers _everything_ below 18464 (i.e., it 
ACKs them), including the 1448 bytes in 12408:13856... In addition, the 
SACK block is DSACK information [RFC2883] telling explicitly the address 
of the received duplicate block. However, if this ACK doesn't reach the 
SERVER TCP, RTO is triggered and the first not yet cumulatively ACKed 
segment is retransmitted (I guess cumulative ACKs up to 12408 arrived 
without problems to the SERVER):

> > > > 09:21:40.453440 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 
> > > > 2991 
> > > > win 2728 
[...snip...]

> BTW, some information are missing. It would have been better if the trace
> had been read with tcpdump -Svv. We would have got seq numbers and ttl.
> Also, we do not know if there's a firewall between both sides. Sometimes,
> some IDS identify attacks in crypted traffic and kill connections. It
> might have been the case here, with the connection closed one way on an
> intermediate firewall.

Yeah, firewall or some other issue, I'd say it's quite unlikely a bug in 
TCP because behavior to both directions indicate client -> sender 
blackhole independently of each other...


-- 
 i.

Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Sun, 29 Jul 2007 11:26:00 +0300 (EEST)

> Is this reproducable? Can you somehow verify that the packets CLIENT is 
> sending are indeed received by the SERVER...?

One possibility is drops due to checksum errors on the receiver, this
tends to pop up from time to time, let's see some SNMP statistics.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Willy Tarreau
On Sun, Jul 29, 2007 at 11:26:00AM +0300, Ilpo Järvinen wrote:
> On Sun, 29 Jul 2007, Willy Tarreau wrote:
> 
> > On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote:
> > > CLIENT = Linux 2.6.20.1-smp [Customer build]
> > > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 
> > > (Nahant Update 5)]
> > > 
> > > The problems start around time index 09:21:39.860302 when the CLIENT 
> > > issues 
> > > a TCP packet with SACK option set (seemingly for a data segment which has 
> > > already been seen) from that point on the connection hangs.
> 
> ...That's DSACK and it's being correctly sent. To me, it seems unlikely to 
> be the cause for this breakage...
> 
> > Where was the capture taken ? on CLIENT or on SERVER (I suspect client from
> > the timers) ? 
> 
> ...I would guess the same based on SYN timestamps (and from the DSACK 
> timestamps)...
> 
> > A possible, but very unlikely reason would be an MTU limitation
> > somewhere, because the segment which never gets correctly ACKed is also the
> > largest one in this trace.
> 
> Limitation for 48 byte segments? You have to be kidding... :-) But yes,
> it seems that one of the directions is dropping packets for some reason 
> though I would not assume MTU limitation... Or did you mean some other 
> segment?

No, I was talking about the 1448 bytes segments. But in fact I don't
believe it much because the SACKs are always retransmitted just afterwards.

BTW, some information are missing. It would have been better if the trace
had been read with tcpdump -Svv. We would have got seq numbers and ttl.
Also, we do not know if there's a firewall between both sides. Sometimes,
some IDS identify attacks in crypted traffic and kill connections. It
might have been the case here, with the connection closed one way on an
intermediate firewall.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Jan Engelhardt

On Jul 29 2007 08:45, Willy Tarreau wrote:
>On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote:
>> CLIENT = Linux 2.6.20.1-smp [Customer build]
>> SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 
>> (Nahant Update 5)]
>> 
>> The problems start around time index 09:21:39.860302 when the CLIENT issues 
>> a TCP packet with SACK option set (seemingly for a data segment which has 
>> already been seen) from that point on the connection hangs.
>
>Where was the capture taken ? on CLIENT or on SERVER (I suspect client from
>the timers) ? A possible, but very unlikely reason would be an MTU limitation
>somewhere, because the segment which never gets correctly ACKed is also the
>largest one in this trace.[...]

I had something similar (and most strangely enough, it only seemed to happen
when the source and/or destination address was outside a given subnet). It
boiled down to that some switch acted really strange and God knows why. A
tcpdump on *both* sides (server+client) each showed that the *other* peer sent
a FIN/RST. Well in the end I settled with setting sys.net.ipv4.tcp_sack=0,
no idea if the broken switch got replaced in the meantime.
Well, just letting you know that it is not limited to computers.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Ilpo Järvinen
On Sun, 29 Jul 2007, Willy Tarreau wrote:

> On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote:
> > CLIENT = Linux 2.6.20.1-smp [Customer build]
> > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 
> > (Nahant Update 5)]
> > 
> > The problems start around time index 09:21:39.860302 when the CLIENT issues 
> > a TCP packet with SACK option set (seemingly for a data segment which has 
> > already been seen) from that point on the connection hangs.

...That's DSACK and it's being correctly sent. To me, it seems unlikely to 
be the cause for this breakage...

> Where was the capture taken ? on CLIENT or on SERVER (I suspect client from
> the timers) ? 

...I would guess the same based on SYN timestamps (and from the DSACK 
timestamps)...

> A possible, but very unlikely reason would be an MTU limitation
> somewhere, because the segment which never gets correctly ACKed is also the
> largest one in this trace.

Limitation for 48 byte segments? You have to be kidding... :-) But yes,
it seems that one of the directions is dropping packets for some reason 
though I would not assume MTU limitation... Or did you mean some other 
segment?

> Also, if everything is right with the packets on the server side, then
> it would means that it's the RHEL kernel which is buggy (which does not
> mean that the same bug does not exist in mainline).

...There are two independed signs that server side is not getting the 
packets (ack field is not advanced nor do the retransmissions of an old 
data cease) so IMHO that would the first thing to rule out.

> > Full dump available via email.
> > 
> > 
> > 09:12:13.444999 IP CLIENT.50727 > SERVER.ssh: S 2919512080:2919512080(0) 
> > win 5840 
> > 09:12:13.535643 IP SERVER.ssh > CLIENT.50727: S 492909547:492909547(0) ack 
> > 2919512081 win 5792  > 2>
> > 09:12:13.535717 IP CLIENT.50727 > SERVER.ssh: . ack 1 win 46 
> > 
> > 09:12:13.665481 IP SERVER.ssh > CLIENT.50727: P 1:24(23) ack 1 win 1448 
> > 
> > 09:12:13.665895 IP CLIENT.50727 > SERVER.ssh: . ack 24 win 46 
> > 
> > 09:12:13.666044 IP CLIENT.50727 > SERVER.ssh: P 1:23(22) ack 24 win 46 
> > 
> > 09:12:13.776318 IP SERVER.ssh > CLIENT.50727: . ack 23 win 1448 
> > 
> > 
> > ...SNIPPED SUCCESSFUL TRAFFIC...
> > 
> > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack 2991 
> > win 2728 
> > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:21:40.453440 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:21:40.453495 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:21:41.641821 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:21:41.641938 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:21:44.017552 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:21:44.017622 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:21:48.770051 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:21:48.770099 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:21:58.274061 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:21:58.274180 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:22:17.282035 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:22:17.282153 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:22:55.298955 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:22:55.299023 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:24:11.329853 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:24:11.329923 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:26:11.331578 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> > win 2728 
> > 09:26:11.331699 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> > 
> > 09:26:13.011885 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:13.309032 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:13.901048 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:15.085103 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:17.453195 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:22.189378 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:31.661696 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:26:50.610374 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:27:28.499729 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> > 378 
> > 09:28:11.331256 IP SERVER.ssh > CLIENT.50727: . 12408:

Re: TCP SACK issue, hung connection, tcpdump included

2007-07-29 Thread Willy Tarreau
Hi,

[CCing netdev as it's where people competent on the subject are]

On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote:
> CLIENT = Linux 2.6.20.1-smp [Customer build]
> SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 
> (Nahant Update 5)]
> 
> The problems start around time index 09:21:39.860302 when the CLIENT issues 
> a TCP packet with SACK option set (seemingly for a data segment which has 
> already been seen) from that point on the connection hangs.

Where was the capture taken ? on CLIENT or on SERVER (I suspect client from
the timers) ? A possible, but very unlikely reason would be an MTU limitation
somewhere, because the segment which never gets correctly ACKed is also the
largest one in this trace. It would be strange as the SACKs are send
immediately after, but it should be something to consider anyway.

Also, if everything is right with the packets on the server side, then
it would means that it's the RHEL kernel which is buggy (which does not
mean that the same bug does not exist in mainline).

Regards,
Willy


> Full dump available via email.
> 
> 
> 09:12:13.444999 IP CLIENT.50727 > SERVER.ssh: S 2919512080:2919512080(0) 
> win 5840 
> 09:12:13.535643 IP SERVER.ssh > CLIENT.50727: S 492909547:492909547(0) ack 
> 2919512081 win 5792  2>
> 09:12:13.535717 IP CLIENT.50727 > SERVER.ssh: . ack 1 win 46 
> 
> 09:12:13.665481 IP SERVER.ssh > CLIENT.50727: P 1:24(23) ack 1 win 1448 
> 
> 09:12:13.665895 IP CLIENT.50727 > SERVER.ssh: . ack 24 win 46 
> 
> 09:12:13.666044 IP CLIENT.50727 > SERVER.ssh: P 1:23(22) ack 24 win 46 
> 
> 09:12:13.776318 IP SERVER.ssh > CLIENT.50727: . ack 23 win 1448 
> 
> 
> ...SNIPPED SUCCESSFUL TRAFFIC...
> 
> 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack 2991 
> win 2728 
> 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:21:40.453440 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:21:40.453495 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:21:41.641821 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:21:41.641938 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:21:44.017552 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:21:44.017622 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:21:48.770051 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:21:48.770099 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:21:58.274061 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:21:58.274180 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:22:17.282035 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:22:17.282153 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:22:55.298955 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:22:55.299023 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:24:11.329853 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:24:11.329923 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:26:11.331578 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:26:11.331699 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:26:13.011885 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:13.309032 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:13.901048 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:15.085103 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:17.453195 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:22.189378 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:31.661696 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:26:50.610374 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:27:28.499729 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:28:11.331256 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:28:11.331354 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:28:44.286495 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:30:11.330959 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:30:11.331074 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:30:44.298756 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win 
> 378 
> 09:32:11.331661 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 
> win 2728 
> 09:32:11.331727 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 
> 
> 09:32:44.311051 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win