Re: TCP SACK issue, hung connection, tcpdump included
...I dropped lkml, it's useless to bother them with this network related stuff... On Thu, 2 Aug 2007, Darryl Miles wrote: > Ilpo Järvinen wrote: > > On Tue, 31 Jul 2007, Darryl L. Miles wrote: [...RFC bashing, snip...] > * The older linux kernel for not being 100% SACK RFC compliant in its > implementation ? Not a lot we can do about this now, but if we're able > to identify there maybe backward compatibility issues with the same > implementation thats a useful point to take forward. > > * The newer linux kernel for enabling D-SACK by default when RFC2883 > doesn't even claim a cast iron case for D-SACK to be compatible with any > 100% RFC compliant SACK implementation. Are you aware that D-SACK processing and generation has been part of the linux kernel TCP far before 2.6 series even begun... ...and it goes far beoynd that, 2.4.0 had it too (2.2.0 didn't seem to have it, never before have I read that one IIRC :-) ). > Does Ilpo have a particular vested interest in D-SACK that should be > disclosed? Sure :-). ...my interest was to show you that it's not a bug :-). > So it is necessary to turn off a TCP option (that is enabled by default) > to be sure to have reliable TCP connections (that don't lock up) in the > bugfree Linux networking stack ? This is absurd. ...You'll have to turn a lot off to be compatible with everything around Internet, and still you would probably fail. Some people have to, e.g., to turn of window scaling to work-around buggy intermediate nodes (nat boxes or some firewalls), there's even a sysctl to workaround signed 16-bit window arithmetic bugs that's mostly legacy but I bet you can find host broken in that area too. Etc. Yet we don't off those by default. > If such an option causes such a problem; then that option should not be > enabled by default. ...Linux TCP has enabled by default option which are _known_ (at least nowadays) to cause bad problems and many of them are _still_ enabled... Browse archives if you don't believe me... And I'm relatively sure it will do so also in future though I'm not the maintainer nor "anybody" to make such decisions... > rather than wallpaper over the cracks with the voodoo of turning things > that are enabled by default off. ...I said that because it felt like you kept repeating that the generated DSACK block is a bug even though, like you now know, it's a feature, not a bug. :-) > > 2) The ACK got discarded by the SERVER > > I'd thought about that one, its a possibility. The server in question > does have period of high UDP load on a fair number of UDP sockets at > once (a few 1000). Both systems have 2Gb of RAM. The server has maybe > just 250Mb of RSS of all apps combined. ...There are three independent signs in the log to indicate discard out of these 3 reasons. Whereas your theory _fails_ to explain some behavior in the log you presented, e.g., not updated timestamp which happen even _before_ the DSACK stuff?!?... I'll formulate this question: why didn't snd_una advance nor timestamp update though a cumulative ACK arrived? You can check for yourself (in server log): 03:58:56.384503 03:58:56.462583 03:58:56.465707 03:58:56.678546 ...I'm hoping SNMPs provide explanation to it. > The client sent a SACK. But from understanding more about D-SACK, this > is a valid D-SACK response, but it appears to confuse the older Linux > kernel at the server end. ...Are you saying that it's confused by _DSACK_ just because it's only "strange" thing you seem to find from the log? I see other things in your log which are exceptional and point to elsewhere... Please don't neglect them... ...Problems occur already before that DSACK is received by the server end. > Agreed on this. However discarding data is allowed (providing it is > intentional discarding not a bug where the TCP stack is discarding segments it > shouldn't), TCP should recover providing sufficient packets get through. But if one end decides to discard everything after time t, TCP _will not_ recover because "sufficient packets" won't "get through"... And that's what your log is telling me. Yes discarding is allowed but that wasn't the point, we're more interested here on why it got discarded rather than allowance of discarding. > Forgive me if I am mistaken, but while the server reports a checksum > error, the client did not. I took this to be a misreporting by tcpdump > at the server, probably due to the "e1000" network card checksum > offloading ...That's probably the reason, I agree, these show up. Thought that also myself, besides, it wouldn't cause that kind of breakage anyway. > So the SNMP data would show up intentional discards (due to memory/resource > issues). So I'll get some of those too. > > The SNMP stats aren't so useful right now as > the box has been rebooted since then but I shall attempt to capture > /proc/net/* data, cause the problem, then capture /proc/net/* data again > if those numbers can help. Good, thanks.
Re: TCP SACK issue, hung connection, tcpdump included
Ilpo Järvinen wrote: On Tue, 31 Jul 2007, Darryl L. Miles wrote: I've been able to capture a tcpdump from both ends during the problem and its my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a SACK option for an old sequence which the current window being advertised is beyond it. This is the most concerning issue as the integrity of the sequence numbers doesn't seem right (to my limited understanding anyhow). You probably didn't check the reference I explicitly gave to those who are not familiar how DSACK works, just in case you didn't pick it up last time, here it is again for you: RFC2883... I've now squinted the D-SACK RFC and understand a little about this, however the RFC does make the claim "This extension is compatible with current implementations of the SACK option in TCP. That is, if one of the TCP end-nodes does not implement this D-SACK extension and the other TCP end-node does, we believe that this use of the D-SACK extension by one of the end nodes will not introduce problems." What if it turns out that is not true for a large enough number of SACK implementations out there; in the timeframe that SACK was supported but D-SACK was not supported. Would it be possible to clearly catagorise an implementation to be: * 100% SACK RFC compliant. SACK works and by virtue of the mandatory requirements written into the previous SACK RFCs then this implementation would never see a problem with receiving D-SACK even through the stack itself does not support D-SACK. * Mostly SACK RFC compliant. SACK works but if it saw D-SACK it would have a problems dealing with it, possibly resulting in fatal TCP lockups. Are there SACK implementation mandatory requirements in place for to be able to clearly draw the line and state that the 2.6.9 SACK implementation was not RFC compliant. * 100% SACK and D-DACK RFC compliant. Such an implementation was written to support D-SACK on top of SACK. So if there is a problem whos fault would it be: * The original SACK RFCs for not specifying a mandatory course of action to take which D-SACK exploits. Thus making the claim in RFC2883 unsound. * The older linux kernel for not being 100% SACK RFC compliant in its implementation ? Not a lot we can do about this now, but if we're able to identify there maybe backward compatibility issues with the same implementation thats a useful point to take forward. * The newer linux kernel for enabling D-SACK by default when RFC2883 doesn't even claim a cast iron case for D-SACK to be compatible with any 100% RFC compliant SACK implementation. Does TCP support the concept of vendor dependent options, that would be TCP options which are in a special range that would both identify the vendor and the vendors-specific option id. Such a system would allow Linux to implement a option, even if the RFC claims one is not needed. This would allow moving forward through this era until such point in time when it was officially agreed it was just a linux problem or an RFC problem. If its an RFC problem then IANA (or whoever) would issue a generic TCP option for it. If the dump on this problem really does identify a risk/problem when as its between 2 version of linux a vendor specific option also makes sense. I don't really want to switch new useful stuff off by default (so it never gets used), I'm all for experimentation but not to the point of failure between default configurations of widely distributed version of the kernel. So thats the technical approaches I can come up with to discuss. Does Ilpo have a particular vested interest in D-SACK that should be disclosed? However, if DSACKs really bother you still (though it shouldn't :-)), IIRC I also told you how you're able to turn it off (tcp_dsack sysctl) but I assure you that it's not a bug but feature called DSACK [RFC2883], there's _absolutely_ nothing wrong with it, instead, it would be wrong to _not_ send the below snd_una SACK in this scenario when tcp_dsack set to 1. So it is necessary to turn off a TCP option (that is enabled by default) to be sure to have reliable TCP connections (that don't lock up) in the bugfree Linux networking stack ? This is absurd. If such an option causes such a problem; then that option should not be enabled by default. If however the problem is because of a bug then let us continue to try to isolate the cause rather than wallpaper over the cracks with the voodoo of turning things that are enabled by default off. It only makes sense to turn options off when there is a 3rd party involved (or other means beyond your control) which is affecting function, the case here is that two Linux kernel stacks are affected and no 3rd party device has been shown to be affecting function. There is another concern of why the SERVER performed a retransmission in the first place, when the tcpdump shows the ack covering it has been seen. There are only three possible reasons to this thing:
Re: TCP SACK issue, hung connection, tcpdump included
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Thu, 2 Aug 2007 12:23:23 +0300 (EEST) > ...Seriously, somebody else than me is probably better in suggesting what > could cause the discarding at the SERVER in this case. SNMP stuff Dave was > asking could help, you can find them from /proc/net/{netstat,snmp}... That will also tell us if TCP discarded the packet due to timestamps tests or similar. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
On Tue, 31 Jul 2007, Darryl L. Miles wrote: > I've been able to capture a tcpdump from both ends during the problem and its > my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a > SACK option for an old sequence which the current window being advertised is > beyond it. This is the most concerning issue as the integrity of the sequence > numbers doesn't seem right (to my limited understanding anyhow). You probably didn't check the reference I explicitly gave to those who are not familiar how DSACK works, just in case you didn't pick it up last time, here it is again for you: RFC2883... However, if DSACKs really bother you still (though it shouldn't :-)), IIRC I also told you how you're able to turn it off (tcp_dsack sysctl) but I assure you that it's not a bug but feature called DSACK [RFC2883], there's _absolutely_ nothing wrong with it, instead, it would be wrong to _not_ send the below snd_una SACK in this scenario when tcp_dsack set to 1. > There is another concern of why the SERVER performed a retransmission in the > first place, when the tcpdump shows the ack covering it has been seen. There are only three possible reasons to this thing: 1) The ACK didn't reach the SERVER (your logs prove this to not be the case) 2) The ACK got discarded by the SERVER 3) The SERVER (not the client) is buggy and sends an incorrect retransmission ...So we have just two options remaining... > I have made available the full dumps at: > > http://darrylmiles.org/snippets/lkml/20070731/ Thanks about these... Based on a quick check, it is rather clear that the SERVER is for some reason discarding the packets it's receiving: 04:11:26.833935 IP CLIENT.43726 > SERVER.ssh: P 4239:4287(48) ack 28176 win 501 04:11:27.132425 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 04:11:27.230081 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 This sequence is interesting from the client side: > > 03:58:56.419034 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 > win 2728 # S1 > 03:58:56.419100 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 > # C1 > 03:58:56.422019 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 4239 > win 2728 # S2 > 03:58:56.422078 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 > # C2 > > The above 4 packets look as expect to me. Then we suddenly see a > retransmission of 26016:27464. > > 03:58:56.731597 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 > win 2728 # S3 ...Look at this on the retransmission: ... timestamp 16346128 819458864> ...it tells us what really got received by the TCP. The corresponding ACK with matching timestamp is, surprise, surprise, this one: > 03:58:56.340180 IP CLIENT.43726 > SERVER.ssh: . ack 26016 win 501 > ...thus the SERVER has _not_ received but discarded the subsequent cumulative ACKs!!! Therefore it's retransmitting from 26016 onward but never receives any reply as everything seems to get discarded... There was one bad checksum btw: > 03:58:56.365662 IP (tos 0x10, ttl 64, id 28685, offset 0, flags [DF], > proto 6, length: 764) SERVER.ssh > CLIENT.43726: P [bad tcp cksum 6662 > (->ef2b)!] 617734888:617735600(712) ack 2634113543 win 2728 > > There are some changes in 2.6.22 that appear to affect TCP SACK handling > does this fix a known issue ? There is no such "known issue" :-)... This issue has nothing to do with TCP SACK handling, since that code _won't_ be reached... We could verify that from the timestamps. But if you still insist that SACK under snd_una is the issue, please turn tcp_dsack to 0 on the CLIENT, you will not get them after that and you can be happy as your out-of-window SACK "issue" is then fixed :-)... ...Seriously, somebody else than me is probably better in suggesting what could cause the discarding at the SERVER in this case. SNMP stuff Dave was asking could help, you can find them from /proc/net/{netstat,snmp}... -- i. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
I've been able to capture a tcpdump from both ends during the problem and its my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a SACK option for an old sequence which the current window being advertised is beyond it. This is the most concerning issue as the integrity of the sequence numbers doesn't seem right (to my limited understanding anyhow). There is another concern of why the SERVER performed a retransmission in the first place, when the tcpdump shows the ack covering it has been seen. I have made available the full dumps at: http://darrylmiles.org/snippets/lkml/20070731/ There are some changes in 2.6.22 that appear to affect TCP SACK handling does this fix a known issue ? This sequence is interesting from the client side: 03:58:56.419034 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 # S1 03:58:56.419100 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 # C1 03:58:56.422019 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 4239 win 2728 # S2 03:58:56.422078 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 # C2 The above 4 packets look as expect to me. Then we suddenly see a retransmission of 26016:27464. 03:58:56.731597 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 # S3 So the client instead of discarding the retransmission of duplicate segment, issues a SACK. 03:58:56.731637 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 > # C3 In response to this the server is confused ??? It responds to sack{26016:27464} but the client is also saying "wnd 28176". Wouldn't the server expect "wnd < 26016" to there is a segment to retransmit ? 03:58:57.322800 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 # S4 Now viewed from the server side: 03:58:56.365655 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 # S1 03:58:56.365662 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 4239 win 2728 # S2 03:58:56.374633 IP CLIENT.43726 > SERVER.ssh: . ack 24144 win 488 # propagation delay 03:58:56.381630 IP CLIENT.43726 > SERVER.ssh: . ack 25592 win 501 # propagation delay 03:58:56.384503 IP CLIENT.43726 > SERVER.ssh: . ack 26016 win 501 # propagation delay 03:58:56.462583 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501 # C1 03:58:56.465707 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 # C2 The above packets just as expected. 03:58:56.678546 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 # S3 I guess the above packet is indeed a retransmission of "# S1" but why was it retransmitted, when we can clearly see "# C1" above acks this segment ? It is not even as if the retransmission escaped before the kernel had time to process the ack, as 200ms elapsed. CONCERN NUMBER TWO 03:58:56.774778 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 > # C3 CONCERN NUMBER ONE, why in response to that escaped retransmission was a SACK the appropriate response ? When at the time the client sent the SACK it had received all data upto 28176, a fact it continues to advertise in the "# C3" packet above. There is nothing wrong is the CLIENT expecting to see a retransmission of that segment at this point in time that is an expected circumstance. 03:58:57.269529 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 # S4 Darryl - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, 29 Jul 2007, Willy Tarreau wrote: > On Sun, Jul 29, 2007 at 12:28:04PM +0300, Ilpo Järvinen wrote: > > > [...snip...] > > > > > BTW, some information are missing. It would have been better if the trace > > > had been read with tcpdump -Svv. We would have got seq numbers and ttl. > > > Also, we do not know if there's a firewall between both sides. Sometimes, > > > some IDS identify attacks in crypted traffic and kill connections. It > > > might have been the case here, with the connection closed one way on an > > > intermediate firewall. > > > > Yeah, firewall or some other issue, I'd say it's quite unlikely a bug in > > TCP because behavior to both directions indicate client -> sender > > blackhole independently of each other... > > It would also be possible that something stupid between both ends simply > drops packets with the SACK option set. Also something which sometimes > happen is that some firewalls automatically translate sequence numbers > but not necessarily SACK values, which could pretty well lead to this > packet being received but ignored on the server side. ...One can toggle those off with /proc/sys/net/ipv4/tcp_dsack but I suspect DSACKs are not the cause because these retransmissions neither are making it through (there are many of them also earlier in the log, just quoted the easiest ones :-) ): > > > > 09:36:44.335591 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 > > > > win > > > > 378 > > > > 09:38:44.351950 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 > > > > win > > > > 378 > > > > 09:40:44.368172 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 > > > > win > > > > 378 ...there are no SACKs involved in them, yet no cumulative ACK ever arrives from SERVER... -- i.
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, Jul 29, 2007 at 12:28:04PM +0300, Ilpo Järvinen wrote: (...) > > > Limitation for 48 byte segments? You have to be kidding... :-) But yes, > > > it seems that one of the directions is dropping packets for some reason > > > though I would not assume MTU limitation... Or did you mean some other > > > segment? > > > > No, I was talking about the 1448 bytes segments. But in fact I don't > > believe it much because the SACKs are always retransmitted just afterwards. > > Ah, but it's ACKed correctly right below it...: > > [...snip...] > > > > > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack > > > > > 2991 > > > > > win 2728 > > > > > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > > > > > > > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack > > > > > 2991 > > > > > win 2728 > > ...segment below snd_una arrived => snd_una remains 18464, receiver > generates a duplicate ACK: > > > > > > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > > > > > > {12408:13856} > > > The cumulative ACK field of it covers _everything_ below 18464 (i.e., it > ACKs them), including the 1448 bytes in 12408:13856... In addition, the > SACK block is DSACK information [RFC2883] telling explicitly the address > of the received duplicate block. However, if this ACK doesn't reach the > SERVER TCP, RTO is triggered and the first not yet cumulatively ACKed > segment is retransmitted (I guess cumulative ACKs up to 12408 arrived > without problems to the SERVER): Oh yes, you're damn right. I did not notice that the ACK was higher than the SACK, I'm more used to read traces with absolute rather than relative seq/acks. So I agree, it is this ACK which is lost between client and server, reinforcing the supposition about the location of the capture (client side). > [...snip...] > > > BTW, some information are missing. It would have been better if the trace > > had been read with tcpdump -Svv. We would have got seq numbers and ttl. > > Also, we do not know if there's a firewall between both sides. Sometimes, > > some IDS identify attacks in crypted traffic and kill connections. It > > might have been the case here, with the connection closed one way on an > > intermediate firewall. > > Yeah, firewall or some other issue, I'd say it's quite unlikely a bug in > TCP because behavior to both directions indicate client -> sender > blackhole independently of each other... It would also be possible that something stupid between both ends simply drops packets with the SACK option set. Also something which sometimes happen is that some firewalls automatically translate sequence numbers but not necessarily SACK values, which could pretty well lead to this packet being received but ignored on the server side. I'm pretty sure that the same trace taken on the server side will reveal the reason for the problem. Willy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, 29 Jul 2007, Willy Tarreau wrote: > On Sun, Jul 29, 2007 at 11:26:00AM +0300, Ilpo Järvinen wrote: > > On Sun, 29 Jul 2007, Willy Tarreau wrote: > > > > > On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: > > > > CLIENT = Linux 2.6.20.1-smp [Customer build] > > > > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 > > > > (Nahant Update 5)] > > > > > > > > The problems start around time index 09:21:39.860302 when the CLIENT > > > > issues > > > > a TCP packet with SACK option set (seemingly for a data segment which > > > > has > > > > already been seen) from that point on the connection hangs. > > > > ...That's DSACK and it's being correctly sent. To me, it seems unlikely to > > be the cause for this breakage... > > > > > Where was the capture taken ? on CLIENT or on SERVER (I suspect client > > > from > > > the timers) ? > > > > ...I would guess the same based on SYN timestamps (and from the DSACK > > timestamps)... > > > > > A possible, but very unlikely reason would be an MTU limitation > > > somewhere, because the segment which never gets correctly ACKed is also > > > the > > > largest one in this trace. > > > > Limitation for 48 byte segments? You have to be kidding... :-) But yes, > > it seems that one of the directions is dropping packets for some reason > > though I would not assume MTU limitation... Or did you mean some other > > segment? > > No, I was talking about the 1448 bytes segments. But in fact I don't > believe it much because the SACKs are always retransmitted just afterwards. Ah, but it's ACKed correctly right below it...: [...snip...] > > > > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack > > > > 2991 > > > > win 2728 > > > > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > > > > > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack > > > > 2991 > > > > win 2728 ...segment below snd_una arrived => snd_una remains 18464, receiver generates a duplicate ACK: > > > > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > The cumulative ACK field of it covers _everything_ below 18464 (i.e., it ACKs them), including the 1448 bytes in 12408:13856... In addition, the SACK block is DSACK information [RFC2883] telling explicitly the address of the received duplicate block. However, if this ACK doesn't reach the SERVER TCP, RTO is triggered and the first not yet cumulatively ACKed segment is retransmitted (I guess cumulative ACKs up to 12408 arrived without problems to the SERVER): > > > > 09:21:40.453440 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack > > > > 2991 > > > > win 2728 [...snip...] > BTW, some information are missing. It would have been better if the trace > had been read with tcpdump -Svv. We would have got seq numbers and ttl. > Also, we do not know if there's a firewall between both sides. Sometimes, > some IDS identify attacks in crypted traffic and kill connections. It > might have been the case here, with the connection closed one way on an > intermediate firewall. Yeah, firewall or some other issue, I'd say it's quite unlikely a bug in TCP because behavior to both directions indicate client -> sender blackhole independently of each other... -- i.
Re: TCP SACK issue, hung connection, tcpdump included
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Sun, 29 Jul 2007 11:26:00 +0300 (EEST) > Is this reproducable? Can you somehow verify that the packets CLIENT is > sending are indeed received by the SERVER...? One possibility is drops due to checksum errors on the receiver, this tends to pop up from time to time, let's see some SNMP statistics. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, Jul 29, 2007 at 11:26:00AM +0300, Ilpo Järvinen wrote: > On Sun, 29 Jul 2007, Willy Tarreau wrote: > > > On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: > > > CLIENT = Linux 2.6.20.1-smp [Customer build] > > > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 > > > (Nahant Update 5)] > > > > > > The problems start around time index 09:21:39.860302 when the CLIENT > > > issues > > > a TCP packet with SACK option set (seemingly for a data segment which has > > > already been seen) from that point on the connection hangs. > > ...That's DSACK and it's being correctly sent. To me, it seems unlikely to > be the cause for this breakage... > > > Where was the capture taken ? on CLIENT or on SERVER (I suspect client from > > the timers) ? > > ...I would guess the same based on SYN timestamps (and from the DSACK > timestamps)... > > > A possible, but very unlikely reason would be an MTU limitation > > somewhere, because the segment which never gets correctly ACKed is also the > > largest one in this trace. > > Limitation for 48 byte segments? You have to be kidding... :-) But yes, > it seems that one of the directions is dropping packets for some reason > though I would not assume MTU limitation... Or did you mean some other > segment? No, I was talking about the 1448 bytes segments. But in fact I don't believe it much because the SACKs are always retransmitted just afterwards. BTW, some information are missing. It would have been better if the trace had been read with tcpdump -Svv. We would have got seq numbers and ttl. Also, we do not know if there's a firewall between both sides. Sometimes, some IDS identify attacks in crypted traffic and kill connections. It might have been the case here, with the connection closed one way on an intermediate firewall. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
On Jul 29 2007 08:45, Willy Tarreau wrote: >On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: >> CLIENT = Linux 2.6.20.1-smp [Customer build] >> SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 >> (Nahant Update 5)] >> >> The problems start around time index 09:21:39.860302 when the CLIENT issues >> a TCP packet with SACK option set (seemingly for a data segment which has >> already been seen) from that point on the connection hangs. > >Where was the capture taken ? on CLIENT or on SERVER (I suspect client from >the timers) ? A possible, but very unlikely reason would be an MTU limitation >somewhere, because the segment which never gets correctly ACKed is also the >largest one in this trace.[...] I had something similar (and most strangely enough, it only seemed to happen when the source and/or destination address was outside a given subnet). It boiled down to that some switch acted really strange and God knows why. A tcpdump on *both* sides (server+client) each showed that the *other* peer sent a FIN/RST. Well in the end I settled with setting sys.net.ipv4.tcp_sack=0, no idea if the broken switch got replaced in the meantime. Well, just letting you know that it is not limited to computers. Jan -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP SACK issue, hung connection, tcpdump included
On Sun, 29 Jul 2007, Willy Tarreau wrote: > On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: > > CLIENT = Linux 2.6.20.1-smp [Customer build] > > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 > > (Nahant Update 5)] > > > > The problems start around time index 09:21:39.860302 when the CLIENT issues > > a TCP packet with SACK option set (seemingly for a data segment which has > > already been seen) from that point on the connection hangs. ...That's DSACK and it's being correctly sent. To me, it seems unlikely to be the cause for this breakage... > Where was the capture taken ? on CLIENT or on SERVER (I suspect client from > the timers) ? ...I would guess the same based on SYN timestamps (and from the DSACK timestamps)... > A possible, but very unlikely reason would be an MTU limitation > somewhere, because the segment which never gets correctly ACKed is also the > largest one in this trace. Limitation for 48 byte segments? You have to be kidding... :-) But yes, it seems that one of the directions is dropping packets for some reason though I would not assume MTU limitation... Or did you mean some other segment? > Also, if everything is right with the packets on the server side, then > it would means that it's the RHEL kernel which is buggy (which does not > mean that the same bug does not exist in mainline). ...There are two independed signs that server side is not getting the packets (ack field is not advanced nor do the retransmissions of an old data cease) so IMHO that would the first thing to rule out. > > Full dump available via email. > > > > > > 09:12:13.444999 IP CLIENT.50727 > SERVER.ssh: S 2919512080:2919512080(0) > > win 5840 > > 09:12:13.535643 IP SERVER.ssh > CLIENT.50727: S 492909547:492909547(0) ack > > 2919512081 win 5792 > 2> > > 09:12:13.535717 IP CLIENT.50727 > SERVER.ssh: . ack 1 win 46 > > > > 09:12:13.665481 IP SERVER.ssh > CLIENT.50727: P 1:24(23) ack 1 win 1448 > > > > 09:12:13.665895 IP CLIENT.50727 > SERVER.ssh: . ack 24 win 46 > > > > 09:12:13.666044 IP CLIENT.50727 > SERVER.ssh: P 1:23(22) ack 24 win 46 > > > > 09:12:13.776318 IP SERVER.ssh > CLIENT.50727: . ack 23 win 1448 > > > > > > ...SNIPPED SUCCESSFUL TRAFFIC... > > > > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack 2991 > > win 2728 > > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:21:40.453440 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:21:40.453495 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:21:41.641821 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:21:41.641938 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:21:44.017552 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:21:44.017622 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:21:48.770051 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:21:48.770099 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:21:58.274061 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:21:58.274180 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:22:17.282035 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:22:17.282153 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:22:55.298955 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:22:55.299023 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:24:11.329853 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:24:11.329923 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:26:11.331578 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > > win 2728 > > 09:26:11.331699 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > > > 09:26:13.011885 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:13.309032 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:13.901048 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:15.085103 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:17.453195 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:22.189378 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:31.661696 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:26:50.610374 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:27:28.499729 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > > 378 > > 09:28:11.331256 IP SERVER.ssh > CLIENT.50727: . 12408:
Re: TCP SACK issue, hung connection, tcpdump included
Hi, [CCing netdev as it's where people competent on the subject are] On Sun, Jul 29, 2007 at 06:59:26AM +0100, Darryl L. Miles wrote: > CLIENT = Linux 2.6.20.1-smp [Customer build] > SERVER = Linux 2.6.9-55.ELsmp [Red Hat Enterprise Linux AS release 4 > (Nahant Update 5)] > > The problems start around time index 09:21:39.860302 when the CLIENT issues > a TCP packet with SACK option set (seemingly for a data segment which has > already been seen) from that point on the connection hangs. Where was the capture taken ? on CLIENT or on SERVER (I suspect client from the timers) ? A possible, but very unlikely reason would be an MTU limitation somewhere, because the segment which never gets correctly ACKed is also the largest one in this trace. It would be strange as the SACKs are send immediately after, but it should be something to consider anyway. Also, if everything is right with the packets on the server side, then it would means that it's the RHEL kernel which is buggy (which does not mean that the same bug does not exist in mainline). Regards, Willy > Full dump available via email. > > > 09:12:13.444999 IP CLIENT.50727 > SERVER.ssh: S 2919512080:2919512080(0) > win 5840 > 09:12:13.535643 IP SERVER.ssh > CLIENT.50727: S 492909547:492909547(0) ack > 2919512081 win 5792 2> > 09:12:13.535717 IP CLIENT.50727 > SERVER.ssh: . ack 1 win 46 > > 09:12:13.665481 IP SERVER.ssh > CLIENT.50727: P 1:24(23) ack 1 win 1448 > > 09:12:13.665895 IP CLIENT.50727 > SERVER.ssh: . ack 24 win 46 > > 09:12:13.666044 IP CLIENT.50727 > SERVER.ssh: P 1:23(22) ack 24 win 46 > > 09:12:13.776318 IP SERVER.ssh > CLIENT.50727: . ack 23 win 1448 > > > ...SNIPPED SUCCESSFUL TRAFFIC... > > 09:21:39.490740 IP SERVER.ssh > CLIENT.50727: P 18200:18464(264) ack 2991 > win 2728 > 09:21:39.490775 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:21:39.860245 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:21:39.860302 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:21:40.453440 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:21:40.453495 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:21:41.641821 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:21:41.641938 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:21:44.017552 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:21:44.017622 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:21:48.770051 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:21:48.770099 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:21:58.274061 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:21:58.274180 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:22:17.282035 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:22:17.282153 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:22:55.298955 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:22:55.299023 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:24:11.329853 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:24:11.329923 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:26:11.331578 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:26:11.331699 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:26:13.011885 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:13.309032 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:13.901048 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:15.085103 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:17.453195 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:22.189378 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:31.661696 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:26:50.610374 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:27:28.499729 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:28:11.331256 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:28:11.331354 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:28:44.286495 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:30:11.330959 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:30:11.331074 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:30:44.298756 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win > 378 > 09:32:11.331661 IP SERVER.ssh > CLIENT.50727: . 12408:13856(1448) ack 2991 > win 2728 > 09:32:11.331727 IP CLIENT.50727 > SERVER.ssh: . ack 18464 win 378 > > 09:32:44.311051 IP CLIENT.50727 > SERVER.ssh: P 2991:3039(48) ack 18464 win