Re: TCP connection closed without FIN or RST
On Wed, Nov 8, 2017 at 12:29 PM, Eric Dumazet wrote: > Please do not top post on netdev. Right - apologies for that. > > On Wed, 2017-11-08 at 11:04 -0500, Vitaly Davidovich wrote: >> So this issue is somehow related to setting SO_RCVBUF *after* >> connecting the socket (from the client). The system is configured >> such that the default rcvbuf size is 1MB, but the code was shrinking >> this down to 75Kb right after connect(). > > What are you calling default rcvbuf size exactly ? > > Is the application doing > > s = socket(...); > ... > setsockopt(s, SOL_SOCKET, SO_RCVBUF, [100], 4) > ... > connect(s, ...) > setsockopt(s, SOL_SOCKET, SO_RCVBUF, [75000], 4) > Yes, sort of. The application (Java, but nothing fancy here) does essentially the following: s = socket(...); // no explicit setting of SO_RCVBUF size, but the system default should be picked up (1MB as tcp_rmem shows) connect(s, ...); // now it goes and sets it setsockopt(s, SOL_SOCKET, SO_RCVBUF, 75000, ...); // then it goes to sleep for 15 mins sleep(...) The client machine has /proc/sys/net/ipv4/tcp_rmem: 131072 1048576 20971520 > >> I think that explains why >> the window size advertised by the client was much larger than >> expected. I see that the kernel does not want to shrink the >> previously advertised window without advancement in the sequence >> space. So my guess is that the client runs out of buffer and starts >> dropping packets. Not sure how to further debug this from userspace >> (systemtap? bpf?) - any tips on that front would be appreciated. > > > You could provide a packet capture (tcpdump) for a start ;) I might be able to share that (this is from a private network). In the meantime, if there's something specific I should look at there, I'd be happy to do that and report back. I understand that's not ideal, but it would be faster/easier. My own observation is that the client's last ACK has a window size of >300KB, which I'm pretty sure it doesn't have room for if the rcvbuf was shrunk after the setsockopt() set it to 75000 (I understand the kernel actually reserves more than that, but even if it's double, that's still far less than room for 300KB. Needless to say, if I move the setsockopt(s, SOL_SOCKET, SO_RCVBUF, 75000, ...) prior to connect(s, ...), then everything works fine - we hit a "persist" state, and there's zero window alert and probing by the server. I've tried a few other buffer sizes, including smallish ones like 4KB and 8KB, and they all work (no real surprise there, but was more of sanity checking). The fact that SO_RCVBUF is set after connect() is a bug in the code - no doubt about it. However, I'm surprised it wedges the stack like this. Another interesting bit is that if the client isn't put to sleep but allowed to read the bytes as they come in, then everything works fine as well. So it's not like the stack is broken outright - I need to put the client to sleep to hit this (but it reproduces 100% of the time thus far). Thanks Eric > > >
Re: TCP connection closed without FIN or RST
So this issue is somehow related to setting SO_RCVBUF *after* connecting the socket (from the client). The system is configured such that the default rcvbuf size is 1MB, but the code was shrinking this down to 75Kb right after connect(). I think that explains why the window size advertised by the client was much larger than expected. I see that the kernel does not want to shrink the previously advertised window without advancement in the sequence space. So my guess is that the client runs out of buffer and starts dropping packets. Not sure how to further debug this from userspace (systemtap? bpf?) - any tips on that front would be appreciated. Thanks again for the help. On Fri, Nov 3, 2017 at 5:33 PM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 14:28 -0400, Vitaly Davidovich wrote: > >> So Eric, while I still have your interest here (although I know it's >> waning :)), any code pointers to where I might look to see if a >> specific small-ish rcv buf size may interact poorly with the rest of >> the stack? Is it possible some buffer was starved in the client stack >> which prevented it from sending any segments to the server? Maybe the >> incoming retrans were actually dropped somewhere in the ingress pkt >> processing and so the stack doesn't know it needs to react to >> something? Pulling at straws here but clearly the recv buf size, and a >> somewhat small one at that, has some play. >> >> I checked dmesg (just in case something would pop up there) but didn't >> observe any warnings or anything interesting. > > I believe you could reproduce the issue with packetdrill. > > If you can provide a packetdrill file demonstrating the issue, that > would be awesome ;) > > >
Re: TCP connection closed without FIN or RST
On Fri, Nov 3, 2017 at 1:58 PM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 13:23 -0400, Vitaly Davidovich wrote: >> On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet wrote: >> > On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote: >> >> Ok, an interesting finding. The client was originally running with >> >> SO_RCVBUF of 75K (apparently someone decided to set that for some >> >> unknown reason). I tried the test with a 1MB recv buffer and >> >> everything works perfectly! The client responds with 0 window alerts, >> >> the server just hits the persist condition and sends keep-alive >> >> probes; the client continues answering with a 0 window up until it >> >> wakes up and starts processing data in its receive buffer. At that >> >> point, the window opens up and the server sends more data. Basically, >> >> things look as one would expect in this situation :). >> >> >> >> /proc/sys/net/ipv4/tcp_rmem is 131072 1048576 20971520. The >> >> conversation flows normally, as described above, when I change the >> >> client's recv buf size to 1048576. I also tried 131072, but that >> >> doesn't work - same retrans/no ACKs situation. >> >> >> >> I think this eliminates (right?) any middleware from the equation. >> >> Instead, perhaps it's some bad interaction between a low recv buf size >> >> and either some other TCP setting or TSO mechanics (LRO specifically). >> >> Still investigating further. >> > >> > Just in case, have you tried a more recent linux kernel ? >> I haven't but will look into that. I was mostly hoping to see if >> anyone perhaps has seen similar symptoms/behavior and figured out what >> the root cause is - just a stab in the dark with the well-informed >> folks on this list :). As of right now, based on the fact that a 1MB >> recv buffer works, I would surmise the issue is perhaps some poor >> interaction between a lower recv buffer size and some other tcp >> settings. But I'm just speculating - will continue investigating, and >> I'll update this thread if I get to the bottom of it. >> > >> > I would rather not spend time on some problem that might already be >> > fixed. >> Completely understandable - I really appreciate the tips and pointers >> thus far Eric, they've been helpful in their own right. > > I am interested to see if the issue with small sk_rcvbuf is still there. > > We have an upcoming change to rcvbuf autotuning to not blindly give > tcp_rmem[2] to all sockets, but use a function based on RTT. > > Meaning that local flows could use small sk_rcvbuf instead of inflated > ones. > > And meaning that we could increase tcp_rmem[2] to better match modern > capabilities (more memory on hosts, larger BDP) So Eric, while I still have your interest here (although I know it's waning :)), any code pointers to where I might look to see if a specific small-ish rcv buf size may interact poorly with the rest of the stack? Is it possible some buffer was starved in the client stack which prevented it from sending any segments to the server? Maybe the incoming retrans were actually dropped somewhere in the ingress pkt processing and so the stack doesn't know it needs to react to something? Pulling at straws here but clearly the recv buf size, and a somewhat small one at that, has some play. I checked dmesg (just in case something would pop up there) but didn't observe any warnings or anything interesting. > > >
Re: TCP connection closed without FIN or RST
On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote: >> Ok, an interesting finding. The client was originally running with >> SO_RCVBUF of 75K (apparently someone decided to set that for some >> unknown reason). I tried the test with a 1MB recv buffer and >> everything works perfectly! The client responds with 0 window alerts, >> the server just hits the persist condition and sends keep-alive >> probes; the client continues answering with a 0 window up until it >> wakes up and starts processing data in its receive buffer. At that >> point, the window opens up and the server sends more data. Basically, >> things look as one would expect in this situation :). >> >> /proc/sys/net/ipv4/tcp_rmem is 131072 1048576 20971520. The >> conversation flows normally, as described above, when I change the >> client's recv buf size to 1048576. I also tried 131072, but that >> doesn't work - same retrans/no ACKs situation. >> >> I think this eliminates (right?) any middleware from the equation. >> Instead, perhaps it's some bad interaction between a low recv buf size >> and either some other TCP setting or TSO mechanics (LRO specifically). >> Still investigating further. > > Just in case, have you tried a more recent linux kernel ? I haven't but will look into that. I was mostly hoping to see if anyone perhaps has seen similar symptoms/behavior and figured out what the root cause is - just a stab in the dark with the well-informed folks on this list :). As of right now, based on the fact that a 1MB recv buffer works, I would surmise the issue is perhaps some poor interaction between a lower recv buffer size and some other tcp settings. But I'm just speculating - will continue investigating, and I'll update this thread if I get to the bottom of it. > > I would rather not spend time on some problem that might already be > fixed. Completely understandable - I really appreciate the tips and pointers thus far Eric, they've been helpful in their own right. > > >
Re: TCP connection closed without FIN or RST
Ok, an interesting finding. The client was originally running with SO_RCVBUF of 75K (apparently someone decided to set that for some unknown reason). I tried the test with a 1MB recv buffer and everything works perfectly! The client responds with 0 window alerts, the server just hits the persist condition and sends keep-alive probes; the client continues answering with a 0 window up until it wakes up and starts processing data in its receive buffer. At that point, the window opens up and the server sends more data. Basically, things look as one would expect in this situation :). /proc/sys/net/ipv4/tcp_rmem is 131072 1048576 20971520. The conversation flows normally, as described above, when I change the client's recv buf size to 1048576. I also tried 131072, but that doesn't work - same retrans/no ACKs situation. I think this eliminates (right?) any middleware from the equation. Instead, perhaps it's some bad interaction between a low recv buf size and either some other TCP setting or TSO mechanics (LRO specifically). Still investigating further. On Fri, Nov 3, 2017 at 10:02 AM, Vitaly Davidovich wrote: > On Fri, Nov 3, 2017 at 9:39 AM, Vitaly Davidovich wrote: >> On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet wrote: >>> On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote: >>>> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote: >>>> > Hi Eric, >>>> > >>>> > Ran a few more tests yesterday with packet captures, including a >>>> > capture on the client. It turns out that the client stops ack'ing >>>> > entirely at some point in the conversation - the last advertised >>>> > client window is not even close to zero (it's actually ~348K). So >>>> > there's complete radio silence from the client for some reason, even >>>> > though it does send back ACKs early on in the conversation. So yes, >>>> > as far as the server is concerned, the client is completely gone and >>>> > tcp_retries2 rightfully breaches eventually once the server retrans go >>>> > unanswered long (and for sufficient times) enough. >>>> > >>>> > What's odd though is the packet capture on the client shows the server >>>> > retrans packets arriving, so it's not like the segments don't reach >>>> > the client. I'll keep investigating, but if you (or anyone else >>>> > reading this) knows of circumstances that might cause this, I'd >>>> > appreciate any tips on where/what to look at. >>>> >>>> >>>> Might be a middle box issue ? Like a firewall connection tracking >>>> having some kind of timeout if nothing is sent on one direction ? >>>> >>>> What output do you have from client side with : >>>> >>>> ss -temoi dst >>> >>> It also could be a wrapping issue on TCP timestamps. >>> >>> You could try disabling tcp timestamps, and restart the TCP flow. >>> >>> echo 0 >/proc/sys/net/ipv4/tcp_timestamps >> Ok, I will try to do that. Thanks for the tip. > Tried with tcp_timestamps disabled on the client (didn't touch the > server), but that didn't change the outcome - same issue at the end. >>> >>> >>> >>> >>>
Re: TCP connection closed without FIN or RST
On Fri, Nov 3, 2017 at 9:39 AM, Vitaly Davidovich wrote: > On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet wrote: >> On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote: >>> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote: >>> > Hi Eric, >>> > >>> > Ran a few more tests yesterday with packet captures, including a >>> > capture on the client. It turns out that the client stops ack'ing >>> > entirely at some point in the conversation - the last advertised >>> > client window is not even close to zero (it's actually ~348K). So >>> > there's complete radio silence from the client for some reason, even >>> > though it does send back ACKs early on in the conversation. So yes, >>> > as far as the server is concerned, the client is completely gone and >>> > tcp_retries2 rightfully breaches eventually once the server retrans go >>> > unanswered long (and for sufficient times) enough. >>> > >>> > What's odd though is the packet capture on the client shows the server >>> > retrans packets arriving, so it's not like the segments don't reach >>> > the client. I'll keep investigating, but if you (or anyone else >>> > reading this) knows of circumstances that might cause this, I'd >>> > appreciate any tips on where/what to look at. >>> >>> >>> Might be a middle box issue ? Like a firewall connection tracking >>> having some kind of timeout if nothing is sent on one direction ? >>> >>> What output do you have from client side with : >>> >>> ss -temoi dst >> >> It also could be a wrapping issue on TCP timestamps. >> >> You could try disabling tcp timestamps, and restart the TCP flow. >> >> echo 0 >/proc/sys/net/ipv4/tcp_timestamps > Ok, I will try to do that. Thanks for the tip. Tried with tcp_timestamps disabled on the client (didn't touch the server), but that didn't change the outcome - same issue at the end. >> >> >> >> >>
Re: TCP connection closed without FIN or RST
On Fri, Nov 3, 2017 at 9:02 AM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 06:00 -0700, Eric Dumazet wrote: >> On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote: >> > Hi Eric, >> > >> > Ran a few more tests yesterday with packet captures, including a >> > capture on the client. It turns out that the client stops ack'ing >> > entirely at some point in the conversation - the last advertised >> > client window is not even close to zero (it's actually ~348K). So >> > there's complete radio silence from the client for some reason, even >> > though it does send back ACKs early on in the conversation. So yes, >> > as far as the server is concerned, the client is completely gone and >> > tcp_retries2 rightfully breaches eventually once the server retrans go >> > unanswered long (and for sufficient times) enough. >> > >> > What's odd though is the packet capture on the client shows the server >> > retrans packets arriving, so it's not like the segments don't reach >> > the client. I'll keep investigating, but if you (or anyone else >> > reading this) knows of circumstances that might cause this, I'd >> > appreciate any tips on where/what to look at. >> >> >> Might be a middle box issue ? Like a firewall connection tracking >> having some kind of timeout if nothing is sent on one direction ? >> >> What output do you have from client side with : >> >> ss -temoi dst > > It also could be a wrapping issue on TCP timestamps. > > You could try disabling tcp timestamps, and restart the TCP flow. > > echo 0 >/proc/sys/net/ipv4/tcp_timestamps Ok, I will try to do that. Thanks for the tip. > > > > >
Re: TCP connection closed without FIN or RST
On Fri, Nov 3, 2017 at 9:00 AM, Eric Dumazet wrote: > On Fri, 2017-11-03 at 08:41 -0400, Vitaly Davidovich wrote: >> Hi Eric, >> >> Ran a few more tests yesterday with packet captures, including a >> capture on the client. It turns out that the client stops ack'ing >> entirely at some point in the conversation - the last advertised >> client window is not even close to zero (it's actually ~348K). So >> there's complete radio silence from the client for some reason, even >> though it does send back ACKs early on in the conversation. So yes, >> as far as the server is concerned, the client is completely gone and >> tcp_retries2 rightfully breaches eventually once the server retrans go >> unanswered long (and for sufficient times) enough. >> >> What's odd though is the packet capture on the client shows the server >> retrans packets arriving, so it's not like the segments don't reach >> the client. I'll keep investigating, but if you (or anyone else >> reading this) knows of circumstances that might cause this, I'd >> appreciate any tips on where/what to look at. > > > Might be a middle box issue ? Like a firewall connection tracking > having some kind of timeout if nothing is sent on one direction ? Yeah, that's certainly possible although I've not found evidence of that yet, including asking sysadmins. But it's definitely an avenue I'm going to walk a bit further down. > > What output do you have from client side with : > > ss -temoi dst I snipped some irrelevant info, like IP addresses, uid, inode number, etc. Client before it wakes up - the recvq has been at 125976 for the entire time it's been sleeping (15 minutes): State Recv-Q Send-Q ESTAB 125976 0 skmem:(r151040,rb15,t0,tb15,f512,w0,o0,bl0) ts sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448 cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140 While the server is on its last retrans timer, the client wakes up and slurps up its recv buffer: State Recv-Q Send-Q ESTAB 0 0 skmem:(r0,rb15,t0,tb15,f151552,w0,o0,bl0) ts sack scalable wscale:0,11 rto:208 rtt:4.664/8.781 ato:40 mss:1448 cwnd:10 send 24.8Mbps rcv_rtt:321786 rcv_space:524140 Here's the cmd output from the server right before the last retrans timer expires and the socket is aborted. Note that this output is after the client has drained its recv queue (the output right above): State Recv-Q Send-Q ESTAB 0 925272 timer:(on,14sec,15) skmem:(r0,rb10,t0,tb105,f2440,w947832,o0,bl0) ts sack scalable wscale:11,0 rto:12 rtt:9.69/16.482 ato:40 mss:1448 cwnd:1 ssthresh:89 send 1.2Mbps unacked:99 retrans:1/15 lost:99 rcv_rtt:4 rcv_space:28960 Also worth noting the server's sendq has been at 925272 the entire time as well. Does anything stand out here? I guess one thing that stands out to me (but that could be due to my lack of in-depth knowledge of this) is that the client rcv_space is significantly larger than the recvq. Thanks Eric! > >
Re: TCP connection closed without FIN or RST
Hi Eric, Ran a few more tests yesterday with packet captures, including a capture on the client. It turns out that the client stops ack'ing entirely at some point in the conversation - the last advertised client window is not even close to zero (it's actually ~348K). So there's complete radio silence from the client for some reason, even though it does send back ACKs early on in the conversation. So yes, as far as the server is concerned, the client is completely gone and tcp_retries2 rightfully breaches eventually once the server retrans go unanswered long (and for sufficient times) enough. What's odd though is the packet capture on the client shows the server retrans packets arriving, so it's not like the segments don't reach the client. I'll keep investigating, but if you (or anyone else reading this) knows of circumstances that might cause this, I'd appreciate any tips on where/what to look at. Thanks On Wed, Nov 1, 2017 at 7:06 PM, Eric Dumazet wrote: > On Wed, 2017-11-01 at 22:22 +, Vitaly Davidovich wrote: >> Eric, >> > >> Yes I agree. However the thing I’m still puzzled about is the client >> application is not reading/draining the recvq - ok, the client tcp >> stack should start advertising a 0 window size. Does a 0 window size >> count against the tcp_retries2? Is that what you were alluding to in >> your first reply? >> > > Every time we receive an (valid) ACK, with a win 0 or not, the counter > of attempts is cleared, given the opportunity for the sender to send 15 > more probes. >> >> If it *does* count towards the retries limit then a RST doesn’t seem >> like a bad idea. The client is responding with segments but the user >> app there just isn’t draining the data. Presumably that RST has a >> good chance of reaching the client and then unblocking the read() >> there with a peer reset error. Or am I missing something? >> >> >> If it doesn’t count towards the limit then I need to figure out why >> the 0 window size segments weren’t being sent by the client. > > Yes please :) >> >> >> I will try to double check that the client was indeed advertising 0 >> window size. There’s nothing special about that machine - it’s a >> 4.1.35 kernel as well. I wouldn’t expect the tcp stack there to be >> unresponsive just because the user app is sleeping. >> > > >
TCP connection closed without FIN or RST
Hi all, I'm seeing some puzzling TCP behavior that I'm hoping someone on this list can shed some light on. Apologies if this isn't the right forum for this type of question. But here goes anyway :) I have client and server x86-64 linux machines with the 4.1.35 kernel. I set up the following test/scenario: 1) Client connects to the server and requests a stream of data. The server (written in Java) starts to send data. 2) Client then goes to sleep for 15 minutes (I'll explain why below). 3) Naturally, the server's sendq fills up and it blocks on a write() syscall. 4) Similarly, the client's recvq fills up. 5) After 15 minutes the client wakes up and reads the data off the socket fairly quickly - the recvq is fully drained. 6) At about the same time, the server's write() fails with ETIMEDOUT. The server then proceeds to close() the socket. 7) The client, however, remains forever stuck in its read() call. When the client is stuck in read(), netstat on the server does not show the tcp connection - it's gone. On the client, netstat shows the connection with 0 recv (and send) queue size and in ESTABLISHED state. I have done a packet capture (using tcpdump) on the server, and expected to see either a FIN or RST packet to be sent to the client - neither of these are present. What is present, however, is a bunch of retrans from the server to the client, with what appears to be exponential backoff. However, the conversation just stops around the time when the ETIMEDOUT error occurred. I do not see any attempt to abort or gracefully shut down the TCP stream. When I strace the server thread that was blocked on write(), I do see the ETIMEDOUT error from write(), followed by a close() on the socket fd. Would anyone possibly know what could cause this? Or suggestions on how to troubleshoot further? In particular, are there any known cases where a FIN or RST wouldn't be sent after a write() times out due to too many retrans? I believe this might be related to the tcp_retries2 behavior (the system is configured with the default value of 15), where too many retrans attempts will cause write() to error with a timeout. My understanding is that this shouldn't do anything to the state of the socket on its own - it should stay in the ESTABLISHED state. But then presumably a close() should start the shutdown state machine by sending a FIN packet to the client and entering FIN WAIT1 on the server. Ok, as to why I'm doing a test where the client sleeps for 15 minutes - this is an attempt at reproducing a problem that I saw with a client that wasn't sleeping intentionally, but otherwise the situation appeared to be the same - the server write() blocked, eventually timed out, server tcp session was gone, but client was stuck in a read() syscall with the tcp session still in ESTABLISHED state. Thanks a lot ahead of time for any insights/help!
[no subject]
subscribe netdev Sent from my iPhone -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html