Hello Michael, Thanks for all those information.
I corrected your suggested point (close parent process sockets). I also activated keepalive, with values adapted to my application. I hope this will solve my issue, but as the problem may take several weeks to occur, I will not know immediately if this was the origin :-) Many thanks for your help. Regards, Brice Le ven. 13 nov. 2020 à 18:52, Michael Wojcik <michael.woj...@microfocus.com> a écrit : > > From: Brice André <br...@famille-andre.be> > > Sent: Friday, 13 November, 2020 09:13 > > > "Does the server parent process close its copy of the conversation > socket?" > > I checked in my code, but it seems that no. Is it needed? > > You'll want to do it, for a few reasons: > > - You'll be leaking descriptors in the server, and eventually it will hit > its limit. > - If the child process dies without cleanly closing its end of the > conversation, > the parent will still have an open descriptor for the socket, so the > network stack > won't terminate the TCP connection. > - A related problem: If the child just closes its socket without calling > shutdown, > no FIN will be sent to the client system (because the parent still has its > copy of > the socket open). The client system will have the connection in one of the > termination > states (FIN_WAIT, maybe? I don't have my references handy) until it times > out. > - A bug in the parent process might cause it to operate on the connected > socket, > causing unexpected traffic on the connection. > - All such sockets will be inherited by future child processes, and one of > them might > erroneously perform some operation on one of them. Obviously there could > also be a > security issue with this, depending on what your application does. > > Basically, when a descriptor is "handed off" to a child process by > forking, you > generally want to close it in the parent, unless it's used for parent-child > communication. (There are some cases where the parent wants to keep it > open for > some reason, but they're rare.) > > On a similar note, if you exec a different program in the child process (I > wasn't > sure from your description), it's a good idea for the parent to set the > FD_CLOEXEC > option (with fcntl) on its listening socket and any other descriptors that > shouldn't > be passed along to child processes. You could close these manually in the > child > process between the fork and exec, but FD_CLOEXEC is often easier to > maintain. > > For some applications, you might just dup2 the socket over descriptor 0 or > descriptor 3, depending on whether the child needs access to stdio, and > then close > everything higher. > > Closing descriptors not needed by the child process is a good idea even if > you > don't exec, since it can prevent various problems and vulnerabilities that > result > from certain classes of bugs. It's a defensive measure. > > The best source for this sort of recommendation, in my opinion, remains W. > Richard > Stevens' /Advanced Programming in the UNIX Environment/. The book is old, > and Linux > isn't UNIX, but I don't know of any better explanation of how and why to > do things > in a UNIX-like OS. > > And my favorite source of TCP/IP information is Stevens' /TCP/IP > Illustrated/. > > > May it explain my problem? > > In this case, I don't offhand see how it does, but I may be overlooking > something. > > > I suppose that, if for some reason, the communication with the client is > lost > > (crash of client, loss of network, etc.) and keepalive is not enabled, > this may > > fully explain my problem ? > > It would give you those symptoms, yes. > > > If yes, do you have an idea of why keepalive is not enabled? > > The Host Requirements RFC mandates that it be disabled by default. I think > the > primary reasoning for that was to avoid re-establishing virtual circuits > (e.g. > dial-up connections) for long-running connections that had long idle > periods. > > Linux may well have a kernel tunable or similar to enable TCP keepalive by > default, but it seems to be switched off on your system. You'd have to > consult > the documentation for your distribution, I think. > > By default (again per the Host Requirements RFC), it takes quite a long > time for > TCP keepalive to detect a broken connection. It doesn't start probing > until the > connection has been idle for 2 hours, and then you have to wait for the TCP > retransmit timer times the retransmit count to be exhausted - typically > over 10 > minutes. Again, some OSes let you change these defaults, and some let you > change > them on an individual connection. > > -- > Michael Wojcik > >