[ https://issues.apache.org/jira/browse/THRIFT-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Wilson-Brown updated THRIFT-748: ------------------------------------ Description: If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(), the child process can terminate the parent processes' connection by deleting its copy of the parent TSocket. In particular, the default setting of lingerOn_ = 1 causes a RST to be sent in close(socket_) in TSocket->close() Discussion: This behaviour is identical to the behaviour of unix sockets when SO_LINGER is set (implementations vary). However, the SO_LINGER default for sockets is off not on. This provides unexpected behaviour in TSocket. This design choice makes it really difficult to program a Thrift client that forks other clients in C++, as the first process to call TSocket->close() terminates all copies of the connection. The processes all have to call TSocket->setLinger(0,0) or (1,timeout) before deleting the TSocket, closing the TSocket, or exiting. (This workaround only succeeds with the suggested fix in [#THRIFT-747] ). However, the design choice also prevents deadlock/slowdown issues where a forked process holds open a copy of the parent's Thrift connections. It also makes close non-blocking, which is ideal in a destructor. The design choice may also be an attempt to implement the block to send then close behaviour described in http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable However, the default linger interval of 0 turns the linger setting into a hard reset. And in the absence of linger, the kernel can usually send small thrift messages by itself. Options: * Change the default lingerOn to 0 - rely on the kernel to resend a limited number of times * Change the default lingerVal to > 0 - a large value like INT_MAX would match the default connection, send, and recv 'no timeout' behaviour TODO: * Confirm issue on Linux - see attached test code * Decide if a change to the defaults is needed * Document workaround after resolution of [#THRIFT-747] - call TSocket->setLinger(0,0) or (1,timeout) if forking was: If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(), the child process can terminate the parent processes' connection by deleting its copy of the parent TSocket. In particular, the default setting of lingerOn_ = 1 causes a RST to be sent in close(socket_) in TSocket->close() Discussion: This behaviour is identical to the behaviour of unix sockets when SO_LINGER is set (implementations vary). However, the SO_LINGER default for sockets is off not on. This provides unexpected behaviour in TSocket. This design choice makes it really difficult to program a Thrift client that forks other clients in C++, as the first process to call TSocket->close() terminates all copies of the connection. The processes all have to call TSocket->setLinger(0,0) before deleting the TSocket, closing the TSocket, or exiting. (This workaround only succeeds with the suggested fix in [#THRIFT-747] ). However, the design choice also prevents deadlock/slowdown issues where a forked process holds open a copy of the parent's Thrift connections. It also makes close non-blocking, which is ideal in a destructor. Options: Do we want to change the default? What is linger useful for? TODO: * Confirm issue on Linux - see attached test code * Decide if a code change is needed * Document workaround after resolution of [#THRIFT-747] - call TSocket->setLinger(0,0) if forking Added notes about article at http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable describing reliable TCP communication > C++ TSocket default linger setting breaks forked parent process > --------------------------------------------------------------- > > Key: THRIFT-748 > URL: https://issues.apache.org/jira/browse/THRIFT-748 > Project: Thrift > Issue Type: Bug > Components: Library (C++) > Affects Versions: 0.2, 0.3 > Environment: Cygwin 1.7.1 on Windows XP SP3, Thrift 0.2.0 & r760184 & > Trunk > Reporter: Tim Wilson-Brown > Priority: Trivial > Attachments: thrift_linger_example.cpp > > Original Estimate: 72h > Remaining Estimate: 72h > > If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(), > the child process can terminate the parent processes' connection by deleting > its copy of the parent TSocket. > In particular, > the default setting of lingerOn_ = 1 causes a RST to be sent in > close(socket_) in TSocket->close() > Discussion: > This behaviour is identical to the behaviour of unix sockets when SO_LINGER > is set (implementations vary). > However, the SO_LINGER default for sockets is off not on. This provides > unexpected behaviour in TSocket. > This design choice makes it really difficult to program a Thrift client that > forks other clients in C++, as the first process to call TSocket->close() > terminates all copies of the connection. The processes all have to call > TSocket->setLinger(0,0) or (1,timeout) before deleting the TSocket, closing > the TSocket, or exiting. (This workaround only succeeds with the suggested > fix in [#THRIFT-747] ). > However, the design choice also prevents deadlock/slowdown issues where a > forked process holds open a copy of the parent's Thrift connections. It also > makes close non-blocking, which is ideal in a destructor. > The design choice may also be an attempt to implement the block to send then > close behaviour described in > http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable > However, the default linger interval of 0 turns the linger setting into a > hard reset. > And in the absence of linger, the kernel can usually send small thrift > messages by itself. > Options: > * Change the default lingerOn to 0 - rely on the kernel to resend a limited > number of times > * Change the default lingerVal to > 0 > - a large value like INT_MAX would match the default connection, send, > and recv 'no timeout' behaviour > TODO: > * Confirm issue on Linux - see attached test code > * Decide if a change to the defaults is needed > * Document workaround after resolution of [#THRIFT-747] - call > TSocket->setLinger(0,0) or (1,timeout) if forking -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.