i'm running java TThreadedSelectorServer and THsHaServer based servers and both seem to be leaking sockets (thrift 0.9.0)
googling around searching for answers i keep running into https://issues.apache.org/jira/browse/THRIFT-1653 which puts the blame on the TCP config on the server while acknowledging that perhaps a problem in the application layer does exist (see last entry) i prefer not to mess with the TCP config on the machine because it is used for various tasks, also i did not have these issues with a TThreadPoolServer and a TSocket (blocking + TBufferedTransport) or any non-thrift server on the same machine. what happens is i get a bunch of TCP connections in a CLOSE_WAIT state and these remain in that state indefinitely. but what is even more concerning, i get many sockets that don't show up in netstat at all and only lsof can show me that they exist. on Linux lsof shows them as "can't identify protocol". according to https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/ these sockets are in a "half closed state" and the linux kernel has no idea what to do with them. i'm pretty sure there's a problem with misbehaving clients, but the server should not fall leak resources because of a client side bug. my only recourse is to run a cronjob that looks at the lsof output and restarts the server whenever the socket count gets dangerously close to "too many open files" (8192 in my case) any ideas? -- jules cisek | [email protected]
