My server gets permanently hung after seeing this internal exception, from lib/cpp/src/thrift/transport/TBufferTransports.h:
void consume(uint32_t len) { countConsumedMessageBytes(len); if (TDB_LIKELY(static_cast<ptrdiff_t>(len) <= rBound_ - rBase_)) { rBase_ += len; } else { throw TTransportException(TTransportException::BAD_ARGS, "consume did not follow a borrow."); } } Once this happens the server becomes unresponsive and all new clients connect and then hang until TCP times out. The thread stuck in epoll_wait acts as if it's ignoring everything after the connection is established. It can take anywhere from 10 minutes to 23 hours of heavy use before this hang condition occurs, so it's hard to predict and there's no clear test case (because if I had one I presumably could make it hang immediately).