I'm not sure it is the same problem, but last time I had an hanging in the TTransport part it was due to a DNS misconfiguration that lead to big delays in all functions based on the dns resolver.
On Sun, 23 May 2021 at 14:12, Buster, James <james.bus...@transunion.com.invalid> wrote: > My server gets permanently hung after seeing this internal exception, from > lib/cpp/src/thrift/transport/TBufferTransports.h: > > void consume(uint32_t len) { > countConsumedMessageBytes(len); > if (TDB_LIKELY(static_cast<ptrdiff_t>(len) <= rBound_ - rBase_)) { > rBase_ += len; > } else { > throw TTransportException(TTransportException::BAD_ARGS, "consume > did not follow a borrow."); > } > } > > Once this happens the server becomes unresponsive and all new clients > connect and then hang until TCP times out. > The thread stuck in epoll_wait acts as if it's ignoring everything after > the connection is established. It can take anywhere > from 10 minutes to 23 hours of heavy use before this hang condition > occurs, so it's hard to predict and there's no clear > test case (because if I had one I presumably could make it hang > immediately). >