My server gets permanently hung after seeing this internal exception, from 
lib/cpp/src/thrift/transport/TBufferTransports.h:

  void consume(uint32_t len) {
    countConsumedMessageBytes(len);
    if (TDB_LIKELY(static_cast<ptrdiff_t>(len) <= rBound_ - rBase_)) {
      rBase_ += len;
    } else {
      throw TTransportException(TTransportException::BAD_ARGS, "consume did not 
follow a borrow.");
    }
  }

Once this happens the server becomes unresponsive and all new clients connect 
and then hang until TCP times out.
The thread stuck in epoll_wait acts as if it's ignoring everything after the 
connection is established. It can take anywhere
from 10 minutes to 23 hours of heavy use before this hang condition occurs, so 
it's hard to predict and there's no clear
test case (because if I had one I presumably could make it hang immediately).

Reply via email to