I'm not sure how DNS is involved here, this exception looks like a Thrift 
internal error. Unfortunately I don't know enough
about Thrift internals to know how this error can occur, and it appears Thrift 
doesn't properly recover from the exception,
causing the hanging I'm seeing. This looks like an exception that is expected 
to never happen.

-----Original Message-----
I'm not sure it is the same problem, but last time I had an hanging in the 
TTransport part it was due to a DNS misconfiguration that lead to big delays in 
all functions based on the dns resolver.

On Sun, 23 May 2021 at 14:12, Buster, James 
<james.bus...@transunion.com.invalid> wrote:

> My server gets permanently hung after seeing this internal exception, 
> from
> lib/cpp/src/thrift/transport/TBufferTransports.h:
>
>   void consume(uint32_t len) {
>     countConsumedMessageBytes(len);
>     if (TDB_LIKELY(static_cast<ptrdiff_t>(len) <= rBound_ - rBase_)) {
>       rBase_ += len;
>     } else {
>       throw TTransportException(TTransportException::BAD_ARGS, 
> "consume did not follow a borrow.");
>     }
>   }
>
> Once this happens the server becomes unresponsive and all new clients 
> connect and then hang until TCP times out.
> The thread stuck in epoll_wait acts as if it's ignoring everything 
> after the connection is established. It can take anywhere from 10 
> minutes to 23 hours of heavy use before this hang condition occurs, so 
> it's hard to predict and there's no clear test case (because if I had 
> one I presumably could make it hang immediately).
>

Reply via email to