Re: DNS Failures

Josh Elser Sun, 09 Nov 2014 16:54:57 -0800

This is a known deficiency that exists in the current API; theimplementation tends to retry indefinitely and quickly.

This tends to work well when the services are functioning or failing"normally". If your DNS failure is transient, you should recoverautomatically, but, if it's an extended failure, you'll sit there likeyou're observing.

It's hard to draw the line between "expected" or recoverable failuresand failures that you want to propagate back to your client. I'm notsure if this is something that's planning on being addressed in the newclient API or not (https://issues.apache.org/jira/browse/ACCUMULO-2589).


Ariel Valentin wrote:

We have a very peculiar situation, where a DNS failure is causing our
application to hang.

Based on the trace debugging logs it appears that the ThriftScanner
encounters a TTransportException, which was caused by an
UnknownHostException. It seems to then retry a few seconds later.

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.accumulo/accumulo-core/1.6.0-cdh4.6.0/org/apache/accumulo/core/client/impl/ThriftScanner.java/#124

https://gist.github.com/arielvalentin/794415d1744e52984d0d

After tracing the code a bit I realized that we could mitigate the
"hanging" by setting a timeout on our scans/writes however I would
prefer that the client would fail faster if it could not resolve the
hostnames of the TServers it found in zookeeper.

Thoughts? Concerns? Opinions?

Ariel Valentin
e-mail: [email protected] <mailto:[email protected]>
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect

Re: DNS Failures

Reply via email to