Looks like there is also
https://issues.apache.org/jira/browse/ACCUMULO-1268 which may be relevant.
Josh Elser wrote:
This is a known deficiency that exists in the current API; the
implementation tends to retry indefinitely and quickly.
This tends to work well when the services are functioning or failing
"normally". If your DNS failure is transient, you should recover
automatically, but, if it's an extended failure, you'll sit there like
you're observing.
It's hard to draw the line between "expected" or recoverable failures
and failures that you want to propagate back to your client. I'm not
sure if this is something that's planning on being addressed in the new
client API or not (https://issues.apache.org/jira/browse/ACCUMULO-2589).
Ariel Valentin wrote:
We have a very peculiar situation, where a DNS failure is causing our
application to hang.
Based on the trace debugging logs it appears that the ThriftScanner
encounters a TTransportException, which was caused by an
UnknownHostException. It seems to then retry a few seconds later.
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.accumulo/accumulo-core/1.6.0-cdh4.6.0/org/apache/accumulo/core/client/impl/ThriftScanner.java/#124
https://gist.github.com/arielvalentin/794415d1744e52984d0d
After tracing the code a bit I realized that we could mitigate the
"hanging" by setting a timeout on our scans/writes however I would
prefer that the client would fail faster if it could not resolve the
hostnames of the TServers it found in zookeeper.
Thoughts? Concerns? Opinions?
Ariel Valentin
e-mail: [email protected] <mailto:[email protected]>
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect