Hi,
Zookeeper implements a delay of up to 1 second before trying to reconnect.
ClientCnxn$SendThread
@Override
public void run() {
...
while (state.isAlive()) {
try {
if (!clientCnxnSocket.isConnected()) {
if(!isFirstConnect){
try {
Thread.sleep(r.nextInt(1000));
} catch (InterruptedException e) {
LOG.warn("Unexpected exception", e);
}
This creates "outages" (even with simple retry on ConnectionLoss) up to
1s even with perfectly healthy cluster like in scenario of rolling
restart. In our scenario it might be a problem under high load creating
a spike in a number of requests waiting on zk operation.
Would it be a better strategy to perform reconnect attempt immediately
at least one time? Or there is more to it?
Regards,
Sergei
This e-mail message and all attachments transmitted with it may contain
privileged and/or confidential information intended solely for the use of the
addressee(s). If the reader of this message is not the intended recipient, you
are hereby notified that any reading, dissemination, distribution, copying,
forwarding or other use of this message or its attachments is strictly
prohibited. If you have received this message in error, please notify the
sender immediately and delete this message, all attachments and all copies and
backups thereof.