Hi,
Zookeeper implements a delay of up to 1 second before trying to reconnect.

ClientCnxn$SendThread
        @Override
        public void run() {
            ...
            while (state.isAlive()) {
                try {
                    if (!clientCnxnSocket.isConnected()) {
                        if(!isFirstConnect){
                            try {
                                Thread.sleep(r.nextInt(1000));
                            } catch (InterruptedException e) {
                                LOG.warn("Unexpected exception", e);
                            }

This creates "outages" (even with simple retry on ConnectionLoss) up to 1s even with perfectly healthy cluster like in scenario of rolling restart. In our scenario it might be a problem under high load creating a spike in a number of requests waiting on zk operation. Would it be a better strategy to perform reconnect attempt immediately at least one time? Or there is more to it?

Regards,
Sergei



This e-mail message and all attachments transmitted with it may contain 
privileged and/or confidential information intended solely for the use of the 
addressee(s). If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, 
forwarding or other use of this message or its attachments is strictly 
prohibited. If you have received this message in error, please notify the 
sender immediately and delete this message, all attachments and all copies and 
backups thereof.

Reply via email to