Hello, I am running some simple tests around the connection state listener behavior. I use a regular 3 nodes ensemble, 1 of them being down, I start/stop a second one to trigger an outage of the ensemble.
I use: - connection timeout : 18 seconds - session timeout : 72 seconds - retry interval : 5 seconds Case 0: there is no retry: - the switch SUSPENDED -> LOST takes less than a second - the background retry goes on for 18 seconds Case 1: there is 1 retry: - the switch SUSPENDED -> LOST takes 7 seconds - the background retry goes on for 41 seconds Case 2: there is 2 retries: - the switch SUSPENDED -> LOST takes 12 seconds - the background retry goes on for 64 seconds I expected to see the same numbers, i.e. I thought that we received a LOST event when Curator gave up trying. But apparently the duration of the background retries is this: *connectionTimeout * nbRetries + retryInterval * max(0, nbRetries-1)* Why is it linked to the connectionTimeout since the connection fails before that (case 0, 1 and 2 all go into LOST state in less than 18 seconds) According to http://curator.apache.org/errors.html , LOST means that "the connection is confirmed to be lost." So a LOST state is when I lose my ephemeral nodes (for example). Is that correct? Then I am wondering why it would be different whether we have 0, 1 or 2 retries? Thanks for your insights, Benjamin
