I see. So, this is a slow network. You’d like a heuristic that puts Curator 
into SUSPENDED mode when the network performance drops. Sounds interesting to 
me. 

-JZ

From: Jeremy Stribling Jeremy Stribling
Reply: Jeremy Stribling [email protected]
Date: February 27, 2014 at 11:15:46 AM
To: Jordan Zimmerman [email protected], [email protected] 
[email protected]
Subject:  Re: adding a "network timeout" to curator?  
Please correct me if I'm wrong, but I thought Curator went into SUSPENDED mode 
when it gets a Disconnected state event from its ZK client.  That is not 
necessarily the same as a network issue, because that ZK keepalive could be 
stuck in the ZK server processing queue, blocked on a slow disk.  What I'm 
proposing would be a true, network-only timeout that could be used to declare a 
client disconnected quickly if there's a network issue, without having to 
reduce the ZK session timeout so low that a slow disk would cause false 
negatives.  Does that make sense?

Jeremy

On 02/26/2014 09:25 PM, Jordan Zimmerman wrote:
Curator should already go into SUSPENDED when there is a connection issue, 
right? How would this be different?

-JZ

From: Jeremy Stribling Jeremy Stribling
Reply: [email protected] [email protected]
Date: February 26, 2014 at 7:56:26 AM
To: [email protected] [email protected]
Subject:  adding a "network timeout" to curator?
Hi all,

I started a thread on the ZK list a while back about timeouts in ZK.
You can find it in the archives here:

http://mail-archives.apache.org/mod_mbox/zookeeper-user/201309.mbox/%[email protected]%3E

The basic idea is that when ZK is running on a node with slow disks
(e.g., in a VM), you might want to set your session timeout to a long
value (e.g., 30 seconds or 60 seconds), but still detect network
timeouts quickly. On that thread, Michi proposed using 'ruok' commands
from the client to test network connectivity, along with the normal
client pings happening in the background to detect server slowness.

I was wondering if this would make sense to provide as part of the
Curator Framework or Client. There could be some background thread
sending 'ruok' commands to whatever server the client is connected to,
and going into SUSPENDED (or LOST?) mode when it hits a timeout or gets
a failure back. We might be able to implement something like that here
and contribute it back, if it sounds interesting to other people and we
can agree on a design. Any thoughts?

Jeremy

Reply via email to