Re: Weird ephemeral node issue
Thanks for the explaination! I suggest this goes to the wiki.. the client only finds out about session expiration events when the client reconnects to the cluster. if zk tells a client that its session is expired, the ephemerals that correspond to that session will already be cleaned up. - deletion of an ephemeral file due to loss of client connection will occur after the client gets a connection loss - deletion of an ephemeral file will precede delivery of a session expiration event to the owner So session expirations means two things here : server view(ephemeral clean up) & client view(event delivery) , there are no guarantee how long it will take in between, correct? I guess the confusion rises from the documention which doesn't distinguish these two concepts, e.g. in the javadoc http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html An ephemeral node will be removed by the ZooKeeper automatically when the session associated with the creation of the node expires. It is actually refering to the server view not the client view. On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning wrote: > Uncharacteristically, I think that Ben's comments could use a little bit of > amplification. > > First, ZK is designed with certain guarantees in mind and almost all > operational characteristics flow logically from these guarantees. > > The guarantee that Ben mentioned here in passing is that if a client gets > session expiration, it is *guaranteed* that the ephemerals have been > cleaned > up. This guarantee is what drives the notification of session expiration > after reconnection since while the client is disconnected, it cannot know > if > the cluster is operating correctly or not and thus cannot know if the > ephemerals have been cleaned up yet. The only way to have certain > knowledge > that the cluster has cleaned up the ephemerals is to get back in touch with > an operating cluster. > > The client is not completely in the dark. As Ben implied, it can know that > the cluster is unavailable (it got a ConnectionLoss event, after all). > While the cluster is unavailable and before it gets a session expiration > notification, the client can go into safe mode. > > The moral of this story is that to get the most out of ZK, it is best to > adopt the same guarantee based design process that drove ZK in the first > place. The first step is that you have to decide what guarantees that you > want to provide and then work from ZK's guarantees to get to yours. > > In the classic leader-election use of ZK, the key guarantee that we want > is: > > - the number of leaders is less than or equal to 1 > > Note that you can't guarantee that the number == 1, because other stuff > could happen. This has nothing to do with ZK. > > The pertinent ZK guarantees are: > > - an ephemeral file can only be created by a single session > > - deletion of an ephemeral file due to loss of client connection will occur > after the client gets a connection loss > > - deletion of an ephemeral file will precede delivery of a session > expiration event to the owner > > Phrased in terms of CSP-like constructs, the client has events > BecomeMaster, > EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur > according to this grammar: > > client := ( > (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; > EnterSafeMode?; RelinquishMaster) > | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash) > | Crash > )* > > To get the guarantees that we want, we can require the client to only do > BecomeMaster after it creates an ephemeral file and require it to either > Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is > deleted. The only way that we can do that is to immediately do > EnterSafeMode on connection loss and then do RelinquishMaster on session > expiration or ExitSafeMode on connection restored. It is involved, but you > can actually do a proof of correctness from this that shows that your > guarantee will be honored even in the presence of ZK or the client crashing > or being partitioned. > > > > On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed > wrote: > > > there are two things to keep in mind when thinking about this issue: > > > > 1) if a zk client is disconnected from the cluster, the client is > > essentially in limbo. because the client cannot talk to a server it > cannot > > know if its session is still alive. it also cannot close its session. > > > > 2) the client only finds out about session expiration events when the > > client reconnects to the cluster. if zk tells a client that its session > is > > expired, the ephemerals that correspond to that session will already be > > cleaned up. > > > > one of the main design points about zk is that zk only gives correct > > information. if zk cannot give correct information, it basically says "i > > don't know". connection loss exceptions and disconnected states are > > basically "i don't know". > > > > generally applications w
Re: Weird ephemeral node issue
Hi Vishal, It is in the prod env, the process has been restarted already:-(, I checked the zookeeper log file(loglevel=ERROR), it is empty. Here is the ZK config: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/home/admin/TimeTunnel2/zookeeper/zoo # the port at which the clients will connect clientPort=32181 server.1=tt2config019072.cm3:32888:33888 server.2=tt2config023132.cm3:32888:33888 server.3=tt2config024079.cm3:32888:33888 server.4=tt2config017052.cm4:32888:33888 server.5=tt2config021101.cm4:32888:33888 Hardware/software Config: Processors: 4 x Xeon E5410 2.33GHz Memory: 4GB Network:eth0: 00:16:3e:17:13:48 //Gigabyte network OS: RHEL Server 5.4 (Tikanga), Linux 2.6.18-164.el5xen x86_64, 64-bit Zookeeper Release 3.2.2 On Tue, Aug 17, 2010 at 8:40 PM, Vishal K wrote: > Hi Qing, > > Can you list the znodes from the monitor and from the node that the monitor > is restarting (run zkCli.sh on both machines). > I am curious to see if the node that did not receive the SESSION_EXPIRED > event still has the znode in its database. > Also can you describe your setiup? Can you send out logs and zoo.cfg file. > Thanks. > > -Vishal > On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan wrote: > > > Forget to mention: the process looks fine, nomal memory foot print and > > cpu > > usage, generate expected results, only thing is missing > > the ephermenal node in ZK. > > >
Re: ZK monitoring
You should also take a look at ZOOKEEPER-744 [1] and ZOOKEEPER-799 [2] The archive from 799 contains ready to be used scripts for monitoring ZooKeeper using Ganglia, Nagios and Cacti. Let me know if you need more help. [1] https://issues.apache.org/jira/browse/ZOOKEEPER-744 [2] https://issues.apache.org/jira/browse/ZOOKEEPER-799 On Tue, Aug 17, 2010 at 9:50 PM, Jun Rao wrote: > Hi, > > Is there a way to see the current leader and a list of followers from a > single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter > commands) only provides info local to a node. > > Thanks, > > Jun > -- Andrei Savu
Re: ZK monitoring
It's not possible. You need to query all the servers in order to know who is the current leader. It should be pretty simple to implement this by parsing the output from the 'stat' 4-letter command. On Tue, Aug 17, 2010 at 9:50 PM, Jun Rao wrote: > Hi, > > Is there a way to see the current leader and a list of followers from a > single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter > commands) only provides info local to a node. > > Thanks, > > Jun > -- Andrei Savu
ZK monitoring
Hi, Is there a way to see the current leader and a list of followers from a single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter commands) only provides info local to a node. Thanks, Jun
Re: A question about Watcher
All servers keep a copy - so you can shutdown the zk service entirely (all servers) and restart it and the sessions are maintained. Patrick On 08/16/2010 06:34 PM, Qian Ye wrote: Thx Mahadev and Benjamin, it seems that I've got some misunderstanding about the client. I will check it out. Another relevant question. I noticed that the master zookeeper server keep a track of all the client session which connects to every zookeeper server in the same cluster. So when a slave zookeeper server failed, the clients it served, can switch to another zookeeper server and keep their old session (the new zookeeper server can get the session information from the master). My question is, if the master failed, does that means some session information will definitely be lost? thx~ On Tue, Aug 17, 2010 at 12:40 AM, Benjamin Reed wrote: the client does keep track of the watches that it has outstanding. when it reconnects to a new server it tells the server what it is watching for and the last view of the system that it had. ben On 08/16/2010 09:28 AM, Qian Ye wrote: thx for explaination. Since the watcher can be preserved when the client switch the zookeeper server it connects to, does that means all the watchers information will be saved on all the zookeeper servers? I didn't find any source of the client can hold the watchers information. On Tue, Aug 17, 2010 at 12:21 AM, Ted Dunning wrote: I should correct this. The watchers will deliver a session expiration event, but since the connection is closed at that point no further events will be delivered and the cluster will remove them. This is as good as the watchers disappearing. On Mon, Aug 16, 2010 at 9:20 AM, Ted Dunning wrote: The other is session expiration. Watchers do not survive this. This happens when a client does not provide timely evidence that it is alive and is marked as having disappeared by the cluster.
Re: client failure detectionin ZK
Generally it should be determined by your requirements around failure detection/recovery. The higher you set it the less susceptible to intermittent failures you rare (brief network outages say, or GC pauses on the client). However this means that it takes longer to discover/recover from a real failure. The lower you set it the faster you'll discover/recover from a real failure, but you also have the potential to see more "false positives". Setting this really depends on your use case(s) -- your application requirements. Typically I see btw 5 and 30 seconds being used. Patrick On 08/17/2010 08:51 AM, Jun Rao wrote: Thanks. Also, suppose that I know the average network latency, what's the rule of thumb to set the value of session timeout? Jun On Mon, Aug 16, 2010 at 1:55 PM, Patrick Hunt mailto:ph...@apache.org>> wrote: The session timeout is used for this: http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions Patrick On 08/16/2010 01:47 PM, Jun Rao wrote: Hi, What config parameters in ZK determine how soon a failed client is detected? Thanks, Jun
Re: Weird ephemeral node issue
Uncharacteristically, I think that Ben's comments could use a little bit of amplification. First, ZK is designed with certain guarantees in mind and almost all operational characteristics flow logically from these guarantees. The guarantee that Ben mentioned here in passing is that if a client gets session expiration, it is *guaranteed* that the ephemerals have been cleaned up. This guarantee is what drives the notification of session expiration after reconnection since while the client is disconnected, it cannot know if the cluster is operating correctly or not and thus cannot know if the ephemerals have been cleaned up yet. The only way to have certain knowledge that the cluster has cleaned up the ephemerals is to get back in touch with an operating cluster. The client is not completely in the dark. As Ben implied, it can know that the cluster is unavailable (it got a ConnectionLoss event, after all). While the cluster is unavailable and before it gets a session expiration notification, the client can go into safe mode. The moral of this story is that to get the most out of ZK, it is best to adopt the same guarantee based design process that drove ZK in the first place. The first step is that you have to decide what guarantees that you want to provide and then work from ZK's guarantees to get to yours. In the classic leader-election use of ZK, the key guarantee that we want is: - the number of leaders is less than or equal to 1 Note that you can't guarantee that the number == 1, because other stuff could happen. This has nothing to do with ZK. The pertinent ZK guarantees are: - an ephemeral file can only be created by a single session - deletion of an ephemeral file due to loss of client connection will occur after the client gets a connection loss - deletion of an ephemeral file will precede delivery of a session expiration event to the owner Phrased in terms of CSP-like constructs, the client has events BecomeMaster, EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur according to this grammar: client := ( (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; RelinquishMaster) | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash) | Crash )* To get the guarantees that we want, we can require the client to only do BecomeMaster after it creates an ephemeral file and require it to either Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is deleted. The only way that we can do that is to immediately do EnterSafeMode on connection loss and then do RelinquishMaster on session expiration or ExitSafeMode on connection restored. It is involved, but you can actually do a proof of correctness from this that shows that your guarantee will be honored even in the presence of ZK or the client crashing or being partitioned. On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed wrote: > there are two things to keep in mind when thinking about this issue: > > 1) if a zk client is disconnected from the cluster, the client is > essentially in limbo. because the client cannot talk to a server it cannot > know if its session is still alive. it also cannot close its session. > > 2) the client only finds out about session expiration events when the > client reconnects to the cluster. if zk tells a client that its session is > expired, the ephemerals that correspond to that session will already be > cleaned up. > > one of the main design points about zk is that zk only gives correct > information. if zk cannot give correct information, it basically says "i > don't know". connection loss exceptions and disconnected states are > basically "i don't know". > > generally applications we design go into a "safe" mode, meaning they may > serve reads but reject changes, when disconnected from zk and only kill > themselves when they find out their session has expired. > > ben > > ps - session information is replicated to all zk servers, so if a leader > dies, all replicas know the sessions that are currently active and their > timeouts. > > On 08/16/2010 09:03 PM, Ted Dunning wrote: > >> Ben or somebody else will have to repeat some of the detailed logic for >> this, but it has >> to do with the fact that you can't be sure what has happened during the >> network partition. >> One possibility is the one you describe, but another is that the partition >> happened because >> a majority of the ZK cluster lost power and you can't see the remaining >> nodes. Those nodes >> will continue to serve any files in a read-only fashion. If the partition >> involves you losing >> contact with the entire cluster at the same time a partition of the >> cluster >> into a quorum and >> a minority happens, then your ephemeral files could continue to exist at >> least until the breach >> in the cluster itself is healed. >> >> Suffice it to say that there are only a few strategies that leave you with >> a >> coherent picture >> of the universe. Importantly, you should
Re: Weird ephemeral node issue
there are two things to keep in mind when thinking about this issue: 1) if a zk client is disconnected from the cluster, the client is essentially in limbo. because the client cannot talk to a server it cannot know if its session is still alive. it also cannot close its session. 2) the client only finds out about session expiration events when the client reconnects to the cluster. if zk tells a client that its session is expired, the ephemerals that correspond to that session will already be cleaned up. one of the main design points about zk is that zk only gives correct information. if zk cannot give correct information, it basically says "i don't know". connection loss exceptions and disconnected states are basically "i don't know". generally applications we design go into a "safe" mode, meaning they may serve reads but reject changes, when disconnected from zk and only kill themselves when they find out their session has expired. ben ps - session information is replicated to all zk servers, so if a leader dies, all replicas know the sessions that are currently active and their timeouts. On 08/16/2010 09:03 PM, Ted Dunning wrote: Ben or somebody else will have to repeat some of the detailed logic for this, but it has to do with the fact that you can't be sure what has happened during the network partition. One possibility is the one you describe, but another is that the partition happened because a majority of the ZK cluster lost power and you can't see the remaining nodes. Those nodes will continue to serve any files in a read-only fashion. If the partition involves you losing contact with the entire cluster at the same time a partition of the cluster into a quorum and a minority happens, then your ephemeral files could continue to exist at least until the breach in the cluster itself is healed. Suffice it to say that there are only a few strategies that leave you with a coherent picture of the universe. Importantly, you shouldn't assume that the ephemerals will disappear at the same time as the session expiration event is delivered. On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan wrote: Ouch, is this the current ZK behavior? This is unexpected, if the client get partitioned from ZK cluster, he should get notified and take some action(e.g. commit suicide) otherwise how to tell a ephemeral node is really up or down? Zombie can create synchronization nightmares.. On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright wrote: Another possible cause for this that I ran into recently with the c client - you don't get the session expired notification until you are reconnected to the quorum and it informs you the session is lost. If you get disconnected and can't reconnect you won't get the notification. Personally I think the client api should track the session expiration time locally and information you once it's expired. On Aug 16, 2010 2:09 AM, "Qing Yan" wrote: Hi Ted, Do you mean GC problem can prevent delivery of SESSION EXPIRE event? Hum...so you have met this problem before? I didn't see any OOM though, will look into it more. On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning wrote: I am assuming that y...
Re: Weird ephemeral node issue
Hi Qing, Can you list the znodes from the monitor and from the node that the monitor is restarting (run zkCli.sh on both machines). I am curious to see if the node that did not receive the SESSION_EXPIRED event still has the znode in its database. Also can you describe your setiup? Can you send out logs and zoo.cfg file. Thanks. -Vishal On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan wrote: > Forget to mention: the process looks fine, nomal memory foot print and > cpu > usage, generate expected results, only thing is missing > the ephermenal node in ZK. >
Re: Weird ephemeral node issue
Forget to mention: the process looks fine, nomal memory foot print and cpu usage, generate expected results, only thing is missing the ephermenal node in ZK.