Re: Weird ephemeral node issue

2010-08-17 Thread Qing Yan
Thanks for the explaination! I suggest this goes to the wiki..


the client only finds out about session expiration events when the client
reconnects to the cluster. if zk tells a client that its session is expired,
the ephemerals that correspond to that session will already be cleaned up.

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner


So session expirations means two things here : server view(ephemeral clean
up) & client view(event delivery) , there are
no guarantee how long it will take in between, correct?

I guess the confusion rises from the documention which doesn't distinguish
these two concepts, e.g. in the javadoc
http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html

An ephemeral node will be removed by the ZooKeeper automatically when the
session associated with the creation of the node expires.

It is actually refering to the server view not the client view.



On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning  wrote:

> Uncharacteristically, I think that Ben's comments could use a little bit of
> amplification.
>
> First, ZK is designed with certain guarantees in mind and almost all
> operational characteristics flow logically from these guarantees.
>
> The guarantee that Ben mentioned here in passing is that if a client gets
> session expiration, it is *guaranteed* that the ephemerals have been
> cleaned
> up.  This guarantee is what drives the notification of session expiration
> after reconnection since while the client is disconnected, it cannot know
> if
> the cluster is operating correctly or not and thus cannot know if the
> ephemerals have been cleaned up yet.  The only way to have certain
> knowledge
> that the cluster has cleaned up the ephemerals is to get back in touch with
> an operating cluster.
>
> The client is not completely in the dark.  As Ben implied, it can know that
> the cluster is unavailable (it got a ConnectionLoss event, after all).
>  While the cluster is unavailable and before it gets a session expiration
> notification, the client can go into safe mode.
>
> The moral of this story is that to get the most out of ZK, it is best to
> adopt the same guarantee based design process that drove ZK in the first
> place.  The first step is that you have to decide what guarantees that you
> want to provide and then work from ZK's guarantees to get to yours.
>
> In the classic leader-election use of ZK, the key guarantee that we want
> is:
>
> - the number of leaders is less than or equal to 1
>
> Note that you can't guarantee that the number == 1, because other stuff
> could happen.  This has nothing to do with ZK.
>
> The pertinent ZK guarantees are:
>
> - an ephemeral file can only be created by a single session
>
> - deletion of an ephemeral file due to loss of client connection will occur
> after the client gets a connection loss
>
> - deletion of an ephemeral file will precede delivery of a session
> expiration event to the owner
>
> Phrased in terms of CSP-like constructs, the client has events
> BecomeMaster,
> EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
> according to this grammar:
>
> client := (
>   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
> EnterSafeMode?; RelinquishMaster)
>  | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
>  | Crash
>  )*
>
> To get the guarantees that we want, we can require the client to only do
> BecomeMaster after it creates an ephemeral file and require it to either
> Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
> deleted.  The only way that we can do that is to immediately do
> EnterSafeMode on connection loss and then do RelinquishMaster on session
> expiration or ExitSafeMode on connection restored.  It is involved, but you
> can actually do a proof of correctness from this that shows that your
> guarantee will be honored even in the presence of ZK or the client crashing
> or being partitioned.
>
>
>
> On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed 
> wrote:
>
> > there are two things to keep in mind when thinking about this issue:
> >
> > 1) if a zk client is disconnected from the cluster, the client is
> > essentially in limbo. because the client cannot talk to a server it
> cannot
> > know if its session is still alive. it also cannot close its session.
> >
> > 2) the client only finds out about session expiration events when the
> > client reconnects to the cluster. if zk tells a client that its session
> is
> > expired, the ephemerals that correspond to that session will already be
> > cleaned up.
> >
> > one of the main design points about zk is that zk only gives correct
> > information. if zk cannot give correct information, it basically says "i
> > don't know". connection loss exceptions and disconnected states are
> > basically "i don't know".
> >
> > generally applications w

Re: Weird ephemeral node issue

2010-08-17 Thread Qing Yan
Hi Vishal,

  It is in the prod env, the process has been restarted already:-(, I
checked the zookeeper log file(loglevel=ERROR), it is empty.

Here is the ZK config:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/home/admin/TimeTunnel2/zookeeper/zoo
# the port at which the clients will connect
clientPort=32181

server.1=tt2config019072.cm3:32888:33888
server.2=tt2config023132.cm3:32888:33888
server.3=tt2config024079.cm3:32888:33888
server.4=tt2config017052.cm4:32888:33888
server.5=tt2config021101.cm4:32888:33888


Hardware/software Config:

Processors: 4 x Xeon E5410 2.33GHz
Memory: 4GB
Network:eth0: 00:16:3e:17:13:48 //Gigabyte network
OS: RHEL Server 5.4 (Tikanga), Linux 2.6.18-164.el5xen
x86_64, 64-bit
Zookeeper Release 3.2.2



On Tue, Aug 17, 2010 at 8:40 PM, Vishal K  wrote:

> Hi Qing,
>
> Can you list the znodes from the monitor and from the node that the monitor
> is restarting (run zkCli.sh on both machines).
> I am curious to see if the node that did not receive the SESSION_EXPIRED
> event still has the znode in its database.
> Also can you describe your setiup? Can you send out logs and zoo.cfg file.
> Thanks.
>
> -Vishal
> On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan  wrote:
>
> > Forget to mention:  the process looks fine,  nomal memory foot print and
> > cpu
> > usage, generate expected results, only thing is missing
> > the ephermenal node in ZK.
> >
>


Re: ZK monitoring

2010-08-17 Thread Andrei Savu
You should also take a look at ZOOKEEPER-744 [1] and ZOOKEEPER-799 [2]

The archive from 799 contains ready to be used scripts for monitoring
ZooKeeper using Ganglia, Nagios and Cacti.

Let me know if you need more help.

[1] https://issues.apache.org/jira/browse/ZOOKEEPER-744
[2] https://issues.apache.org/jira/browse/ZOOKEEPER-799

On Tue, Aug 17, 2010 at 9:50 PM, Jun Rao  wrote:
> Hi,
>
> Is there a way to see the current leader and a list of followers from a
> single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter
> commands) only provides info local to a node.
>
> Thanks,
>
> Jun
>



-- Andrei Savu


Re: ZK monitoring

2010-08-17 Thread Andrei Savu
It's not possible. You need to query all the servers in order to know
who is the current leader.

It should be pretty simple to implement this by parsing the output
from the 'stat' 4-letter command.

On Tue, Aug 17, 2010 at 9:50 PM, Jun Rao  wrote:
> Hi,
>
> Is there a way to see the current leader and a list of followers from a
> single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter
> commands) only provides info local to a node.
>
> Thanks,
>
> Jun
>



-- Andrei Savu


ZK monitoring

2010-08-17 Thread Jun Rao
Hi,

Is there a way to see the current leader and a list of followers from a
single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter
commands) only provides info local to a node.

Thanks,

Jun


Re: A question about Watcher

2010-08-17 Thread Patrick Hunt
All servers keep a copy - so you can shutdown the zk service entirely 
(all servers) and restart it and the sessions are maintained.


Patrick

On 08/16/2010 06:34 PM, Qian Ye wrote:

Thx Mahadev and Benjamin, it seems that I've got some misunderstanding about
the client. I will check it out.

Another relevant question. I noticed that the master zookeeper server keep a
track of all the client session which connects to every zookeeper server in
the same cluster. So when a slave zookeeper server failed, the clients it
served, can switch to another zookeeper server and keep their old session
(the new zookeeper server can get the session information from the master).
My question is, if the master failed, does that means some session
information will definitely be lost?

thx~

On Tue, Aug 17, 2010 at 12:40 AM, Benjamin Reed  wrote:


the client does keep track of the watches that it has outstanding. when it
reconnects to a new server it tells the server what it is watching for and
the last view of the system that it had.

ben


On 08/16/2010 09:28 AM, Qian Ye wrote:


thx for explaination. Since the watcher can be preserved when the client
switch the zookeeper server it connects to, does that means all the
watchers
information will be saved on all the zookeeper servers? I didn't find any
source of the client can hold the watchers information.


On Tue, Aug 17, 2010 at 12:21 AM, Ted Dunning
  wrote:




I should correct this.  The watchers will deliver a session expiration
event, but since the connection is closed at that point no further
events will be delivered and the cluster will remove them.  This is as
good
as the watchers disappearing.

On Mon, Aug 16, 2010 at 9:20 AM, Ted Dunning
wrote:




The other is session expiration.  Watchers do not survive this.  This
happens when a client does not provide timely
evidence that it is alive and is marked as having disappeared by the
cluster.



















Re: client failure detectionin ZK

2010-08-17 Thread Patrick Hunt
Generally it should be determined by your requirements around failure 
detection/recovery.


The higher you set it the less susceptible to intermittent failures you 
rare (brief network outages say, or GC pauses on the client). However 
this means that it takes longer to discover/recover from a real failure.


The lower you set it the faster you'll discover/recover from a real 
failure, but you also have the potential to see more "false positives".


Setting this really depends on your use case(s) -- your application 
requirements. Typically I see btw 5 and 30 seconds being used.


Patrick

On 08/17/2010 08:51 AM, Jun Rao wrote:

Thanks. Also, suppose that I know the average network latency, what's
the rule of thumb to set the value of session timeout?

Jun

On Mon, Aug 16, 2010 at 1:55 PM, Patrick Hunt mailto:ph...@apache.org>> wrote:

The session timeout is used for this:

http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions

Patrick


On 08/16/2010 01:47 PM, Jun Rao wrote:

Hi,

What config parameters in ZK determine how soon a failed client
is detected?
Thanks,

Jun




Re: Weird ephemeral node issue

2010-08-17 Thread Ted Dunning
Uncharacteristically, I think that Ben's comments could use a little bit of
amplification.

First, ZK is designed with certain guarantees in mind and almost all
operational characteristics flow logically from these guarantees.

The guarantee that Ben mentioned here in passing is that if a client gets
session expiration, it is *guaranteed* that the ephemerals have been cleaned
up.  This guarantee is what drives the notification of session expiration
after reconnection since while the client is disconnected, it cannot know if
the cluster is operating correctly or not and thus cannot know if the
ephemerals have been cleaned up yet.  The only way to have certain knowledge
that the cluster has cleaned up the ephemerals is to get back in touch with
an operating cluster.

The client is not completely in the dark.  As Ben implied, it can know that
the cluster is unavailable (it got a ConnectionLoss event, after all).
 While the cluster is unavailable and before it gets a session expiration
notification, the client can go into safe mode.

The moral of this story is that to get the most out of ZK, it is best to
adopt the same guarantee based design process that drove ZK in the first
place.  The first step is that you have to decide what guarantees that you
want to provide and then work from ZK's guarantees to get to yours.

In the classic leader-election use of ZK, the key guarantee that we want is:

- the number of leaders is less than or equal to 1

Note that you can't guarantee that the number == 1, because other stuff
could happen.  This has nothing to do with ZK.

The pertinent ZK guarantees are:

- an ephemeral file can only be created by a single session

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner

Phrased in terms of CSP-like constructs, the client has events BecomeMaster,
EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
according to this grammar:

client := (
   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
EnterSafeMode?; RelinquishMaster)
 | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
 | Crash
 )*

To get the guarantees that we want, we can require the client to only do
BecomeMaster after it creates an ephemeral file and require it to either
Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
deleted.  The only way that we can do that is to immediately do
EnterSafeMode on connection loss and then do RelinquishMaster on session
expiration or ExitSafeMode on connection restored.  It is involved, but you
can actually do a proof of correctness from this that shows that your
guarantee will be honored even in the presence of ZK or the client crashing
or being partitioned.



On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed  wrote:

> there are two things to keep in mind when thinking about this issue:
>
> 1) if a zk client is disconnected from the cluster, the client is
> essentially in limbo. because the client cannot talk to a server it cannot
> know if its session is still alive. it also cannot close its session.
>
> 2) the client only finds out about session expiration events when the
> client reconnects to the cluster. if zk tells a client that its session is
> expired, the ephemerals that correspond to that session will already be
> cleaned up.
>
> one of the main design points about zk is that zk only gives correct
> information. if zk cannot give correct information, it basically says "i
> don't know". connection loss exceptions and disconnected states are
> basically "i don't know".
>
> generally applications we design go into a "safe" mode, meaning they may
> serve reads but reject changes, when disconnected from zk and only kill
> themselves when they find out their session has expired.
>
> ben
>
> ps - session information is replicated to all zk servers, so if a leader
> dies, all replicas know the sessions that are currently active and their
> timeouts.
>
> On 08/16/2010 09:03 PM, Ted Dunning wrote:
>
>> Ben or somebody else will have to repeat some of the detailed logic for
>> this, but it has
>> to do with the fact that you can't be sure what has happened during the
>> network partition.
>> One possibility is the one you describe, but another is that the partition
>> happened because
>> a majority of the ZK cluster lost power and you can't see the remaining
>> nodes.  Those nodes
>> will continue to serve any files in a read-only fashion.  If the partition
>> involves you losing
>> contact with the entire cluster at the same time a partition of the
>> cluster
>> into a quorum and
>> a minority happens, then your ephemeral files could continue to exist at
>> least until the breach
>> in the cluster itself is healed.
>>
>> Suffice it to say that there are only a few strategies that leave you with
>> a
>> coherent picture
>> of the universe.  Importantly, you should

Re: Weird ephemeral node issue

2010-08-17 Thread Benjamin Reed

there are two things to keep in mind when thinking about this issue:

1) if a zk client is disconnected from the cluster, the client is 
essentially in limbo. because the client cannot talk to a server it 
cannot know if its session is still alive. it also cannot close its session.


2) the client only finds out about session expiration events when the 
client reconnects to the cluster. if zk tells a client that its session 
is expired, the ephemerals that correspond to that session will already 
be cleaned up.


one of the main design points about zk is that zk only gives correct 
information. if zk cannot give correct information, it basically says "i 
don't know". connection loss exceptions and disconnected states are 
basically "i don't know".


generally applications we design go into a "safe" mode, meaning they may 
serve reads but reject changes, when disconnected from zk and only kill 
themselves when they find out their session has expired.


ben

ps - session information is replicated to all zk servers, so if a leader 
dies, all replicas know the sessions that are currently active and their 
timeouts.


On 08/16/2010 09:03 PM, Ted Dunning wrote:

Ben or somebody else will have to repeat some of the detailed logic for
this, but it has
to do with the fact that you can't be sure what has happened during the
network partition.
One possibility is the one you describe, but another is that the partition
happened because
a majority of the ZK cluster lost power and you can't see the remaining
nodes.  Those nodes
will continue to serve any files in a read-only fashion.  If the partition
involves you losing
contact with the entire cluster at the same time a partition of the cluster
into a quorum and
a minority happens, then your ephemeral files could continue to exist at
least until the breach
in the cluster itself is healed.

Suffice it to say that there are only a few strategies that leave you with a
coherent picture
of the universe.  Importantly, you shouldn't assume that the ephemerals will
disappear at
the same time as the session expiration event is delivered.

On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan  wrote:

   

Ouch, is this the current ZK behavior? This is unexpected, if the
client get partitioned from ZK cluster, he should
get notified and take some action(e.g. commit suicide) otherwise how
to tell a ephemeral node is really
up or down? Zombie can create synchronization nightmares..



On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright  wrote:
 

Another possible cause for this that I ran into recently with the c
   

client -
 

you don't get the session expired notification until you are reconnected
   

to
 

the quorum and it informs you the session is lost.  If you get
   

disconnected
 

and can't reconnect you won't get the notification.  Personally I think
   

the
 

client api should track the session expiration time locally and
   

information
 

you once it's expired.

On Aug 16, 2010 2:09 AM, "Qing Yan"  wrote:

Hi Ted,

  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
Hum...so you have met this problem before?
I didn't see any OOM though, will look into it more.


On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning
   

wrote:
 

I am assuming that y...
 
   
 




Re: Weird ephemeral node issue

2010-08-17 Thread Vishal K
Hi Qing,

Can you list the znodes from the monitor and from the node that the monitor
is restarting (run zkCli.sh on both machines).
I am curious to see if the node that did not receive the SESSION_EXPIRED
event still has the znode in its database.
Also can you describe your setiup? Can you send out logs and zoo.cfg file.
Thanks.

-Vishal
On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan  wrote:

> Forget to mention:  the process looks fine,  nomal memory foot print and
> cpu
> usage, generate expected results, only thing is missing
> the ephermenal node in ZK.
>


Re: Weird ephemeral node issue

2010-08-17 Thread Qing Yan
Forget to mention:  the process looks fine,  nomal memory foot print and cpu
usage, generate expected results, only thing is missing
the ephermenal node in ZK.