Re: Weird ephemeral node issue

2010-08-30 Thread Patrick Hunt
Rather than the wiki would be great to get this into the docs. Would you
mind creating a JIRA?
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks,

Patrick

On Tue, Aug 17, 2010 at 8:29 PM, Qing Yan  wrote:

> Thanks for the explaination! I suggest this goes to the wiki..
>
> 
> the client only finds out about session expiration events when the client
> reconnects to the cluster. if zk tells a client that its session is
> expired,
> the ephemerals that correspond to that session will already be cleaned up.
>
> - deletion of an ephemeral file due to loss of client connection will occur
> after the client gets a connection loss
>
> - deletion of an ephemeral file will precede delivery of a session
> expiration event to the owner
> 
>
> So session expirations means two things here : server view(ephemeral clean
> up) & client view(event delivery) , there are
> no guarantee how long it will take in between, correct?
>
> I guess the confusion rises from the documention which doesn't distinguish
> these two concepts, e.g. in the javadoc
> http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html
>
> An ephemeral node will be removed by the ZooKeeper automatically when the
> session associated with the creation of the node expires.
>
> It is actually refering to the server view not the client view.
>
>
>
> On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning 
> wrote:
>
> > Uncharacteristically, I think that Ben's comments could use a little bit
> of
> > amplification.
> >
> > First, ZK is designed with certain guarantees in mind and almost all
> > operational characteristics flow logically from these guarantees.
> >
> > The guarantee that Ben mentioned here in passing is that if a client gets
> > session expiration, it is *guaranteed* that the ephemerals have been
> > cleaned
> > up.  This guarantee is what drives the notification of session expiration
> > after reconnection since while the client is disconnected, it cannot know
> > if
> > the cluster is operating correctly or not and thus cannot know if the
> > ephemerals have been cleaned up yet.  The only way to have certain
> > knowledge
> > that the cluster has cleaned up the ephemerals is to get back in touch
> with
> > an operating cluster.
> >
> > The client is not completely in the dark.  As Ben implied, it can know
> that
> > the cluster is unavailable (it got a ConnectionLoss event, after all).
> >  While the cluster is unavailable and before it gets a session expiration
> > notification, the client can go into safe mode.
> >
> > The moral of this story is that to get the most out of ZK, it is best to
> > adopt the same guarantee based design process that drove ZK in the first
> > place.  The first step is that you have to decide what guarantees that
> you
> > want to provide and then work from ZK's guarantees to get to yours.
> >
> > In the classic leader-election use of ZK, the key guarantee that we want
> > is:
> >
> > - the number of leaders is less than or equal to 1
> >
> > Note that you can't guarantee that the number == 1, because other stuff
> > could happen.  This has nothing to do with ZK.
> >
> > The pertinent ZK guarantees are:
> >
> > - an ephemeral file can only be created by a single session
> >
> > - deletion of an ephemeral file due to loss of client connection will
> occur
> > after the client gets a connection loss
> >
> > - deletion of an ephemeral file will precede delivery of a session
> > expiration event to the owner
> >
> > Phrased in terms of CSP-like constructs, the client has events
> > BecomeMaster,
> > EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
> > according to this grammar:
> >
> > client := (
> >   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
> > EnterSafeMode?; RelinquishMaster)
> >  | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
> >  | Crash
> >  )*
> >
> > To get the guarantees that we want, we can require the client to only do
> > BecomeMaster after it creates an ephemeral file and require it to either
> > Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
> > deleted.  The only way that we can do that is to immediately do
> > EnterSafeMode on connection loss and then do RelinquishMaster on session
> > expiration or ExitSafeMode on connection restored.  It is involved, but
> you
> > can actually do a proof of correctness from this that shows that your
> > guarantee will be honored even in the presence of ZK or the client
> crashing
> > or being partitioned.
> >
> >
> >
> > On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed 
> > wrote:
> >
> > > there are two things to keep in mind when thinking about this issue:
> > >
> > > 1) if a zk client is disconnected from the cluster, the client is
> > > essentially in limbo. because the client cannot talk to a server it
> > cannot
> > > know if its session is still alive. it also cannot close its session.
> > >
> > > 2) the client only finds out about ses

Re: Weird ephemeral node issue

2010-08-17 Thread Qing Yan
Thanks for the explaination! I suggest this goes to the wiki..


the client only finds out about session expiration events when the client
reconnects to the cluster. if zk tells a client that its session is expired,
the ephemerals that correspond to that session will already be cleaned up.

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner


So session expirations means two things here : server view(ephemeral clean
up) & client view(event delivery) , there are
no guarantee how long it will take in between, correct?

I guess the confusion rises from the documention which doesn't distinguish
these two concepts, e.g. in the javadoc
http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html

An ephemeral node will be removed by the ZooKeeper automatically when the
session associated with the creation of the node expires.

It is actually refering to the server view not the client view.



On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning  wrote:

> Uncharacteristically, I think that Ben's comments could use a little bit of
> amplification.
>
> First, ZK is designed with certain guarantees in mind and almost all
> operational characteristics flow logically from these guarantees.
>
> The guarantee that Ben mentioned here in passing is that if a client gets
> session expiration, it is *guaranteed* that the ephemerals have been
> cleaned
> up.  This guarantee is what drives the notification of session expiration
> after reconnection since while the client is disconnected, it cannot know
> if
> the cluster is operating correctly or not and thus cannot know if the
> ephemerals have been cleaned up yet.  The only way to have certain
> knowledge
> that the cluster has cleaned up the ephemerals is to get back in touch with
> an operating cluster.
>
> The client is not completely in the dark.  As Ben implied, it can know that
> the cluster is unavailable (it got a ConnectionLoss event, after all).
>  While the cluster is unavailable and before it gets a session expiration
> notification, the client can go into safe mode.
>
> The moral of this story is that to get the most out of ZK, it is best to
> adopt the same guarantee based design process that drove ZK in the first
> place.  The first step is that you have to decide what guarantees that you
> want to provide and then work from ZK's guarantees to get to yours.
>
> In the classic leader-election use of ZK, the key guarantee that we want
> is:
>
> - the number of leaders is less than or equal to 1
>
> Note that you can't guarantee that the number == 1, because other stuff
> could happen.  This has nothing to do with ZK.
>
> The pertinent ZK guarantees are:
>
> - an ephemeral file can only be created by a single session
>
> - deletion of an ephemeral file due to loss of client connection will occur
> after the client gets a connection loss
>
> - deletion of an ephemeral file will precede delivery of a session
> expiration event to the owner
>
> Phrased in terms of CSP-like constructs, the client has events
> BecomeMaster,
> EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
> according to this grammar:
>
> client := (
>   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
> EnterSafeMode?; RelinquishMaster)
>  | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
>  | Crash
>  )*
>
> To get the guarantees that we want, we can require the client to only do
> BecomeMaster after it creates an ephemeral file and require it to either
> Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
> deleted.  The only way that we can do that is to immediately do
> EnterSafeMode on connection loss and then do RelinquishMaster on session
> expiration or ExitSafeMode on connection restored.  It is involved, but you
> can actually do a proof of correctness from this that shows that your
> guarantee will be honored even in the presence of ZK or the client crashing
> or being partitioned.
>
>
>
> On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed 
> wrote:
>
> > there are two things to keep in mind when thinking about this issue:
> >
> > 1) if a zk client is disconnected from the cluster, the client is
> > essentially in limbo. because the client cannot talk to a server it
> cannot
> > know if its session is still alive. it also cannot close its session.
> >
> > 2) the client only finds out about session expiration events when the
> > client reconnects to the cluster. if zk tells a client that its session
> is
> > expired, the ephemerals that correspond to that session will already be
> > cleaned up.
> >
> > one of the main design points about zk is that zk only gives correct
> > information. if zk cannot give correct information, it basically says "i
> > don't know". connection loss exceptions and disconnected states are
> > basically "i don't know".
> >
> > generally applications w

Re: Weird ephemeral node issue

2010-08-17 Thread Qing Yan
Hi Vishal,

  It is in the prod env, the process has been restarted already:-(, I
checked the zookeeper log file(loglevel=ERROR), it is empty.

Here is the ZK config:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/home/admin/TimeTunnel2/zookeeper/zoo
# the port at which the clients will connect
clientPort=32181

server.1=tt2config019072.cm3:32888:33888
server.2=tt2config023132.cm3:32888:33888
server.3=tt2config024079.cm3:32888:33888
server.4=tt2config017052.cm4:32888:33888
server.5=tt2config021101.cm4:32888:33888


Hardware/software Config:

Processors: 4 x Xeon E5410 2.33GHz
Memory: 4GB
Network:eth0: 00:16:3e:17:13:48 //Gigabyte network
OS: RHEL Server 5.4 (Tikanga), Linux 2.6.18-164.el5xen
x86_64, 64-bit
Zookeeper Release 3.2.2



On Tue, Aug 17, 2010 at 8:40 PM, Vishal K  wrote:

> Hi Qing,
>
> Can you list the znodes from the monitor and from the node that the monitor
> is restarting (run zkCli.sh on both machines).
> I am curious to see if the node that did not receive the SESSION_EXPIRED
> event still has the znode in its database.
> Also can you describe your setiup? Can you send out logs and zoo.cfg file.
> Thanks.
>
> -Vishal
> On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan  wrote:
>
> > Forget to mention:  the process looks fine,  nomal memory foot print and
> > cpu
> > usage, generate expected results, only thing is missing
> > the ephermenal node in ZK.
> >
>


Re: Weird ephemeral node issue

2010-08-17 Thread Ted Dunning
Uncharacteristically, I think that Ben's comments could use a little bit of
amplification.

First, ZK is designed with certain guarantees in mind and almost all
operational characteristics flow logically from these guarantees.

The guarantee that Ben mentioned here in passing is that if a client gets
session expiration, it is *guaranteed* that the ephemerals have been cleaned
up.  This guarantee is what drives the notification of session expiration
after reconnection since while the client is disconnected, it cannot know if
the cluster is operating correctly or not and thus cannot know if the
ephemerals have been cleaned up yet.  The only way to have certain knowledge
that the cluster has cleaned up the ephemerals is to get back in touch with
an operating cluster.

The client is not completely in the dark.  As Ben implied, it can know that
the cluster is unavailable (it got a ConnectionLoss event, after all).
 While the cluster is unavailable and before it gets a session expiration
notification, the client can go into safe mode.

The moral of this story is that to get the most out of ZK, it is best to
adopt the same guarantee based design process that drove ZK in the first
place.  The first step is that you have to decide what guarantees that you
want to provide and then work from ZK's guarantees to get to yours.

In the classic leader-election use of ZK, the key guarantee that we want is:

- the number of leaders is less than or equal to 1

Note that you can't guarantee that the number == 1, because other stuff
could happen.  This has nothing to do with ZK.

The pertinent ZK guarantees are:

- an ephemeral file can only be created by a single session

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner

Phrased in terms of CSP-like constructs, the client has events BecomeMaster,
EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
according to this grammar:

client := (
   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
EnterSafeMode?; RelinquishMaster)
 | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
 | Crash
 )*

To get the guarantees that we want, we can require the client to only do
BecomeMaster after it creates an ephemeral file and require it to either
Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
deleted.  The only way that we can do that is to immediately do
EnterSafeMode on connection loss and then do RelinquishMaster on session
expiration or ExitSafeMode on connection restored.  It is involved, but you
can actually do a proof of correctness from this that shows that your
guarantee will be honored even in the presence of ZK or the client crashing
or being partitioned.



On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed  wrote:

> there are two things to keep in mind when thinking about this issue:
>
> 1) if a zk client is disconnected from the cluster, the client is
> essentially in limbo. because the client cannot talk to a server it cannot
> know if its session is still alive. it also cannot close its session.
>
> 2) the client only finds out about session expiration events when the
> client reconnects to the cluster. if zk tells a client that its session is
> expired, the ephemerals that correspond to that session will already be
> cleaned up.
>
> one of the main design points about zk is that zk only gives correct
> information. if zk cannot give correct information, it basically says "i
> don't know". connection loss exceptions and disconnected states are
> basically "i don't know".
>
> generally applications we design go into a "safe" mode, meaning they may
> serve reads but reject changes, when disconnected from zk and only kill
> themselves when they find out their session has expired.
>
> ben
>
> ps - session information is replicated to all zk servers, so if a leader
> dies, all replicas know the sessions that are currently active and their
> timeouts.
>
> On 08/16/2010 09:03 PM, Ted Dunning wrote:
>
>> Ben or somebody else will have to repeat some of the detailed logic for
>> this, but it has
>> to do with the fact that you can't be sure what has happened during the
>> network partition.
>> One possibility is the one you describe, but another is that the partition
>> happened because
>> a majority of the ZK cluster lost power and you can't see the remaining
>> nodes.  Those nodes
>> will continue to serve any files in a read-only fashion.  If the partition
>> involves you losing
>> contact with the entire cluster at the same time a partition of the
>> cluster
>> into a quorum and
>> a minority happens, then your ephemeral files could continue to exist at
>> least until the breach
>> in the cluster itself is healed.
>>
>> Suffice it to say that there are only a few strategies that leave you with
>> a
>> coherent picture
>> of the universe.  Importantly, you should

Re: Weird ephemeral node issue

2010-08-17 Thread Benjamin Reed

there are two things to keep in mind when thinking about this issue:

1) if a zk client is disconnected from the cluster, the client is 
essentially in limbo. because the client cannot talk to a server it 
cannot know if its session is still alive. it also cannot close its session.


2) the client only finds out about session expiration events when the 
client reconnects to the cluster. if zk tells a client that its session 
is expired, the ephemerals that correspond to that session will already 
be cleaned up.


one of the main design points about zk is that zk only gives correct 
information. if zk cannot give correct information, it basically says "i 
don't know". connection loss exceptions and disconnected states are 
basically "i don't know".


generally applications we design go into a "safe" mode, meaning they may 
serve reads but reject changes, when disconnected from zk and only kill 
themselves when they find out their session has expired.


ben

ps - session information is replicated to all zk servers, so if a leader 
dies, all replicas know the sessions that are currently active and their 
timeouts.


On 08/16/2010 09:03 PM, Ted Dunning wrote:

Ben or somebody else will have to repeat some of the detailed logic for
this, but it has
to do with the fact that you can't be sure what has happened during the
network partition.
One possibility is the one you describe, but another is that the partition
happened because
a majority of the ZK cluster lost power and you can't see the remaining
nodes.  Those nodes
will continue to serve any files in a read-only fashion.  If the partition
involves you losing
contact with the entire cluster at the same time a partition of the cluster
into a quorum and
a minority happens, then your ephemeral files could continue to exist at
least until the breach
in the cluster itself is healed.

Suffice it to say that there are only a few strategies that leave you with a
coherent picture
of the universe.  Importantly, you shouldn't assume that the ephemerals will
disappear at
the same time as the session expiration event is delivered.

On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan  wrote:

   

Ouch, is this the current ZK behavior? This is unexpected, if the
client get partitioned from ZK cluster, he should
get notified and take some action(e.g. commit suicide) otherwise how
to tell a ephemeral node is really
up or down? Zombie can create synchronization nightmares..



On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright  wrote:
 

Another possible cause for this that I ran into recently with the c
   

client -
 

you don't get the session expired notification until you are reconnected
   

to
 

the quorum and it informs you the session is lost.  If you get
   

disconnected
 

and can't reconnect you won't get the notification.  Personally I think
   

the
 

client api should track the session expiration time locally and
   

information
 

you once it's expired.

On Aug 16, 2010 2:09 AM, "Qing Yan"  wrote:

Hi Ted,

  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
Hum...so you have met this problem before?
I didn't see any OOM though, will look into it more.


On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning
   

wrote:
 

I am assuming that y...
 
   
 




Re: Weird ephemeral node issue

2010-08-17 Thread Vishal K
Hi Qing,

Can you list the znodes from the monitor and from the node that the monitor
is restarting (run zkCli.sh on both machines).
I am curious to see if the node that did not receive the SESSION_EXPIRED
event still has the znode in its database.
Also can you describe your setiup? Can you send out logs and zoo.cfg file.
Thanks.

-Vishal
On Tue, Aug 17, 2010 at 3:31 AM, Qing Yan  wrote:

> Forget to mention:  the process looks fine,  nomal memory foot print and
> cpu
> usage, generate expected results, only thing is missing
> the ephermenal node in ZK.
>


Re: Weird ephemeral node issue

2010-08-17 Thread Qing Yan
Forget to mention:  the process looks fine,  nomal memory foot print and cpu
usage, generate expected results, only thing is missing
the ephermenal node in ZK.


Re: Weird ephemeral node issue

2010-08-16 Thread Qing Yan
I understand no strategy will work perfectly in all circumstances,
just need better documentation so developers can make correct
assumptions. Previously I assume delivery of session expiration event
& ephemeral dissapearance will occur together - not exact same time
but within certain definite time frame...

BTW, here is the thread dump of the zombie client:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x54aad800 nid=0x6813
waiting on condition [0x..0x]
   java.lang.Thread.State: RUNNABLE

"IPC Client (47) connection to hdpnn/10.249.54.101:9000 from taobao"
daemon prio=10 tid=0x2aaadc31c800 nid=0x67f8 in Object.wait()
[0x427fa000..0x427faa90]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2f9360e0> (a 
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:396)
- locked <0x2f9360e0> (a 
org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)

"ResponseProcessor for block blk_-7997360194615811589_639843163"
daemon prio=10 tid=0x54aae000 nid=0x67ec runnable
[0x429fc000..0x429fcd10]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x2f9b1de0> (a sun.nio.ch.Util$1)
- locked <0x2f9b1dc8> (a java.util.Collections$UnmodifiableSet)
- locked <0x2f9818a0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2318)

"DataStreamer for file
/group/tbads/TimeTunnel2/merge_pv/20100815/02/35/tt2yunti2.sds.cnz.alimama.com/43_040500a8-5aa3-4816-9ab5-31ffd70bf899.log.tmp
block blk_-7997360194615811589_639843163" daemon prio=10
tid=0x549cc400 nid=0x67c9 in Object.wait()
[0x423f6000..0x423f6c90]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2f6faf80> (a java.util.LinkedList)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2166)
- locked <0x2f6faf80> (a java.util.LinkedList)

"LeaseChecker" daemon prio=10 tid=0x54692800 nid=0x5882
waiting on condition [0x428fb000..0x428fbb90]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978)
at java.lang.Thread.run(Thread.java:619)

"DestroyJavaVM" prio=10 tid=0x2aaac022d000 nid=0x585f waiting on
condition [0x..0x415c9d00]
   java.lang.Thread.State: RUNNABLE

"Thread-5" prio=10 tid=0x2aaac022b800 nid=0x5880 waiting on
condition [0x426f9000..0x426f9a90]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2f6e8460> (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:37)
at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:28)
at 
org.apache.zookeeper.recipes.lock.ProtocolSupport.retryOperation(ProtocolSupport.java:120)
at 
com.taobao.timetunnel2.cluster.zookeeper.opera

Re: Weird ephemeral node issue

2010-08-16 Thread Ted Dunning
Ben or somebody else will have to repeat some of the detailed logic for
this, but it has
to do with the fact that you can't be sure what has happened during the
network partition.
One possibility is the one you describe, but another is that the partition
happened because
a majority of the ZK cluster lost power and you can't see the remaining
nodes.  Those nodes
will continue to serve any files in a read-only fashion.  If the partition
involves you losing
contact with the entire cluster at the same time a partition of the cluster
into a quorum and
a minority happens, then your ephemeral files could continue to exist at
least until the breach
in the cluster itself is healed.

Suffice it to say that there are only a few strategies that leave you with a
coherent picture
of the universe.  Importantly, you shouldn't assume that the ephemerals will
disappear at
the same time as the session expiration event is delivered.

On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan  wrote:

> Ouch, is this the current ZK behavior? This is unexpected, if the
> client get partitioned from ZK cluster, he should
> get notified and take some action(e.g. commit suicide) otherwise how
> to tell a ephemeral node is really
> up or down? Zombie can create synchronization nightmares..
>
>
>
> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright  wrote:
> > Another possible cause for this that I ran into recently with the c
> client -
> > you don't get the session expired notification until you are reconnected
> to
> > the quorum and it informs you the session is lost.  If you get
> disconnected
> > and can't reconnect you won't get the notification.  Personally I think
> the
> > client api should track the session expiration time locally and
> information
> > you once it's expired.
> >
> > On Aug 16, 2010 2:09 AM, "Qing Yan"  wrote:
> >
> > Hi Ted,
> >
> >  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
> > Hum...so you have met this problem before?
> > I didn't see any OOM though, will look into it more.
> >
> >
> > On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning 
> wrote:
> >> I am assuming that y...
> >
>


Re: Weird ephemeral node issue

2010-08-16 Thread Qing Yan
Ouch, is this the current ZK behavior? This is unexpected, if the
client get partitioned from ZK cluster, he should
get notified and take some action(e.g. commit suicide) otherwise how
to tell a ephemeral node is really
up or down? Zombie can create synchronization nightmares..



On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright  wrote:
> Another possible cause for this that I ran into recently with the c client -
> you don't get the session expired notification until you are reconnected to
> the quorum and it informs you the session is lost.  If you get disconnected
> and can't reconnect you won't get the notification.  Personally I think the
> client api should track the session expiration time locally and information
> you once it's expired.
>
> On Aug 16, 2010 2:09 AM, "Qing Yan"  wrote:
>
> Hi Ted,
>
>  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
> Hum...so you have met this problem before?
> I didn't see any OOM though, will look into it more.
>
>
> On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning  wrote:
>> I am assuming that y...
>


Re: Weird ephemeral node issue

2010-08-16 Thread Ted Dunning
No.  I meant that GC can cause your client to appear to be unresponsive
until the session expires.

Can you post some ZK server logs?  And some client GC logs?

On Sun, Aug 15, 2010 at 11:08 PM, Qing Yan  wrote:

> Hi Ted,
>
>  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
> Hum...so you have met this problem before?
> I didn't see any OOM though, will look into it more.
>
> On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning 
> wrote:
> > I am assuming that you are using ZK from java.
> >
> > Very likely you are having GC problems.
> >
> > Turn on verbose GC logging and see what is happening.  You may also want
> to
> > change the session timeout values.
> >
> > It is very common for the use of ZK to highlight problems that you didn't
> > know that you had.
> >
> > On Sun, Aug 15, 2010 at 8:51 PM, Qing Yan  wrote:
> >
> >> We started using ZK in production recently and run into some problems.
> >> The user case is simple we have a central
> >> monitor checks the ephermenal nodes created by distributed apps, if
> >> the node dissappear, corresponding app will get restarted. Each app
> >> will also handle SESSION_EXPIRE by shutting itself down...
> >>
> >> Whats happening now is sometimes the central monitor will try to
> >> restart the app, in the mean time the app runs fine and sees no sign
> >> of SESSION_EXPIRED. Any clue what's going on here?
> >>
> >> Thanks
> >>
> >
>


Re: Weird ephemeral node issue

2010-08-16 Thread Dave Wright
Another possible cause for this that I ran into recently with the c client -
you don't get the session expired notification until you are reconnected to
the quorum and it informs you the session is lost.  If you get disconnected
and can't reconnect you won't get the notification.  Personally I think the
client api should track the session expiration time locally and information
you once it's expired.

On Aug 16, 2010 2:09 AM, "Qing Yan"  wrote:

Hi Ted,

 Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
Hum...so you have met this problem before?
I didn't see any OOM though, will look into it more.


On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning  wrote:
> I am assuming that y...


Re: Weird ephemeral node issue

2010-08-15 Thread Qing Yan
Hi Ted,

  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
Hum...so you have met this problem before?
I didn't see any OOM though, will look into it more.

On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning  wrote:
> I am assuming that you are using ZK from java.
>
> Very likely you are having GC problems.
>
> Turn on verbose GC logging and see what is happening.  You may also want to
> change the session timeout values.
>
> It is very common for the use of ZK to highlight problems that you didn't
> know that you had.
>
> On Sun, Aug 15, 2010 at 8:51 PM, Qing Yan  wrote:
>
>> We started using ZK in production recently and run into some problems.
>> The user case is simple we have a central
>> monitor checks the ephermenal nodes created by distributed apps, if
>> the node dissappear, corresponding app will get restarted. Each app
>> will also handle SESSION_EXPIRE by shutting itself down...
>>
>> Whats happening now is sometimes the central monitor will try to
>> restart the app, in the mean time the app runs fine and sees no sign
>> of SESSION_EXPIRED. Any clue what's going on here?
>>
>> Thanks
>>
>


Re: Weird ephemeral node issue

2010-08-15 Thread Ted Dunning
I am assuming that you are using ZK from java.

Very likely you are having GC problems.

Turn on verbose GC logging and see what is happening.  You may also want to
change the session timeout values.

It is very common for the use of ZK to highlight problems that you didn't
know that you had.

On Sun, Aug 15, 2010 at 8:51 PM, Qing Yan  wrote:

> We started using ZK in production recently and run into some problems.
> The user case is simple we have a central
> monitor checks the ephermenal nodes created by distributed apps, if
> the node dissappear, corresponding app will get restarted. Each app
> will also handle SESSION_EXPIRE by shutting itself down...
>
> Whats happening now is sometimes the central monitor will try to
> restart the app, in the mean time the app runs fine and sees no sign
> of SESSION_EXPIRED. Any clue what's going on here?
>
> Thanks
>


Weird ephemeral node issue

2010-08-15 Thread Qing Yan
We started using ZK in production recently and run into some problems.
The user case is simple we have a central
monitor checks the ephermenal nodes created by distributed apps, if
the node dissappear, corresponding app will get restarted. Each app
will also handle SESSION_EXPIRE by shutting itself down...

Whats happening now is sometimes the central monitor will try to
restart the app, in the mean time the app runs fine and sees no sign
of SESSION_EXPIRED. Any clue what's going on here?

Thanks