Re: Zookeeper outage recap & questions

2010-06-30 Thread Flavio Junqueira
Hi Travis, Do you think it would be possible for you to open a jira and upload your logs?Thanks,-FlavioOn Jul 1, 2010, at 8:13 AM, Travis Crawford wrote:Hey zookeepers -We just experienced a total zookeeper outage, and here's a quickpost-mortem of the issue, and some questions about preventing it goingforward. Quick overview of the setup:- RHEL5 2.6.18 kernel- Zookeeper 3.3.0- ulimit raised to 65k files- 3 cluster members- 4-5k connections in steady-state- Primarily C and python clients, plus some javaIn chronological order, the issue manifested itself as alert about RWtests failing. Logs were full of too many files errors, and the outputof netstat showed lots of CLOSE_WAIT and SYN_RECV sockets. CPU was100%. Application logs showed lots of connection timeouts. Thissuggests an event happened that caused applications to dogpile onZookeeper, and eventually the CLOSE_WAIT timeout caused file handlesto run out and basically game over.I looked through lots of logs (clients+servers) and did not see aclear indication of what happened. Graphs show a sudden decrease innetwork traffic when the outage began, zookeeper goes cpu bound, andruns our of file descriptors.Clients are primarily a couple thousand C clients using defaultconnection parameters, and a couple thousand python clients usingdefault connection parameters.Digging through Jira we see two issues that probably contributed to this outage:    https://issues.apache.org/jira/browse/ZOOKEEPER-662    https://issues.apache.org/jira/browse/ZOOKEEPER-517Both are tagged for the 3.4.0 release. Anyone know if that's still thecase, and when 3.4.0 is roughly scheduled to ship?Thanks!Travis flaviojunqueira research scientist f...@yahoo-inc.comdirect +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301 

Zookeeper outage recap & questions

2010-06-30 Thread Travis Crawford
Hey zookeepers -

We just experienced a total zookeeper outage, and here's a quick
post-mortem of the issue, and some questions about preventing it going
forward. Quick overview of the setup:

- RHEL5 2.6.18 kernel
- Zookeeper 3.3.0
- ulimit raised to 65k files
- 3 cluster members
- 4-5k connections in steady-state
- Primarily C and python clients, plus some java

In chronological order, the issue manifested itself as alert about RW
tests failing. Logs were full of too many files errors, and the output
of netstat showed lots of CLOSE_WAIT and SYN_RECV sockets. CPU was
100%. Application logs showed lots of connection timeouts. This
suggests an event happened that caused applications to dogpile on
Zookeeper, and eventually the CLOSE_WAIT timeout caused file handles
to run out and basically game over.

I looked through lots of logs (clients+servers) and did not see a
clear indication of what happened. Graphs show a sudden decrease in
network traffic when the outage began, zookeeper goes cpu bound, and
runs our of file descriptors.

Clients are primarily a couple thousand C clients using default
connection parameters, and a couple thousand python clients using
default connection parameters.

Digging through Jira we see two issues that probably contributed to this outage:

https://issues.apache.org/jira/browse/ZOOKEEPER-662
https://issues.apache.org/jira/browse/ZOOKEEPER-517

Both are tagged for the 3.4.0 release. Anyone know if that's still the
case, and when 3.4.0 is roughly scheduled to ship?

Thanks!
Travis


Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Ted Dunning
I think that you are correct, but a real ZK person should answer this.

On Wed, Jun 30, 2010 at 4:48 PM, Bryan Thompson  wrote:

> For example, if a client registers a watch, and a state change which would
> trigger that watch occurs _after_ the client has successfuly registered the
> watch with the zookeeper quorum, is it possible that the client would not
> observe the watch trigger due to communication failure, etc., even while the
> clients session remains valid?  It sounds like the answer is "no" per the
> timeliness guarantee.  Is that correct?
>
>


Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Ted Dunning
Yes.  That is true.  In particular, your link to a server (or the server
itself) can fail causing your client to switch to a different ZK server and
retry there.  This can and often does happen without you knowing.

On Wed, Jun 30, 2010 at 4:48 PM, Bryan Thompson  wrote:

> With regard to timeliness:   > The clients view of the system is
> guaranteed to be up-to-date within a certain time bound. (On the order of
> tens of seconds.) Either system changes will be seen by a client within this
> bound, or the client will detect a service outage.
>
> This seems to imply that there are retries for transient communication
> failures.  Is that true?
>


RE: Guaranteed message delivery until session timeout?

2010-06-30 Thread Bryan Thompson
Ted,

Yes, that is clear.  I was looking for this:

> On some failures (communication errors, timeouts, etc) the client will not 
> know if the update has applied or not. We take steps to minimize the 
> failures, but the only guarantee is only present with successful return codes.

With regard to timeliness:

> The clients view of the system is guaranteed to be up-to-date within a 
> certain time bound. (On the order of tens of seconds.) Either system changes 
> will be seen by a client within this bound, or the client will detect a 
> service outage.

This seems to imply that there are retries for transient communication 
failures.  Is that true?

For example, if a client registers a watch, and a state change which would 
trigger that watch occurs _after_ the client has successfuly registered the 
watch with the zookeeper quorum, is it possible that the client would not 
observe the watch trigger due to communication failure, etc., even while the 
clients session remains valid?  It sounds like the answer is "no" per the 
timeliness guarantee.  Is that correct?

Thanks,
Bryan


From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Wednesday, June 30, 2010 7:38 PM
To: Patrick Hunt
Cc: zookeeper-user@hadoop.apache.org; Bryan Thompson
Subject: Re: Guaranteed message delivery until session timeout?

Also this:

Once an update has been applied, it will persist from that time forward until a 
client overwrites the update. This guarantee has two corollaries:

If a client gets a successful return code, the update will have been applied. 
On some failures (communication errors, timeouts, etc) the client will not know 
if the update has applied or not. We take steps to minimize the failures, but 
the only guarantee is only present with successful return codes. (This is 
called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful 
update, will never be rolled back when recovering from server failures.

I think that the clear implications here are:

a) if you get a successful return code and no session expiration, your 
ephemeral file is there

b) if the ephemeral files is created, you might not get the successful return 
code (due to connection loss), but the ephemeral file might continue to exist 
(because connection loss != session loss)

c) if you get a failure return code, your ephemeral file was not created

On Wed, Jun 30, 2010 at 4:33 PM, Patrick Hunt 
mailto:ph...@apache.org>> wrote:
in particular see "timeliness" 
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkGuarantees



Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Ted Dunning
Also this:

Once an update has been applied, it will persist from that time forward
until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been
applied. On some failures (communication errors, timeouts, etc) the client
will not know if the update has applied or not. We take steps to minimize
the failures, but the only guarantee is only present with successful return
codes. (This is called the *monotonicity condition* in Paxos.)
Any updates that are seen by the client, through a read request or
successful update, will never be rolled back when recovering from server
failures.


I think that the clear implications here are:

a) if you get a successful return code and no session expiration, your
ephemeral file is there

b) if the ephemeral files is created, you might not get the successful
return code (due to connection loss), but the ephemeral file might continue
to exist (because connection loss != session loss)

c) if you get a failure return code, your ephemeral file was not created

On Wed, Jun 30, 2010 at 4:33 PM, Patrick Hunt  wrote:

> in particular see "timeliness"
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkGuarantees
>


Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Patrick Hunt

On 06/30/2010 09:37 AM, Ted Dunning wrote:

Which API are you talking about?  C?

I think that the difference between connection loss and session expiration
might mess you up slightly in your disjunction here.

On Wed, Jun 30, 2010 at 7:45 AM, Bryan Thompson  wrote:


I am wondering what guarantees (if any) zookeeper provides for reliable
messaging for operation return codes up to a session timeout.  Basically, I


in particular see "timeliness" 
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkGuarantees



would like to know whether a zookeeper client can rely on observing the
return code for a successful operation which creates an ephemeral (or
ephemeral sequential) znode -or- have a guarantee that its session was timed
out and the ephemeral znode destroyed.  That is, does zookeeper provide


Any ephemeral node(s) associated with a session will be deleted when the 
session is invalidated (session expiration or client close request).


Patrick


RE: Guaranteed message delivery until session timeout?

2010-06-30 Thread Bryan Thompson
Ted,

You are correct.  This is a resend.  The apache mail server had some hiccups 
and did not subscribe me until just a few minutes ago and the archives appear 
to be out of date so I had no means to verify the delivery of my message, which 
I had sent this morning but before having positive confirmation that I was 
subscribed.

My apologies for the resend. Any answers to the original message would be 
appreciated.

Thanks,
Bryan 

> -Original Message-
> From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
> Sent: Wednesday, June 30, 2010 6:41 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Guaranteed message delivery until session timeout?
> 
> Isn't this the same question that you sent this morning?
> 
> On Wed, Jun 30, 2010 at 3:36 PM, Bryan Thompson 
>  wrote:
> 
> > Hello,
> >
> > I am wondering what guarantees (if any) zookeeper provides for 
> > reliable messaging for operation return codes up to a 
> session timeout.  
> > Basically, I would like to know whether a zookeeper client 
> can rely on 
> > observing the return code for a successful operation which 
> creates an 
> > ephemeral (or ephemeral sequential) znode -or- have a 
> guarantee that 
> > its session was timed out and the ephemeral znode 
> destroyed.  That is, 
> > does zookeeper provide guaranteed delivery of the operation return 
> > code unless the session is invalidated by a timeout?
> >
> > Thanks,
> > Bryan
> >
> 

Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Ted Dunning
Isn't this the same question that you sent this morning?

On Wed, Jun 30, 2010 at 3:36 PM, Bryan Thompson  wrote:

> Hello,
>
> I am wondering what guarantees (if any) zookeeper provides for reliable
> messaging for operation return codes up to a session timeout.  Basically, I
> would like to know whether a zookeeper client can rely on observing the
> return code for a successful operation which creates an ephemeral (or
> ephemeral sequential) znode -or- have a guarantee that its session was timed
> out and the ephemeral znode destroyed.  That is, does zookeeper provide
> guaranteed delivery of the operation return code unless the session is
> invalidated by a timeout?
>
> Thanks,
> Bryan
>


Guaranteed message delivery until session timeout?

2010-06-30 Thread Bryan Thompson
Hello,

I am wondering what guarantees (if any) zookeeper provides for reliable 
messaging for operation return codes up to a session timeout.  Basically, I 
would like to know whether a zookeeper client can rely on observing the return 
code for a successful operation which creates an ephemeral (or ephemeral 
sequential) znode -or- have a guarantee that its session was timed out and the 
ephemeral znode destroyed.  That is, does zookeeper provide guaranteed delivery 
of the operation return code unless the session is invalidated by a timeout?

Thanks,
Bryan


Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Ted Dunning
Which API are you talking about?  C?

I think that the difference between connection loss and session expiration
might mess you up slightly in your disjunction here.

On Wed, Jun 30, 2010 at 7:45 AM, Bryan Thompson  wrote:

> Hello,
>
> I am wondering what guarantees (if any) zookeeper provides for reliable
> messaging for operation return codes up to a session timeout.  Basically, I
> would like to know whether a zookeeper client can rely on observing the
> return code for a successful operation which creates an ephemeral (or
> ephemeral sequential) znode -or- have a guarantee that its session was timed
> out and the ephemeral znode destroyed.  That is, does zookeeper provide
> guaranteed delivery of the operation return code unless the session is
> invalidated by a timeout?
>
> Thanks,
> Bryan
>


Guaranteed message delivery until session timeout?

2010-06-30 Thread Bryan Thompson
Hello,

I am wondering what guarantees (if any) zookeeper provides for reliable 
messaging for operation return codes up to a session timeout.  Basically, I 
would like to know whether a zookeeper client can rely on observing the return 
code for a successful operation which creates an ephemeral (or ephemeral 
sequential) znode -or- have a guarantee that its session was timed out and the 
ephemeral znode destroyed.  That is, does zookeeper provide guaranteed delivery 
of the operation return code unless the session is invalidated by a timeout?

Thanks,
Bryan