RE: Locks based on ephemeral nodes - Handling network outage correctly

2011-10-14 Thread Fournier, Camille F.
Pretty much all of the Java client wrappers out there in the wild have some sort of a retry loop around operations, to make some of this easier to deal with. But they don't to my knowledge deal with the situation of knowing whether an operation succeeded in the case of a disconnect (it is

RE: Locks based on ephemeral nodes - Handling network outage correctly

2011-10-14 Thread Fournier, Camille F.
, this is a case that a disconnection can be handled generically. -JZ On 10/14/11 7:20 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: Pretty much all of the Java client wrappers out there in the wild have some sort of a retry loop around operations, to make some of this easier to deal

RE: ZooKeeper performance

2011-10-03 Thread Fournier, Camille F.
It's pretty easy to set up a zk-smoketest to simulate what you are doing. We can't answer this question without knowing how big the data you're writing, etc etc. I would recommend testing it out yourself on realistic data sizes.

RE: ephemeral node not removed after the client session is long gone

2011-09-27 Thread Fournier, Camille F.
So, the node was created by 0x13220b93e610550 at 12:17:56, then that session closed at 12:17:57, the node did not delete, and a bunch of other sessions later tried to create the node. These sessions got nodeexists failures I presume? Forgive the block of text I'm going to write instead of

RE: zookeeper cluster spanning datacenters

2011-09-22 Thread Fournier, Camille F.
We spread our ZKs across 3 data centers and in fact, these data centers are split across global regions (2 or 4 in one region, one in a remote region). To keep throughput up (and note that the throughput you have to worry about is only write throughput), we always ensure that the master is in

RE: zookeeper cluster spanning datacenters

2011-09-22 Thread Fournier, Camille F.
We have a monitor process that runs 'stat' against the remote ZK and if it returns leader, kills the process. -Original Message- From: Damu R [mailto:damu.devn...@gmail.com] Sent: Thursday, September 22, 2011 12:47 PM To: user@zookeeper.apache.org Subject: Re: zookeeper cluster

RE: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in case of unstable network

2011-09-21 Thread Fournier, Camille F.
This is expected. In cases where the network becomes unstable, it is the responsibility of the client writer to handle disconnected events appropriately and check to verify whether nodes they tried to write around the time of these events did or did not succeed. It makes writing a Generic

RE: Lock recipes and the lock path

2011-09-20 Thread Fournier, Camille F.
Well, if you are locking member IDS that are generated strings that will never be reused, the best you can do right now is clean those up after a period of time. If, on the other hand, your member IDs are meaningful, it's likely you will want to reuse these locations in the future when you

RE: zk keeps disconnecting and reconnecting

2011-08-29 Thread Fournier, Camille F.
Did anyone ever check resetting watches at client reconnect on a client with a chroot? Looking at the code, we store the watches associated with the non-chroot path, but they are set by the original request prepending chroot to the request. However, it looks like the SetWatches request on

RE: zk keeps disconnecting and reconnecting

2011-08-29 Thread Fournier, Camille F.
and reconnecting Camille, Do you think we should put the fix in 3.3.4? I think 3.4 might take a while to stabilize, so 3.3.4 would be a good release to get this in. Thoughts? mahadev On Aug 29, 2011, at 10:50 AM, Fournier, Camille F. wrote: Well, it causes the problem you are seeing. If you set any

RE: Zookeeper on two clusters?

2011-08-26 Thread Fournier, Camille F.
As long as the two clusters can ping each other, just set up a single ZK cluster spread across the two, 3 or 5 nodes (total, not per cluster). Note that if these two clusters are used for business continuity purposes (spread across 2 data centers), you still risk the outage of the zookeeper if

RE: Zookeeper on two clusters?

2011-08-26 Thread Fournier, Camille F.
, Fournier, Camille F. camille.fourn...@gs.com wrote: As long as the two clusters can ping each other, just set up a single ZK cluster spread across the two, 3 or 5 nodes (total, not per cluster). Note that if these two clusters are used for business continuity purposes (spread

RE: devops/admin/client question: What do you do when you rollback?

2011-08-05 Thread Fournier, Camille F.
letting you see time go backwards.  Your situation is different of course. On Thu, Aug 4, 2011 at 7:05 PM, Fournier, Camille F. camille.fourn...@gs.com wrote: Right now the server just detects that the zxid is wrong, and calls close on the client. The client logs: 15:01:47,593 - INFO

RE: devops/admin/client question: What do you do when you rollback?

2011-08-05 Thread Fournier, Camille F.
that thinks that it is in quorum then things are not good. The definition of thinks it is in quorum is problematic of course. On Fri, Aug 5, 2011 at 10:57 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: Oh blah, of course it won't be b/w compatible, because all the older clients would

RE: devops/admin/client question: What do you do when you rollback?

2011-08-05 Thread Fournier, Camille F.
to a dev discussion) C -Original Message- From: Fournier, Camille F. [Tech] Sent: Friday, August 05, 2011 11:57 AM To: 'user@zookeeper.apache.org' Subject: RE: devops/admin/client question: What do you do when you rollback? Hmmm. I thought I had another way around this but I don't. We really

devops/admin/client question: What do you do when you rollback?

2011-08-04 Thread Fournier, Camille F.
We had an issue here the other day where the ZK servers were running poorly, and in an effort to get them healthy again we ended up rolling back the cluster state. While this was, in retrospect, not the right solution to the problem we were facing, it brought up another problem. Namely, that

RE: devops/admin/client question: What do you do when you rollback?

2011-08-04 Thread Fournier, Camille F.
: Thursday, August 04, 2011 1:51 PM To: user@zookeeper.apache.org Subject: Re: devops/admin/client question: What do you do when you rollback? On Thu, Aug 4, 2011 at 10:29 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: We had an issue here the other day where the ZK servers were running

Re: help on Zookeeper code walk through?

2011-07-18 Thread Fournier, Camille F.
If the zk cluster doesn't get pings from your existing master, the zk client on that master should see a disconnected state event, not a node deletion event. Upon seeing that event, it should stop acting as master until such time as it can determine whether it has reconnected and is still

RE: libzookeeper_mt and GDB

2011-07-18 Thread Fournier, Camille F.
ZooKeeper can't possibly know that you are in GDB unless you have a special message that you send to the server that says I'm in a debugger now, please don't expire me. You might be able to hack something in to do this, but do you really want to? I think the second idea is best. If you are a

RE: help on Zookeeper code walk through?

2011-07-18 Thread Fournier, Camille F.
thread keeps going happily along . though this is a very slight possibility, theoretically it is still possible. or am I missing something? Thanks Yang On Mon, Jul 18, 2011 at 6:51 AM, Fournier, Camille F. camille.fourn...@gs.com wrote: If the zk cluster doesn't get pings from your existing

RE: help on Zookeeper code walk through?

2011-07-18 Thread Fournier, Camille F.
) should be guaranteed to be higher than the time needed for application thread to detect the Disconnect event from ZK client. the latter time value can be due to GC pause, thread scheduling delay etc. right? thanks Yang On Mon, Jul 18, 2011 at 9:30 AM, Fournier, Camille F. camille.fourn...@gs.com

RE: Debian packager orphaning ZooKeeper

2011-06-15 Thread Fournier, Camille F.
Again, you seem too easily offended when touching on the subject, which makes Thomas' points valid. No it doesn't. He is allowed to be offended and it has no bearing on the truth of the matter. This isn't a Shakespearian play. C -Original Message- From: Gustavo Niemeyer

RE: lost ZK events across datacenters

2011-06-06 Thread Fournier, Camille F. [Tech]
)? Thanks, Jun On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao jun...@gmail.com wrote: The log doesn't have any state changing entries around the time the watcher is triggered, in all clients. Jun On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] camille.fourn...@gs.com wrote: Any state

RE: Memory leak in zookeeper 3.3.2 and 3.3.3?

2011-06-01 Thread Fournier, Camille F. [Tech]
I would bet that Ted is right about this, but if you're still having problems and want to put the yourkit profile up somewhere I could take a look later today. C -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Wednesday, June 01, 2011 2:07 AM To:

Re: Importance of latency in a global deployment

2011-05-04 Thread Fournier, Camille F. [Tech]
Global clusters will affect writes greatly, and may also affect you client reads in an indirect manner. Writes, having to traverse from one region to another for purposes of voting, will be slowed down considerably by the ping time between regions. If you did a three node deployment in the

RE: znode metadata consistency

2011-04-12 Thread Fournier, Camille F. [Tech]
I have some ideas of where to look on this, happy to do it in the next day or two if no one else wants to look. Please do open a jira even though you can't reproduce it consistently yet, just so we have somewhere to track the efforts. C -Original Message- From: Jeremy Stribling

RE: Using ZK for real-time group membership notification

2011-03-20 Thread Fournier, Camille F. [Tech]
To do this right you probably need messaging queues. I'd research the various MQ solutions out there. They are built to handle exactly this sort of issue. You could try to implement it via a ZK distributed queue plus some sort of crazy transaction logic in each process (so that a document to

RE: Task/Job distribution using ZooKeeper

2011-03-07 Thread Fournier, Camille F. [Tech]
Have you checked out the distributed queue recipe? It is what I have used to implement a solution to a similar problem. http://hadoop.apache.org/zookeeper/docs/r3.3.2/recipes.html Are the jobs worker-specific, or can all workers handle all jobs? The distributed queue protocol is very nice and