in some sense the children will "own" the parent. the nice thing about it is that it isn't tied to any particular session, so we don't have to worry about weird cases like owners going away or switching ownership.


On 03/23/2010 03:04 AM, Dominic Williams wrote:
Hi, would work nicely.

Who would own the parent node after the session that created the initial
pair exits (assume additional children exist)?

On 23 March 2010 02:42, Benjamin Reed<>  wrote:

let me put out an idea that we have kicked around for a while: ephemeral
containers. the idea is that the znode disappears if it doesn't have
children. you would create the znode with create("/path", data, acl,
EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and
/path/child. (we have to create it with a child otherwise it immediately

i think this mechanism would address your need in a way that is easy to
implement and use. it would also allow you to do a cool barrier


On 03/22/2010 10:37 AM, Patrick Hunt wrote:

Dominic Williams wrote:

What I'd suggest might work:
- when the session that created the parent ends, ownership of the parent
could either be transferred to the owner/session that created the oldest
child, or instead ownership could be transferred to some kind of nominal
system session (which would delete the parent once the last ephemeral

There may be some issues with idempotency here, also it could require
extensive locking which drives up operation latencies (essentially
"recursive delete"). It sounds possible, but someone would have to take
a closer look as to the technical challenges involved.

Our general philosophy is to keep things as simple as possible wrt api,
semantics, implementation, etc... Distributed communication is hard and
while we handle a lot of the issues for you it's still complex.
Following our philosophy generally makes the easy things simple and the
hard things possible, additionally it reduces the number of bugs that we
have in the implementation itself (both user and service code).

I don't wish to discourage you as much as provide insight/background
into some of our decisions.



On 22 March 2010 16:44, Patrick Hunt<>   wrote:

Dominic Williams wrote:

1/ If a node crashes or something else goes wrong, you leave behind
persistent nodes. Over time these will grow and grow, rather like the
tmp folders used to fill with files under Windows

That's true. One either needs to use ephemerals or use persistent and
a "garbage collector" (implicit or explicit gc). In most cases it's
preferable to use the ephemeral.

  2/ Persistent nodes = nasty scalability *bottleneck* because you're

having to write to disk somewhere.

This is not actually how ZK works. All znodes regardless of
persistent/ephemeral are written to disk persistently. Ephemeral nodes
tied to the session that created them. As long as the session is alive
ephemeral node is alive. Sessions themselves are persistently/reliably
stored by the ZK cluster. This allows the shutdown of the entire cluster
restart it, all sessions/ephemerals will be maintained. Sessions can
from server to server (if say network connectivity to server A fails, or
server A itself fails then the client will move to server B). The
and all ephemerals are maintained (well, as long as the client moves
the expiration timeout value).

  To avoid this I'm actually thinking of writing locking system where you

out the existing chain not by enumerating sequential children, but by
looking at the contents of each temporary lock node to see what it is
waiting on. But... that's quite horrible. Was wondering whether there
some technical reason why you ephemeral nodes can't have children??

There are a few cases to think about.

1) obviously ephemeral nodes can't have persistent children, this just
doesn't make sense

2) ephemeral nodes have an owner - the session that created them. so it
would also not make sense (in my mind at least) to have an ephemeral
with another ephemeral /foo/bar with a different owner.

3) so you are left with "ephemerals can be a child of an ephemeral with
same owner".

4) there are also issues of order. in particular what is the "deletion
order" depth first or breadth first, etc...

I believe the answer so far has been "we don't do this because it's
complicated and we haven't seen any use cases that require it." In the
I've seen so far there was either a misunderstanding of how zk worked,
or a
simpler way available.

Does that make sense? Thoughts?


Reply via email to