let me put out an idea that we have kicked around for a while: ephemeral containers. the idea is that the znode disappears if it doesn't have children. you would create the znode with create("/path", data, acl, EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and /path/child. (we have to create it with a child otherwise it immediately disappears.)

i think this mechanism would address your need in a way that is easy to implement and use. it would also allow you to do a cool barrier implementation!


On 03/22/2010 10:37 AM, Patrick Hunt wrote:
Dominic Williams wrote:
What I'd suggest might work:
- when the session that created the parent ends, ownership of the parent
could either be transferred to the owner/session that created the oldest
child, or instead ownership could be transferred to some kind of nominal
system session (which would delete the parent once the last ephemeral child
There may be some issues with idempotency here, also it could require
extensive locking which drives up operation latencies (essentially
"recursive delete"). It sounds possible, but someone would have to take
a closer look as to the technical challenges involved.

Our general philosophy is to keep things as simple as possible wrt api,
semantics, implementation, etc... Distributed communication is hard and
while we handle a lot of the issues for you it's still complex.
Following our philosophy generally makes the easy things simple and the
hard things possible, additionally it reduces the number of bugs that we
have in the implementation itself (both user and service code).

I don't wish to discourage you as much as provide insight/background
into some of our decisions.



On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:

Dominic Williams wrote:

1/ If a node crashes or something else goes wrong, you leave behind
persistent nodes. Over time these will grow and grow, rather like the old
tmp folders used to fill with files under Windows

That's true. One either needs to use ephemerals or use persistent and have
a "garbage collector" (implicit or explicit gc). In most cases it's
preferable to use the ephemeral.

  2/ Persistent nodes = nasty scalability *bottleneck* because you're
having to write to disk somewhere.

This is not actually how ZK works. All znodes regardless of
persistent/ephemeral are written to disk persistently. Ephemeral nodes are
tied to the session that created them. As long as the session is alive the
ephemeral node is alive. Sessions themselves are persistently/reliably
stored by the ZK cluster. This allows the shutdown of the entire cluster and
restart it, all sessions/ephemerals will be maintained. Sessions can move
from server to server (if say network connectivity to server A fails, or
server A itself fails then the client will move to server B). The session
and all ephemerals are maintained (well, as long as the client moves withing
the expiration timeout value).

  To avoid this I'm actually thinking of writing locking system where you
out the existing chain not by enumerating sequential children, but by
looking at the contents of each temporary lock node to see what it is
waiting on. But... that's quite horrible. Was wondering whether there is
some technical reason why you ephemeral nodes can't have children??

There are a few cases to think about.

1) obviously ephemeral nodes can't have persistent children, this just
doesn't make sense

2) ephemeral nodes have an owner - the session that created them. so it
would also not make sense (in my mind at least) to have an ephemeral /foo
with another ephemeral /foo/bar with a different owner.

3) so you are left with "ephemerals can be a child of an ephemeral with the
same owner".

4) there are also issues of order. in particular what is the "deletion
order" depth first or breadth first, etc...

I believe the answer so far has been "we don't do this because it's fairly
complicated and we haven't seen any use cases that require it." In the cases
I've seen so far there was either a misunderstanding of how zk worked, or a
simpler way available.

Does that make sense? Thoughts?


Reply via email to