RE: NodeChildrenChanged WatchedEvent

2009-05-11 Thread Benjamin Reed
good summary ted. just to add a bit. another motivation for the current design 
is what scott had mentioned earlier: not sending a flood of changes when the 
value of a node is changing rapidly. implicit in this is the fact that we do 
not send the value in the events. not only does this make the events much more 
heavy weight, but it also leads to bad programming practices (see the faq). 
since we don't send data in the events, sending 3 data changed events in a 
row is the same as just sending the last data changed event.

i also agree with ted about the wrappers. unless they are used to implement a 
new construct, usually they just introduce bugs. however, there are two things 
i want to point out. first, the current exception handling ranges from a pain 
to, in the case of create() with SEQUENTIAL and EPHEMERAL, almost impossible, 
so we want to make connecting recovery a bit more sophisticated; when a 
connection goes down, the client and server figure out what happen to the 
pending requests so that we never need to error them out with the i have no 
idea what happened exception, aka CONNECTION LOSS. second, higher level 
constructs in the form of recipes are great! for more sophisticated constructs 
it is great to have things implemented once and thoroughly debugged.

ben

ps - one other clarification in ZK 3, the watches are still tracked locally. 
it's just that in ZK 3 the client now has the ability to tell the server what 
it was watching and what was the last thing seen when it reconnects. the server 
can then figure out which watches were missed and need to be retriggered and 
which watches need to be reregistered
 
__
From: Ted Dunning [ted.dunn...@gmail.com]
Sent: Saturday, May 09, 2009 1:06 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: NodeChildrenChanged WatchedEvent

Making things better is always good.

I have found that in practice, most wrappers of ZK lead to serious errors
and should be avoided like the plague.  This particular use case is not a
big deal for me to code correctly (in Java, anyway) and I do it all the
time.

It may be that the no-persistent-watch policy was partly an artifact of the
ZK 1 and ZK 2 situation where ZK avoided keeping much of anything around per
session other than ephemeral files.  This has changed in ZK 3 and it might
be plausible to have more persistent watches.

On the other hand, I believe that Ben purposely avoided having this type of
watch to automatically throttle the number of notifications to be equal to
the rate at which the listener can handle them.  Having seen a number of
systems that didn't throttle this way up close and personal, I have lots of
empathy which that position.  Since I don't have any issue with looking at
for changes, I would tend to just go with whatever Ben suggests.  His
opinions (largely based on watching people code with ZK) are pretty danged
good.

On Sat, May 9, 2009 at 12:37 PM, Scott Carey sc...@richrelevance.comwrote:

 What I am suggesting are higher level constructs that do these repeated
 mundane tasks for you to handle those use cases where the verbosity of the
 API is a hinderance to quality and productivity.




--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)


Re: NodeChildrenChanged WatchedEvent

2009-05-09 Thread Ted Dunning
Making things better is always good.

I have found that in practice, most wrappers of ZK lead to serious errors
and should be avoided like the plague.  This particular use case is not a
big deal for me to code correctly (in Java, anyway) and I do it all the
time.

It may be that the no-persistent-watch policy was partly an artifact of the
ZK 1 and ZK 2 situation where ZK avoided keeping much of anything around per
session other than ephemeral files.  This has changed in ZK 3 and it might
be plausible to have more persistent watches.

On the other hand, I believe that Ben purposely avoided having this type of
watch to automatically throttle the number of notifications to be equal to
the rate at which the listener can handle them.  Having seen a number of
systems that didn't throttle this way up close and personal, I have lots of
empathy which that position.  Since I don't have any issue with looking at
for changes, I would tend to just go with whatever Ben suggests.  His
opinions (largely based on watching people code with ZK) are pretty danged
good.

On Sat, May 9, 2009 at 12:37 PM, Scott Carey sc...@richrelevance.comwrote:

 What I am suggesting are higher level constructs that do these repeated
 mundane tasks for you to handle those use cases where the verbosity of the
 API is a hinderance to quality and productivity.




-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)


Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Patrick Hunt
Javier, also note that the subsequent getChildren you mention in your 
original email is usually not entirely superfluous given that you 
generally want to watch the parent node for further changes, and a 
getChildren is required to set that watch.


Patrick

Benjamin Reed wrote:

i'm adding a faq on this right now. it's a rather common request.

we could put in the name of the node that is changing. indeed, we did in 
the first cut of zookeeper, but then we found that every instance of 
programs that used this resulted in bugs, so we removed it.


here is the problem:

you do a getChildren(), an event comes in that foo is deleted, and 
right afterwords goo gets deleted, but you aren't going to get that 
event since the previous delete fired and you haven't done another 
getChildren(). this almost always results in an error, so much so that 
we don't even give people the rope.


ben

Javier Vegas wrote:

Hi, I am starting to implement Zookeeper as an arbiter for a high
performance client-server service, it is working really well but I
have a question. When my Watcher receives an event of
NodeChildrenChanged event, is there any way of getting from the event
the path for the child that changed? The WatchedEvent javadoc says
that it includes exactly what happened but all I am able to extract
is a vague NodeChildrenChanged type. What I am doing now to figure
out the path of teh new child is to do a new getChildren and compare
the new children list with the old children list, but that seems a
waste of time and bandwith if my node has lots of children and is
watched by a loot of zookeepers (which will be in prod). If I can
somehow get the path of the added/deleted child from the
WatchedEvent, it will make my life easier and my Zookeeper-powered
system much more simple, robust and scalable. Any suggestions?

Thanks,

Javier Vegas
  




Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Javier Vegas
Sorry, what I meant is issuing the new method watchChildren() on the
parent node (basically the same as getChildren() but returning just a
boolean instead of a list of children, because I already know the
paths of the original children and the ones that were added/deleted so
I dont need the list again). I wasnt thinking (yet) about
grandchildren, but If I want to watch for them, I will need to do a
initial getChildren() on the new child that NodeChildrenChanged told
me about, followed by a watchChildren() after each event. Does this
make sense?

Javier

On Fri, May 8, 2009 at 1:23 PM, Patrick Hunt ph...@apache.org wrote:
 Javier, also note that the subsequent getChildren you mention in your
 original email is usually not entirely superfluous given that you generally
 want to watch the parent node for further changes, and a getChildren is
 required to set that watch.

 Patrick

 Benjamin Reed wrote:

 i'm adding a faq on this right now. it's a rather common request.

 we could put in the name of the node that is changing. indeed, we did in
 the first cut of zookeeper, but then we found that every instance of
 programs that used this resulted in bugs, so we removed it.

 here is the problem:

 you do a getChildren(), an event comes in that foo is deleted, and right
 afterwords goo gets deleted, but you aren't going to get that event since
 the previous delete fired and you haven't done another getChildren(). this
 almost always results in an error, so much so that we don't even give people
 the rope.

 ben

 Javier Vegas wrote:

 Hi, I am starting to implement Zookeeper as an arbiter for a high
 performance client-server service, it is working really well but I
 have a question. When my Watcher receives an event of
 NodeChildrenChanged event, is there any way of getting from the event
 the path for the child that changed? The WatchedEvent javadoc says
 that it includes exactly what happened but all I am able to extract
 is a vague NodeChildrenChanged type. What I am doing now to figure
 out the path of teh new child is to do a new getChildren and compare
 the new children list with the old children list, but that seems a
 waste of time and bandwith if my node has lots of children and is
 watched by a loot of zookeepers (which will be in prod). If I can
 somehow get the path of the added/deleted child from the
 WatchedEvent, it will make my life easier and my Zookeeper-powered
 system much more simple, robust and scalable. Any suggestions?

 Thanks,

 Javier Vegas





Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Javier Vegas
Maybe the name I selected is confusing, watchForChildrenChanges() is
more descriptive than watchChildren(). The first indicates that I am
setting a watch for children changes, the old name kinds of implies I
am watching for changes on the children of the node, which is not what
I want.

Javier

On Fri, May 8, 2009 at 1:31 PM, Javier Vegas jav...@beboinc.com wrote:
 Sorry, what I meant is issuing the new method watchChildren() on the
 parent node (basically the same as getChildren() but returning just a
 boolean instead of a list of children, because I already know the
 paths of the original children and the ones that were added/deleted so
 I dont need the list again). I wasnt thinking (yet) about
 grandchildren, but If I want to watch for them, I will need to do a
 initial getChildren() on the new child that NodeChildrenChanged told
 me about, followed by a watchChildren() after each event. Does this
 make sense?

 Javier

 On Fri, May 8, 2009 at 1:23 PM, Patrick Hunt ph...@apache.org wrote:
 Javier, also note that the subsequent getChildren you mention in your
 original email is usually not entirely superfluous given that you generally
 want to watch the parent node for further changes, and a getChildren is
 required to set that watch.

 Patrick

 Benjamin Reed wrote:

 i'm adding a faq on this right now. it's a rather common request.

 we could put in the name of the node that is changing. indeed, we did in
 the first cut of zookeeper, but then we found that every instance of
 programs that used this resulted in bugs, so we removed it.

 here is the problem:

 you do a getChildren(), an event comes in that foo is deleted, and right
 afterwords goo gets deleted, but you aren't going to get that event since
 the previous delete fired and you haven't done another getChildren(). this
 almost always results in an error, so much so that we don't even give people
 the rope.

 ben

 Javier Vegas wrote:

 Hi, I am starting to implement Zookeeper as an arbiter for a high
 performance client-server service, it is working really well but I
 have a question. When my Watcher receives an event of
 NodeChildrenChanged event, is there any way of getting from the event
 the path for the child that changed? The WatchedEvent javadoc says
 that it includes exactly what happened but all I am able to extract
 is a vague NodeChildrenChanged type. What I am doing now to figure
 out the path of teh new child is to do a new getChildren and compare
 the new children list with the old children list, but that seems a
 waste of time and bandwith if my node has lots of children and is
 watched by a loot of zookeepers (which will be in prod). If I can
 somehow get the path of the added/deleted child from the
 WatchedEvent, it will make my life easier and my Zookeeper-powered
 system much more simple, robust and scalable. Any suggestions?

 Thanks,

 Javier Vegas






Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Scott Carey
It won't work, because when you call watchChildren() you don't actually know
the list of children from the previous getChildren() if the initial watch
fired.

The initial watch may have fired because child X was added, but by the time
you get that message and call your watchChildren(), child Y and Z may have
been added as well, and you won't get events for that.

So, the pattern is to call getChildren() with a watch, save the list, then
when the event fires you call getChildren() again and set a watch, do a diff
of the result with the previous result to calculate what was added or
removed, do your app specific things as a result, and save the new state for
when the next watch fires.


On 5/8/09 1:31 PM, Javier Vegas jav...@beboinc.com wrote:

 Sorry, what I meant is issuing the new method watchChildren() on the
 parent node (basically the same as getChildren() but returning just a
 boolean instead of a list of children, because I already know the
 paths of the original children and the ones that were added/deleted so
 I dont need the list again). I wasnt thinking (yet) about
 grandchildren, but If I want to watch for them, I will need to do a
 initial getChildren() on the new child that NodeChildrenChanged told
 me about, followed by a watchChildren() after each event. Does this
 make sense?
 
 Javier
 
 On Fri, May 8, 2009 at 1:23 PM, Patrick Hunt ph...@apache.org wrote:
 Javier, also note that the subsequent getChildren you mention in your
 original email is usually not entirely superfluous given that you generally
 want to watch the parent node for further changes, and a getChildren is
 required to set that watch.
 
 Patrick
 
 Benjamin Reed wrote:
 
 i'm adding a faq on this right now. it's a rather common request.
 
 we could put in the name of the node that is changing. indeed, we did in
 the first cut of zookeeper, but then we found that every instance of
 programs that used this resulted in bugs, so we removed it.
 
 here is the problem:
 
 you do a getChildren(), an event comes in that foo is deleted, and right
 afterwords goo gets deleted, but you aren't going to get that event since
 the previous delete fired and you haven't done another getChildren(). this
 almost always results in an error, so much so that we don't even give people
 the rope.
 
 ben
 
 Javier Vegas wrote:
 
 Hi, I am starting to implement Zookeeper as an arbiter for a high
 performance client-server service, it is working really well but I
 have a question. When my Watcher receives an event of
 NodeChildrenChanged event, is there any way of getting from the event
 the path for the child that changed? The WatchedEvent javadoc says
 that it includes exactly what happened but all I am able to extract
 is a vague NodeChildrenChanged type. What I am doing now to figure
 out the path of teh new child is to do a new getChildren and compare
 the new children list with the old children list, but that seems a
 waste of time and bandwith if my node has lots of children and is
 watched by a loot of zookeepers (which will be in prod). If I can
 somehow get the path of the added/deleted child from the
 WatchedEvent, it will make my life easier and my Zookeeper-powered
 system much more simple, robust and scalable. Any suggestions?
 
 Thanks,
 
 Javier Vegas
 
 
 
 



Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Scott Carey
This reminds me of a feature request:

Persistent watch --

This type of watch would stick around, and doesn't have the gap-in-time
problem that the other watchers do.  This is particularly useful for exists
and children -- less so for data.

Such an event could delay, and merge certain changes together to avoid the
possible performance issue (e.g. changed children in last 5ms, not each
change).  It could even just be a client side ZooKeeper API thing that
re-registered and kept state to do it for you.

I wrote an entire wrapper API called NodeWatcher that tracks a node, keeps
its last seen state, registers watches, and does the diff on data and
children -- only passing along real changes to the client in a specific
callback (e.g. childrenAdded(List children)) not a generic ZK event.  The
client to this class does not need to deal with any error conditions at all,
it only gets a callback when the NodeWatcher knows something has changed.  A
separate Session is broken sequence is initiated elsewhere (another lower
level wrapper class), and a client can _optionally_ decide to listen for
that event based on its needs.  I've found that in about 4 out of 5 use
cases, clients don't even need to know that.


I'm also building a TreeWatcher that can do this cascaded to all children of
a node.
Both of these things should really be part of the base API IMO.  All the
garbage low level error condition details that make ZK a pain in the butt to
use is needed for locks, queues, and other synchronization primitives, but
is not needed for watch this node for changes to children or data, and tell
me what changed.

(this is the basis for an event-driven distributed coordination system --
for many event types all that is required is notification, queues and locks
don't apply; for the parts that need it more error handling is exposed to
the client)




On 5/8/09 1:01 PM, Benjamin Reed br...@yahoo-inc.com wrote:

 i'm adding a faq on this right now. it's a rather common request.
 
 we could put in the name of the node that is changing. indeed, we did in
 the first cut of zookeeper, but then we found that every instance of
 programs that used this resulted in bugs, so we removed it.
 
 here is the problem:
 
 you do a getChildren(), an event comes in that foo is deleted, and
 right afterwords goo gets deleted, but you aren't going to get that
 event since the previous delete fired and you haven't done another
 getChildren(). this almost always results in an error, so much so that
 we don't even give people the rope.
 
 ben
 
 Javier Vegas wrote:
 Hi, I am starting to implement Zookeeper as an arbiter for a high
 performance client-server service, it is working really well but I
 have a question. When my Watcher receives an event of
 NodeChildrenChanged event, is there any way of getting from the event
 the path for the child that changed? The WatchedEvent javadoc says
 that it includes exactly what happened but all I am able to extract
 is a vague NodeChildrenChanged type. What I am doing now to figure
 out the path of teh new child is to do a new getChildren and compare
 the new children list with the old children list, but that seems a
 waste of time and bandwith if my node has lots of children and is
 watched by a loot of zookeepers (which will be in prod). If I can
 somehow get the path of the added/deleted child from the
 WatchedEvent, it will make my life easier and my Zookeeper-powered
 system much more simple, robust and scalable. Any suggestions?
 
 Thanks,
 
 Javier Vegas
  
 
 



Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Javier Vegas
Ok, thanks for convincing me that I need to do another getChildren()
after each event. My initial plan was to put thousands of children
under the same node, but it seems I will need to organize them on some
kind of hierarchical structure. I will need to watch lots of nodes not
just one, but each getChildren after a watch change will return a
small list instead of a humongous one.

Your input was extremely helpful and fast, than you!

Javier

On Fri, May 8, 2009 at 1:38 PM, Scott Carey sc...@richrelevance.com wrote:
 It won't work, because when you call watchChildren() you don't actually know
 the list of children from the previous getChildren() if the initial watch
 fired.

 The initial watch may have fired because child X was added, but by the time
 you get that message and call your watchChildren(), child Y and Z may have
 been added as well, and you won't get events for that.

 So, the pattern is to call getChildren() with a watch, save the list, then
 when the event fires you call getChildren() again and set a watch, do a diff
 of the result with the previous result to calculate what was added or
 removed, do your app specific things as a result, and save the new state for
 when the next watch fires.


 On 5/8/09 1:31 PM, Javier Vegas jav...@beboinc.com wrote:

 Sorry, what I meant is issuing the new method watchChildren() on the
 parent node (basically the same as getChildren() but returning just a
 boolean instead of a list of children, because I already know the
 paths of the original children and the ones that were added/deleted so
 I dont need the list again). I wasnt thinking (yet) about
 grandchildren, but If I want to watch for them, I will need to do a
 initial getChildren() on the new child that NodeChildrenChanged told
 me about, followed by a watchChildren() after each event. Does this
 make sense?

 Javier

 On Fri, May 8, 2009 at 1:23 PM, Patrick Hunt ph...@apache.org wrote:
 Javier, also note that the subsequent getChildren you mention in your
 original email is usually not entirely superfluous given that you generally
 want to watch the parent node for further changes, and a getChildren is
 required to set that watch.

 Patrick

 Benjamin Reed wrote:

 i'm adding a faq on this right now. it's a rather common request.

 we could put in the name of the node that is changing. indeed, we did in
 the first cut of zookeeper, but then we found that every instance of
 programs that used this resulted in bugs, so we removed it.

 here is the problem:

 you do a getChildren(), an event comes in that foo is deleted, and right
 afterwords goo gets deleted, but you aren't going to get that event since
 the previous delete fired and you haven't done another getChildren(). this
 almost always results in an error, so much so that we don't even give 
 people
 the rope.

 ben

 Javier Vegas wrote:

 Hi, I am starting to implement Zookeeper as an arbiter for a high
 performance client-server service, it is working really well but I
 have a question. When my Watcher receives an event of
 NodeChildrenChanged event, is there any way of getting from the event
 the path for the child that changed? The WatchedEvent javadoc says
 that it includes exactly what happened but all I am able to extract
 is a vague NodeChildrenChanged type. What I am doing now to figure
 out the path of teh new child is to do a new getChildren and compare
 the new children list with the old children list, but that seems a
 waste of time and bandwith if my node has lots of children and is
 watched by a loot of zookeepers (which will be in prod). If I can
 somehow get the path of the added/deleted child from the
 WatchedEvent, it will make my life easier and my Zookeeper-powered
 system much more simple, robust and scalable. Any suggestions?

 Thanks,

 Javier Vegas








Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Ted Dunning
On Fri, May 8, 2009 at 1:31 PM, Javier Vegas jav...@beboinc.com wrote:

 Sorry, what I meant is issuing the new method watchChildren() on the
 parent node (basically the same as getChildren() but returning just a
 boolean instead of a list of children, because I already know the
 paths of the original children and the ones that were added/deleted so
 I dont need the list again).


You need to analyze this very much more carefully in light of Ben's comment.


 I wasnt thinking (yet) about
 grandchildren, but If I want to watch for them, I will need to do a
 initial getChildren() on the new child that NodeChildrenChanged told
 me about, followed by a watchChildren() after each event. Does this
 make sense?


That is close.

The watch has to be set when you do the getChildren to avoid having a crack
that a change could fall into between the getChildren and setting the watch.