Re: getting created child on NodeChildrenChanged event

2010-09-07 Thread Patrick Hunt
It is good to keep things simple, but we have seen some requests related to
the client api  for children use cases that seem reasonable. In particular
the issue of handling large numbers of children efficiently is currently a
problem (queue say). We've seen proposals on this before, just no one's
followed through with them. I personally think there's room for improvement,
perhaps the current client api is too simple:

https://issues.apache.org/jira/browse/ZOOKEEPER-423

Patrick

On Fri, Sep 3, 2010 at 11:18 PM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi Todd,
  We have always tried to lean on the side of keeping things lightweight and
 the api simple. The only way you would be able to do this is with
 sequential
 creates.

 1. create nodes like /queueelement-$i where i is a monotonically increasing
 number. You could use the sequential flag of zookeeper to do this.

 2. when deleting a node, you would remove the node and create a deleted
 node
 on

 /deletedqueueelements/queuelement-$i

 2.1 on notification you would go to /deletedqueelements/ and find out which
 ones were deleted.

 The above only works if you are ok with monotonically unique queue
 elements.

 3. the above method allows the folks to see the deltas using
 deletedqueuelements, which can be garbage collected by some clean up
 process
 (you can be smarter abt this as well)

 Would something like this work?


 Thanks
 mahadev


 On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote:

  Hi Dave,
Thanks for the response.  I understand your point about missed events
  during a watch reset period.  I may be off, here is the functionality I
  was thinking.  I'm not sure if the ZK internal versioning process could
  possibly support something like this.
 
  1. A watch is placed on children
  2. The event is fired to the client.  The client receives the Stat
  object as part of the event for the current state of the node when the
  event was created.  We'll call this Stat A with version 1
  3. The client performs processing.  Meanwhile the node has several
  children changed. Versions are incremented to version 2 and version 3
  4. Client resets the watch
  5. A node is added
  6. The event is fired to the client.  Client receives Stat B with
  version 4
  7. Client calls performs a deltaChildren(Stat A, Stat B)
  8. zookeeper returns added nodes between stats, also returns deleted
  nodes between stats.
 
  This would handle the missed event problem since the client would have
  the 2 states it needs to compare.  It also allows clients dealing with
  large data sets to only deal with the delta over time (like a git
  replay).  Our number of queues could get quite large, and I'm concerned
  that keeping my previous event's children in a set to perform the delta
  may become quite memory and processor intensive  Would a feature like
  this be possible without over complicating the Zookeeper core?
 
 
  Thanks,
  Todd
 
  On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:
 
  Hi Todd -
  The general explanation for why Zookeeper doesn't pass the event
 information
  w/ the event notification is that an event notification is only
 triggered
  once, and thus may indicate multiple events. For example, if you do a
  GetChildren and set a watch, then multiple children are added at about
 the
  same time, the first one triggers a notification, but the second (or
 later)
  ones do not. When you do another GetChildren() request to get the list
 and
  reset the watch, you'll see all the changed nodes, however if you had
 just
  been told about the first change in the notification you would have
 missed
  the others.
  To do what you are wanting, you would really need persistent watches
 that
  send notifications every time a change occurs and don't need to be reset
 so
  you can't miss events. That isn't the design that was chosen for
 Zookeeper
  and I don't think it's likely to be implemented.
 
  -Dave Wright
 
  On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz
 wrote:
 
  Hi all,
   I'm writing a distributed queue monitoring class for our leader node
 in
  the cluster.  We're queueing messages per input hardware device, this
 queue
  is then assigned to a node with the least load in our cluster.  To do
 this,
  I maintain 2 Persistent Znode with the following format.
 
  data queue
 
  /dataqueue/devices/unit id/data packet
 
  processing follower
 
  /dataqueue/nodes/node name/unit id
 
  The queue monitor watches for changes on the path of
 /dataqueue/devices.
   When the first packet from a unit is received, the queue writer will
  create
  the queue with the unit id.  This triggers the watch event on the
  monitoring
  class, which in turn creates the znode for the path with the least
 loaded
  node.  This path is watched for child node creation and the node
 creates a
  queue consumer to consume messages from the new queue.
 
 
  Our list of queues can become quite large, and I would prefer not to
  maintain a list 

Re: getting created child on NodeChildrenChanged event

2010-09-04 Thread Mahadev Konar
Hi Todd, 
  We have always tried to lean on the side of keeping things lightweight and
the api simple. The only way you would be able to do this is with sequential
creates.

1. create nodes like /queueelement-$i where i is a monotonically increasing
number. You could use the sequential flag of zookeeper to do this.

2. when deleting a node, you would remove the node and create a deleted node
on 

/deletedqueueelements/queuelement-$i

2.1 on notification you would go to /deletedqueelements/ and find out which
ones were deleted. 

The above only works if you are ok with monotonically unique queue elements.

3. the above method allows the folks to see the deltas using
deletedqueuelements, which can be garbage collected by some clean up process
(you can be smarter abt this as well)

Would something like this work?


Thanks
mahadev


On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote:

 Hi Dave,
   Thanks for the response.  I understand your point about missed events
 during a watch reset period.  I may be off, here is the functionality I
 was thinking.  I'm not sure if the ZK internal versioning process could
 possibly support something like this.
 
 1. A watch is placed on children
 2. The event is fired to the client.  The client receives the Stat
 object as part of the event for the current state of the node when the
 event was created.  We'll call this Stat A with version 1
 3. The client performs processing.  Meanwhile the node has several
 children changed. Versions are incremented to version 2 and version 3
 4. Client resets the watch
 5. A node is added
 6. The event is fired to the client.  Client receives Stat B with
 version 4
 7. Client calls performs a deltaChildren(Stat A, Stat B)
 8. zookeeper returns added nodes between stats, also returns deleted
 nodes between stats.
 
 This would handle the missed event problem since the client would have
 the 2 states it needs to compare.  It also allows clients dealing with
 large data sets to only deal with the delta over time (like a git
 replay).  Our number of queues could get quite large, and I'm concerned
 that keeping my previous event's children in a set to perform the delta
 may become quite memory and processor intensive  Would a feature like
 this be possible without over complicating the Zookeeper core?
 
 
 Thanks,
 Todd
 
 On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:
 
 Hi Todd -
 The general explanation for why Zookeeper doesn't pass the event information
 w/ the event notification is that an event notification is only triggered
 once, and thus may indicate multiple events. For example, if you do a
 GetChildren and set a watch, then multiple children are added at about the
 same time, the first one triggers a notification, but the second (or later)
 ones do not. When you do another GetChildren() request to get the list and
 reset the watch, you'll see all the changed nodes, however if you had just
 been told about the first change in the notification you would have missed
 the others.
 To do what you are wanting, you would really need persistent watches that
 send notifications every time a change occurs and don't need to be reset so
 you can't miss events. That isn't the design that was chosen for Zookeeper
 and I don't think it's likely to be implemented.
 
 -Dave Wright
 
 On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote:
 
 Hi all,
  I'm writing a distributed queue monitoring class for our leader node in
 the cluster.  We're queueing messages per input hardware device, this queue
 is then assigned to a node with the least load in our cluster.  To do this,
 I maintain 2 Persistent Znode with the following format.
 
 data queue
 
 /dataqueue/devices/unit id/data packet
 
 processing follower
 
 /dataqueue/nodes/node name/unit id
 
 The queue monitor watches for changes on the path of /dataqueue/devices.
  When the first packet from a unit is received, the queue writer will
 create
 the queue with the unit id.  This triggers the watch event on the
 monitoring
 class, which in turn creates the znode for the path with the least loaded
 node.  This path is watched for child node creation and the node creates a
 queue consumer to consume messages from the new queue.
 
 
 Our list of queues can become quite large, and I would prefer not to
 maintain a list of queues I have assigned then perform a delta when the
 event fires to determine which queues are new and caused the watch event. I
 can't really use sequenced nodes and keep track of my last read position,
 because I don't want to iterate over the list of queues to determine which
 sequenced node belongs to the current unit id (it would require full
 iteration, which really doesn't save me any reads).  Is it possible to
 create a watch to return the path and Stat of the child node that caused
 the
 event to fire?
 
 Thanks,
 Todd
 
 



Re: getting created child on NodeChildrenChanged event

2010-08-31 Thread Dave Wright
Hi Todd -
The general explanation for why Zookeeper doesn't pass the event information
w/ the event notification is that an event notification is only triggered
once, and thus may indicate multiple events. For example, if you do a
GetChildren and set a watch, then multiple children are added at about the
same time, the first one triggers a notification, but the second (or later)
ones do not. When you do another GetChildren() request to get the list and
reset the watch, you'll see all the changed nodes, however if you had just
been told about the first change in the notification you would have missed
the others.
To do what you are wanting, you would really need persistent watches that
send notifications every time a change occurs and don't need to be reset so
you can't miss events. That isn't the design that was chosen for Zookeeper
and I don't think it's likely to be implemented.

-Dave Wright

On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote:

 Hi all,
  I'm writing a distributed queue monitoring class for our leader node in
 the cluster.  We're queueing messages per input hardware device, this queue
 is then assigned to a node with the least load in our cluster.  To do this,
 I maintain 2 Persistent Znode with the following format.

 data queue

 /dataqueue/devices/unit id/data packet

 processing follower

 /dataqueue/nodes/node name/unit id

 The queue monitor watches for changes on the path of /dataqueue/devices.
  When the first packet from a unit is received, the queue writer will
 create
 the queue with the unit id.  This triggers the watch event on the
 monitoring
 class, which in turn creates the znode for the path with the least loaded
 node.  This path is watched for child node creation and the node creates a
 queue consumer to consume messages from the new queue.


 Our list of queues can become quite large, and I would prefer not to
 maintain a list of queues I have assigned then perform a delta when the
 event fires to determine which queues are new and caused the watch event. I
 can't really use sequenced nodes and keep track of my last read position,
 because I don't want to iterate over the list of queues to determine which
 sequenced node belongs to the current unit id (it would require full
 iteration, which really doesn't save me any reads).  Is it possible to
 create a watch to return the path and Stat of the child node that caused
 the
 event to fire?

 Thanks,
 Todd



Re: getting created child on NodeChildrenChanged event

2010-08-31 Thread Todd Nine
Hi Dave,
  Thanks for the response.  I understand your point about missed events
during a watch reset period.  I may be off, here is the functionality I
was thinking.  I'm not sure if the ZK internal versioning process could
possibly support something like this.

1. A watch is placed on children
2. The event is fired to the client.  The client receives the Stat
object as part of the event for the current state of the node when the
event was created.  We'll call this Stat A with version 1
3. The client performs processing.  Meanwhile the node has several
children changed. Versions are incremented to version 2 and version 3
4. Client resets the watch
5. A node is added
6. The event is fired to the client.  Client receives Stat B with
version 4
7. Client calls performs a deltaChildren(Stat A, Stat B)
8. zookeeper returns added nodes between stats, also returns deleted
nodes between stats.

This would handle the missed event problem since the client would have
the 2 states it needs to compare.  It also allows clients dealing with
large data sets to only deal with the delta over time (like a git
replay).  Our number of queues could get quite large, and I'm concerned
that keeping my previous event's children in a set to perform the delta
may become quite memory and processor intensive  Would a feature like
this be possible without over complicating the Zookeeper core?
 

Thanks,
Todd

On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:

 Hi Todd -
 The general explanation for why Zookeeper doesn't pass the event information
 w/ the event notification is that an event notification is only triggered
 once, and thus may indicate multiple events. For example, if you do a
 GetChildren and set a watch, then multiple children are added at about the
 same time, the first one triggers a notification, but the second (or later)
 ones do not. When you do another GetChildren() request to get the list and
 reset the watch, you'll see all the changed nodes, however if you had just
 been told about the first change in the notification you would have missed
 the others.
 To do what you are wanting, you would really need persistent watches that
 send notifications every time a change occurs and don't need to be reset so
 you can't miss events. That isn't the design that was chosen for Zookeeper
 and I don't think it's likely to be implemented.
 
 -Dave Wright
 
 On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote:
 
  Hi all,
   I'm writing a distributed queue monitoring class for our leader node in
  the cluster.  We're queueing messages per input hardware device, this queue
  is then assigned to a node with the least load in our cluster.  To do this,
  I maintain 2 Persistent Znode with the following format.
 
  data queue
 
  /dataqueue/devices/unit id/data packet
 
  processing follower
 
  /dataqueue/nodes/node name/unit id
 
  The queue monitor watches for changes on the path of /dataqueue/devices.
   When the first packet from a unit is received, the queue writer will
  create
  the queue with the unit id.  This triggers the watch event on the
  monitoring
  class, which in turn creates the znode for the path with the least loaded
  node.  This path is watched for child node creation and the node creates a
  queue consumer to consume messages from the new queue.
 
 
  Our list of queues can become quite large, and I would prefer not to
  maintain a list of queues I have assigned then perform a delta when the
  event fires to determine which queues are new and caused the watch event. I
  can't really use sequenced nodes and keep track of my last read position,
  because I don't want to iterate over the list of queues to determine which
  sequenced node belongs to the current unit id (it would require full
  iteration, which really doesn't save me any reads).  Is it possible to
  create a watch to return the path and Stat of the child node that caused
  the
  event to fire?
 
  Thanks,
  Todd