Re: getting created child on NodeChildrenChanged event
It is good to keep things simple, but we have seen some requests related to the client api for children use cases that seem reasonable. In particular the issue of handling large numbers of children efficiently is currently a problem (queue say). We've seen proposals on this before, just no one's followed through with them. I personally think there's room for improvement, perhaps the current client api is too simple: https://issues.apache.org/jira/browse/ZOOKEEPER-423 Patrick On Fri, Sep 3, 2010 at 11:18 PM, Mahadev Konar maha...@yahoo-inc.comwrote: Hi Todd, We have always tried to lean on the side of keeping things lightweight and the api simple. The only way you would be able to do this is with sequential creates. 1. create nodes like /queueelement-$i where i is a monotonically increasing number. You could use the sequential flag of zookeeper to do this. 2. when deleting a node, you would remove the node and create a deleted node on /deletedqueueelements/queuelement-$i 2.1 on notification you would go to /deletedqueelements/ and find out which ones were deleted. The above only works if you are ok with monotonically unique queue elements. 3. the above method allows the folks to see the deltas using deletedqueuelements, which can be garbage collected by some clean up process (you can be smarter abt this as well) Would something like this work? Thanks mahadev On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi Dave, Thanks for the response. I understand your point about missed events during a watch reset period. I may be off, here is the functionality I was thinking. I'm not sure if the ZK internal versioning process could possibly support something like this. 1. A watch is placed on children 2. The event is fired to the client. The client receives the Stat object as part of the event for the current state of the node when the event was created. We'll call this Stat A with version 1 3. The client performs processing. Meanwhile the node has several children changed. Versions are incremented to version 2 and version 3 4. Client resets the watch 5. A node is added 6. The event is fired to the client. Client receives Stat B with version 4 7. Client calls performs a deltaChildren(Stat A, Stat B) 8. zookeeper returns added nodes between stats, also returns deleted nodes between stats. This would handle the missed event problem since the client would have the 2 states it needs to compare. It also allows clients dealing with large data sets to only deal with the delta over time (like a git replay). Our number of queues could get quite large, and I'm concerned that keeping my previous event's children in a set to perform the delta may become quite memory and processor intensive Would a feature like this be possible without over complicating the Zookeeper core? Thanks, Todd On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote: Hi Todd - The general explanation for why Zookeeper doesn't pass the event information w/ the event notification is that an event notification is only triggered once, and thus may indicate multiple events. For example, if you do a GetChildren and set a watch, then multiple children are added at about the same time, the first one triggers a notification, but the second (or later) ones do not. When you do another GetChildren() request to get the list and reset the watch, you'll see all the changed nodes, however if you had just been told about the first change in the notification you would have missed the others. To do what you are wanting, you would really need persistent watches that send notifications every time a change occurs and don't need to be reset so you can't miss events. That isn't the design that was chosen for Zookeeper and I don't think it's likely to be implemented. -Dave Wright On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm writing a distributed queue monitoring class for our leader node in the cluster. We're queueing messages per input hardware device, this queue is then assigned to a node with the least load in our cluster. To do this, I maintain 2 Persistent Znode with the following format. data queue /dataqueue/devices/unit id/data packet processing follower /dataqueue/nodes/node name/unit id The queue monitor watches for changes on the path of /dataqueue/devices. When the first packet from a unit is received, the queue writer will create the queue with the unit id. This triggers the watch event on the monitoring class, which in turn creates the znode for the path with the least loaded node. This path is watched for child node creation and the node creates a queue consumer to consume messages from the new queue. Our list of queues can become quite large, and I would prefer not to maintain a list
Re: getting created child on NodeChildrenChanged event
Hi Todd, We have always tried to lean on the side of keeping things lightweight and the api simple. The only way you would be able to do this is with sequential creates. 1. create nodes like /queueelement-$i where i is a monotonically increasing number. You could use the sequential flag of zookeeper to do this. 2. when deleting a node, you would remove the node and create a deleted node on /deletedqueueelements/queuelement-$i 2.1 on notification you would go to /deletedqueelements/ and find out which ones were deleted. The above only works if you are ok with monotonically unique queue elements. 3. the above method allows the folks to see the deltas using deletedqueuelements, which can be garbage collected by some clean up process (you can be smarter abt this as well) Would something like this work? Thanks mahadev On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi Dave, Thanks for the response. I understand your point about missed events during a watch reset period. I may be off, here is the functionality I was thinking. I'm not sure if the ZK internal versioning process could possibly support something like this. 1. A watch is placed on children 2. The event is fired to the client. The client receives the Stat object as part of the event for the current state of the node when the event was created. We'll call this Stat A with version 1 3. The client performs processing. Meanwhile the node has several children changed. Versions are incremented to version 2 and version 3 4. Client resets the watch 5. A node is added 6. The event is fired to the client. Client receives Stat B with version 4 7. Client calls performs a deltaChildren(Stat A, Stat B) 8. zookeeper returns added nodes between stats, also returns deleted nodes between stats. This would handle the missed event problem since the client would have the 2 states it needs to compare. It also allows clients dealing with large data sets to only deal with the delta over time (like a git replay). Our number of queues could get quite large, and I'm concerned that keeping my previous event's children in a set to perform the delta may become quite memory and processor intensive Would a feature like this be possible without over complicating the Zookeeper core? Thanks, Todd On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote: Hi Todd - The general explanation for why Zookeeper doesn't pass the event information w/ the event notification is that an event notification is only triggered once, and thus may indicate multiple events. For example, if you do a GetChildren and set a watch, then multiple children are added at about the same time, the first one triggers a notification, but the second (or later) ones do not. When you do another GetChildren() request to get the list and reset the watch, you'll see all the changed nodes, however if you had just been told about the first change in the notification you would have missed the others. To do what you are wanting, you would really need persistent watches that send notifications every time a change occurs and don't need to be reset so you can't miss events. That isn't the design that was chosen for Zookeeper and I don't think it's likely to be implemented. -Dave Wright On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm writing a distributed queue monitoring class for our leader node in the cluster. We're queueing messages per input hardware device, this queue is then assigned to a node with the least load in our cluster. To do this, I maintain 2 Persistent Znode with the following format. data queue /dataqueue/devices/unit id/data packet processing follower /dataqueue/nodes/node name/unit id The queue monitor watches for changes on the path of /dataqueue/devices. When the first packet from a unit is received, the queue writer will create the queue with the unit id. This triggers the watch event on the monitoring class, which in turn creates the znode for the path with the least loaded node. This path is watched for child node creation and the node creates a queue consumer to consume messages from the new queue. Our list of queues can become quite large, and I would prefer not to maintain a list of queues I have assigned then perform a delta when the event fires to determine which queues are new and caused the watch event. I can't really use sequenced nodes and keep track of my last read position, because I don't want to iterate over the list of queues to determine which sequenced node belongs to the current unit id (it would require full iteration, which really doesn't save me any reads). Is it possible to create a watch to return the path and Stat of the child node that caused the event to fire? Thanks, Todd
Re: getting created child on NodeChildrenChanged event
Hi Todd - The general explanation for why Zookeeper doesn't pass the event information w/ the event notification is that an event notification is only triggered once, and thus may indicate multiple events. For example, if you do a GetChildren and set a watch, then multiple children are added at about the same time, the first one triggers a notification, but the second (or later) ones do not. When you do another GetChildren() request to get the list and reset the watch, you'll see all the changed nodes, however if you had just been told about the first change in the notification you would have missed the others. To do what you are wanting, you would really need persistent watches that send notifications every time a change occurs and don't need to be reset so you can't miss events. That isn't the design that was chosen for Zookeeper and I don't think it's likely to be implemented. -Dave Wright On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm writing a distributed queue monitoring class for our leader node in the cluster. We're queueing messages per input hardware device, this queue is then assigned to a node with the least load in our cluster. To do this, I maintain 2 Persistent Znode with the following format. data queue /dataqueue/devices/unit id/data packet processing follower /dataqueue/nodes/node name/unit id The queue monitor watches for changes on the path of /dataqueue/devices. When the first packet from a unit is received, the queue writer will create the queue with the unit id. This triggers the watch event on the monitoring class, which in turn creates the znode for the path with the least loaded node. This path is watched for child node creation and the node creates a queue consumer to consume messages from the new queue. Our list of queues can become quite large, and I would prefer not to maintain a list of queues I have assigned then perform a delta when the event fires to determine which queues are new and caused the watch event. I can't really use sequenced nodes and keep track of my last read position, because I don't want to iterate over the list of queues to determine which sequenced node belongs to the current unit id (it would require full iteration, which really doesn't save me any reads). Is it possible to create a watch to return the path and Stat of the child node that caused the event to fire? Thanks, Todd
Re: getting created child on NodeChildrenChanged event
Hi Dave, Thanks for the response. I understand your point about missed events during a watch reset period. I may be off, here is the functionality I was thinking. I'm not sure if the ZK internal versioning process could possibly support something like this. 1. A watch is placed on children 2. The event is fired to the client. The client receives the Stat object as part of the event for the current state of the node when the event was created. We'll call this Stat A with version 1 3. The client performs processing. Meanwhile the node has several children changed. Versions are incremented to version 2 and version 3 4. Client resets the watch 5. A node is added 6. The event is fired to the client. Client receives Stat B with version 4 7. Client calls performs a deltaChildren(Stat A, Stat B) 8. zookeeper returns added nodes between stats, also returns deleted nodes between stats. This would handle the missed event problem since the client would have the 2 states it needs to compare. It also allows clients dealing with large data sets to only deal with the delta over time (like a git replay). Our number of queues could get quite large, and I'm concerned that keeping my previous event's children in a set to perform the delta may become quite memory and processor intensive Would a feature like this be possible without over complicating the Zookeeper core? Thanks, Todd On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote: Hi Todd - The general explanation for why Zookeeper doesn't pass the event information w/ the event notification is that an event notification is only triggered once, and thus may indicate multiple events. For example, if you do a GetChildren and set a watch, then multiple children are added at about the same time, the first one triggers a notification, but the second (or later) ones do not. When you do another GetChildren() request to get the list and reset the watch, you'll see all the changed nodes, however if you had just been told about the first change in the notification you would have missed the others. To do what you are wanting, you would really need persistent watches that send notifications every time a change occurs and don't need to be reset so you can't miss events. That isn't the design that was chosen for Zookeeper and I don't think it's likely to be implemented. -Dave Wright On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm writing a distributed queue monitoring class for our leader node in the cluster. We're queueing messages per input hardware device, this queue is then assigned to a node with the least load in our cluster. To do this, I maintain 2 Persistent Znode with the following format. data queue /dataqueue/devices/unit id/data packet processing follower /dataqueue/nodes/node name/unit id The queue monitor watches for changes on the path of /dataqueue/devices. When the first packet from a unit is received, the queue writer will create the queue with the unit id. This triggers the watch event on the monitoring class, which in turn creates the znode for the path with the least loaded node. This path is watched for child node creation and the node creates a queue consumer to consume messages from the new queue. Our list of queues can become quite large, and I would prefer not to maintain a list of queues I have assigned then perform a delta when the event fires to determine which queues are new and caused the watch event. I can't really use sequenced nodes and keep track of my last read position, because I don't want to iterate over the list of queues to determine which sequenced node belongs to the current unit id (it would require full iteration, which really doesn't save me any reads). Is it possible to create a watch to return the path and Stat of the child node that caused the event to fire? Thanks, Todd