Re: getChildren() when the number of children is very large
On Tue, Jul 20, 2010 at 8:47 PM, André Oriani wrote: > Ted, just to clarify. By file you mean znode, right ? Yes. > So you are advising me > to try an atomic append to znode's by first calling getData and then trying > to conditionally set the data by using the version information obtained in > the previous step ? > Exactly.
Re: getChildren() when the number of children is very large
Ted, just to clarify. By file you mean znode, right ? So you are advising me to try an atomic append to znode's by first calling getData and then trying to conditionally set the data by using the version information obtained in the previous step ? Thanks, André On Tue, Jul 20, 2010 at 23:52, Ted Dunning wrote: > Creating a new znode for each update isn't really necessary. Just create a > file that will contain all of the updates for the next snapshot and do > atomic updates to add to the list of updates belonging to that snapshot. > When you complete the snapshot, you will create a new file. After a time > you can delete the old snapshot lists since they are now redundant. This > will leave only a few snapshot files in your directory and getChildren will > be fast. Getting the contents of the file will give you a list of > transactions to apply and when you are done with those, you can get the > file > again to get any new ones before considering yourself to be up to date. > The > snapshot file doesn't need to contain the updates themselves, but instead > can contain pointers to other znodes that would actually contain the > updates. > > I think that the tendency to use file creation as the basic atomic > operation > is a holdover from days when we used filesystems that way. With ZK, file > updates are ordered, atomic and you know that you updated the right version > which makes many uses of directory updates much less natural. > > On Tue, Jul 20, 2010 at 7:26 PM, André Oriani < > ra078...@students.ic.unicamp.br> wrote: > > > Hi folks, > > > > I was considering using Zookeeper to implement a replication protocol due > > the global order guarantee. In my case, operations are logged by creating > > persistent sequential znodes. Knowing the name of last applied znode, > > backups can identify pending operations and apply them in order. Because > I > > want to allow backups to join the system at any time, I will not delete a > > znode before a checkpoint. Thus, I can ending up with thousand of child > > nodes and consequently ZooKeeper.getChildren() calls might be very > > consuming > > since a huge list of node will be returned. > > > > I thought of using another znode to store the last created znode. So if > the > > last applied znode was op-11 and last created znode was op-14, I would > try > > to read op-12 and op-13. However, in order to protect against partial > > failure, I have to encode some extra information ( I am using > > -) in the name of znodes. Thus it > is > > not possible to predict their names (they'll be op- > string>-). Consequently , I will have to call > > getChildren() anyway. > > > > Has somebody faced the same issue ? Has anybody found a better solution > ? > > I was thinking of extending ZooKeeper code to have some kind of indexed > > access to child znodes, but I don`t know how easy/clever is that. > > > > Thanks, > > André > > >
Re: getChildren() when the number of children is very large
Creating a new znode for each update isn't really necessary. Just create a file that will contain all of the updates for the next snapshot and do atomic updates to add to the list of updates belonging to that snapshot. When you complete the snapshot, you will create a new file. After a time you can delete the old snapshot lists since they are now redundant. This will leave only a few snapshot files in your directory and getChildren will be fast. Getting the contents of the file will give you a list of transactions to apply and when you are done with those, you can get the file again to get any new ones before considering yourself to be up to date. The snapshot file doesn't need to contain the updates themselves, but instead can contain pointers to other znodes that would actually contain the updates. I think that the tendency to use file creation as the basic atomic operation is a holdover from days when we used filesystems that way. With ZK, file updates are ordered, atomic and you know that you updated the right version which makes many uses of directory updates much less natural. On Tue, Jul 20, 2010 at 7:26 PM, André Oriani < ra078...@students.ic.unicamp.br> wrote: > Hi folks, > > I was considering using Zookeeper to implement a replication protocol due > the global order guarantee. In my case, operations are logged by creating > persistent sequential znodes. Knowing the name of last applied znode, > backups can identify pending operations and apply them in order. Because I > want to allow backups to join the system at any time, I will not delete a > znode before a checkpoint. Thus, I can ending up with thousand of child > nodes and consequently ZooKeeper.getChildren() calls might be very > consuming > since a huge list of node will be returned. > > I thought of using another znode to store the last created znode. So if the > last applied znode was op-11 and last created znode was op-14, I would try > to read op-12 and op-13. However, in order to protect against partial > failure, I have to encode some extra information ( I am using > -) in the name of znodes. Thus it is > not possible to predict their names (they'll be op- string>-). Consequently , I will have to call > getChildren() anyway. > > Has somebody faced the same issue ? Has anybody found a better solution ? > I was thinking of extending ZooKeeper code to have some kind of indexed > access to child znodes, but I don`t know how easy/clever is that. > > Thanks, > André >
getChildren() when the number of children is very large
Hi folks, I was considering using Zookeeper to implement a replication protocol due the global order guarantee. In my case, operations are logged by creating persistent sequential znodes. Knowing the name of last applied znode, backups can identify pending operations and apply them in order. Because I want to allow backups to join the system at any time, I will not delete a znode before a checkpoint. Thus, I can ending up with thousand of child nodes and consequently ZooKeeper.getChildren() calls might be very consuming since a huge list of node will be returned. I thought of using another znode to store the last created znode. So if the last applied znode was op-11 and last created znode was op-14, I would try to read op-12 and op-13. However, in order to protect against partial failure, I have to encode some extra information ( I am using -) in the name of znodes. Thus it is not possible to predict their names (they'll be op--). Consequently , I will have to call getChildren() anyway. Has somebody faced the same issue ? Has anybody found a better solution ? I was thinking of extending ZooKeeper code to have some kind of indexed access to child znodes, but I don`t know how easy/clever is that. Thanks, André