Re: getChildren() when the number of children is very large

2010-07-21 Thread Ted Dunning
On Tue, Jul 20, 2010 at 8:47 PM, André Oriani  wrote:

> Ted, just to clarify. By file you mean znode, right ?


Yes.


> So you are advising me
> to try an atomic append to znode's by first calling getData and then trying
> to conditionally set the data by using the version information obtained in
> the previous step ?
>

Exactly.


Re: getChildren() when the number of children is very large

2010-07-20 Thread André Oriani
Ted, just to clarify. By file you mean znode, right ? So you are advising me
to try an atomic append to znode's by first calling getData and then trying
to conditionally set the data by using the version information obtained in
the previous step ?

Thanks,
André

On Tue, Jul 20, 2010 at 23:52, Ted Dunning  wrote:

> Creating a new znode for each update isn't really necessary.  Just create a
> file that will contain all of the updates for the next snapshot and do
> atomic updates to add to the list of updates belonging to that snapshot.
>  When you complete the snapshot, you will create a new file.  After a time
> you can delete the old snapshot lists since they are now redundant.  This
> will leave only a few snapshot files in your directory and getChildren will
> be fast.  Getting the contents of the file will give you a list of
> transactions to apply and when you are done with those, you can get the
> file
> again to get any new ones before considering yourself to be up to date.
>  The
> snapshot file doesn't need to contain the updates themselves, but instead
> can contain pointers to other znodes that would actually contain the
> updates.
>
> I think that the tendency to use file creation as the basic atomic
> operation
> is a holdover from days when we used filesystems that way.  With ZK, file
> updates are ordered, atomic and you know that you updated the right version
> which makes many uses of directory updates much less natural.
>
> On Tue, Jul 20, 2010 at 7:26 PM, André Oriani <
> ra078...@students.ic.unicamp.br> wrote:
>
> > Hi folks,
> >
> > I was considering using Zookeeper to implement a replication protocol due
> > the global order guarantee. In my case, operations are logged by creating
> > persistent sequential znodes. Knowing the name of last applied znode,
> > backups can identify pending operations and apply them in order. Because
> I
> > want to allow backups to join the system at any time, I will not delete a
> > znode before a checkpoint. Thus,  I can ending up with thousand of child
> > nodes and consequently ZooKeeper.getChildren() calls might be very
> > consuming
> > since a huge list of node will be returned.
> >
> > I thought of using another znode to store the last created znode. So if
> the
> > last applied znode was op-11 and last created znode was op-14, I would
> try
> > to read op-12 and op-13. However, in order to protect against partial
> > failure, I have to encode some extra information ( I am using
> > -)  in the name of znodes. Thus it
> is
> > not possible to predict their names (they'll be op- > string>-). Consequently , I will have to call
> > getChildren() anyway.
> >
> > Has somebody faced the same issue ?  Has anybody found a better solution
> ?
> >  I was thinking of extending ZooKeeper code to have some kind of indexed
> > access to child znodes, but I don`t know how easy/clever is that.
> >
> > Thanks,
> > André
> >
>


Re: getChildren() when the number of children is very large

2010-07-20 Thread Ted Dunning
Creating a new znode for each update isn't really necessary.  Just create a
file that will contain all of the updates for the next snapshot and do
atomic updates to add to the list of updates belonging to that snapshot.
 When you complete the snapshot, you will create a new file.  After a time
you can delete the old snapshot lists since they are now redundant.  This
will leave only a few snapshot files in your directory and getChildren will
be fast.  Getting the contents of the file will give you a list of
transactions to apply and when you are done with those, you can get the file
again to get any new ones before considering yourself to be up to date.  The
snapshot file doesn't need to contain the updates themselves, but instead
can contain pointers to other znodes that would actually contain the
updates.

I think that the tendency to use file creation as the basic atomic operation
is a holdover from days when we used filesystems that way.  With ZK, file
updates are ordered, atomic and you know that you updated the right version
which makes many uses of directory updates much less natural.

On Tue, Jul 20, 2010 at 7:26 PM, André Oriani <
ra078...@students.ic.unicamp.br> wrote:

> Hi folks,
>
> I was considering using Zookeeper to implement a replication protocol due
> the global order guarantee. In my case, operations are logged by creating
> persistent sequential znodes. Knowing the name of last applied znode,
> backups can identify pending operations and apply them in order. Because I
> want to allow backups to join the system at any time, I will not delete a
> znode before a checkpoint. Thus,  I can ending up with thousand of child
> nodes and consequently ZooKeeper.getChildren() calls might be very
> consuming
> since a huge list of node will be returned.
>
> I thought of using another znode to store the last created znode. So if the
> last applied znode was op-11 and last created znode was op-14, I would try
> to read op-12 and op-13. However, in order to protect against partial
> failure, I have to encode some extra information ( I am using
> -)  in the name of znodes. Thus it is
> not possible to predict their names (they'll be op- string>-). Consequently , I will have to call
> getChildren() anyway.
>
> Has somebody faced the same issue ?  Has anybody found a better solution ?
>  I was thinking of extending ZooKeeper code to have some kind of indexed
> access to child znodes, but I don`t know how easy/clever is that.
>
> Thanks,
> André
>


getChildren() when the number of children is very large

2010-07-20 Thread André Oriani
Hi folks,

I was considering using Zookeeper to implement a replication protocol due
the global order guarantee. In my case, operations are logged by creating
persistent sequential znodes. Knowing the name of last applied znode,
backups can identify pending operations and apply them in order. Because I
want to allow backups to join the system at any time, I will not delete a
znode before a checkpoint. Thus,  I can ending up with thousand of child
nodes and consequently ZooKeeper.getChildren() calls might be very consuming
since a huge list of node will be returned.

I thought of using another znode to store the last created znode. So if the
last applied znode was op-11 and last created znode was op-14, I would try
to read op-12 and op-13. However, in order to protect against partial
failure, I have to encode some extra information ( I am using
-)  in the name of znodes. Thus it is
not possible to predict their names (they'll be op--). Consequently , I will have to call
getChildren() anyway.

Has somebody faced the same issue ?  Has anybody found a better solution ?
 I was thinking of extending ZooKeeper code to have some kind of indexed
access to child znodes, but I don`t know how easy/clever is that.

Thanks,
André