Concurrent reads and writes on BookKeeper
Sorry, I forgot the subject on my last message :| Hi all, I was considering BookKeeper to implement some server replicated application having one primary server as writer and many backup servers reading from BookKeeper concurrently. The last documentation a I had access says "This writer has to execute a close ledger operation before any other client can read from it." So readers cannot ready any entry on the ledger, even the already committed ones until writer stops writing to the ledger,i.e, closes it. Is my understanding right ? Should I then use Zookeeper directly to achieve what I want ? Thanks for the attention, André Oriani
Re: Concurrent reads and writes on BookKeeper
Well Flavio, it is a extremely simple prototype where a primary broadcast updates on a single integer to backups. So we gonna have (n-1) reads for every write in a cluster of size n. I think sequential nodes in Zookeeper are fine for now, But I don't know if I am going to review that if things begin to get more complex. Tks a lot, André Oriani > Hi Andre, To guarantee that two clients that read from a ledger will > read the same sequence of entries, we need to make sure that there is > agreement on the end of the sequence. A client is still able to read > from an open ledger, though. We have an open jira about informing > clients of the progress of an open ledger (ZOOKEEPER-462), but we > haven't reached agreement on it yet. Some folks think that it is best > that each application use the mechanism it finds best. One option is > to have the writer writing periodically to a ZooKeeper znode to inform > of its progress. > > I would need to know more detail of your application before > recommending you to stick with BookKeeper or switch to ZooKeeper. If > your workload is dominated by writes, then BookKeeper might be a > better option. > > -Flavio > > On May 19, 2010, at 1:29 AM, André Oriani wrote: > >> Sorry, I forgot the subject on my last message :| >> >> Hi all, >> I was considering BookKeeper to implement some server replicated >> application having one primary server as writer and many backup >> servers >> reading from BookKeeper concurrently. The last documentation a I had >> access says "This writer has to execute a close ledger operation >> before >> any other client can read from it." So readers cannot ready any >> entry on >> the ledger, even the already committed ones until writer stops >> writing to >> the ledger,i.e, closes it. Is my understanding right ? Should I >> then use >> Zookeeper directly to achieve what I want ? >> >> >> Thanks for the attention, >> André Oriani >> >> >> >> >> >> > >
Are Watchers execute sequentially or in parallel ?
Hi, Are Watchers executed sequentially or in parallel ? Suppose I want to monitor the children of a znode for any modification. I don't want the same watcher to be re-executed while it is still executing. 1) public class ChildrenWatcher implements Watcher{ public void process(WatchedEvent event) { //get children and install watcher List children = zk.getChildren(path, this); //process children } } 2) public class ChildrenWatcher implements Watcher{ public void process(WatchedEvent event) { //get children List children = zk.getChildren(path, null); //process children //install watcher zk.getChildren(path, null) } } Does both code achieve the goal or just the code number 2 ? Tks, André
Re: Are Watchers execute sequentially or in parallel ?
Thanks Ben. It was a copy and past mistake. So that means method process() must return as soon as possible. On Tue, Jun 29, 2010 at 11:48, Benjamin Reed wrote: watchers are executed sequentially and in order. there is one dispatch thread that invokes the watch callbacks. ben ps - in 2) you do not install a watch. On 06/29/2010 06:13 AM, André Oriani wrote: Hi, Are Watchers executed sequentially or in parallel ? Suppose I want to monitor the children of a znode for any modification. I don't want the same watcher to be re-executed while it is still executing. 1) public class ChildrenWatcher implements Watcher{ public void process(WatchedEvent event) { //get children and install watcher List children = zk.getChildren(path, this); //process children } } 2) public class ChildrenWatcher implements Watcher{ public void process(WatchedEvent event) { //get children List children = zk.getChildren(path, null); //process children //install watcher zk.getChildren(path, null) } } Does both code achieve the goal or just the code number 2 ? Tks, André
BookKeeper Doubts
Hi, I was not sure if I had understood the behavior of BookKeeper from documentation. So I made a little program, reproduced below, to see what BookKeeper looks like in action. Assuming my code is correct ( you never know when your code has some nasty obvious bugs that only other person than you can see ) , I could draw the follow conclusions: 1) Only the creator can add entries to a ledger, even though you can open the ledger, get a handle and call addEntry on it. No exception is thrown i. In other words, you cannot open a ledger for append. 2) Readers are able to see only the entries that were added to a ledger until someone had opened it for reading. If you want to ensure readers will see all the entries, you must add all entries before any reader attempts to read from the ledger. Could someone please tell me if those conclusions are correct or if I am mistaken? In the later case, could that person also tell me what is wrong ? Thanks a lot for the attention and the patience with this BookKeeper newbie, André package br.unicamp.zooexp.booexp; import java.io.IOException; import java.util.Enumeration; import org.apache.bookkeeper.client.BKException; import org.apache.bookkeeper.client.BookKeeper; import org.apache.bookkeeper.client.LedgerEntry; import org.apache.bookkeeper.client.LedgerHandle; import org.apache.bookkeeper.client.BookKeeper.DigestType; import org.apache.zookeeper.KeeperException; public class BookTest { public static void main (String ... args) throws IOException, InterruptedException, KeeperException, BKException{ BookKeeper bk = new BookKeeper("127.0.0.1"); LedgerHandle lh = bk.createLedger(DigestType.CRC32, "123" .getBytes()); long lh_id = lh.getId(); lh.addEntry("Teste".getBytes()); lh.addEntry("Test2".getBytes()); System.out.printf("Got %d entries for lh\n" ,lh.getLastAddConfirmed()+1); lh.addEntry("Test3".getBytes()); LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, "123" .getBytes()); System.out.printf("Got %d entries for lh1\n" ,lh1.getLastAddConfirmed()+1); lh.addEntry("Test4".getBytes()); lh.addEntry("Test5".getBytes()); lh.addEntry("Test6".getBytes()); System.out.printf("Got %d entries for lh\n" ,lh.getLastAddConfirmed()+1); Enumeration seq = lh.readEntries(0, lh.getLastAddConfirmed()); while (seq.hasMoreElements()){ System.out.println(new String(seq.nextElement().getEntry())); } lh.close(); lh1.addEntry("Test7".getBytes()); lh1.addEntry("Test8".getBytes()); System.out.printf("Got %d entries for lh1\n" ,lh1.getLastAddConfirmed()+1); seq = lh1.readEntries(0, lh1.getLastAddConfirmed()); while (seq.hasMoreElements()){ System.out.println(new String(seq.nextElement().getEntry())); } lh1.close(); LedgerHandle lh2 = bk.openLedger(lh_id, DigestType.CRC32, "123" .getBytes()); lh2.addEntry("Test9".getBytes()); System.out.printf("Got %d entries for lh2 \n" ,lh2.getLastAddConfirmed()+1); seq = lh2.readEntries(0, lh2.getLastAddConfirmed()); while (seq.hasMoreElements()){ System.out.println(new String(seq.nextElement().getEntry())); } bk.halt(); } } Output: Got 2 entries for lh Got 3 entries for lh1 Got 6 entries for lh Teste Test2 Test3 Test4 Test5 Test6 Got 3 entries for lh1 Teste Test2 Test3 Got 3 entries for lh2 Teste Test2 Test3
Re: BookKeeper Doubts
I filled ZOOKEEPER-824 and ZOOKEEPER-825 tks, André On Mon, Jul 19, 2010 at 19:44, Benjamin Reed wrote: > you have concluded correctly. > > 1) bookkeeper was designed for a process to use as a write-ahead log, so as > a simplifying assumption we assume a single writer to a log. we should be > throwing an exception if you try to write to a handle that you obtained > using openLedger. can you open a jira for that? > > 2) this is mostly true, there are some exceptions. the creater of a ledger > can read entries even though the ledger is still being written to. we would > like to add the ability for a reader to assert the last entry in a ledger > and read up to that entry, but this is not yet in the code. > > 3) there is one other bug you are seeing, before a ledger can be read, it > must be closed. as your code shows, a process can open a ledger for reading > while it is still being written to, which causes an implicit close that is > not detected by the writer. > > this is a nice test case :) thanx > ben > > > On 07/17/2010 05:02 PM, André Oriani wrote: > >> Hi, >> >> >> I was not sure if I had understood the behavior of BookKeeper from >> documentation. So I made a little program, reproduced below, to see what >> BookKeeper looks like in action. Assuming my code is correct ( you never >> know when your code has some nasty obvious bugs that only other person >> than >> you can see ) , I could draw the follow conclusions: >> >> 1) Only the creator can add entries to a ledger, even though you can open >> the ledger, get a handle and call addEntry on it. No exception is thrown >> i. >> In other words, you cannot open a ledger for append. >> >> 2) Readers are able to see only the entries that were added to a ledger >> until someone had opened it for reading. If you want to ensure readers >> will >> see all the entries, you must add all entries before any reader attempts >> to >> read from the ledger. >> >> Could someone please tell me if those conclusions are correct or if I am >> mistaken? In the later case, could that person also tell me what is wrong >> ? >> >> Thanks a lot for the attention and the patience with this BookKeeper >> newbie, >> André >> >> >> >> >> package br.unicamp.zooexp.booexp; >> >> >> import java.io.IOException; >> >> import java.util.Enumeration; >> >> >> import org.apache.bookkeeper.client.BKException; >> >> import org.apache.bookkeeper.client.BookKeeper; >> >> import org.apache.bookkeeper.client.LedgerEntry; >> >> import org.apache.bookkeeper.client.LedgerHandle; >> >> import org.apache.bookkeeper.client.BookKeeper.DigestType; >> >> import org.apache.zookeeper.KeeperException; >> >> >> public class BookTest { >> >> >> public static void main (String ... args) throws IOException, >> InterruptedException, KeeperException, BKException{ >> >> BookKeeper bk = new BookKeeper("127.0.0.1"); >> >> LedgerHandle lh = bk.createLedger(DigestType.CRC32, "123" >> .getBytes()); >> >> long lh_id = lh.getId(); >> >> lh.addEntry("Teste".getBytes()); >> >> lh.addEntry("Test2".getBytes()); >> >> System.out.printf("Got %d entries for lh\n" >> ,lh.getLastAddConfirmed()+1); >> >> >> >> >> lh.addEntry("Test3".getBytes()); >> >> LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, "123" >> .getBytes()); >> >> System.out.printf("Got %d entries for lh1\n" >> ,lh1.getLastAddConfirmed()+1); >> >> lh.addEntry("Test4".getBytes()); >> >> >> lh.addEntry("Test5".getBytes()); >> >> lh.addEntry("Test6".getBytes()); >> >> System.out.printf("Got %d entries for lh\n" >> ,lh.getLastAddConfirmed()+1); >> >> Enumeration seq = lh.readEntries(0, >> lh.getLastAddConfirmed()); >> >> while (seq.hasMoreElements()){ >> >> System.out.println(new String(seq.nextElement().getEntry())); >> >> } >> >> lh.close(); >> >> >> >> lh1.addEntry("Test7".getBytes()); >> >> lh1.addEntry("Test8".getBytes()); >> >> >> System.out.printf("Got %d entries for lh1\n" >> ,lh1.
getChildren() when the number of children is very large
Hi folks, I was considering using Zookeeper to implement a replication protocol due the global order guarantee. In my case, operations are logged by creating persistent sequential znodes. Knowing the name of last applied znode, backups can identify pending operations and apply them in order. Because I want to allow backups to join the system at any time, I will not delete a znode before a checkpoint. Thus, I can ending up with thousand of child nodes and consequently ZooKeeper.getChildren() calls might be very consuming since a huge list of node will be returned. I thought of using another znode to store the last created znode. So if the last applied znode was op-11 and last created znode was op-14, I would try to read op-12 and op-13. However, in order to protect against partial failure, I have to encode some extra information ( I am using -) in the name of znodes. Thus it is not possible to predict their names (they'll be op--). Consequently , I will have to call getChildren() anyway. Has somebody faced the same issue ? Has anybody found a better solution ? I was thinking of extending ZooKeeper code to have some kind of indexed access to child znodes, but I don`t know how easy/clever is that. Thanks, André
Re: getChildren() when the number of children is very large
Ted, just to clarify. By file you mean znode, right ? So you are advising me to try an atomic append to znode's by first calling getData and then trying to conditionally set the data by using the version information obtained in the previous step ? Thanks, André On Tue, Jul 20, 2010 at 23:52, Ted Dunning wrote: > Creating a new znode for each update isn't really necessary. Just create a > file that will contain all of the updates for the next snapshot and do > atomic updates to add to the list of updates belonging to that snapshot. > When you complete the snapshot, you will create a new file. After a time > you can delete the old snapshot lists since they are now redundant. This > will leave only a few snapshot files in your directory and getChildren will > be fast. Getting the contents of the file will give you a list of > transactions to apply and when you are done with those, you can get the > file > again to get any new ones before considering yourself to be up to date. > The > snapshot file doesn't need to contain the updates themselves, but instead > can contain pointers to other znodes that would actually contain the > updates. > > I think that the tendency to use file creation as the basic atomic > operation > is a holdover from days when we used filesystems that way. With ZK, file > updates are ordered, atomic and you know that you updated the right version > which makes many uses of directory updates much less natural. > > On Tue, Jul 20, 2010 at 7:26 PM, André Oriani < > ra078...@students.ic.unicamp.br> wrote: > > > Hi folks, > > > > I was considering using Zookeeper to implement a replication protocol due > > the global order guarantee. In my case, operations are logged by creating > > persistent sequential znodes. Knowing the name of last applied znode, > > backups can identify pending operations and apply them in order. Because > I > > want to allow backups to join the system at any time, I will not delete a > > znode before a checkpoint. Thus, I can ending up with thousand of child > > nodes and consequently ZooKeeper.getChildren() calls might be very > > consuming > > since a huge list of node will be returned. > > > > I thought of using another znode to store the last created znode. So if > the > > last applied znode was op-11 and last created znode was op-14, I would > try > > to read op-12 and op-13. However, in order to protect against partial > > failure, I have to encode some extra information ( I am using > > -) in the name of znodes. Thus it > is > > not possible to predict their names (they'll be op- > string>-). Consequently , I will have to call > > getChildren() anyway. > > > > Has somebody faced the same issue ? Has anybody found a better solution > ? > > I was thinking of extending ZooKeeper code to have some kind of indexed > > access to child znodes, but I don`t know how easy/clever is that. > > > > Thanks, > > André > > >