Concurrent reads and writes on BookKeeper

2010-05-18 Thread André Oriani
Sorry, I forgot the subject on my last message :|

 Hi all,
 I was considering BookKeeper to implement some server replicated
 application having one primary server as writer and many backup servers
 reading from BookKeeper concurrently. The last documentation a I had
 access says "This writer has to execute a close ledger operation before
 any other client can read from it."  So readers cannot ready any entry on
 the ledger, even the already committed ones until writer stops writing to
 the ledger,i.e, closes it.  Is my understanding right ? Should I then use
 Zookeeper directly to achieve what I want ?


 Thanks for the attention,
 André Oriani








Re: Concurrent reads and writes on BookKeeper

2010-05-20 Thread André Oriani

Well Flavio, it is a extremely simple prototype where a primary broadcast
updates on a single integer to backups. So we gonna have (n-1) reads for
every write in a cluster of size n. I think sequential nodes in Zookeeper
are fine  for now, But I don't know if I am going to review that if things
begin to get more complex.

Tks a lot,
André Oriani

> Hi Andre, To guarantee that two clients that read from a ledger will
> read the same sequence of entries, we need to make sure that there is
> agreement on the end of the sequence. A client is still able to read
> from an open ledger, though. We have an open jira about informing
> clients of the progress of an open ledger (ZOOKEEPER-462), but we
> haven't reached agreement on it yet. Some folks think that it is best
> that each application use the mechanism it finds best. One option is
> to have the writer writing periodically to a ZooKeeper znode to inform
> of its progress.
>
> I would need to know more detail of your application before
> recommending you to stick with BookKeeper or switch to ZooKeeper. If
> your workload is dominated by writes, then BookKeeper might be a
> better option.
>
> -Flavio
>
> On May 19, 2010, at 1:29 AM, André Oriani wrote:
>
>> Sorry, I forgot the subject on my last message :|
>>
>> Hi all,
>> I was considering BookKeeper to implement some server replicated
>> application having one primary server as writer and many backup
>> servers
>> reading from BookKeeper concurrently. The last documentation a I had
>> access says "This writer has to execute a close ledger operation
>> before
>> any other client can read from it."  So readers cannot ready any
>> entry on
>> the ledger, even the already committed ones until writer stops
>> writing to
>> the ledger,i.e, closes it.  Is my understanding right ? Should I
>> then use
>> Zookeeper directly to achieve what I want ?
>>
>>
>> Thanks for the attention,
>> André Oriani
>>
>>
>>
>>
>>
>>
>
>




Are Watchers execute sequentially or in parallel ?

2010-06-29 Thread André Oriani
Hi,

Are Watchers executed sequentially  or in parallel ? Suppose I want to
monitor the children of a znode for any modification.  I don't want the same
watcher to be re-executed while it is still executing.



1)

public class ChildrenWatcher implements Watcher{

 public void process(WatchedEvent event) {

  //get children and install watcher
  List children = zk.getChildren(path, this);

   //process children

}
}



2)

public class ChildrenWatcher implements Watcher{

 public void process(WatchedEvent event) {

  //get children
  List children = zk.getChildren(path, null);

   //process children

 //install watcher
 zk.getChildren(path, null)
}
}



Does both code achieve the goal or just the code number 2 ?


Tks,
André


Re: Are Watchers execute sequentially or in parallel ?

2010-06-29 Thread André Oriani
Thanks Ben. It was a copy and past mistake.  So that means method process()
must return as soon as possible.

On Tue, Jun 29, 2010 at 11:48, Benjamin Reed  wrote:

watchers are executed sequentially and in order. there is one dispatch
thread that invokes the watch callbacks.

ben

ps - in 2) you do not install a watch.

On 06/29/2010 06:13 AM, André Oriani wrote:

Hi,

Are Watchers executed sequentially  or in parallel ? Suppose I want to
monitor the children of a znode for any modification.  I don't want the same
watcher to be re-executed while it is still executing.

1)

public class ChildrenWatcher implements Watcher{

 public void process(WatchedEvent event) {

  //get children and install watcher
  List  children = zk.getChildren(path, this);

   //process children

}
}

2)

public class ChildrenWatcher implements Watcher{

 public void process(WatchedEvent event) {

  //get children
  List  children = zk.getChildren(path, null);

   //process children

 //install watcher
 zk.getChildren(path, null)
}
}

Does both code achieve the goal or just the code number 2 ?

Tks,
André


BookKeeper Doubts

2010-07-17 Thread André Oriani
Hi,


I was not sure if I had understood the behavior of BookKeeper from
documentation. So I made  a little program, reproduced below, to see what
BookKeeper looks like in action. Assuming my code is correct ( you never
know  when your code has some nasty obvious bugs that only other person than
you can see ) , I could draw the follow conclusions:

1) Only the creator can add entries to a ledger, even though you can  open
the ledger, get a handle and call addEntry on it. No exception is thrown  i.
In other words, you cannot open a ledger for append.

2) Readers are able to see only the entries that were added to a ledger
until someone had opened it for reading. If you want to ensure  readers will
see all the entries, you must add all entries before any reader attempts to
read from the ledger.

Could someone please tell me if those conclusions are correct or if I am
mistaken? In the later case, could that person also tell me what is wrong ?

Thanks a lot for the attention and the patience with this BookKeeper newbie,
André




package br.unicamp.zooexp.booexp;


import java.io.IOException;

import java.util.Enumeration;


import org.apache.bookkeeper.client.BKException;

import org.apache.bookkeeper.client.BookKeeper;

import org.apache.bookkeeper.client.LedgerEntry;

import org.apache.bookkeeper.client.LedgerHandle;

import org.apache.bookkeeper.client.BookKeeper.DigestType;

import org.apache.zookeeper.KeeperException;


public class BookTest {


public static void main (String ... args) throws IOException,
InterruptedException, KeeperException, BKException{

BookKeeper bk = new BookKeeper("127.0.0.1");

LedgerHandle lh = bk.createLedger(DigestType.CRC32, "123"
.getBytes());

long lh_id = lh.getId();

lh.addEntry("Teste".getBytes());

lh.addEntry("Test2".getBytes());

System.out.printf("Got %d entries for lh\n"
,lh.getLastAddConfirmed()+1);




lh.addEntry("Test3".getBytes());

LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, "123"
.getBytes());

System.out.printf("Got %d entries for lh1\n"
,lh1.getLastAddConfirmed()+1);

lh.addEntry("Test4".getBytes());


lh.addEntry("Test5".getBytes());

lh.addEntry("Test6".getBytes());

System.out.printf("Got %d entries for lh\n"
,lh.getLastAddConfirmed()+1);

Enumeration seq = lh.readEntries(0,
lh.getLastAddConfirmed());

while (seq.hasMoreElements()){

System.out.println(new String(seq.nextElement().getEntry()));

}

lh.close();



lh1.addEntry("Test7".getBytes());

lh1.addEntry("Test8".getBytes());


System.out.printf("Got %d entries for lh1\n"
,lh1.getLastAddConfirmed()+1);


seq = lh1.readEntries(0, lh1.getLastAddConfirmed());

while (seq.hasMoreElements()){

System.out.println(new String(seq.nextElement().getEntry()));

}



lh1.close();


LedgerHandle lh2 = bk.openLedger(lh_id, DigestType.CRC32, "123"
.getBytes());

lh2.addEntry("Test9".getBytes());


System.out.printf("Got %d entries for lh2 \n"
,lh2.getLastAddConfirmed()+1);


seq = lh2.readEntries(0, lh2.getLastAddConfirmed());

while (seq.hasMoreElements()){

System.out.println(new String(seq.nextElement().getEntry()));

}


bk.halt();


}

}


Output:

Got 2 entries for lh

Got 3 entries for lh1

Got 6 entries for lh

Teste

Test2

Test3

Test4

Test5

Test6

Got 3 entries for lh1

Teste

Test2

Test3

Got 3 entries for lh2

Teste

Test2

Test3


Re: BookKeeper Doubts

2010-07-19 Thread André Oriani
I filled ZOOKEEPER-824 and ZOOKEEPER-825

tks,
André

On Mon, Jul 19, 2010 at 19:44, Benjamin Reed  wrote:

> you have concluded correctly.
>
> 1) bookkeeper was designed for a process to use as a write-ahead log, so as
> a simplifying assumption we assume a single writer to a log. we should be
> throwing an exception if you try to write to a handle that you obtained
> using openLedger. can you open a jira for that?
>
> 2) this is mostly true, there are some exceptions. the creater of a ledger
> can read entries even though the ledger is still being written to. we would
> like to add the ability for a reader to assert the last entry in a ledger
> and read up to that entry, but this is not yet in the code.
>
> 3) there is one other bug you are seeing, before a ledger can be read, it
> must be closed. as your code shows, a process can open a ledger for reading
> while it is still being written to, which causes an implicit close that is
> not detected by the writer.
>
> this is a nice test case :) thanx
> ben
>
>
> On 07/17/2010 05:02 PM, André Oriani wrote:
>
>> Hi,
>>
>>
>> I was not sure if I had understood the behavior of BookKeeper from
>> documentation. So I made  a little program, reproduced below, to see what
>> BookKeeper looks like in action. Assuming my code is correct ( you never
>> know  when your code has some nasty obvious bugs that only other person
>> than
>> you can see ) , I could draw the follow conclusions:
>>
>> 1) Only the creator can add entries to a ledger, even though you can  open
>> the ledger, get a handle and call addEntry on it. No exception is thrown
>>  i.
>> In other words, you cannot open a ledger for append.
>>
>> 2) Readers are able to see only the entries that were added to a ledger
>> until someone had opened it for reading. If you want to ensure  readers
>> will
>> see all the entries, you must add all entries before any reader attempts
>> to
>> read from the ledger.
>>
>> Could someone please tell me if those conclusions are correct or if I am
>> mistaken? In the later case, could that person also tell me what is wrong
>> ?
>>
>> Thanks a lot for the attention and the patience with this BookKeeper
>> newbie,
>> André
>>
>>
>>
>>
>> package br.unicamp.zooexp.booexp;
>>
>>
>> import java.io.IOException;
>>
>> import java.util.Enumeration;
>>
>>
>> import org.apache.bookkeeper.client.BKException;
>>
>> import org.apache.bookkeeper.client.BookKeeper;
>>
>> import org.apache.bookkeeper.client.LedgerEntry;
>>
>> import org.apache.bookkeeper.client.LedgerHandle;
>>
>> import org.apache.bookkeeper.client.BookKeeper.DigestType;
>>
>> import org.apache.zookeeper.KeeperException;
>>
>>
>> public class BookTest {
>>
>>
>> public static void main (String ... args) throws IOException,
>> InterruptedException, KeeperException, BKException{
>>
>> BookKeeper bk = new BookKeeper("127.0.0.1");
>>
>> LedgerHandle lh = bk.createLedger(DigestType.CRC32, "123"
>> .getBytes());
>>
>> long lh_id = lh.getId();
>>
>> lh.addEntry("Teste".getBytes());
>>
>> lh.addEntry("Test2".getBytes());
>>
>> System.out.printf("Got %d entries for lh\n"
>> ,lh.getLastAddConfirmed()+1);
>>
>>
>>
>>
>> lh.addEntry("Test3".getBytes());
>>
>> LedgerHandle lh1 = bk.openLedger(lh_id, DigestType.CRC32, "123"
>> .getBytes());
>>
>> System.out.printf("Got %d entries for lh1\n"
>> ,lh1.getLastAddConfirmed()+1);
>>
>> lh.addEntry("Test4".getBytes());
>>
>>
>> lh.addEntry("Test5".getBytes());
>>
>> lh.addEntry("Test6".getBytes());
>>
>> System.out.printf("Got %d entries for lh\n"
>> ,lh.getLastAddConfirmed()+1);
>>
>> Enumeration  seq = lh.readEntries(0,
>> lh.getLastAddConfirmed());
>>
>> while (seq.hasMoreElements()){
>>
>> System.out.println(new String(seq.nextElement().getEntry()));
>>
>> }
>>
>> lh.close();
>>
>>
>>
>> lh1.addEntry("Test7".getBytes());
>>
>> lh1.addEntry("Test8".getBytes());
>>
>>
>> System.out.printf("Got %d entries for lh1\n"
>> ,lh1.

getChildren() when the number of children is very large

2010-07-20 Thread André Oriani
Hi folks,

I was considering using Zookeeper to implement a replication protocol due
the global order guarantee. In my case, operations are logged by creating
persistent sequential znodes. Knowing the name of last applied znode,
backups can identify pending operations and apply them in order. Because I
want to allow backups to join the system at any time, I will not delete a
znode before a checkpoint. Thus,  I can ending up with thousand of child
nodes and consequently ZooKeeper.getChildren() calls might be very consuming
since a huge list of node will be returned.

I thought of using another znode to store the last created znode. So if the
last applied znode was op-11 and last created znode was op-14, I would try
to read op-12 and op-13. However, in order to protect against partial
failure, I have to encode some extra information ( I am using
-)  in the name of znodes. Thus it is
not possible to predict their names (they'll be op--). Consequently , I will have to call
getChildren() anyway.

Has somebody faced the same issue ?  Has anybody found a better solution ?
 I was thinking of extending ZooKeeper code to have some kind of indexed
access to child znodes, but I don`t know how easy/clever is that.

Thanks,
André


Re: getChildren() when the number of children is very large

2010-07-20 Thread André Oriani
Ted, just to clarify. By file you mean znode, right ? So you are advising me
to try an atomic append to znode's by first calling getData and then trying
to conditionally set the data by using the version information obtained in
the previous step ?

Thanks,
André

On Tue, Jul 20, 2010 at 23:52, Ted Dunning  wrote:

> Creating a new znode for each update isn't really necessary.  Just create a
> file that will contain all of the updates for the next snapshot and do
> atomic updates to add to the list of updates belonging to that snapshot.
>  When you complete the snapshot, you will create a new file.  After a time
> you can delete the old snapshot lists since they are now redundant.  This
> will leave only a few snapshot files in your directory and getChildren will
> be fast.  Getting the contents of the file will give you a list of
> transactions to apply and when you are done with those, you can get the
> file
> again to get any new ones before considering yourself to be up to date.
>  The
> snapshot file doesn't need to contain the updates themselves, but instead
> can contain pointers to other znodes that would actually contain the
> updates.
>
> I think that the tendency to use file creation as the basic atomic
> operation
> is a holdover from days when we used filesystems that way.  With ZK, file
> updates are ordered, atomic and you know that you updated the right version
> which makes many uses of directory updates much less natural.
>
> On Tue, Jul 20, 2010 at 7:26 PM, André Oriani <
> ra078...@students.ic.unicamp.br> wrote:
>
> > Hi folks,
> >
> > I was considering using Zookeeper to implement a replication protocol due
> > the global order guarantee. In my case, operations are logged by creating
> > persistent sequential znodes. Knowing the name of last applied znode,
> > backups can identify pending operations and apply them in order. Because
> I
> > want to allow backups to join the system at any time, I will not delete a
> > znode before a checkpoint. Thus,  I can ending up with thousand of child
> > nodes and consequently ZooKeeper.getChildren() calls might be very
> > consuming
> > since a huge list of node will be returned.
> >
> > I thought of using another znode to store the last created znode. So if
> the
> > last applied znode was op-11 and last created znode was op-14, I would
> try
> > to read op-12 and op-13. However, in order to protect against partial
> > failure, I have to encode some extra information ( I am using
> > -)  in the name of znodes. Thus it
> is
> > not possible to predict their names (they'll be op- > string>-). Consequently , I will have to call
> > getChildren() anyway.
> >
> > Has somebody faced the same issue ?  Has anybody found a better solution
> ?
> >  I was thinking of extending ZooKeeper code to have some kind of indexed
> > access to child znodes, but I don`t know how easy/clever is that.
> >
> > Thanks,
> > André
> >
>