Nope, not using ZK, that would not scale down to the cell level. You'll probably have to stare at the code in MultiVersionConsistencyControlfor a while (I know I had to).
The basic flow of a write operation is this: 1. lock the row 2. persist change to the write ahead log 3. get a "writenumber" from mvcc (this is basically a timestamp) 4. apply change to the memstore (using that write number). 5. advance the readpoint (maximum timestamp of changes that reads will see) -- this is the point where readers see the change 6. unlock the row (7. when memstore is full, flush it to a new disk file, but is done asynchronously, and not really important, although it has some complicated implications when the flush happens while there are readers reading from an old read point) The above is relaxed sometimes for idempotent operations. -- Lars ----- Original Message ----- From: Mohit Anchlia <[email protected]> To: [email protected]; lars hofhansl <[email protected]> Cc: Sent: Thursday, December 1, 2011 3:03 PM Subject: Re: Atomicity questions Thanks. I'll try and take a look, but I haven't worked with zookeeper before. Does it use zookeeper for any of ACID functionality? On Thu, Dec 1, 2011 at 2:55 PM, lars hofhansl <[email protected]> wrote: > Hi Mohit, > > the best way to study this is to look at MultiVersionConsistencyControl.java > (since you are asking how this handled internally). > > In a nutshell this ensures that read operations don't see writes that are not > completed, by (1) defining a thread read point that is rolled forward only > after a completed operations and (2) assigning a special timestamp (not the > timestamp that you set from the client API) to all KeyValues. > > -- Lars > > > ----- Original Message ----- > From: Mohit Anchlia <[email protected]> > To: [email protected] > Cc: > Sent: Thursday, December 1, 2011 2:22 PM > Subject: Atomicity questions > > I have some questions about ACID after reading this page, > http://hbase.apache.org/acid-semantics.html > > - Atomicity point 5 : row must either be "a=1,b=1,c=1" or > "a=2,b=2,c=2" and must not be something like "a=1,b=2,c=1". > > How is this internally handled in hbase such that above is possible? > >
