Inline. On Thu, May 5, 2011 at 6:03 AM, Eric Burin des Roziers <[email protected]> wrote: > Hi, > > I am currently looking at adding a transactional consistency aspect to HBase > and had 2 questions: > > 1. My understanding is that when the client performs an operation (put, > delete, incr), it is sent to the region server which delegates it to > different region servers, which in turn puts it in the WAL and the MemStore > in that region. At some point later, the MemStore is flushed to disk (into > the HFiles). The WAL is essentially there as a way to recover the data in > case the machine crashes, hence loosing data stored in its MemCache, but not > yet store on disk. Once the data is available in the MemStore (but not yet > in HFiles), do scans and gets 'see' that data? Is the data duplicated in the > MemStore across 3 region servers? If a region server crashes, can I get into > a situation where a scan can return a partial data set without the client > being aware of it?
Only one region server serves a region at a time, if that region server crashes then the data is available on other Datanodes but it's not available to the client until the WAL is replayed and the region is reopened. So no stale data. > > 2. The Hbase-trx package implements transactions by effectively creating a > WAL per transaction (THLog) and 'flushing' it to the main WAL (HLog) on > commit. But, flushing this THLog will take a time window (however small it > is). If a scan (or get) is performed during that window, could I get into a > situation where I see part of the committed transaction (some rows but not > others since they have not been flushed yet)? Why did the HBase-trx decide > to go with a THLog, instead of leveraging the KeyValue versioning? I think you are confusing multi-row transactions and single row transactions. In pure HBase, every single row transaction is ACID. You can learn more here http://hbase.apache.org/acid-semantics.html The trx package does multi-row transactions. > > I am thinking of implementing a transaction isolation/consistency mechanism > by storing a unique transaction id as the version when doing a put (instead > of the current millis) and passing invalid transaction ids to scans/get > letting them know to fetch a previous version (with a valid transaction id) > for cells that have been updated by a non-committed transaction. Are there > any reasons for not going with this approach? > So just to be sure, were my previous answers good enough to answer your question, or are you trying to implement something like the HBase-trx? J-D
