Thanks Stack,  I hadn't read the percolator paper (doing it now).  I think I am 
not describing my question properly.  Basically, based on the hbase-trx 
implementation, when the transaction commits, there is a time window where a 
Get() might read partial rows since it implements the snapshot isolation by 
writing records to a different location (than the actual HTable) before the 
commit().  In the percolator paper, cell versions are used as snapshot 
isolation and uses an as-of timestamp when doing a Get().

Another unrelated question: when a region server fails, does the client (while 
doing a get/scan) get notified (exception)?  Basically, I want to ensure that 
an operation (such as a rollup/aggregate) does not compute the wrong amounts 
due to missing data.

Thanks again for your help,
-Eric


________________________________
From: Stack <[email protected]>
To: [email protected]; Eric Burin des Roziers <[email protected]>
Sent: Thursday, May 5, 2011 10:05 PM
Subject: Re: put to WAL and scan/get operation concurrency

On Thu, May 5, 2011 at 11:43 AM, Eric Burin des Roziers
<[email protected]> wrote:
> Hi Jean-Daniel,
>
> Yes, I need to have a multi-row transactional aware HBase for the types of 
> processing I need to do.  I need to avoid having partial rows available and I 
> am in the process of selecting a way to implement such a transaction 
> isolation.  I currently have 2 choices: (1) use the HBase-trx or (2) 
> implement my own leveraging the verioning that HBase provides.  In light of 
> this I wanted to understand the inner workings of HBase a little more.


You have read the megastore and percolator papers?  They discuss x-row
transactions.


> For example, I want to understand if scans read data from the MemStore even 
> if it has not yet been flushed to the HFiles yet.

It does.

> HBase replicates the data 3 times (depending on your configs).  Does it do 
> that as well for the MemStore.

The data in memstore is first put in the WAL which is replicated three times.



> Say the client wants to inserts 10 lines which happen to fall across 2 
> regions.  If region 2 fails, then another client will still be able to read 
> the rows inserted in region 1, but not region 2.  Since HBase replicates data 
> to other servers, region 2 lines could be available on other servers, right?
>

Would suggest you read the bigtable paper.  It'll answer most of your
questions more eloquently than I can (To answer your question, only
one region serves a specific piece of data.  It depends on your
transaction implementation as to whether the half written data is
readable by the client).


> The second aspect that I would like to understand is the implementation of 
> the HBase-trx.  It seems that I can still have a failure point when the 
> transactional WAL (THLog) flushed the data to the main Wal.  using the above 
> example, I can get into a situation where I will only be able to read a 
> subset of the initial 10 lines initially inserted.  Is that right?
>

I think, pardon me if I'm reading this wrong, you have begun on a
wrong foot so your question doesn't add up right.

St.Ack

Reply via email to