Hi,
2011/5/6 Jean-Daniel Cryans <[email protected]> > As I said before the regions aren't replicated, it is not the right > way to see it. > The data for those regions is replicated, but only 1 region server > does the management of that data. > > So does it mean, there isnt "scalling for reads"? {mean higher replica -> better read throughput} Thanks > If a RS crashes, like I said before, the data is unavailable until the > logs are replayed. In the context of a MR (or any other client > context, because it's really the same) the maps or reducers that are > reading data from HBase will be blocked (specifically in the > HConnectionManager code, it is transparent) until the region that > contains the row it's trying to get to is made available again. That's > the strong consistency guarantee. The only case where your client will > see an exception is if the retries are exhausted, in which case you'll > see a RetriesExhaustedException. > > J-D > > On Fri, May 6, 2011 at 7:47 AM, Eric Burin des Roziers > <[email protected]> wrote: > > So, just to make sure I understand, there is a chance that, a MapReduce > job does not get all the data without being aware of it, because a region > server crashed? Wouldn't HBase use a replicated region instead? And if the > region server crashed during the job scan, shouldn't it get an exception, > right? > > Thanks, > > -Eric > > > > > > > > ________________________________ > > From: Stack <[email protected]> > > To: [email protected]; Eric Burin des Roziers <[email protected]> > > Sent: Friday, May 6, 2011 4:37 PM > > Subject: Re: put to WAL and scan/get operation concurrency > > > > On Fri, May 6, 2011 at 1:45 AM, Eric Burin des Roziers > > <[email protected]> wrote: > >> Thanks Stack, I hadn't read the percolator paper (doing it now). I > think I am not describing my question properly. Basically, based on the > hbase-trx implementation, when the transaction commits, there is a time > window where a Get() might read partial rows since it implements the > snapshot isolation by writing records to a different location (than the > actual HTable) before the commit(). In the percolator paper, cell versions > are used as snapshot isolation and uses an as-of timestamp when doing a > Get(). > >> > > > > That could be the case (I had a bit of a notion of how hbase-trx > > worked -- once -- but its been flushed w/ a while now). Want to ask > > over on the hbase-trx github project? James will likely know. > > > >> Another unrelated question: when a region server fails, does the client > (while doing a get/scan) get notified (exception)? Basically, I want to > ensure that an operation (such as a rollup/aggregate) does not compute the > wrong amounts due to missing data. > >> > > > > The client? No. Not natively. RegionServers do register themselves > > in zk. A trx-client could register a zk watcher on regionservers dir > > in zk. Then you'd get notification of RS death. If you go this route > > and thousands or tens of thousands of clients, you might want to do a > > bit of research around how it'll scale. > > > > St.Ack >
