Re: put to WAL and scan/get operation concurrency

pob Fri, 06 May 2011 11:20:05 -0700

Hi,



2011/5/6 Jean-Daniel Cryans <[email protected]>

> As I said before the regions aren't replicated, it is not the right
> way to see it.
>

The data for those regions is replicated, but only 1 region server
> does the management of that data.
>
>
So does it mean, there isnt "scalling for reads"? {mean higher replica ->
better read throughput}


Thanks


> If a RS crashes, like I said before, the data is unavailable until the
> logs are replayed. In the context of a MR (or any other client
> context, because it's really the same) the maps or reducers that are
> reading data from HBase will be blocked (specifically in the
> HConnectionManager code, it is transparent) until the region that
> contains the row it's trying to get to is made available again. That's
> the strong consistency guarantee. The only case where your client will
> see an exception is if the retries are exhausted, in which case you'll
> see a RetriesExhaustedException.
>
> J-D
>
> On Fri, May 6, 2011 at 7:47 AM, Eric Burin des Roziers
> <[email protected]> wrote:
> > So, just to make sure I understand, there is a chance that, a MapReduce
> job does not get all the data without being aware of it, because a region
> server crashed?  Wouldn't HBase use a replicated region instead?  And if the
> region server crashed during the job scan, shouldn't it get an exception,
> right?
> > Thanks,
> > -Eric
> >
> >
> >
> > ________________________________
> > From: Stack <[email protected]>
> > To: [email protected]; Eric Burin des Roziers <[email protected]>
> > Sent: Friday, May 6, 2011 4:37 PM
> > Subject: Re: put to WAL and scan/get operation concurrency
> >
> > On Fri, May 6, 2011 at 1:45 AM, Eric Burin des Roziers
> > <[email protected]> wrote:
> >> Thanks Stack,  I hadn't read the percolator paper (doing it now).  I
> think I am not describing my question properly.  Basically, based on the
> hbase-trx implementation, when the transaction commits, there is a time
> window where a Get() might read partial rows since it implements the
> snapshot isolation by writing records to a different location (than the
> actual HTable) before the commit().  In the percolator paper, cell versions
> are used as snapshot isolation and uses an as-of timestamp when doing a
> Get().
> >>
> >
> > That could be the case (I had a bit of a notion of how hbase-trx
> > worked -- once -- but its been flushed w/ a while now).  Want to ask
> > over on the hbase-trx github project?  James will likely know.
> >
> >> Another unrelated question: when a region server fails, does the client
> (while doing a get/scan) get notified (exception)?  Basically, I want to
> ensure that an operation (such as a rollup/aggregate) does not compute the
> wrong amounts due to missing data.
> >>
> >
> > The client?  No.  Not natively.  RegionServers do register themselves
> > in zk.  A trx-client could register a zk watcher on regionservers dir
> > in zk.  Then you'd get notification of RS death.  If you go this route
> > and thousands or tens of thousands of clients, you might want to do a
> > bit of research around how it'll scale.
> >
> > St.Ack
>

Re: put to WAL and scan/get operation concurrency

Reply via email to