Please read the papers. You'll understand the architecture better that way.
On Aug 9, 2012, at 1:48 PM, Lin Ma <[email protected]> wrote: Thank you Amandeep, So I can simply understand in this way (logically), there do exist multiple region servers for the same region, but they are working in active-passive mode, when at one time only one active server is active? Correct? regards, Lin On Thu, Aug 9, 2012 at 2:04 PM, Amandeep Khurana <[email protected]> wrote: > Correct. You are limited to the throughput of a single region server while > interacting with a particular region. This throughput limitation is > typically handled by designing your keys such that your data is distributed > well across the cluster. > Having multiple region servers serve a single region gets you into the land > of maintaining consistency across copies, which is challenging. It might be > doable but that's not the design choice Bigtable (and hence HBase) made > initially. > > On Thu, Aug 9, 2012 at 11:04 AM, Lin Ma <[email protected]> wrote: > > > Thanks > > > > "only a single RegionServer ever hosts a region at once" -- I know HDFS > > have multiple copies for the same file. Is region server works in > > active-passive way, i.e. even if there are multiple copies, only one > region > > server could serve? If so, will it be bottleneck, supposing the traffic > to > > that region is too high? > > > > regards, > > Lin > > > > On Thu, Aug 9, 2012 at 11:09 AM, Bryan Beaudreault < > > [email protected] > > > wrote: > > > > > Actual data backing hbase is replicated, but that is handled by HDFS. > > Yes, > > > if you lose an hdfs datanode, clients (in this case the client is > hbase) > > > move to the next node in the pipeline. > > > > > > However, only a single RegionServer ever hosts a region at once. If > the > > > RegionServer dies, there is a period where the master must notice the > > > regions are unhosted and move them to other regionservers. During that > > > period, data is inaccessible or modifiable. > > > > > > On Wed, Aug 8, 2012 at 10:32 PM, Lin Ma <[email protected]> wrote: > > > > > > > Thank you Lars. > > > > > > > > Is the same data store duplicated copy across region server? If so, > if > > > one > > > > primary server for the region dies, client just need to read from the > > > > secondary server for the same region. Why there is data is > unavailable > > > > time? > > > > > > > > BTW: please feel free to correct me for any wrong knowledge about > > HBase. > > > > > > > > regards, > > > > Lin > > > > > > > > On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl <[email protected]> > > > wrote: > > > > > > > > > After a write completes the next read (regardless of the location > it > > is > > > > > issued from) will see the latest value. > > > > > This is because at any given time exactly RegionServer is > responsible > > > for > > > > > a specific Key > > > > > (through assignment of key ranges to regions and regions to > > > > RegionServers). > > > > > > > > > > > > > > > As Mohit said, the trade off is that data is unavailable if a > > > > RegionServer > > > > > dies until another RegionServer picks up the regions (and by > > extension > > > > the > > > > > key range) > > > > > > > > > > -- Lars > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > From: Lin Ma <[email protected]> > > > > > To: [email protected] > > > > > Cc: > > > > > Sent: Wednesday, August 8, 2012 8:47 AM > > > > > Subject: Re: consistency, availability and partition pattern of > HBase > > > > > > > > > > And consistency is not sacrificed? i.e. all distributed clients' > > update > > > > > will results in sequential / real time update? Once update is done > by > > > one > > > > > client, all other client could see results immediately? > > > > > > > > > > regards, > > > > > Lin > > > > > > > > > > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia < > > [email protected] > > > > > >wrote: > > > > > > > > > > > I think availability is sacrificed in the sense that if region > > server > > > > > > fails clients will have data inaccessible for the time region > comes > > > up > > > > on > > > > > > some other server, not to confuse with data loss. > > > > > > > > > > > > Sent from my iPad > > > > > > > > > > > > On Aug 7, 2012, at 11:56 PM, Lin Ma <[email protected]> wrote: > > > > > > > > > > > > > Thank you Wei! > > > > > > > > > > > > > > Two more comments, > > > > > > > > > > > > > > 1. How about Hadoop's CAP characters do you think about? > > > > > > > 2. For your comments, if HBase implements "per key sequential > > > > > > consistency", > > > > > > > what are the missing characters for consistency? Cross-key > update > > > > > > > sequences? Could you show me an example about what you think > are > > > > > missed? > > > > > > > thanks. > > > > > > > > > > > > > > regards, > > > > > > > Lin > > > > > > > > > > > > > > On Wed, Aug 8, 2012 at 12:18 PM, Wei Tan <[email protected]> > > wrote: > > > > > > > > > > > > > >> Hi Lin, > > > > > > >> > > > > > > >> In the CAP theorem > > > > > > >> Consistency stands for atomic consistency, i.e., each CRUD > > > operation > > > > > > >> occurs sequentially in a global, real-time clock > > > > > > >> Availability means each server if not partitioned can accept > > > > requests > > > > > > >> > > > > > > >> Partition means network partition > > > > > > >> > > > > > > >> As far as I understand (although I do not see any official > > > > > > documentation), > > > > > > >> HBase achieved "per key sequential consistency", i.e., for a > > > > specific > > > > > > key, > > > > > > >> there is an agreed sequence, for all operations on it. This is > > > > weaker > > > > > > than > > > > > > >> strong or sequential consistency, but stronger than "eventual > > > > > > >> consistency". > > > > > > >> > > > > > > >> BTW: CAP was proposed by Prof. Eric Brewer... > > > > > > >> http://en.wikipedia.org/wiki/Eric_Brewer_%28scientist%29 > > > > > > >> > > > > > > >> Best Regards, > > > > > > >> Wei > > > > > > >> > > > > > > >> Wei Tan > > > > > > >> Research Staff Member > > > > > > >> IBM T. J. Watson Research Center > > > > > > >> 19 Skyline Dr, Hawthorne, NY 10532 > > > > > > >> [email protected]; 914-784-6752 > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> From: Lin Ma <[email protected]> > > > > > > >> To: [email protected], > > > > > > >> Date: 08/07/2012 09:30 PM > > > > > > >> Subject: consistency, availability and partition > pattern > > of > > > > > HBase > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> Hello guys, > > > > > > >> > > > > > > >> According to the notes by Werner*, "*He presented the CAP > > theorem, > > > > > which > > > > > > >> states that of three properties of shared-data systems—data > > > > > consistency, > > > > > > >> system availability, and tolerance to network partition—only > two > > > can > > > > > be > > > > > > >> achieved at any given time." => > > > > > > >> > > > > > > > http://www.allthingsdistributed.com/2008/12/eventually_consistent.html > > > > > > >> > > > > > > >> But it seems HBase could achieve all of the 3 features at the > > same > > > > > time. > > > > > > >> Does it mean HBase breaks the rule by Werner. :-) > > > > > > >> > > > > > > >> If not, which one is sacrificed -- consistency (by using > HDFS), > > > > > > >> availability (by using Zookeeper) or partition (by using > region > > / > > > > > column > > > > > > >> family) ? And why? > > > > > > >> > > > > > > >> regards, > > > > > > >> Lin > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >
