Hi Jean, Thank you for your reply. So RS is a completely different entity when compared to the datanode? How does RS server the data? I can view the region directories in HDFS. So the same region must be on 3 datanodes, right? Then which regionserver gets to serve that region? Is it a completely random regionserver? And if I ask that region server for all keys from that region, will it have to come from the same HDFS datanode? As far as I understand, in HDFS, if I stream a file, then I get the data from a single datanode (the one closest to the client, usually). So, in HBase, I ask for all keys in region reg1, then I get all the keys from the datanode that is closest to the client?
Thanks for your time, Nick On Mon, Jun 18, 2012 at 11:53 PM, Jean-Daniel Cryans <[email protected]>wrote: > A region is only served by 1 region server, and since HBase uses the > HDFS client it doesn't have a view of the blocks layout. HBase > currently doesn't even know about replication, it asks to read a file > and gets some data coming from somewhere (that somewhere is determined > by HDFS). > > Hope this helps, > > J-D > > On Mon, Jun 18, 2012 at 11:16 AM, IGZ Nick <[email protected]> wrote: > > Hi folks, > > > > Here is how I understand the scan flow (A regular sequential scan from > key > > A to key B): > > - Zookeeper is contacted for the RegionServer that has the -ROOT- > regions. > > - The -ROOT- RS is contacted and it gets you the RS for .META. > > - The .META. is contacted, and it will give you all regions for keys > from A > > to B - e.g, A to A1 resides in reg1, A1 to A2 in reg2, A2 to B in reg3. > > > > Now if HDFS replication is set to 3, there must be 3 RS which will have > > reg1, and likewise for reg2 and reg3. So how does the client figure out > > which RS to go to? Or am I completely wrong here? > > As a follow up, if reg3 is present in RS1, RS2 and RS3, then does the > > client get all the data from A1 to A2 from a single RS or is there some > > sort of splitting like A1 to A11 can come from RS1, A11 to A12 from RS2 > and > > A12 to A2 from RS3. That would be faster, right? Put another way, if my > > scan consists of only one region, which is hosted on three RegionServers, > > does the data come in from all 3 RS's or just one of them? > > > > Thanks a lot, > > Nick >
