Thanks J-D! On Tue, Jun 19, 2012 at 12:31 AM, Jean-Daniel Cryans <[email protected]>wrote:
> On Mon, Jun 18, 2012 at 11:49 AM, IGZ Nick <[email protected]> wrote: > > Okay. Let me ask a more specific example. Say I have 3 contiguous > regions, > > all server by one RS. So if I do a scan which gets data from each of the > > regions, then everything has to come through this RS, which would be > slow. > > Why would it be slow? Because you have to scan sequentially? You have > different options here depending on your use case, but mainly if you > need to go faster you can do multiple scans in parallel. That's how it > works when MR'ing a table. > > > Or is there any optimization such that continuous regions don't end up > > being server by the same regionserver? > > No, AFAIK there's no reason to do it. > > J-D > > > > > On Tue, Jun 19, 2012 at 12:11 AM, Jean-Daniel Cryans < > [email protected]>wrote: > > > >> On Mon, Jun 18, 2012 at 11:34 AM, IGZ Nick <[email protected]> wrote: > >> > Hi Jean, > >> > > >> > Thank you for your reply. So RS is a completely different entity when > >> > compared to the datanode? > >> > >> Totally. > >> > >> > How does RS server the data? > >> > >> That's HBase 101, I recommend you read the guide > >> http://hbase.apache.org/book/book.html or the book > >> http://ofps.oreilly.com/titles/9781449396107/ or the bigtable paper. > >> > >> > I can view the > >> > region directories in HDFS. So the same region must be on 3 datanodes, > >> > right? > >> > >> Yep. > >> > >> > Then which regionserver gets to serve that region? > >> > >> HBase 101, but in short the master decides that. > >> > >> > Is it a > >> > completely random regionserver? > >> > >> The master uses a few heuristics. > >> > >> > And if I ask that region server for all > >> > keys from that region, will it have to come from the same HDFS > datanode? > >> > >> Depends if the data is there, if it is then it will be served locally > >> else it will be fetched. It doesn't really matter to the region server > >> since the HDFS client handles it transparently. > >> > >> > As > >> > far as I understand, in HDFS, if I stream a file, then I get the data > >> from > >> > a single datanode (the one closest to the client, usually). So, in > >> HBase, I > >> > ask for all keys in region reg1, then I get all the keys from the > >> datanode > >> > that is closest to the client? > >> > >> Yep > >> > >> J-D > >> >
