For question #4, see also http://hbase.apache.org/book.html#regions.arch.locality
Cheers On Sun, Jan 19, 2014 at 10:49 PM, Bharath Vissapragada < [email protected]> wrote: > For question #3, The block size Lars talks about is the blocksize inside a > HFile which is different from HDFS block size. Look at > http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to > facilitate random access to data so that we can skip unnecessary disk > blocks while gets/scans. Smaller the hfile block size better is the random > read performance. You can see the detailed hfile layout in that link. > > For question #4, You are correct, since the data resides on HDFS, each > region server has access to all the storefiles (they just use hdfs api to > read them). The reason they are still available after a (RS+datanode) crash > is because of the replication in hdfs. The store files still have valid > replicas and namenode tries to maintain the replication factor by > re-replicating them eventually. > > > On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <[email protected]> wrote: > > > For question #1, there is load balancer in HMaster which does the job of > > balancing region load. > > > > For number 2, the daughter regions stay on the same server as the parent > > after split. Later one or both of them may be moved to other region > servers. > > > > Cheers > > > > On Jan 19, 2014, at 10:27 PM, Bill Q <[email protected]> wrote: > > > > > Hi, > > > I am trying to get more information about HBase. I would appreciate > some > > > answers to these few questions. Thanks a lot. > > > > > > 1. About load balancing: does HMaster monitor overloaded or low loaded > > > HRegionServer, and move some regions from the hot HRegionServer to low > > > loaded ones (with or without add new servers into the cluster, > > > respectively)? > > > > > > 2. About region splitting: when splitting a region, will the newly > > created > > > regions stay on the current HRegionSever, or will HMaster assign some > new > > > HRegionServers to take the newly created two regions? > > > > > > 3. About HFile size: Lars mentioned here > > > > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat > > > the HFile size is default to 64k. How does this work while the default > > HDFS > > > block is 64M/128M? Would the small HFile size waste lots of space on > > HDFS? > > > > > > 4. About data locality: if a HRegionServer fails, the HMaster would > > assign > > > a new HRegionServer to take its place. But does this new HRegionServer > > > should have access to the storeFiles? I assumed that's how it works by > > > using HDFS's data replication. But after some readings, I got confused. > > It > > > seems that the new HRegionServer can work without the storeFiles data > at > > > local. How does this work at all? > > > > > > Many thanks. > > > > > > > > > Bill > > > > > > -- > Bharath Vissapragada > <http://www.cloudera.com> >
