bq. under heavy load by serving to hot regions Did you mean 'two hot regions' ? If so, the master will move one of them to another RS.
Cheers On Mon, Jan 20, 2014 at 6:17 AM, Bill Q <[email protected]> wrote: > Hi Ted and Bharath, > Thanks a lot for the replies. > > For question #1, if there is a RS is under heavy load by serving to hot > regions, the HMaster will move one of the two regions to another RS, or > HMaster will split both of them and move the newly crated halves to other > RSs? > > For question #3, does this mean that a HFile has many 64k blocks, but > itself is around 64M (or 128M)? > > > Many thanks. > > > Bill > > > On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada < > [email protected] > > wrote: > > > For question #3, The block size Lars talks about is the blocksize inside > a > > HFile which is different from HDFS block size. Look at > > http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to > > facilitate random access to data so that we can skip unnecessary disk > > blocks while gets/scans. Smaller the hfile block size better is the > random > > read performance. You can see the detailed hfile layout in that link. > > > > For question #4, You are correct, since the data resides on HDFS, each > > region server has access to all the storefiles (they just use hdfs api to > > read them). The reason they are still available after a (RS+datanode) > crash > > is because of the replication in hdfs. The store files still have valid > > replicas and namenode tries to maintain the replication factor by > > re-replicating them eventually. > > > > > > On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <[email protected]> wrote: > > > > > For question #1, there is load balancer in HMaster which does the job > of > > > balancing region load. > > > > > > For number 2, the daughter regions stay on the same server as the > parent > > > after split. Later one or both of them may be moved to other region > > servers. > > > > > > Cheers > > > > > > On Jan 19, 2014, at 10:27 PM, Bill Q <[email protected]> wrote: > > > > > > > Hi, > > > > I am trying to get more information about HBase. I would appreciate > > some > > > > answers to these few questions. Thanks a lot. > > > > > > > > 1. About load balancing: does HMaster monitor overloaded or low > loaded > > > > HRegionServer, and move some regions from the hot HRegionServer to > low > > > > loaded ones (with or without add new servers into the cluster, > > > > respectively)? > > > > > > > > 2. About region splitting: when splitting a region, will the newly > > > created > > > > regions stay on the current HRegionSever, or will HMaster assign some > > new > > > > HRegionServers to take the newly created two regions? > > > > > > > > 3. About HFile size: Lars mentioned here > > > > > > > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat > > > > the HFile size is default to 64k. How does this work while the > default > > > HDFS > > > > block is 64M/128M? Would the small HFile size waste lots of space on > > > HDFS? > > > > > > > > 4. About data locality: if a HRegionServer fails, the HMaster would > > > assign > > > > a new HRegionServer to take its place. But does this new > HRegionServer > > > > should have access to the storeFiles? I assumed that's how it works > by > > > > using HDFS's data replication. But after some readings, I got > confused. > > > It > > > > seems that the new HRegionServer can work without the storeFiles data > > at > > > > local. How does this work at all? > > > > > > > > Many thanks. > > > > > > > > > > > > Bill > > > > > > > > > > > -- > > Bharath Vissapragada > > <http://www.cloudera.com> > > >
