Ok, I found some reference. I was actually asking the default load balancer of HBase. And by googling, it seems it only makes the number of regions even across region servers, but the distribution of regions are random.
Also found good load balancer implementation like this: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html Thanks for the help JM! :) Jianshi On Tue, Aug 19, 2014 at 2:31 PM, lars hofhansl <[email protected]> wrote: > I'd change the max file size to 20GB. That'd give you 5000 regions for > 100TB. > > > > ________________________________ > From: Jianshi Huang <[email protected]> > To: [email protected] > Sent: Monday, August 18, 2014 12:22 PM > Subject: Re: How are split files distributed across Region servers? > > > Hi JM, > > Make the range bigger you mean to make it multiple regions/splits, right? > > I probably will have >100TB of data, and I think the default split file > size is 10GB. So I can assume each of my 100 machines will get assigned to > 100 *random* regions? > > Where can I find the implementation details or settings for region > assignment? > > Jianshi > > > > On Mon, Aug 18, 2014 at 8:48 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Hi Jianshi, > > > > A region server can host more than one region. So if you pre-split your > > table correctly based on your access usage, at the end all the servers > > should be used evenly. > > > > If you have about 30% or your range which is not used, just make sure > that > > this range is bigger so at the end it will have the same load at the > > others. > > > > JM > > > > > > 2014-08-18 2:08 GMT-04:00 Jianshi Huang <[email protected]>: > > > > > Hi JM, > > > > > > If the region boundaries will not change, does that mean, > > > > > > If my data access pattern has skews (say a certain part (30%) of my > data > > > will almost never be used), then a proportion (30%) of my server will > > > always be idle? > > > > > > A region server has to have a continuous rowkey range? > > > > > > Jianshi > > > > > > > > > > > > > > > On Sat, Aug 16, 2014 at 2:46 AM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > > > H Jianshi, > > > > > > > > Not sure to get your question. > > > > > > > > Can I rephrase it? > > > > > > > > So you have 10 regions, and each of those regions has 10 HFiles. Then > > you > > > > run a major compaction on the table. Correct? > > > > > > > > Then you will end up with: > > > > > > > > reg1:[files:1] > > > > reg2:[files:2] > > > > reg3:[files:3] > > > > ... > > > > > > > > Regions boundaries will not change. But each region will not have a > > > single > > > > underlaying file. > > > > > > > > HTH, > > > > > > > > JM > > > > > > > > > > > > 2014-08-15 1:53 GMT-04:00 Jianshi Huang <[email protected]>: > > > > > > > > > Say I have 100 split files on 10 region servers, and I did a major > > > > compact. > > > > > > > > > > Will these split files be distributed like this: > > > > > reg1: [splits 1,2,..,10] > > > > > reg2: [splits 11,12,...,20] > > > > > ... > > > > > > > > > > Or like this: > > > > > reg1: [splits: 1, 11, 21, ... , 91] > > > > > reg2: [splits: 2, 12, 22, ... , 92] > > > > > ... > > > > > > > > > > And if I want to specify the locality and the stride of split > files? > > > How > > > > > can I do it in HBase? > > > > > > > > > > > > > > > -- > > > > > Jianshi Huang > > > > > > > > > > LinkedIn: jianshi > > > > > Twitter: @jshuang > > > > > Github & Blog: http://huangjs.github.com/ > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Jianshi Huang > > > > > > LinkedIn: jianshi > > > Twitter: @jshuang > > > Github & Blog: http://huangjs.github.com/ > > > > > > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
